1. Introduction
Iron is one of the most globally demanded metals, being the main component for steel manufacturing. This essential alloy forms the backbone of modern global infrastructure, from skyscrapers and pipelines to household appliances. Brazil stands out as the world’s second-largest producer of iron ore, with the states of Minas Gerais and Pará accounting for 98% of national production [
1]. In 2024, Minas Gerais produced 238.92 million tons and Pará produced 167.98 million tons, representing 64.35% and 34.5% of the national total, respectively [
2].
The state of Bahia is the third-largest mineral producer in Brazil, with a strong emphasis on chromium, gold, copper, and nickel [
2]. However, despite possessing estimated reserves of at least 12.5 billion tons of iron ore, the state contributes only a small fraction to the country’s iron output. In 2024, Bahia’s production was approximately 1.2 million tons, or 0.016% of the national total [
3]. The primary constraint on growth is the lack of an effective logistical infrastructure to facilitate exports.
While the state works to overcome its past export limitations by developing key infrastructure such as the FIOL railway, which will provide a link between Bahia’s largest iron mine, Pedra de Ferro Mine, and the deep-water port in the southeast of the state, Porto Sul, numerous other potential iron reserves within the state, which have been documented in geological surveys for over 80 years, remain unsubstantiated due to the insufficiency of comprehensive studies to delineate and quantify reserves. This discourages mining companies and, consequently, the government from investing in these regions.
In this context, remote sensing emerges as an essential tool for MPM as satellite platforms such as Landsat, ASTER, and Sentinel-2 offers consistent, large-scale, and accessible source for applying strategic insights from geologic knowledge and data to the identification of surface indicators of mineralization, such as lithological units, structural features, and hydrothermal alteration zones [
4].
Combining these geoscience datasets with Artificial Intelligence (AI) methods, such as Machine Learning (ML) algorithms, allows a data-driven approach that can assist in the identification of complex, non-linear relationships between geological evidence and mineralization. It also systematically reduces the exploration search space and increases the effectiveness of discovering new profitable mineral deposits while lowering costs by redirecting prospecting field work.
Comparative studies [
5,
6,
7] suggest that RF, one of the ML methods that can be used in MBM, is particularly well-suited for geological applications because of its overall performance, feature importance, and robustness against noise and outliers, and it is less prone to overfitting compared to other ML algorithms like Support Vector Machines (SVM). However, it offers poor performance with limited and often imbalanced training datasets typical of mineral exploration. A hybrid approach may be used to overcome this.
Geological mapping has long employed the Spectral Angle Mapper (SAM) to identify mineralized zones of interest by comparing training data with a spectral reference library to determine which samples most closely resemble the target mineral’s spectrum [
8,
9,
10,
11]. This strategy can help determine the iron-rich positive samples training set and work as a class weight to mitigate the RF’s bias towards the majority class in imbalanced datasets, also reducing the likelihood of committing a “false negative” error by labeling a prospective area as non-mineralized.
This study aims to assess a simple and accessible workflow based on remote sensing data for large-scale, low-cost, and rapid regional MPM. By design, this approach is intended to be a first-pass, reconnaissance-level tool that is adequate for quickly evaluating large-scale areas with a minimum of data available, as many high-performing models rely on the integration of multiple, often expensive and heterogeneous, datasets, including geophysical, geochemical, and drilling data. This can be a significant barrier in the early stages of exploration or in regions where such data is not readily available.
3. Materials and Methods
The methodological framework of this study was designed to be, at most, an end-to-end workflow, from satellite data acquisition to the generation and validation of the final MPM. The software used includes Google Earth Engine (GEE; Google LLC, Mountain View, CA, USA) for image acquisition, sample collection, model generation by Random Forest (RF), and validation; ENVI 5.3 (L3Harris Geospatial, Broomfield, CO, USA) for Spectral Angle Mapper (SAM) analysis and preparation of the training dataset; and QGIS 3.4 (QGIS Development Team, Hannover, Germany) for map production, validation, and additional spatial analysis. The workflow is visually summarized in a flow diagram in the following image (
Figure 2).
3.1. Sensor Selection and Data Acquisition
The sole data product used was images from the Sentinel-2 multispectral sensor. The choice of this sensor was strategic for several reasons. Technically, the Sentinel-2 mission offers a distinct advantage for mapping iron-rich minerals. It includes several narrow bands in the visible and near-infrared (VNIR) (specifically bands 5, 6, 7, 8, and 8A), with the wavelength range of 0.7 to 0.9 µm, a crucial spectral region where ferrous minerals such as goethite, jarosite, and hematite exhibit diagnostic absorption features [
15,
16,
17]. In contrast, other multispectral sensors, such as Landsat, offer only one or two bands in this region, limiting spectral detail [
18,
19,
20]. From a practical standpoint, Sentinel-2 data is freely accessible, has a wide coverage, and a high revisit rate (approximately 10 days). These features give it a practical edge over other sensors, like ASTER, which, despite being well suited for geological mapping, does not have a high revisit rate, making it impractical for large-scale regions such as the state of Bahia.
The specific data product used was the Harmonized Sentinel-2 Level-2A Surface Reflectance (SR) collection, accessed via the GEE data catalog. These products include atmospheric correction with the Sen2Cor method [
21] and bi-directional reflectance distribution function (BRDF) normalization within the GEE cloud environment.
For the image selection, we focused on selecting images that maximize cloud-free coverage. There are three distinct biomes in the state of Bahia, each with its own rainfall dynamics. Therefore, it was necessary to expand the time window for selecting images.
The image collection was filtered to cover the entire study area and a time interval from 1 January 2020 to 31 December 2022. This multi-year window helped reduce areas that were obscured by cloud presence. A cloud-masking function was applied to the collection. This function utilized the Sentinel-2 QA60 quality assessment band to identify and remove pixels contaminated by opaque clouds or cirrus clouds, which would otherwise introduce significant noise into the spectral data.
The filtered and masked image collection was subjected to a median reducer to produce a single, cloud-free mosaic image. This statistical method determines the median value for each pixel across all available images in the time series. This effectively helped to mitigate the effects of seasonal variations in vegetation and soil moisture.
3.2. Sampling Design
The sampling strategy involved selecting two reference areas with proven mineralization, evidenced by the presence of active mining operations. The Pedra de Ferro Mine, located near the city of Caetité, in southwestern Bahia, and the Mocó Mine, located further towards the central part of the state, were chosen as reference areas (
Figure 3), representing high-grade and low-grade iron ore deposits, respectively. The samples were selected from within the area of the mine pit and from its surroundings, selected inside a buffer zone around the perimeter of each mine with 100 m, 200 m, 500 m and 10 km radius.
3.2.1. Ore Sampling Strata
The “ore” dataset represents the target class of iron mineralization. It corresponds to the coordinates where the ore samples, later used to build the reference library, were collected in the field in both mines.
3.2.2. Ancillary Classes Sampling Strata
These samples were selected to characterize the spectral background and the potential “halo” effect of mineralization on the surrounding environment. These included “soil” for non-vegetated areas, “vegetation” for vegetated areas and “mixed soil-vegetation” for shrubby and sparsely vegetated areas. A set of randomly selected points was created within each “Strata”, regions between the defined buffer limits. The number of points for each class in the respective strata were deliberately balanced to include validation data. The complete sampling scheme is detailed in
Table 1.
3.3. Spectral Libraries and Analysis
To establish the reference signatures for iron ore, samples collected in the field were measured in the laboratory using a FieldSpec® 4 Hi-Res spectroradiometer (Malvern Panalytical Ltd., Boulder, CO, USA). The measurement results were then converted to reflectance units compatible with those of the Sentinel-2 image to build the reference library. Meanwhile, the input data included the set of selected points exported to ENVI, where they were defined as regions of interest (ROIs), and two scenes from the Sentinel-2 image, cropped from the mosaic, containing the buffer zones of the two mines, defined as test areas.
For this study, ENVI Spectral Hourglass Workflow [
22] was used, which includes Minimum Noise Fraction Dimensionality Reduction (MNF) to reduce noise by defining which bands best contribute to the analysis and the Pixel Purity Index (PPI), which automatically selects the spectrally purest pixels (endmembers) corresponding to the classes defined in the initial collection.
Class-specific SAM angle thresholds were adopted to balance false positives in spectrally similar materials and false negatives under spectral mixing. Lower thresholds were applied to enforce stricter spectral conformity to the laboratory reference, whereas higher thresholds were used to accommodate greater intra-class variability. Accordingly, thresholds of 0.10 radians for hematite, 0.12 radians for itabirite, and 0.15 radians for soil and vegetation were established to minimize spurious matches with iron oxides under lateritic coatings, to reflect the mixed quartz–iron spectral character of itabirite, and to tolerate illumination/BRDF effects and VNIR–SWIR mixing in partially vegetated or thin-soil surfaces, respectively. These values are supported by typical operational SAM thresholds ranging between 0.08 and 0.15 radians, as reported in recent mineral mapping studies [
23,
24,
25]. The thresholds were empirically tuned using held-out strata samples. Angles were scanned in 0.02-rad increments, and the triplet that maximized macro-F1 while limiting ore-class false positives in lateritic and urban areas was selected (see
Section 3.5). By applying this class-specific scheme, misclassification of non-mineralized backgrounds was reduced while sensitivity to iron-bearing targets was preserved.
The training dataset was created by reclassifying the samples into seven new classes based on the mean similarity score values to related minerals and the level of iron content present (
Table 2). Classes with Fe content were further restricted to geological units consistent with Fe-hosting environments identified from state-level geological maps as being promising for iron mineralization. These units primarily included the Archean–Proterozoic greenstone belts and lateritic covers, as the Fe grade in these formations have higher percentages. The negative Fe content class consisted of samples from any previous class or strata that had a score lower than 70%, according to the SAM analysis.
3.4. Machine Learning Classification
For the classification of the entire state, the RF supervised machine learning algorithm was used [
26,
27,
28]. RF is an ensemble method that builds multiple decision trees during training to improve accuracy and control overfitting. The main hyperparameters of the model include number of trees, splitting criterion and number of features per split (
Table 3).
The number of trees hyperparameter (n_estimators) defines the number of decision trees in the forest. A larger number generally leads to a more stable and robust model, with a value chosen where the classification error stabilizes.
Splitting criterion (criterion) is the function for measuring the quality of a split. The Gini index was used, which measures the impurity of a node and seeks to maximize the purity of the child nodes in each split.
The number of features per split (max_features) is the number of features (spectral bands) to be considered when searching for the best split. This parameter introduces randomness, which helps reduce the correlation between trees and improves the model’s generalization ability.
The model was trained with the collected samples and then used to classify each pixel in the state of Bahia. The results were reclassified into potentiality levels (high, medium, low) based on their spectral similarity to the reference ore in the spectral library.
3.5. Validation
The resulting MPM was assessed both qualitatively and quantitatively. Qualitative validation involved a visual comparison with mining concession and research request data from the National Mining Agency (ANM), and with known geology occurrences from the Geological Survey of Brazil (CPRM-SGB).
For the model’s performance, an accuracy test comprising 30% (175 points) of the strata samples was used to generate a confusion matrix. The metrics employed included the overall accuracy, F1-score per-class basis, and Cohen’s Kappa Coefficient (κ).
Robustness was verified by varying the SAM thresholds by ±0.02 radians. Performance trends remained stable, and the selected values provided the best precision–recall balance for the ore class.
4. Results
The initial spectral analysis focused on evaluating how the spectral signatures of the sampled classes vary in each stratum. The spectral curves (
Figure 4) revealed that pixels sampled of soil, vegetation, and soil/vegetation exhibit distinct absorption features in the wavelength intervals between 0.6 and 0.9 μm. This indicates the presence of iron oxides in these targets, even when covered by soil or vegetation.
The lines show the similarity between the spectral curve of soil samples in the mine stratum and the ore signature. As distance increases, a change in the reflection pattern in the SWIR band is observed, with increasing reflectance values in the strata furthest from the mine areas.
The signatures of soil and soil/vegetation collected near the mines displayed similar spectral behaviors to the ore spectrum, indicating a “halo” effect where iron-rich material is present in the surrounding surface cover. As the distance from the mines increased, the spectral similarity to the ore signature decreased, and the influence of vegetation, characterized by high reflectance in the NIR, and mineralogically distinct soils became more dominant.
Further spectral analysis was conducted to assess the sample classes spectral signatures to understand their separability.
Figure 5 displays the mean spectral reflectance signatures of the pixels sampled of ore, soil, vegetation, and mixed soil–vegetation. The soil signatures are characterized by a generally increasing reflectance from the visible to the SWIR region, a typical feature of mineral soils. The curves exhibit a significant absorption feature around 2.20 µm, which is commonly associated with the presence of hydroxyl groups in clay minerals. Additionally, it features a peak reflectance in the NIR region (~0.85 µm), which is highly diagnostic for the presence of iron oxides.
The vegetation signatures that are closest to the mines exhibit a sharp increase in reflectance between 0.70 µm and 0.80 µm in the Red Edge region, indicating presence of iron oxides. The signatures of mixed soil–vegetation display a less pronounced curve in the Red Edge region and a lower reflectance value in the NIR region compared to the pure vegetation class, while also showing higher reflectance in the visible spectrum than pure vegetation. This demonstrates the linear or non-linear mixing of spectral signals from multiple components within a given area.
Boxplot graphs were also generated to assess the dispersion behavior of the data collected in each band of the Sentinel-2 image (
Figure 6).
The graph shows the behavior of the data in each region. In strata closest to the mines, the data referring to ore shows a great variability in the Red to NIR regions. The soil data shows greater variability in the Red and SWIR regions. The mixed soil–vegetation class shows greater variability in the SWIR region, and the vegetation data in the Red-edge and NIR region.
The routine for reducing data noise using the MNF function calculated the eigenvalues of the Sentinel-2 image, ordering its information and noise content. The result of this operation is expressed in the form of a table, where the calculated eigenvalues are classified in descending order, with the values closest to zero representing noise. By applying the components corresponding to the information content generated in the image, it is possible to obtain a color composition where the tonal variations of the different types of elements can be identified according to the themes of the samples. The SAM technique allowed the mapping of the similarity between the spectrum of an image pixel and the reference spectrum from the spectral libraries. This resulted in indices (endmembers) corresponding to the reference materials that spectrally predominate in the pixel showing the order of similarity with the sample type from the reference library for each stratum class.
Figure 7 summarizes the results according to class, percentage of similarity, and collection stratum.
The results of processing using RF model show the mapping of areas with potential for iron deposits well distributed throughout the state (
Figure 8).
The analysis located areas of medium potential between the cities of Caetité and Licínio de Almeida, belonging to the Southwest District of Bahia. These bodies are aligned in a north–northeast to south–southwest trend, approximately 150 km long. Also in this region, it is possible to note two branches of potentially low areas, heading north–northwest. South of Brumado, clusters of areas can be observed, with a predominance of low potential, which, as they extend north–northwest, passing through Piatã and ending in Boninal, increase their iron potential.
In the Southeast District of Bahia, a north–south strip with average potentiality is observed. It extends from north of Jequié and diverges near Iguaí, northeast–southwest towards Vitoria da Conquista and north–northwest to south–southeast to Itapebi. It is possible to note, in this region, the occurrence of two more trends parallel to that of Jequié-Iguaí.
Between the cities of Nazaré, Coração de Maria, and Mundo Novo, lenticular-shaped areas extend from north–northwest to south–southeast. These zones, belonging to the Recôncavo District, were mapped with levels ranging from low to medium potential and branch out towards the cities of Queimadas and Jaguarari, further north in the state.
In the north of the state, the analysis mapped the strip corresponding to the Northern Iron District, located near the cities of Remanso and Sento Sé. This area extends northeast–southwest to the region near Santa Rita de Cássia, with a gradual increase in the level of mineral potential.
In
Figure 9, the MPM was overlaid with the district boundaries suggested by Ribeiro [
14], ANM mining concession areas, as well as points referring to mineral occurrence locations. In general, the processing was able to map areas within and beyond the known districts, as observed on the map between the cities of Caetité and Jequié, an intermediate portion between the Southwest and Southeast districts. Other regions outside the districts include a discontinuous zone north of the Recôncavo District, near Macururé, and a strip extending from Sobradinho to Umburanas.
Visually, the MPM showed high consistency with ground reality. Operating iron mine areas, including non-sampled mines, like Jacuípe mine (Coração de Maria), and Tombador mine (Sento sé) were correctly classified with the highest potential, as their respective mineral deposits are associated with iron formations.
In addition, traces of iron were found around mines that produced other types of ore, such as the Caraíba mine (copper), Ipueira mine (chromium), and Vanádio de Maracás mine (vanadium) (
Figure 10). These are probably host rocks that are mostly rich in iron. Although they are not considered iron deposits, they are commonly produced as a secondary substance in these mines.
Figure 11 shows, in detail, the location of iron occurrences in relation to the prospectivity results. Although certain portions of the mining concessions areas do not yet have an open pit, these are sites where trenches and trails have already been opened as a result of extensive prospecting. On the map, the locations of known Fe occurrences are shown by medium–high potential values in the north district and the iron-bearing district in southwest Bahia. Although no Fe mineralization has been found in these locations as of yet, potential areas of interest have also been discovered.
A heat map was created to represent the trend in demand for research requested areas (
Figure 12). From 1992 to 2025, 3491 areas were requested for iron research with the ANM. Of these, 1853 (53.07%) are located in regions with moderate to high prospectivity potential for iron ore. It is evident that there is a pattern related to the access to information over the years, since most research requests are made by small prospectors or local mining companies. In the first period (1992–2009), applications were more scattered throughout the state, while from 2010 onwards, there has been an increase in applications near areas where large mining corporations operate. It is also noteworthy that from 2020 to 2025, there is a trend toward applications in the central and southeastern parts of the state.
The accuracy assessment of performance metrics was calculated and is summarized in
Table 4, and
Figure 13 displayed the confusion matrix. The model achieved an overall accuracy of 69.8%. Cohen’s Kappa coefficient was calculated to be 0.623, indicating a fair level of agreement between the model’s predictions and the ground truth. The macro-averaged F1-score, which provides a balanced measure of precision and recall across all classes, was 0.697. An analysis of the per-class metrics shows that the model performed best at identifying the “Soil” class (F1-score = 0.750) and struggled most with the mixed soil–vegetation class (F1-score = 0.638), likely due to the high degree of spectral mixing in this category. The F1-score for the “Ore” class was 0.686.
5. Discussion
The integrated SAM–RF framework has shown consistency between the modeled potentiality patterns and the geological architecture of Bahia. The high- and medium-potential zones identified by the model align spatially with units known for hosting iron ore, particularly within the metavolcano–sedimentary sequences of the Licínio de Almeida Complex and the greenstone belts of the Santaluz and Rio Itapicuru complexes. These findings suggest that the model effectively captures the lithological and alteration patterns inherent to Banded Iron Formations (BIFs) and related structures. Additionally, the model has demonstrated the ability to differentiate varying levels of iron enrichment, from exposed ore to subtle anomalies beneath vegetation or lateritic covers, thereby confirming its capacity to detect both direct and indirect spectral indicators of iron mineralization.
The validation metrics, including an overall accuracy of 69.8%, a macro-averaged F1-score of 0.697, and Cohen’s Kappa of 0.623, offer quantitative evidence of the model’s reliability. These results are comparable to, or slightly exceed, those found in similar studies utilizing RF model for MPM in data-scarce environments, such as the findings of Lachaud et al. [
5] (accuracy 61%–70%) and Kong [
28] (accuracy 66%). This underscores that the combination of SAM with RF produces competitive outcomes relative to global benchmarks, even when faced with limited training data. The similarity weights derived from SAM effectively reduced class imbalances and addressed the RF’s tendency to favor the dominant non-mineralized background, a common issue in mineral exploration. As a result, the model achieved a more favorable balance between sensitivity and specificity compared to the single classifiers typically employed in similar studies, such as those by Saremi et al. [
29], Gong et al. [
30], Hronsky and Groves [
31], and Rajesh [
32].
A critical aspect of the model’s performance relates to false positives, which were primarily observed in regions dominated by lateritic soils, ferruginous crusts, and certain urbanized areas. These features exhibit strong reflectance in the 0.7–0.9 µm range—similar to the diagnostic absorption region of hematite and goethite—resulting in confusion between iron-bearing lithologies and surficial iron oxides. Although these occurrences slightly reduced the model’s precision, they highlight the spectral complexity of tropical regolith environments, where chemical weathering and surface coatings alter the reflectance properties of rocks. This limitation does not invalidate the model’s predictive capacity but rather identifies zones of spectral ambiguity that warrant further geophysical or geochemical validation. Similar challenges have been reported in iron MPM studies in Australia and Africa, where regolith and lateritic cover severely complicate spectral interpretation [
14,
16,
22,
23,
24].
To address this issue, the proposed SAM–RF integration introduces a semi-physically constrained data weighting scheme, which improves class separability compared to conventional data-driven RF or SVM models. The SAM component enhances the physical interpretability of spectral relationships by identifying endmember minerals that typify iron oxide alteration zones [
8,
9,
10,
11]. At the same time, RF provides nonlinear decision boundaries that effectively generalize to unseen spatial contexts. This dual mechanism represents an incremental methodological advance in MPM because it merges spectral physics-based classification with statistical learning, reducing overfitting while improving robustness in spectrally complex terrains. The superior performance compared to SVM-based approaches—commonly reported to exhibit instability under small or imbalanced datasets [
5,
19,
27]—supports findings from Yu and Li [
18] and Riquelme [
26], who emphasized the enhanced stability and interpretability of ensemble models for geological applications.
From a comparative perspective, the results achieved in Bahia are consistent with those reported in well-studied provinces such as the Yilgarn Craton (Western Australia), where Duuring et al. [
33] used ASTER data to map BIFs with approximately 70% accuracy. However, unlike those semi-arid regions, Bahia presents a dense vegetation cover and extensive weathering crust, both of which degrade the spectral signal. Achieving comparable accuracy under such conditions underscores the robustness of the proposed approach for tropical environments. Similarly, research in Algeria, Mauritania, and Morocco [
14,
16] demonstrated the potential of multispectral data for regional-scale iron exploration but also highlighted the difficulty in distinguishing ore from lateritic soils. The present model mitigated this issue through the SAM weighting strategy, which prioritized spectral endmembers closely aligned with laboratory-calibrated ore signatures. Comparable to Iranian studies [
17,
19,
34], this work reinforces the need for future integration of geophysical datasets (magnetometry and gravimetry) to reduce misclassifications and confirm subsurface mineral continuity.
In general terms, three main challenges emerged: (i) managing spectral confusion in areas of strong lateritization, (ii) ensuring representativeness of training samples under data scarcity, and (iii) validating results in the absence of dense ground-truth datasets. The use of Sentinel-2’s narrow VNIR bands (0.7–0.9 µm) and multi-strata sampling strategy—covering ore, soil, soil–vegetation mixtures, and vegetation—proved critical to minimize these effects. The MNF transformation further enhanced feature selection by reducing noise and emphasizing spectral components relevant to iron mineralization. Future model refinements should focus on fusing multisource datasets, such as magnetometric and gravimetric layers, to improve the discrimination between lithological and pedological iron signatures. Additionally, exploring Explainable AI (XAI) tools can elucidate the model’s decision process and link it more explicitly to geological reasoning.
This study contributes to the ongoing evolution of MPM. The field is shifting from conventional statistical models to hybrid and deep learning architectures capable of capturing nonlinear, multiscale relationships between geology and mineralization [
27,
28,
29,
30,
35,
36]. However, geological processes are inherently non-stationary, and the relationships among ore-controlling factors vary significantly across regions [
36]. In this sense, the SAM–RF hybrid approach offers a pragmatic and scalable framework for regional-scale reconnaissance, capable of providing meaningful predictions even when data are sparse or heterogeneous. Such adaptability positions the method as a potential foundation for developing next-generation geospatial models that integrate physical spectral knowledge with data-driven generalization.
The SAM–RF integrated workflow provided a validated, cost-effective, and interpretable approach for large-scale iron MPM in Bahia. Its accuracy metrics are on par with or surpass those of comparable studies, while its methodological structure introduces an innovative fusion of physics-based and statistical learning techniques. The main limitations relate to the spectral confusion caused by lateritic covers and the absence of geophysical validation layers; however, these also define clear directions for future research. Expanding the approach through data fusion, transfer learning, and deep neural models could further enhance predictive precision and contribute to a more comprehensive understanding of mineralization processes in tropical metallogenic provinces.
6. Conclusions
The work demonstrated the potential of using remote sensing and machine learning techniques, processed in the cloud, to generate large-scale mineral potentiality maps. The combination of SAM and RF classifiers was effective in characterizing different targets based on their spectral signatures, resulting in a map that proved to be consistent with known deposits and that points out new promising areas for exploration.
The Bahia mineral exploration study is a real-world illustration of an international scientific undertaking. The growing availability of multi-source geodata and the strength of artificial intelligence are driving the rapid evolution of the MPM field. A global drive to develop more precise, reliable, and interpretable prediction tools is reflected in the achievements and difficulties observed in Bahia.
The future of MPM depends on tackling its core issues comprehensively rather than merely implementing increasingly sophisticated algorithms. This includes using intelligent sampling and augmentation to manage sparse and unbalanced data, using explainable AI to allow the model’s reasoning to be compared with established geological knowledge and developing models that take into account the spatial heterogeneity of geological systems.
As future prospects and next steps, the integration of geophysical data, specifically magnetometry and gravimetry, can enhance the model by offering essential physical constraints. This approach distinguishes targets that are only spectrally similar, such as laterite cover, from those that exhibit both spectral and magnetic anomalies, indicating potential iron ore. Therefore, it provides insights into subsurface structures that optical sensors do not capture. Similarly, the use of hyperspectral data, when available, would allow for more detailed mineralogical discrimination. It would also be useful to explore deep learning algorithms, such as Convolutional Neural Networks (CNNs), which utilize stacked kernels, such as 3 × 3 convolutions, for hierarchical feature extraction, enabling them to model the spatially dependent, multiscale lithology-structure associations inherent to mineral systems that pixel-based classifiers, such as RF, are unable to capture. It may enhance spatial pattern recognition and could further refine the results.
It is recommended that the products generated by this work serve as guidelines to direct future mineral research campaigns, which should include detailed geological field studies, such as drilling and rock sampling, to confirm the new areas suggested by the model. The evaluation method presented here, being relatively fast and low-cost, is a critical tool for mineral modeling and exploration in a state with the vast geological potential of Bahia.