Next Article in Journal
Assessing Coincidence of Satellite Acquisitions and Flood Events to Predict Suitability for Flood Map Synthesis
Previous Article in Journal
IceBench: A Benchmark for Deep-Learning-Based Sea-Ice Type Classification
Previous Article in Special Issue
Unveiling the Effects of Crop Rotation on Cropland Soil pH Mapping: A Remote Sensing-Based Soil Sample Grouping Strategy
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Influences of Sampling Design and Model Selection on Predictions of Chemical Compounds in Petroferric Formations in the Brazilian Amazon

by
Niriele Bruno Rodrigues
1,
Theresa Rocco Barbosa
1,
Helena Saraiva Koenow Pinheiro
1,
Marcelo Mancini
2,
Quentin D. Read
3,
Joshua Blackstock
4,
Edwin H. Winzeler
5,
David Miller
6,
Phillip R. Owens
4 and
Zamir Libohova
4,*
1
Soil Department, Federal Rural University of Rio de Janeiro, Seropédica, Rio de Janeiro 23897-970, Brazil
2
Department of Soil Science, Federal University of Lavras, Lavras 37200-900, Brazil
3
United States Department of Agriculture, Agriculture Research Service, Southeast Area, Stoneville, MS 38776, USA
4
United States Department of Agriculture, Agriculture Research Service, Dale Bumpers Small Farms Research Center, Booneville, AR 72927, USA
5
Department of Mathematics, The University of Texas at Arlington, Arlington, TX 76019, USA
6
Department of Crop, Soil, and Environmental Sciences, University of Arkansas, Fayetteville, AR 72701, USA
*
Author to whom correspondence should be addressed.
Remote Sens. 2025, 17(9), 1644; https://doi.org/10.3390/rs17091644
Submission received: 1 February 2025 / Revised: 23 April 2025 / Accepted: 27 April 2025 / Published: 6 May 2025
(This article belongs to the Special Issue GIS and Remote Sensing in Soil Mapping and Modeling (Second Edition))

Abstract

:
Morro de Seis Lagos, a region in the Brazilian Amazon, contains a small (less than 1%) formation of siderite carbonatites which is considered to be one of the world’s largest niobium reserves. This highly weathered geological and pedological occurrence makes the site ideal for studying the pedogenetic process of lateralization and the spatial variability of chemical elements. The aim of this study was to investigate the influences of various sampling combinations (scenarios) derived from three sampling designs on the spatial predictions associated with chemical compounds (Al2O3, Fe2O3, MnO, Nb2O5, TiO2, and SiO2), using multiple machine learning algorithms combined with remotely sensed imagery. The dataset comprised 341 samples from the Geological Survey of Brazil (CPRM). Covariates included remotely sensed data collected from Sentinel-2 MSI, Sentinel-1A, and the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER), and topographic attributes were calculated from a 20 m digital elevation model derived from hydrologic data (HC-DEM). The machine learning algorithms (Generalized Linear Models with Elastic Net Regularization (GLMNET), Nearest Neighbors (KNN), Neural Network (NNET), Random Forest (RF) and Support Vector Machine (SVMRadial) were used in combination with covariates and measured elements at point locations to spatially map the concentrations of these chemical elements. The optimal covariates for modeling were selected using Recursive Feature Elimination (RFE), processing 10 runs for each chemical element. The RF, SVMRadial, and KNN models performed best, followed by the models from the Neural Network group (NNET). The sampling scenarios were not significantly different, based on root mean square error (F = 1.7; p-value = 0.15) and mean absolute error (F = 0.4; p-value = 0.79); however, significant differences were observed in the coefficient of determination (F = 41.2; p-value < 0.00) across all models. Overall, the models performed poorly for all elements, with R2 ranging from 0.07 to 0.27, regardless of sampling scenario (F = 1.6; p-value = 0.08). Relatively, RF, GLMET, and KNN performed better, compared to other models. The terrain attributes were significantly more successful as to the spatial predictions of the elements contained in laterites than were the remote sensing spectral indices, likely due to the fact that the underlying spatial structures of the two formations (laterite and talus) occur at different elevations.

1. Introduction

Soil is the product of the various forming factors [1,2] that influence pedogenetic processes through weathering (chemical, physical, and biological), and translocation of weathered materials over and through the soil [3,4]. Pedogenetic processes, as an expression of the past and current soil environment, are reflected by currently observed features and measured soil properties [5]. The pedogenic processes that leave evidence in the form of morphological, physical, and chemical properties are used to differentiate various soil types. Further, the interactions of pedogenic processes with lithological conditions define the regions and pedo-geomorphological units that influence drainage patterns and hydrological functions [6].
Morro dos Seis Lagos (MSL) is a rare formation of siderite carbonatite with laterization/paleo-pedogenesis weathering processes, and is the only known niobium (Nb) deposit in which rutile is the primary ore mineral [7]. The formation is unique from the perspective of the pedological, geomorphological, and geological characteristics associated with the tropical climate, one which leads to intense weathering, thus reshaping the landscape and controlling the mechanisms of the soil formation processes in the tropical environment. In addition, the mineralogy formations contain rare earth elemental (REE) concentrations of economic interest, namely, niobium (Nb2O5), titanium (TiO2), and tungsten (W) [7]. Niobium is known for properties such as resistance to chemical [7] corrosion and high temperatures, with many applications in the construction of materials used for ocean exploitation, space exploration equipment, and pharmaceutical and chemical equipment [8,9]. While niobium has a limited biological role in the human body, niobates and niobium chloride, which are soluble in water, are toxic and can induce damage to human DNA and cause immune cell mortality [10].
Mineralization consists of the primary, supergene, clastic, and authigenic types [11]. Due to intense weathering, minerals can become concentrated in shallow surfaces, which makes them attractive for mineral extraction [7,11,12]. The process of rare earth element (REE) laterite mineralization in the MSL is dominated by cerianite (Ce), associated with manganiferous laterite. The accumulation of REE in the sedimentary environment occurs in a karst basin, in which sedimentary evolution, as well as REE mineralization, provide new constraints on lateritization, relief, and water table evolution. In this sense, intense lateritization was responsible for the original characteristics of the REE mineralization associated with laterite and sediments.
Morro dos Seis Lagos is made up of laterites from the weathering of primary brookite, which is rich in Nb. By weight, the central siderite carbonatite contains up to 30.82% Nb2O5, making it richer in Nb than the Nb-rich rutile and the Nb-rich brookite (which, by weight, contains up to 16.03% Nb2O5) [7]. The substitution 3Ti4+ = Fe2+ + 2Nb5+, explains the greater enrichment in Nb in primary brookite and characterizes a reducing environment, in contrast to the substitution 2Ti4+ = Fe3+ + Nb5+, which occurs in the Nb-rich brookite of laterites formed by the weathering of Nb-rutile during lateritization processes [7]. Although not common, brookite does occur in some carbonatites; with increasing depth, the volume of these minerals increases from 0.12% by weight in the pisolitic type to 3.5% by weight in the lateritic type, classified as “brown laterite”, according to Giovaninni et al. [13].
However, in many instances, these rich formations occur in isolated areas to which there is limited access. Remote sensing, combined with field data and machine learning, has been used successfully for high-quality predictions associated with digital soil mapping applications, especially in hard-to-reach areas [14,15,16]. According to Wadoux [17], map accuracy depends in part on the number and location of sampling sites used for the calibration and validation of machine learning models. The determination of the effect of sampling designs on property prediction in digital soil mapping is still in its infancy [17]. Designing proper sampling schemes is challenging and depends on the study’s objectives, whether these be soil fertility determinations, soil surveys, or other mapping efforts. Some of the major sampling designs in soil science include systematic grids for detailed soil fertility mapping; transects and/or purposive sampling for traditional soil survey campaigns; random stratified and/or completely randomized sampling for ecological studies; and spatial coverage, like the Conditioned Latin Hypercube (cLHC) [18,19] for DSM applications, among many other designs.
The sampling approach utilized in the Morro dos Seis Lagos is defined by a combination of projects that includes Projeto Seis Lagos [20]; Projeto Uaupés [21]; and Projeto Terras Raras [22]; these are summarized in the Projeto Avaliação do Potencial de Terras Raras no Brasil—Área Morro dos Seis Lagos, Noroeste do Amazonas [22]. Each project has implemented unique sampling designs and density, thus offering a possibility to test the performance of these sampling designs and models. Morro dos Seis Lagos represents a unique situation for assessing different sampling designs relative to the spatial accuracy of predictions of minerals in the siderite carbonatite formation using the DSM approach. Under the DSM approach [23] we focus on some of the most commonly used machine learning techniques, or models such as Generalized Linear Models with Elastic Net Regularization (GLMNET) [24]; Neural Network (NNET) [25]; Nearest Neighbors (KNN) [26]; and Random Forest (RF) [27]. Briefly, GLMNET [24] fits a generalized linear model via likelihood, combining generalized linear models with elastic net models. The feed-forward Neural Network (NNET) is a neural network that uses a multilayer feed-forward perceptron architecture with a sigmoidal transfer function and logarithmic learning for error backpropagation and resilient propagation variation (RPROP). In this algorithm, the connection weights are adjusted by feedback until convergence is reached; this determines the backpropagation error. K-Nearest Neighbor (KNN) is a non-parametric regression technique based on the theory of weighting observations based on the values of their nearest neighbors. The algorithm consists of determining a classification or average value from among the k known objects (k nearest neighbors) which are closest to the unknown samples [28]. Random Forest (RF) is a technique based on trees generated from a random vector sampled independently, and which obeys the same distribution as the other trees [27]. The best tree resulting from the average of the results of all the trees is selected. The model is considered robust, as it is not very sensitive to noise in the data and overfitting; it provides measures of the importance of covariates in the prediction [29], and can handle continuous and categorical data. The Support Vector Machines Radial Sigma (SVM-R) algorithm consists of non-linear regressions, which are usually obtained by applying the kernel trick [30]. The SVM-R model operates as a binary classifier, separating two categories of samples in order to minimize empirical risk while maximizing confidence; its hyperparameters include the “kernel” and “C” (penalty) [31]. According to the author of the study cited above, SVM’s distinctive feature is its ability to define the complex decision functions that optimally separate two classes of data samples.
The objectives of this study were to evaluate the performance of five different machine learning models, given the influences of five sampling scenarios derived from three sampling designs, and the interactions of these factors relative to the accuracy of the predictions as to the main six chemical compounds/minerals (Al2O3, Fe2O3, MnO, Nb2O5, TiO2, and SiO2) in the Morro dos Seis Lagos petroferric/lateritic formation. The following alternative hypotheses will be tested: (i) The predictions of different chemical elements are influenced by the sampling design. (ii) The performance of all the models will depend on the sampling design. (iii) The model performance will differ for different chemical elements. (iv) The predictions of different chemical elements are influenced by the sampling design and the model (interaction).

2. Materials and Methods

The methodological procedures are divided into three stages (Figure 1). Stage one includes the compilation of model inputs (geochemical data and covariates from topography and remote sensing). Stage two includes the selection of covariates through Spearman correlation and recursive feature elimination, and modeling based on different sampling scenarios derived from the initial sampling designs. Stage three includes evaluation of model performance and spatial predictions of the selected chemical elements.

2.1. Study Area

The Morro dos Seis Lagos region (Figure 2) is located in the northwestern part of the Amazon basin, (43°12′34.56″W, 22°58′12.34″S), with a total area of 35.96 km2, and is relatively close to (<150 km) Venezuela and Colombia. The area is within three conservation units (UCs) [32] which overlap: the Balaio Indigenous Land, the Pico da Neblina National Park, and the Morro dos Seis Lagos Biological Reserve.
The soil geochemical data, sediments, and rock materials used in this study were made available by the Geological Survey of Brazil (CPRM) through the GeoSBG platform, as summarized in the Rare Earth Elements Potential Assessment in Brazil—Morro dos Seis Lagos Area, Northwest Amazonas [22]. The surface sampling (0–40 cm) grid consists of 341 points where geochemical analysis was carried out by the SGS Geosol Laboratory (Belo Horizonte, Brazil) for 56 elements. The samples were analyzed based on the following methods [22]: SGS XRF79C was used for identification of the minerals, comparing the diffractogram of the ICSS-PDF database (International Center for Diffraction Data—Powder Diffraction file) to determine the principal oxides; Inductively Coupled Plasma Mass Spectrometry (ICP-MS) was utilized with Inductively Coupled Plasma Mass Spectrometry (ICP_OES) to determine 4 rare earth elements (REE), plus Nb, Sn, W, Y, and 11 other elements; and ISE03A and B were utilized for the elements F and Cl. The summary statistics associated with the rare earth elements were carried out using R software (v. 4.1.3) [33] and Rstudio (v. 2022.02.3 Build 492) [34] (Table 1).

2.2. Environmental Covariates

The terrain attributes were derived from a Hydrologically Consistent Digital Elevation Model (HC-DEM) with a spatial resolution of 20 m, obtained using an interpolator (Topo to Raster) in ArcGIS Desktop v. 10.6 [35] (Table S1). The DEM was based on primary vector data of contour lines with an equidistance of 10 m, with elevation points and hydrography extracted from the cartographic base of the Brazilian Institute of Geography and Statistics (IBGE)—scale 1:25,000. The following terrain attributes derived from the HC-DEM were obtained using the program System for Automated Geoscientific Analyses- SAGA-GIS v.2.1.2 [36,37]: Aspect, Real Surface Area, Convergence Index, Curvature Flow Line, Curvature Features (General, Maximum, Minimum, Planar, Prolife, Tangential, and Total), Elevation, Mid Slope Position, Multiresolution Index of Valley Bottom Flatness (MRVBF), Multiresolution Index of The Ridge Top Flatness (MRRTF), SAGA Wetness Index, Vector Terrain Ruggedness (VRM), Standardized Height, Slope Height, Terrain Surface Convexity, Terrain Surface Texture, Valley Depth, Closed Depressions, Hill, Hill Index, Slope Index, Surface Specific Points, Valley, and Valley Index.
The covariates from remote sensing data were obtained using data from the Multispectral Sensor Instrument (MSI) of the Sentinel-1, 2A satellite (Airbus Defense and Space, Taufkirchen, Germany) and the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) sensor of the Terra satellite (National Aeronautics and Space administration, Washington, D.C, USA), at a spatial resolution of 20 m. Several indices were generated by the combination of spectral bands from Sentinel 2A to identify the following: Ferrous silicates, Altered zones (Band 8/Band 11) [34], Hydrothermally altered rocks, Laterite, Clay minerals, Iron oxide, Normalized difference vegetation index (NDVI), and Ferrous regolith. For the Sentinel 1A (SAR), the indices were Vertical transmit–vertical receive (VV) and Vertical transmit–horizontal receive (VH), while based on information from the ASTER sensor, Detection of advanced clay alteration (Band 4/Band 6) [38] was performed, and Ferric iron, Fe3+, Ferrous iron, Fe2+, and Gossan were identified.

2.3. Selection of Environmental Covariates

The procedure used to select environmental covariates for the modeling involved three steps: (1) removing covariates with zero or near-zero variance; (2) removing highly correlated covariates with a r threshold of 0.95; and (3) implementing the Recursive Feature Elimination algorithm [39] to select potential predictor covariates.
The process of removing variables with zero or near-zero variance, also known as NearZerovar, was considered in the pre-processing stage, seeking to identify those with a variance below a pre-defined threshold. In this first stage, the following covariates were removed: Closed Depressions, Hill, Hill Index, Slope Index, Surface Specific Points, and Valley and Valley Index; these were considered uninformative for modeling.
Subsequently, a new analysis was carried out to exclude correlated covariates, as the removal of highly correlated covariates favors better accuracy and interpretability of the model, while avoiding overfitting. A threshold criterion of r = 0.95 was established to identify covariates that were significantly correlated with one or more other covariates. As a result, the following covariates were eliminated: Band 7, Band 8, Band 8a, Band 11, Band 5, Mass Balance Index, Ferrous Silica, Terrain Ruggedness Index, Clay Minerals, Curvature Cross-Sectional, Curvature Longitudinal, and Gossan (ASTER).
The third stage in selecting the predictor covariates consisted of using the Recursive Feature Elimination (RFE) algorithm [39]. The method performs a recursive elimination by assessing the interactions between the covariates and the predicted variables. The most important covariates were selected, while those with collinearity (multicollinearity) were removed, as this can impair model performance [40] and lead to overfitting.
The RFE (Recursive Feature Elimination) algorithm was implemented in each model using the caret package [41]. In this way, the RFE subset hyperparameter was adjusted to generate subsets with different numbers of covariates, based on the covariates remaining from the previous selection stages. Five specific sets were tested, including the covariates (2:25, 30, 35, 40, 45). The subsets were then selected using five-fold cross-validation, using the R-squared as an accuracy metric when evaluating each subset.

2.4. Modelling

Predictive Models differ due to the different management of the training and validation datasets, and the selection of different covariates based on their importance. The following models were tested for the selected chemical compounds (Al2O3, Fe2O3, MnO, Nb2O5, TiO2 and SiO2): Generalized Linear Models with Elastic Net Regularization (GLMNET); Neural Network (NNET); Nearest Neighbors (KNN); Random Forest (RF); and Support Vector Machine (SVMRadial).
The GLMNET algorithm was implemented via the glmnet package [24]. The hyperparameters α and λ were optimized via the caret package [41]. The feed-forward Neural Network (NNET) was implemented via the nnet package [25], with the hyperparameters size and decay, which were optimized via the caret package [41]. The model was carried out in two steps in different layers of the network: “one step forward”, or propagation, and “one step back”, or backpropagation [42]. The K-Nearest Neighbor (KNN) was implemented via caret package [41]; the hyperparameter distance was kept constant, while max was optimized. Random Forest (RF) was implemented via the caret with randomforest package [43]; three hyperparameters: (i) ntree, (ii) nodesize, and (iii) mtry were adjusted to optimize the performance. The Support Vector Machines Radial Sigma (SVM-R) algorithm was implemented via the caret with kernlab package [44], with two hyperparameters: sigma, which was kept fixed, and C, which was adjusted.

2.5. Sampling Scenarios

The surficial samples were collected based on three transect designs. The two designs consisted of points of collection running across the site in diagonal and perpendicular fashion (D/P) and around the site in box fashion (B). The third design consisted of transects with high point density (D) collected in a small area. To evaluate the influence of sampling design, several sampling scenarios, consisting of combinations of D/P, B, and D, were tested; one design served as a training set while the other was used as a validation set. We tested five scenarios (Figure 3); thus, the data were separated according to scenarios I, II, II, IV, and V into training and validation. For Scenario V, the split of samples between training and validation was random, in contrast to the other scenarios. The LOOCV method was applied to the training set in order to validate the model. The analysis of variance (ANOVA) was used to assess the overall effect of the of scenarios, models, elements, and interactions between of sampling scenarios and models using the Fisher test followed by comparison of means using the Tukey test at p = 0.05, using the R software program [33].

2.6. Model Performance Evaluation

The models’ performance was evaluated using the following metrics: Root Mean Squared Error (RMSE) (Equation (1)), Mean Absolute Error (MAE) (Equation (2)), Coefficient of Determination (R2) (Equation (3)), Concordance Correlation Coefficient ( ρ c ) (Equation (4)), and Bias (Equation (5)).
R M S E = ( i = 1 n ( o b s i p r e d i ) 2 n )
M A E = ǀ i = 1 n o b s i p r e d i n ǀ
R 2 = 1 i = 1 n ( o b s i m e a n   p r e d i ) 2 i = 1 n ( o b s i m e a n   o b s i ) 2
ρ c = 2 ρ σ o b s ρ σ p r e d σ p r e d 2 + σ o b s 2 + ( μ p r e d + μ o b s ) 2
B i a s = 1 n i = 0 n ( p r e d i o b s i )
where “obs” and “pred” are observed and predicted values for each element.
Each model was run 10 times (repetitions) for each chemical element, making it possible to calculate the mean prediction maps and the range (predicted minimums and maximums) for each element. This approach follows the procedure adopted by [45,46], which consists of repeating the processing of the datasets to calculate the variability of the performance parameters.

3. Results

3.1. Factors Influencing Performance

In general, there were significant differences between models and elements, with models such as Random Forest and KNN being more sensitive to heterogeneous datasets, dataset size, and spatial dependence (Table 2). As for the sampling scenarios, there was no statistical evidence of a difference between the means for the RMSE (p-value = 0.15) and MAE (p-value = 0.79). On the other hand, the means for the models and elements showed statistical differences (Table 3).
For the training dataset, GLMNET explained statistically more variability (R2 = 0.93) than the other models (Table 3). For the validation dataset, the KNN model had a higher R2 (0.11) than the other models (Table 3). NNET had the highest and most significant prediction errors for both training (RMSE = 17.05; MAE = 13.47) and validation (RMSE = 18.52; MAE = 13.58), compared to the other models. The prediction errors for the elements decreased from validation to training for the RMSE (9.39 to 12.39) and the MAE (6.21 to 7.53) (Table 3).
The average error predictions (MAE and RMSE) for Fe2O3 were significantly higher, compared to the other elements, for both training and validation, except with respect to R2 for training (Table 3). As with the models, there was a subtle increase in the mean validation metrics (mean MAE = 7.87; RMSE = 12.73) compared to the mean training (MAE= 6.21; RMSE = 9.72) as to the elements (Table 3). As to all model performance parameters, the interactions between sampling scenarios and models were not significant (Table 2).

3.2. Accuracy Prediction Comparisons Among Scenarios

Overall, there were differences in the model performance parameters among the scenarios for all elements (Table 4), although not for all performance parameters (Table 2). The average coefficient of determination (R2) varied significantly, as did the MAE. All models slightly overpredicted the concentrations of the elements, regardless of sampling scenarios.
The performance parameters decreased substantially between training and validation for all elements and scenarios. However, the decrease was less for Scenario IV. Scenario IV performed consistently better across all elements, with R2 values varying from 0.13 (MnO) to 0.44 (Fe2O3), followed by Scenario III for Fe2O3 (R2 = 0.02) and MnO (R2 = 0.02) (Figure 4A). Scenario II performed the worst across all elements, with R2 values varying from 0.00 to 0.12.
The distribution of samples between training and validation varied widely between scenarios (Figure 3). For example, for Scenario I there were 163 samples used for training and 173 for validation, while for Scenario II, there were 118 for training and 223 for validation. Scenario III was the most extreme case, with 47 points used for training and 294 for validation, which explains, among other things, its performance as the worst. Scenario IV, on the other hand, was almost the reverse of Scenario III, with 175 samples used for training and only 19 for validation, and all were confined to a small area; this was followed by Scenario V, with 256 points used for training and 85 for validation, but these were spread throughout the study area, contrary to Scenario IV.
Similar trends were observed for the other performance parameters for Scenario IV, although with some noticeable exceptions. For example, for Scenario IV, the RMSE (21.1%) and Bias (−2.46) for MnO were the highest, though it had the largest R2 (0.13). Similarly, the SiO2 with the largest R2 (0.20) also had the largest RMSE (28.8%), Pc (−9.3), and Bias (−9.26). However, there was a greater variation among the scenarios for each element, with values ranging from 0.00 (Al2O3 and Fe2O3, Scenario II; SiO2, Scenario V) to 0.44 (Fe2O3, Scenario IV). The performance parameters also varied widely among models and elements within each scenario (Table 3). The elements measured at different point locations, whether based on dense grids (D) or two types of transects (D/P and B), did not show strong correlations with the landscape. The dominant geologic formations were laterite crust, in the center at higher elevation, and talus, in the periphery at a lower elevation, and surrounding the laterite crust (Figure 2); both had distinct concentrations of elements (Table 1). Also, the points from the dense grid (D) and the diagonal and perpendicular (D/P) transects fell almost entirely within laterite crust. Not surprisingly, the distribution of predicted values showed two major clusters of values (Figure 5) indicating the underlying structure, based on the distribution of sampling points between the two geologic materials (laterite crust and talus), which was also reflected by the sampling scenarios (Figure 5A–C). The dispersion of the observed and predicted values shows that the models tended to underestimate the high levels of the elements and display somewhat symmetrical patterns relative to the predicted values. The predicted values tend to cluster in two distinct groups, with high and low concentrations.

3.3. Accuracy Predictions Among Models

Overall, all models performed poorly, with R2 values ranging from 0.00 (SVMRadial model; SiO2) to 0.17 (GLMNET model; Fe2O3) (Table 5). The mean R2 for Fe2O3 for all the models combined was 0.14, followed by other elements ranging from 0.07 (SiO2) to 0.16 (TiO2).
However, the performance parameters for Fe2O3 between the models varied more, compared to the other elements. For example, the R2 for Fe2O3 varied from 0.18 (RF) to 0.05 (NNET) compared to the other elements, with R2 values varying from 0.00 (SVMRadial) to 0.11 (KNN). In general, for all scenarios and elements combined, the RF, GLMET, and KNN models tied in terms of performance, followed by SVMRadial and NNET, with the worst performance (Figure 4B). However, no model performed consistently better for each element.
The GLMNET model for the Al2O3, MnO, Nb2O5, and TiO2 elements had R2 = 1.00 for the training and close to zero for the validation. The high R2 was likely due to the model overfitting the training data, which partially explains the close to zero R2 values for the validation data. Another reason could be the imbalance between the training and validation data. The higher MAE values for GLMET compared to other models for most of the elements indicate the poor predictive ability of the model, despite high R2 values.

3.4. Interaction of Models and Sampling Scenarios in Accuracy Predictions

Model performance varied by element and scenario (Figure 6). For aluminum (Al2O3), the performance was responsive to scenarios but insensitive to models. For iron (Fe2O3), model performance was somewhat sensitive to scenarios; however, the differences seem to be obscured by the much greater variability among models (NNET performed very poorly, and all other models performed about the same). For manganese (MnO), there was a similar performance between models, but a strong dependence on scenarios. However, there were some interactions in which some models performed relatively better in some scenarios and worse in others. Niobium (Nb2O5) had a stronger dependence on scenarios than on models, with much better performance in Scenario IV. Silica (SiO2) was also more responsive to scenarios, but opposite, as the performance was very poor in Scenario IV but good in I and III. Lastly, titanium (TiO2) was highly dependent on scenarios, but only because Scenario I performed very poorly compared to all other scenarios.

4. Discussion

At the beginning of our study, we formulated several hypotheses regarding the influences of sampling designs (five) and models (five) on the spatial predictions for selected elements (six). We postulated that the predictions of different chemical elements would be influenced by the sampling design (Hypothesis 1) and that the performance of all the models would be similar, regardless of the sampling design (Hypothesis 2). We further stated that the model performance would be similar for different chemical elements (Hypothesis 3) and that the predictions of different chemical elements would be influenced by the sampling design and the model (Hypothesis 4). We examine each hypothesis in light of the results from our analysis.

4.1. Influence of Sampling Scenarios

The sampling scenarios tested in this study sought to evaluate the representativeness of existing samples to capture the variability in the elements used for spatial modeling based on the relationships between the soil and its environment. The location and distribution of the existing sampling points for each set were different, as they were designed to meet certain objectives of the three different projects [22].
Across all models, Scenarios I, II, and III did not perform well, as the combinations of different sampling designs for training and validation did not equally capture the variability for any of the measured elements. For example, for Scenario I, models were trained on the high point density grid (D) but validated using the combination of diagonal and perpendicular (D/P) transects through the study site and surrounding it in a box square fashion (B) (Figure 3). The differences in the distribution and density between the sample sets used for training (D) and validation (D/P and B) and their ability to capture the variability likely contributed to the poor performance of the models for Scenario I. A similar situation can also explain the poor performance of the models for Scenario II and III, as the model training was based on B and D (Scenario II) and B (Scenario III) while validation was based on D/P for Scenario II and a combination of D/P and D for Scenario III. In other words, when training was based on the dense grid (D) the validation was based on transects (D/P and B) and vice versa. This resulted in an uneven distribution of samples between training and validation sets (Figure 3), which likely also contributed to the differences between scenarios.
Transect sampling designs have been used successfully in many studies to explore the best spacing for capturing patterns in the landscape, especially when mapping soils [47]. According to the study by [48], accurate soil ECa maps were obtained by increasing transect spacing simulations by up to 150 m. For example, ref. [49] used a random stratified sampling design with topography and geology as the stratifying variables to capture soil organic carbon (SOC) variations along a gradient of environmental covariates for spatial predictions using Random Forest and environmental covariates and obtained low R2 values, with RMSE varying from 0.33 to 1.71%. The SOC spatial patterns reflected the catena sequence, showing decreasing trends (toeslopes > ridges > midslope), proving to be effective for landscapes where environmental factors, especially topography, drive the distribution of SOC. All three sampling designs and the derived sampling scenarios failed to capture the landscape variability as represented by terrain attributes, and as shown by studies conducted in other areas [50]. In addition, previous studies have shown that the study area does not present a great heterogeneity of geological materials [51] further contributing to weak relationships between measured elements and terrain attributes.
The models performed relatively better for the other scenarios, especially Scenario IV (Figure 4A). It was different from the first three scenarios, in that both training and validation points were geo-co-located within the same spatial area, despite the fact that for training the dense grid (D) was used, while for validation the D/P transect was used. For Scenario V all sampling designs were used for both training and validation, which resulted in slightly better performance compared to the first three sampling scenarios but was still worse compared to Scenario IV.
The training versus validation analysis (Figure 5A–C), showed a greater range of element contents for the training data compared to the validation data. The range discrepancies were consistent, but varied among elements, which explains also the differences in model performance.

4.2. Spatial Variability and Geological Context

The bimodal distribution of measured values, due to the underlying geology, was present in all sampling scenarios (Figure 5) and influenced the predictions, which also showed a bimodal distribution for many of the elements (Figure 7).
Other studies have demonstrated the importance of understanding the spatial context of the data [50] and the usefulness of the auxiliary variables (geophysics, remote sensing, and morphometric) in capturing the spatial context for improved model predictions [2,23,52,53]. Spatial mapping of elements associated with geologic formations, even after undergoing pedogenesis, can be challenging, especially for areas that are relatively small compared to the spatial scale of the variability of geologic formations like those in our study area. Minasny and McBratney [18] have found that for DSM applications, transect-based sampling may not always capture the patterns of environmental covariates at small scales of measurement. In our study, site transects covered the two major geologic formations. However, they likely either failed to capture the topographic features, or the features were not well expressed, especially at the sampling locations. The same, but to a lesser degree, was the case for the dense grid, perhaps due to the ability to capture spatial variability over short distances, as found by [54] in another study area. Although there were differences in model performance between the sampling scenarios, either due to uneven distribution of samples between training and validation or their spatial representation of the environmental covariates or underlying geology, the overall performance of all models was poor, regardless of the sampling scenario. The evidence supports our initial hypothesis that predictions of chemical elements would be influenced by the sampling design.

4.3. Geological Formation of the Study Area

The spatial distribution of the elements was more driven by geology than topography, contrary to some soil properties such as SOC. Each unit (talus, laterite crust, and lacustrine sediments) has a unique composition which has resulted from the different rates and processes of weathering. In the crust laterite unit, characteristically high concentrations of Fe2O3, Nb2O5, and TiO2 stand out (Figure 7). Muhindo [55] showed that the weathering process, especially intense in tropical environments, causes laterites to have high concentrations of minerals. The laterites are highly enriched in oxidized forms of iron (Fe2O3), ranging from 39.0 to 70.0%, and manganese (MnO), ranging from 8 to 39%. According to [13], high concentrations of these elements occur in the upper and lower purple laterites and in the upper portions of the brown type. The talus on the other hand, has a morphology that is independent of the underlying rock. It is highly weathered, with lower Fe2O3, Nb2O5, and TiO2 contents, as shown by the asymmetrical relationship between the predicted and observed values. The Fe2O3 showed a spatial variability which was similar to Nb2O5 and TiO2. The study by [56] points to very high values of Fe2O3 and TiO2 in soils, characteristics inherited from the rock, which is naturally enriched in these elements. The authors emphasize that Fe2O3 and TiO2 have similar weathering rates, so the Fe2O3/TiO2 ratio is approximately the same as that found in the rock. A study by [57] shows that the spatial variability of Fe2O3 is highly correlated with Nb2O5. Understanding the spatial structures of factors that determine the spatially based distributions of properties as shown in this study could be more important, especially at the beginning, for an efficient sampling design and accurate predictions. Rare earth element (REE) mapping is particularly challenging due to its geochemical complexity, strong spatial variability, and dependence on specific geological processes, such as the presence of petroferric and lateritic formations that often concentrate REE in the Brazilian Amazon [58].

4.4. Predictive Models

Overall, all models performed poorly, with the highest coefficient of determination being (R2 = 0.44 − Fe2O3, S-IV). There were performance differences among the models, as shown by the wide range of performance parameters. The results also showed that the performance of the models not only differed by scenario but was also dependent on the element. Thus, although models performed poorly overall, since one of our objectives was to assess the interactions between sampling scenarios, models, and elements (Hypothesis 4), we discuss in more detail some of these interactions considered to be of interest to the modeling community. The evidence did not support our initial hypothesis that all models would perform similarly across sampling scenarios. Thus, overall, models performed better for Fe2O3, followed by TiO2 and Nb2O5 (Figure 4C). Across and within each element, not a single model was clearly identified as superior, except for RF and KNN, and only to a certain extent (Figure 4B). While, for Fe2O3, KNN was the best performer (R2 = 0.19), for Nb2O5, the GLMNET performed best (R2 = 0.12). This suggests that not only were some of the models perhaps more sensitive to the location, spacing, and imbalances between the sizes of the training and validation datasets, but also that the degree of sensitivity varied by element, indicating an interaction between models, sampling scenarios, and elements, albeit weak. In a sampling study, [59] highlighted the low prediction accuracy for the properties (SOM and CEC), with the interactions between the sites, the model, and the sample density being mainly driven by site differences.
Studies by [48,59] indicate good performance for models using nearest-neighbor assumptions. For the Morro dos seis Lagos area, we obtained good performance for the KNN model in predicting the elements. The effect of sampling density on prediction accuracy assumes a spatial dependence, but one that is sensitive to noise, with a high coefficient of variation [28].
The RF model performed better for almost all elements, probably because it considers the weight of the covariates influencing the prediction and manages their variation through each tree, utilizing it for improving predictions [27]. This was also determined by [53] when comparing other models with RF.

4.5. Sampling Designs and Properties

The accuracy predictions, as influenced by models, sampling scenarios, and elements, point to a complex picture for improving predictions. Ensemble modeling has been used successfully to improve accuracy predictions [60,61], when no one model is superior. However, the poor performance of all models did not warrant such an approach. Our analysis conducted for the six elements showed also spatial autocorrelation between elements that could be leveraged to decrease the number of interactions. For example, the Fe2O3, SiO2, and TiO2 elements were spatially autocorrelated positively or negatively, indicating the geochemical affinity among them (Figure 8). For example, while there was a greater predominance of high levels of Fe2O3 content in the laterite crust unit, a higher predominance for Nb2O5 and TiO2 occurred in the talus.
It was found that the spatial distribution of the elements follows a trend according to the geological classes (talus, laterite) present in the study area (Figure 8).
The laterite class showed low values of SiO2 (<10%) and Al2O3 (<12%) and the highest values of TiO2 and Nb2O5, despite the internal gradient in composition. It should be noted that these elements are less mobile on the earth’s surface and tend to concentrate on the residual soil horizons during chemical weathering (Al2O3/Fe2O3 ratio). With regard to TiO2 and Nb, studies such as those by Oliveira et al. [62] point out that the high values of Fe and Ti in soils are characteristics inherited from the rock, which is are naturally enriched. It is worth noting that Nb, W, and Ti are clearly relatively enriched in Amazonian soils compared to their global averages, probably due to binding in weathering-resistant oxide minerals [63].
Other geochemical evidence of the laterite class is found in the high mean concentrations of Fe2O3 and MnO, which are inversely proportional and result from the weathering process [55]. The laterites are highly enriched in Fe2O3, with an average of 39.4% (ranging from 39 to 70.0%), but relatively less for manganese (MnO), with an average of 9.5% (ranging from 8 to 39%). During the weathering process, the geochemical behavior of manganese is similar to that of iron and can substitute for a fraction of the iron in primary minerals, forming oxidized compounds. The talus class had low concentrations of Nb2O5 and TiO2 (<3%), probably due to its colluvial processes, showing a lower degree of chemical weathering compared to other geological units. The high positive correlation between TiO2 and Nb2O5 stands out, due to the geochemical affinity between these elements and their resistance to weathering.
The concentrations of Fe2O3 content were quite high, and the sampling design (D/P) showed a better distribution of the data and amplitude of the minimum and maximum values, with a greater number of outliers, indicating a greater variation in the data. The sampling design (P) showed relatively less variability, with concentration values varying between 30 and 100%, and mostly closer to 100% (Figure 9).
The sampling design (D/P) showed greater variation in the data for Nb2O5 and contained more outliers than the other sampling designs. There was a significant amount of zero values and/or values concentrated close to zero, a factor that may have influenced the model performance metrics. With regard to Fe2O3, the (D) and (P) sampling designs showed less variation in the data, which was not as suitable for efficient representation.
For the TiO2, the sampling design (D/P) also showed greater variation in the data, with a significant number of outliers. Again, the (D) and (P) sampling designs showed less variation. The sampling design (D/P) has a better representation of the distribution of elements across the geologic units, and likely better captures the spatial patterns.

5. Conclusions

We investigated the influences of sampling designs and models on the prediction accuracy of selected chemical compounds (Al2O3, Fe2O3, MnO, Nb2O5, TiO2, and SiO2). Model performance was influenced by the sampling scenario, but only slightly, and the degree of influence varied by element. The five sampling scenarios created by varying the sampling designs between the training and validation datasets led to an unbalanced number of samples between training and validation, which affected the performance of the models. Thus, the performance parameters decreased substantially from training to validation for all elements.
The exception in the sampling was Scenario IV, in which the number of points used for training was higher than for validation, and for which the training and validation points were contained in a small area. In this context, the RF model performed best, demonstrating the greatest accuracy in predicting the elements when compared to the other models, followed by the KNN model. In contrast, the NNET model performed poorly in terms of scenarios and elements. The GLMNET and SVMRadial models performed well in recognizing patterns for the prediction of elements when compared to NNET, while GLMNET overfitted, especially for the training dataset.
However, in general, model performance was poor for all sampling scenarios and chemical elements, mainly due to the weak correlations between chemical element concentrations, prediction factors, and the underlying geological formation, reflecting the need to include new covariates and consider stratification based on geology. The spatial distribution of samples, combined with the limited number of geologic formations in our study area and their lack of spatial heterogeneity, may have contributed to the poor performance seen in all models and their lack of interaction with the sampling scenarios. In this sense, future studies can benefit from including new covariates, stratified sampling designs, and hybrid modeling techniques to improve the modeling process, given the complexity of the soil environment.
Understanding the signature of geology on the concentrations and spatial distributions of chemical elements found under the conditions of high weathering observed on the Morro de Seis Lagos is as important a factor in characterizing the pedogenic process as are the characteristics of the terrain and landscapes.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/rs17091644/s1, Table S1: Environmental Covariates used for modeling. Table S2: Mean comparisons of performance parameters among models and elements for training and validation. Refs. [64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79] are cited in the Supplementary Materials.

Author Contributions

Conceptualization, N.B.R., T.R.B., H.S.K.P., M.M., J.B., P.R.O. and Z.L.; methodology, N.B.R., H.S.K.P. and Z.L.; software, N.B.R., Q.D.R. and T.R.B.; validation, N.B.R., H.S.K.P. and Z.L.; formal analysis, N.B.R., H.S.K.P., Q.D.R. and Z.L.; resources, H.S.K.P., D.M. and P.R.O.; data curation, N.B.R. and Q.D.R.; writing—original draft preparation, N.B.R., H.S.K.P. and Z.L.; writing—review and editing, N.B.R., H.S.K.P., M.M., Q.D.R., J.B., E.H.W., D.M., P.R.O. and Z.L.; visualization, N.B.R., Q.D.R. and Z.L.; supervision, H.S.K.P., D.M., P.R.O. and Z.L.; project administration, H.S.K.P., D.M. and P.R.O.; funding acquisition, H.S.K.P., and P.R.O. All authors have read and agreed to the published version of the manuscript.

Funding

Brazilian Ministry of Education, Coordination for the Improvement of Higher Education Personnel (CAPES), Visiting Doctoral Student Program # 44/2022. USDA-ARS Innovations for Small Farms Research Center # 6020-21310-011-000D.

Data Availability Statement

The original contributions presented in this study are included in the article/Supplementary Materials. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Jenny, H. Factors of Soil Formation: A System of Quantitative Pedology; McGraw-Hill Book Co.: New York, NY, USA, 1941. [Google Scholar]
  2. McBratney, A.B.; Mendonça Santos, M.L.; Minasny, B. On Digital Soil Mapping. Geoderma 2003, 117, 3–52. [Google Scholar] [CrossRef]
  3. Runge, E.C.A. Soil Development Sequences and Energy Models. Soil Sci. 1973, 115, 183–193. [Google Scholar] [CrossRef]
  4. Simonson, R.W. A Multiple-Process Model of Soil Genesis. In Quaternary Soils; Geo Abstracts: Norwich, UK, 1978; pp. 1–25. ISBN 0-86094-012-8. [Google Scholar]
  5. Kämpf, N.; Curi, N. Formacao e Evolucao Do Solo (Pedogênese). In Pedologia: Fundamentos, SBCS (School of Business and Computer Science); Ker, J.C., Curi, N., Schaefer, C.E.G.R., Vidal-Torrado, P., Eds.; Brazilian Society of Soil Science: Viçosa, Brazil, 2012; pp. 207–302. [Google Scholar]
  6. Pinheiro Junior, C.R.; Pereira, M.G.; Silva Neto, E.C.; Anjos, L.H.C. Soils of Brazil: Genesis, Classification and Limitations to Use. In Soils of Brazil; Atena: Ponta Grossa, Brazil, 2020; pp. 183–199. [Google Scholar]
  7. Giovannini, A.L.; Mitchell, R.H.; Neto, A.C.B.; Moura, C.A.V.; Pereira, V.P.; Porto, C.G. Mineralogy and Geochemistry of the Morro Dos Seis Lagos Siderite Carbonatite, Amazonas, Brazil. Lithos 2020, 360–361, 105433. [Google Scholar] [CrossRef]
  8. Zardiackas, L.D.; Kraay, M.J.; Freese, H.L.; International, A. Titanium, Niobium, Zirconium, and Tantalum for Medical and Surgical Applications; ASTM STP 1471; ASTM: West Conshohocken, PA, USA, 2006; ISBN 978-0-8031-3497-3. [Google Scholar]
  9. Babaei, K.; Fattah-alhosseini, A.; Chaharmahali, R. A Review on Plasma Electrolytic Oxidation (PEO) of Niobium: Mechanism, Properties and Applications. Surf. Interfaces 2020, 21, 100719. [Google Scholar] [CrossRef]
  10. Chen, Q.; Thouas, G.A. Metallic Implant Biomaterials. Mater. Sci. Eng. R Rep. 2015, 87, 1–57. [Google Scholar] [CrossRef]
  11. Giovannini, A.L.; Neto, A.C.B.; Porto, C.G.; Takehara, L.; Pereira, V.P.; Bidone, M.H. REE Mineralization (Primary, Supergene and Sedimentary) Associated to the Morro Dos Seis Lagos Nb (REE, Ti) Deposit (Amazonas, Brazil). Ore Geol. Rev. 2021, 137, 104308. [Google Scholar] [CrossRef]
  12. Palmieri, M.; Brod, J.A.; Cordeiro, P.; Gaspar, J.C.; Barbosa, P.A.R.; de Assis, L.C.; Junqueira-Brod, T.C.; e Silva, S.E.; Milanezi, B.P.; Machado, S.A.; et al. The Carbonatite-Related Morro Do Padre Niobium Deposit, Catalão II Complex, Central Brazil. Econ. Geol. 2022, 117, 1497–1520. [Google Scholar] [CrossRef]
  13. Giovannini, A.L.; Neto, A.C.B.; Porto, C.G.; Pereira, V.P.; Takehara, L.; Barbanson, L.; Bastos, P.H.S. Mineralogy and Geochemistry of Laterites from the Morro Dos Seis Lagos Nb (Ti, REE) Deposit (Amazonas, Brazil). Ore Geol. Rev. 2017, 88, 461–480. [Google Scholar] [CrossRef]
  14. Ferreira, A.C.S.; Pinheiro, É.F.M.; Costa, E.M.; Ceddia, M.B. Predicting Soil Carbon Stock in Remote Areas of the Central Amazon Region Using Machine Learning Techniques. Geoderma Reg. 2023, 32, e00614. [Google Scholar] [CrossRef]
  15. Gelsleichter, Y.A.; Costa, E.M.; dos Anjos, L.H.C.; Marcondes, R.A.T. Enhancing Soil Mapping with Hyperspectral Subsurface Images Generated from Soil Lab Vis-SWIR Spectra Tested in Southern Brazil. Geoderma Reg. 2023, 33, e00641. [Google Scholar] [CrossRef]
  16. Padarian, J.; Minasny, B.; McBratney, A.B. Machine Learning and Soil Sciences: A Review Aided by Machine Learning Tools. Soil 2020, 6, 35–52. [Google Scholar] [CrossRef]
  17. Wadoux, A.M.J.-C.; Brus, D.J.; Heuvelink, G.B.M. Sampling Design Optimization for Soil Mapping with Random Forest. Geoderma 2019, 355, 113913. [Google Scholar] [CrossRef]
  18. Minasny, B.; McBratney, A.B. A Conditioned Latin Hypercube Method for Sampling in the Presence of Ancillary Information. Comput. Geosci. 2006, 32, 1378–1388. [Google Scholar] [CrossRef]
  19. Roudier, P. Clhs: A R Package for Conditioned Latin Hypercube Sampling; R Foundation for Statistical Computing: Vienna, Austria, 2011. [Google Scholar]
  20. de Viegas Filho, J.R.; Bonow, C.W. Seis Lagos Project: Final Report; CPRM: Manaus, Brazil, 1976. [Google Scholar]
  21. Justo, L.J.E.C. Projeto Uaupés: Relatório Final de Pesquisa; CPRM: Manaus, Brazil, 1983. [Google Scholar]
  22. Takehara, L.; Almeida, M. Informe de Recursos Minerais—Avaliação do Potencial de Terras Raras no Brasil—Área Seis Lagos, Estado do Amazonas; Serviço Geológico do Brasil—CPRM: Brasília, Brazil, 2019. [Google Scholar]
  23. Hengl, T.; Nussbaum, M.; Wright, M.N.; Heuvelink, G.B.M.; Gräler, B. Random Forest as a Generic Framework for Predictive Modeling of Spatial and Spatio-Temporal Variables. PeerJ 2018, 6, e5518. [Google Scholar] [CrossRef]
  24. Friedman, J.; Hastie, T.; Tibshirani, R. Regularization Paths for Generalized Linear Models via Coordinate Descent. J. Stat. Softw. 2010, 33, 1–22. [Google Scholar] [CrossRef]
  25. Venables, W.N.; Ripley, B.D. Modern Applied Statistics with S, 4th ed.; Springer: New York, NY, USA, 2002. [Google Scholar]
  26. Hechenbichler, K.; Schliep, K. Weighted K-Nearest-Neighbor Techniques and Ordinal Classification; LMU Munich: Munich, Germany, 2004; discussion paper 399. [Google Scholar]
  27. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  28. McRoberts, R.E. Estimating Forest Attribute Parameters for Small Areas Using Nearest Neighbors Techniques. For. Ecol. Manag. 2012, 272, 3–12. [Google Scholar] [CrossRef]
  29. Heung, B.; Bulmer, C.E.; Schmidt, M.G. Predictive Soil Parent Material Mapping at a Regional-Scale: A Random Forest Approach. Geoderma 2014, 214–215, 141–154. [Google Scholar] [CrossRef]
  30. Dias, R.L.S.; da Silva, D.D.; Fernandes-Filho, E.I.; do Amaral, C.H.; dos Santos, E.P.; Marques, J.F.; Veloso, G.V. Machine Learning Models Applied to TSS Estimation in a Reservoir Using Multispectral Sensor Onboard to RPA. Ecol. Inform. 2021, 65, 101414. [Google Scholar] [CrossRef]
  31. Duan, M.; Song, X.; Li, Z.; Zhang, X.; Ding, X.; Cui, D. Identifying Soil Groups and Selecting a High-Accuracy Classification Method Based on Multi-Textural Features with Optimal Window Sizes Using Remote Sensing Images. Ecol. Inform. 2024, 81, 102563. [Google Scholar] [CrossRef]
  32. BRASIL Decree Law No. 9.985, of 18 July 2000. Establishes the National System of Nature Conservation Units (SNUC). Available online: https://www.planalto.gov.br/ccivil_03/decreto/2000/D9985.htm (accessed on 26 April 2025).
  33. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2021. [Google Scholar]
  34. RStudio Team. RStudio: Integrated Development Environment for R; RStudio, PBC: Boston, MA, USA, 2020. [Google Scholar]
  35. ESRI. ArcGIS Desktop, version 10.6; ESRI: Redlands, CA, USA, 2016. [Google Scholar]
  36. Conrad, O.; Bechtel, B.; Bock, M.; Dietrich, H.; Fischer, E.; Gerlitz, L.; Wehberg, J.; Wichmann, V.; Böhner, J. System for Automated Geoscientific Analyses (SAGA) v. 2.1.4. Geosci. Model Dev. 2015, 8, 1991–2007. [Google Scholar] [CrossRef]
  37. IBGE – Instituto Brasileiro de Geografia e Estatística. Modelos Digitais de Superfície – MDS. Available online: https://www.ibge.gov.br/geociencias/informacoes-ambientais/modelo-digital-de-terreno/15851-modelos-digitais-de-superficie.html (accessed on 26 April 2025).
  38. Sora, A.M.; Simbe, M.; Dias, J.; Uacane, M.S. Integration of Satellite Images to Identify Changes in the Xiluvo-Nhamatanda Carbonatite Suite. Educ.-Educ. Soc. Environ. 2018, 21, 251–263. Available online: https://periodicos.ufam.edu.br/index.php/educamazonia/article/view/5106 (accessed on 26 April 2025).
  39. Jeong, G.; Oeverdieck, H.; Park, S.J.; Huwe, B.; Ließ, M. Spatial Soil Nutrients Prediction Using Three Supervised Learning Methods for Assessment of Land Potentials in Complex Terrain. Catena 2017, 154, 73–84. [Google Scholar] [CrossRef]
  40. Svetnik, V.; Liaw, A.; Tong, C.; Wang, T. Application of Breiman’s Random Forest to Modeling Structure-Activity Relationships of Pharmaceutical Molecules. In Multiple Classifier Systems; Roli, F., Kittler, J., Windeatt, T., Eds.; Springer: Berlin/Heidelberg, Germany, 2004; pp. 334–343. [Google Scholar]
  41. Kuhn, M. Building Predictive Models in R Using the Caret Package. J. Stat. Softw. 2008, 28, 1–26. [Google Scholar] [CrossRef]
  42. Silveira, C.T.; Oka-Fiori, C.; Santos, L.J.C.; Sirtoli, A.E.; Silva, C.R.; Botelho, M.F. Soil Prediction Using Artificial Neural Networks and Topographic Attributes. Geoderma 2013, 195–196, 165–172. [Google Scholar] [CrossRef]
  43. Liaw, A.; Wiener, M. Classification and Regression by randomForest. R News 2002, 2, 18–22. [Google Scholar]
  44. Karatzoglou, A.; Smola, A.; Hornik, K. Kernlab: Kernel-Based Machine Learning Lab; CRAN: Vienna, Austria, 2024. [Google Scholar]
  45. Reis, G.B.; da Silva, D.D.; Filho, E.I.F.; Moreira, M.C.; Veloso, G.V.; de Fraga, M.S.; Pinheiro, S.A.R. Effect of Environmental Covariable Selection in the Hydrological Modeling Using Machine Learning Models to Predict Daily Streamflow. J. Environ. Manag. 2021, 290, 112625. [Google Scholar] [CrossRef]
  46. Moquedace, C.M.; Baldi, C.G.O.; Siqueira, R.G.; Cardoso, I.M.; de Souza, E.F.M.; Fontes, R.L.F.; Francelino, M.R.; Gomes, L.C.; Fernandes-Filho, E.I. High-Resolution Mapping of Soil Carbon Stocks in the Western Amazon. Geoderma Reg. 2024, 36, e00773. [Google Scholar] [CrossRef]
  47. Ditzler, C.; Scheffe, K.; Monger, H.C. (Eds.) Soil Survey Staff Soil Survey Manual; USDA Handbook: Washington, DC, USA, 2017. [Google Scholar]
  48. Rodrigues, H.M.; Vasques, G.M.; Oliveira, R.P.; Tavares, S.R.L.; Ceddia, M.B.; Hernani, L.C. Finding Suitable Transect Spacing and Sampling Designs for Accurate Soil ECa Mapping from EM38-MK2. Soil Syst. 2020, 4, 56. [Google Scholar] [CrossRef]
  49. Grimm, R.; Behrens, T.; Märker, M.; Elsenbeer, H. Soil Organic Carbon Concentrations and Stocks on Barro Colorado Island—Digital Soil Mapping Using Random Forests Analysis. Geoderma 2008, 146, 102–113. [Google Scholar] [CrossRef]
  50. Kirkwood, C.; Cave, M.; Beamish, D.; Grebby, S.; Ferreira, A. A Machine Learning Approach to Geochemical Mapping. J. Geochem. Explor. 2016, 167, 49–61. [Google Scholar] [CrossRef]
  51. Bento, J.P.P.; Porto, C.G.; Takehara, L.; da Silva, F.J.; Bastos Neto, A.C.; Machado, M.L.; Duarte, A.C. Mineral Potential Re-Evaluation of the Seis Lagos Carbonatite Complex, Amazon, Brazil. Braz. J. Geol. 2022, 52, e20210031. [Google Scholar] [CrossRef]
  52. Odeh, I.O.A.; McBratney, A.B.; Chittleborough, D.J. Further Results on Prediction of Soil Properties from Terrain Attributes: Heterotopic Cokriging and Regression-Kriging. Geoderma 1995, 67, 215–226. [Google Scholar] [CrossRef]
  53. McKenzie, N.J.; Ryan, P.J. Spatial Prediction of Soil Properties Using Environmental Correlation. Geoderma 1999, 89, 67–94. [Google Scholar] [CrossRef]
  54. Stumpf, F.; Schmidt, K.; Behrens, T.; Schönbrodt-Stitt, S.; Buzzo, G.; Dumperth, C.; Wadoux, A.; Xiang, W.; Scholten, T. Incorporating Limited Field Operability and Legacy Soil Samples in a Hypercube Sampling Design for Digital Soil Mapping. J. Plant Nutr. Soil Sci. 2016, 179, 499–509. [Google Scholar] [CrossRef]
  55. Muhindo, K.G. Geology, Geochemistry and Economic Potential of the Bingo Carbonatite and Its Associated Laterites in Beni Territory, North Kivu, Democratic Republic of Congo (DRC). Master’s Thesis, University of Nairobi, Nairobi, Kenya, 2018. [Google Scholar]
  56. de Oliveira, J.R.S.; Pruski, F.F.; da Silva, J.M.A.; da Silva, D.P. Comparative Analysis of the Performance of Mixed Terraces and Level and Graded Terraces. Acta Sci. Agron. 2012, 34, 351–357. [Google Scholar] [CrossRef]
  57. Rodrigues, N.; Lopes da Silva, J.C.; Pereira Marinatti da Silva, R.; Saraiva Koenow Pinheiro, H.; Carvalho Junior, W. Mapping of Fe2O3, Nb and TiO2, as a Support to Classify Outcropping Materials in “Morro DOS Seis Lagos” Carbonatite Complex, Brazilian Amazon. In Proceedings of the EGU General Assembly, Vienna, Austria, 23–27 May 2022; EGU22-12326. [Google Scholar]
  58. Guimarães, J.T.F.; Sahoo, P.K.; e Souza-Filho, P.W.M.; da Silva, M.S.; Rodrigues, T.M.; da Silva, E.F.; Reis, L.S.; de Figueiredo, M.M.J.C.; da Lopes, K.S.; Moraes, A.M.; et al. Landscape and Climate Changes in Southeastern Amazonia from Quaternary Records of Upland Lakes. Atmosphere 2023, 14, 621. [Google Scholar] [CrossRef]
  59. Safaee, S.; Libohova, Z.; Kladivko, E.J.; Brown, A.; Winzeler, E.; Read, Q.; Rahmani, S.; Adhikari, K. Influence of Sample Size, Model Selection, and Land Use on Prediction Accuracy of Soil Properties. Geoderma Reg. 2024, 36, e00766. [Google Scholar] [CrossRef]
  60. Malone, B.P.; Minasny, B.; Odgers, N.P.; McBratney, A.B. Using Model Averaging to Combine Soil Property Rasters from Legacy Soil Maps and from Point Data. Geoderma 2014, 232–234, 34–44. [Google Scholar] [CrossRef]
  61. Zhang, J.; Schmidt, M.G.; Heung, B.; Bulmer, C.E.; Knudby, A. Using an Ensemble Learning Approach in Digital Soil Mapping of Soil pH for the Thompson-Okanagan Region of British Columbia. Can. J. Soil Sci. 2022, 102, 579–596. [Google Scholar] [CrossRef]
  62. De Oliveira, S.M.B.; Pessenda, L.C.R.; Gouveia, S.E.M.; Fávaro, D.I.T.; Babinski, M. Geochemical Evidence of Soils Formed by the Interaction of Guano with Volcanic Rocks, Rata Island, Fernando de Noronha (PE). Geologia USP. Sci. Ser. 2009, 9, 3–12. [Google Scholar]
  63. Matschullat, J.; Martins, G.C.; Enzweiler, J.; von Fromm, S.F.; van Leeuwen, J.; de Lima, R.M.B.; Schneider, M.; Zurba, K. What Influences Upland Soil Chemistry in the Amazon Basin, Brazil? Major, Minor and Trace Elements in the Upper Rhizosphere. J. Geochem. Explor. 2020, 211, 106433. [Google Scholar] [CrossRef]
  64. Thompson, J.A.; Bell, J.C.; Butler, C.A. Digital elevation model resolution: Effects on terrain attribute calculation and quantitative soil-landscape modeling. Geoderma 2001, 100, 67–89. [Google Scholar] [CrossRef]
  65. Wilson, J.P.; Gallant, J.C.; Walsh, S.J. Topographic Mapping and Analysis in GIS. In Geographic Information Systems and Science; Ramsay, R.H., McCleary, R.E., Eds.; IntechOpen: London, UK, 2007; pp. 87–112. [Google Scholar]
  66. Hutchinson, M.F.; Gallant, J.C. Digital elevation models and representation of terrain shape. Math. Comput. Simul. 2000, 42, 135–150. [Google Scholar]
  67. Moore, I.D.; Burch, G.J. Physical basis of the length-slope factor in the Universal Soil Loss Equation. Soil Sci. Soc. Am. J. 1986, 50, 1294–1298. [Google Scholar] [CrossRef]
  68. Olaya, V. SAGA GIS—Morphometry Tools. 2004. Available online: https://saga-gis.sourceforge.io/saga_tool_doc/2.2.3/ta_morphometry_6.html (accessed on 26 April 2025).
  69. Conrad, O. Mid-Slope Position Analysis in SAGA GIS; Geoscientific Model Development; SAGA GIS: San Diego, CA, USA, 2008. [Google Scholar]
  70. Gallant, J.C.; Dowling, T.I. A multiresolution index of valley bottom flatness for mapping depositional areas. Water Resour. Res. 2003, 39, 12. [Google Scholar] [CrossRef]
  71. Conrad, O. Slope Index Calculation in SAGA GIS; SAGA GIS Documentation; SAGA GIS: San Diego, CA, USA, 2010. [Google Scholar]
  72. Rouse, J.W.; Haas, R.H.; Schell, J.A.; Deering, D.W. Monitoring vegetation systems in the Great Plains with ERTS. NASA Spec. Publ. 1973, 351, 309–317. [Google Scholar]
  73. Huete, A.R. A soil-adjusted vegetation index (SAVI). Remote Sens. Environ. 1988, 25, 295–309. [Google Scholar] [CrossRef]
  74. Van der Meer, F.D.; Van der Werff, H.M.A.; Van Ruitenbeek, F.J.A.; Hecker, C.A.; Bakker, W.H.; Noomen, M.F.; Woldai, T. Multi-and hyperspectral geologic remote sensing: A review. Int. J. Appl. Earth Obs. Geoinf. 2014, 33, 161–175. [Google Scholar] [CrossRef]
  75. Sabins, F.F. Remote Sensing Principles and Interpretation, 3rd ed.; W.H. Freeman and Company: New York, NY, USA, 1997. [Google Scholar]
  76. Kalinowski, A.; Oliver, S. ASTER Mineral Index Processing Manual; Geoscience Australia Technical Report; Geoscience Australia: Canberra, Australia, 2004. [Google Scholar]
  77. Rowan, L.C.; Mars, J.C.; Simpson, C.J. Hyperspectral analysis of the ultramafic complex of the Gossan Lead, Virginia, USA. Remote Sens. Environ. 2003, 88, 123–139. [Google Scholar]
  78. Sora, S.; Van der Meer, F.; Hecker, C. Spectral indices for detecting advanced argillic alteration in active hydrothermal systems. Int. J. Appl. Earth Obs. Geoinf. 2018, 69, 134–147. [Google Scholar]
  79. European Space Agency (ESA). Sentinel-1 SAR User Guide; ESA: Paris, France, 2021; Available online: https://sentinel.esa.int (accessed on 10 January 2024).
Figure 1. The methodological steps included (1) compilation of geochemical data from CPRM [22]; (2) extraction of remote sensing covariates (ASTER, SAR, and Sentinel-2A) and topographic data (DEM); (3) harmonization of geochemical data and covariates into a single database; (4) selection of covariates using Spearman’s correlation (threshold 0.95) and Recursive Feature Elimination (RFE), with partitioning of the database into test, training, and use (Leave-One-Out Cross-Validation); (5) definition of sampling scenarios; (6) processing models (RF, SVM, NNET, KNN, and GLMNET); (7) model performance evaluation based on precision, accuracy, and sampling design; and (8) calculation of the average spatial predictions of elements.
Figure 1. The methodological steps included (1) compilation of geochemical data from CPRM [22]; (2) extraction of remote sensing covariates (ASTER, SAR, and Sentinel-2A) and topographic data (DEM); (3) harmonization of geochemical data and covariates into a single database; (4) selection of covariates using Spearman’s correlation (threshold 0.95) and Recursive Feature Elimination (RFE), with partitioning of the database into test, training, and use (Leave-One-Out Cross-Validation); (5) definition of sampling scenarios; (6) processing models (RF, SVM, NNET, KNN, and GLMNET); (7) model performance evaluation based on precision, accuracy, and sampling design; and (8) calculation of the average spatial predictions of elements.
Remotesensing 17 01644 g001
Figure 2. Location of the study area and the sampling point locations.
Figure 2. Location of the study area and the sampling point locations.
Remotesensing 17 01644 g002
Figure 3. Different scenarios for a combination of sampling designs in the composition of training and validation sets for modeling.
Figure 3. Different scenarios for a combination of sampling designs in the composition of training and validation sets for modeling.
Remotesensing 17 01644 g003
Figure 4. Coefficients of determination (R2) (A) among scenarios for all models and elements combined; (B) among models for all scenarios and elements combined; and (C) among elements for all models and scenarios combined.
Figure 4. Coefficients of determination (R2) (A) among scenarios for all models and elements combined; (B) among models for all scenarios and elements combined; and (C) among elements for all models and scenarios combined.
Remotesensing 17 01644 g004aRemotesensing 17 01644 g004b
Figure 5. Distributions of the expected values for the sampling scenarios for the elements Al2O3 (A), Fe2O3 (B), MnO (C), Nb2O5 (D), TiO2 (E), and SiO2 (F).
Figure 5. Distributions of the expected values for the sampling scenarios for the elements Al2O3 (A), Fe2O3 (B), MnO (C), Nb2O5 (D), TiO2 (E), and SiO2 (F).
Remotesensing 17 01644 g005aRemotesensing 17 01644 g005b
Figure 6. RMSE for each element by scenario and model (A), and for each sampling scenario by model and element (B).
Figure 6. RMSE for each element by scenario and model (A), and for each sampling scenario by model and element (B).
Remotesensing 17 01644 g006aRemotesensing 17 01644 g006b
Figure 7. Distribution of measured and predicted values for Fe2O3 (A), Nb2O5 (B), and TiO2 (C), based on the KNN, GLMNET, and RF models, respectively, and the locations of sampling points underlined by two major formations (laterite crust and talus).
Figure 7. Distribution of measured and predicted values for Fe2O3 (A), Nb2O5 (B), and TiO2 (C), based on the KNN, GLMNET, and RF models, respectively, and the locations of sampling points underlined by two major formations (laterite crust and talus).
Remotesensing 17 01644 g007
Figure 8. Spatial distributions of element content: SiO2 (A), Al2O3 (B), Fe2O3 (C), TiO2 (D), MnO (E), and Nb2O5 (F).
Figure 8. Spatial distributions of element content: SiO2 (A), Al2O3 (B), Fe2O3 (C), TiO2 (D), MnO (E), and Nb2O5 (F).
Remotesensing 17 01644 g008
Figure 9. Visualization of soil data distribution by sampling design for F2O3 (A), Nb2O5 (B), and TiO2 (C).
Figure 9. Visualization of soil data distribution by sampling design for F2O3 (A), Nb2O5 (B), and TiO2 (C).
Remotesensing 17 01644 g009
Table 1. Summary statistics of elements, based on geologic formations.
Table 1. Summary statistics of elements, based on geologic formations.
Elements
(% Wt)
GeologyMeanMedianMin.Max.SDCV (%)CsCk
Al2O3Combined3.281.670.1958.06.4195.95.331.4
Laterite/Depressions4.731.760.7536.78.9189.43.07.8
Laterite2.071.420.1932.53.1148.27.465.4
Talus6.312.710.2658.010.5165.63.19.6
Fe2O3Combined67.077.40.1295.122.6508.01.71.9
Laterite/Depressions68.673.97.1087.519.728.719.73.5
Laterite75.878.86.9695.113.517.9−2.16.3
Talus51.361.00.1292.73160.3−0.4−1.5
TiO2Combined2.931.42bdl29.94.116.72.89.9
Laterite/Depressions4.822.360.7317.14.9100.91.30.3
Laterite3.461.860.0429.94.4126.62.58.2
Talus1.250.50bdl18.32.3182.15.033.1
Nb2O5Combined0.740.390.004.020.90.81.41.5
Laterite/Depressions1.531.120.533.811.066.90.9−0.4
Laterite0.810.49bdl4.020.9111.81.20.8
Talus0.340.22bdl2.380.5149.02.24.8
MnOCombined5.090.20bdl64.114.02.810.93.0
Laterite/Depressions0.410.06bdl4.511.1272.63.18.1
Laterite1.620.17bdl63.86.8416.56.547.0
Talus14.670.66bdl64.122.0150.11.1−0.4
SiO2Combined2.740.56bdl97.811.6422.67.050.7
Laterite/Depressions2.950.560.1236.59.3314.33.18.5
Laterite0.840.46bdl26.72.2259.79.395.9
Talus7.861.450.1497.821.5273.73.510.9
Med = median; Min = minimum; Max = maximum; SD = standard deviation; CV = coefficient of variation; Cs = coefficient of asymmetry; Ck = Kurtosis; bdl = below detection limit.
Table 2. Factors influencing model performance parameters, based on the Analysis of Variance (ANOVA), using the metrics RMSE (Root Mean Square Error), MAE (Mean Absolute Error), and R2 (Coefficient of Determination). The table presents the Sources of Variation, Degrees of Freedom (DF), Sum of Squares (SS), Mean Squares (MS), F-Statistics (F), and p-Values for each factor (Scenario, Model, Element, Interaction, and Error).
Table 2. Factors influencing model performance parameters, based on the Analysis of Variance (ANOVA), using the metrics RMSE (Root Mean Square Error), MAE (Mean Absolute Error), and R2 (Coefficient of Determination). The table presents the Sources of Variation, Degrees of Freedom (DF), Sum of Squares (SS), Mean Squares (MS), F-Statistics (F), and p-Values for each factor (Scenario, Model, Element, Interaction, and Error).
RMSE
Source of VariationDFSSMSFp-Value
Scenario4551.2137.81.70.15
Model41415.5353.94.00.00
Element516,731.63346.341.50.00
Scenario x Model1658.43.70.041.00
Error1209675.980.6
MAE
Source of VariationDFSSMSFp-Value
Scenaio4127.731.90.40.79
Model41404.7351.24.60.00
Element511,783.72356.730.60.00
Scenario x Model1648.13.00.031.00
Error1209247.077.1
R2
Source of VariationDFSSMSFp-Value
SCENARIO41.00.2541.220.00
Model40.10.024.50.00
Element50.10.023.60.00
Scenario x Model160.201.60.08
Error1200.80
Table 3. Mean comparisons of performance parameters among models and elements for the training and validation.
Table 3. Mean comparisons of performance parameters among models and elements for the training and validation.
Model Training Validation
RMSE MAE R2 RMSE MAE R2
NNET17.05 a13.47 a0.08 d18.52 a13.58 a0.05 b
SVMRadial7.87 b4.15 b0.56 b10.44 b5.19 b0.04 b
GLMNET7.51 b4.8 b0.93 a11.06 b6.62 b0.08 ab
KNN7.36 b4.36 b0.16 cd11.12 b6.14 b0.11 a
RF7.16 b4.27 b0.22 c10.84 b6.16 b0.1 ab
ElementRMSEMAER2RMSEMAER2
Fe2O329.17 a24.23 a0.34 a32.64 a26.69 a0.14 a
MnO11.81 b5.44 b0.47 a15.36 b7.96 b0.06 b
SiO26.07 bc2.43 b0.34 a15.4 b4.72 bc0.06 b
Al2O35.07 bc2.43 b0.4 a5.65 c2.51 bc0.06 b
TiO23.4 c2.13 b0.36 a4.52 c2.7 bc0.07 b
Nb2O50.81 c0.61 b0.42 a0.81 c0.64 c0.07 b
Values followed by same letters and not significantly different at p = 0.05.
Table 4. Mean of performance parameters among sampling scenarios for all models combined.
Table 4. Mean of performance parameters among sampling scenarios for all models combined.
Training Validation
Element (%)ScenarioRMSEMAER2RMSEMAER2PcBias
Al2O3I1.991.200.339.083.540.020.11−2.33
II9.584.560.464.222.490.020.070.24
III6.633.040.486.572.790.010.04−0.35
IV2.021.220.311.591.210.240.32−0.02
V6.632.840.436.022.470.040.15−0.25
Mean5.372.570.405.502.500.070.14−0.54
Fe2O3I27.8223.270.4435.8029.720.090.2416.85
II33.9227.190.2330.7926.120.070.2215.87
III25.2122.170.3934.2026.860.020.95−9.96
IV28.5023.740.3932.4025.460.440.53−4.39
V30.4324.840.3030.0125.330.090.2517.29
Mean29.1824.240.3532.6426.700.140.4412.87
MnOI12.135.840.5018.2111.710.030.166.52
II13.645.870.4213.516.170.120.27−2.18
III7.033.190.3215.006.310.020.11−3.38
IV12.735.900.5821.0911.460.130.32−2.46
V13.516.380.599.054.240.03−0.211.88
Mean11.815.440.4815.377.980.070.130.07
Nb2O5I0.860.650.340.890.630.020.06−0.12
II0.700.540.470.910.720.030.160.05
III0.840.590.380.960.770.020.020.10
IV0.860.660.470.460.380.260.29−0.01
V0.830.620.490.870.710.070.250.09
Mean0.820.610.430.820.640.080.160.02
TiO2I4.592.810.323.762.690.030.130.07
II3.282.050.284.572.690.050.20−0.95
III2.601.730.304.612.660.01−0.26−0.88
IV4.492.730.314.722.940.210.43−1.48
V4.042.450.464.122.540.080.25−0.66
Mean3.802.360.334.362.710.070.15−0.78
SiO2I3.642.260.344.472.710.080.15−0.95
II16.307.390.274.592.800.050.681.82
III2.331.210.2912.613.120.010.01−1.44
IV0.700.380.3628.799.510.200.39−9.26
V10.212.770.5114.263.650.000.04−0.98
Mean6.642.800.3512.944.360.070.25−2.16
Table 5. Means of performance parameters among the models for all scenarios combined.
Table 5. Means of performance parameters among the models for all scenarios combined.
Element (%)ModelRMSEMAER2RMSEMAER2PcBias
TrainingValidation
Al2O3RF5.162.620.125.512.640.060.190.20
SVMRadial5.302.210.645.302.050.090.17−1.03
NNET5.822.460.175.692.240.05−0.10−2.06
GLMNET5.272.771.005.382.770.090.26−0.10
KNN5.292.800.085.582.800.080.240.22
Mean5.372.570.405.502.500.070.15−0.56
Fe2O3RF17.3612.470.3422.3716.570.18−1.540.35
SVMRadial19.1613.060.3922.8215.510.114.220.26
NNET71.8568.530.0470.7866.610.05−66.580.01
GLMNET18.4213.630.7624.4118.260.171.000.31
KNN19.0813.500.2122.8316.530.20−0.570.40
Mean29.1824.240.3532.6426.700.14−12.690.27
MnORF10.655.250.3114.477.550.081.210.14
SVMRadial12.484.980.7814.376.180.02−3.080.09
NNET13.825.110.0414.295.360.06−4.440.13
GLMNET11.116.581.0015.599.780.091.880.29
KNN11.375.480.2616.158.830.081.960.22
Mean11.885.480.4814.977.540.07−0.500.18
Nb2O5RF0.380.700.470.440.750.370.180.08
SVMRadial0.700.800.680.220.780.61−0.01−0.03
NNET0.260.780.500.220.780.500.10−0.01
GLMNET0.971.291.000.211.500.930.11−0.10
KNN0.270.780.510.340.770.510.330.07
Mean0.520.870.640.280.920.590.140.00
TiO2RF3.482.280.194.192.720.120.25−0.11
SVMRadial3.802.320.324.382.630.020.07−1.01
NNET4.332.340.044.662.520.040.18−2.12
GLMNET3.202.551.003.883.330.51−0.03−0.62
KNN3.742.400.124.082.610.120.22−0.17
Mean3.712.380.374.242.760.160.14−0.81
SiO2RF6.112.690.1915.734.820.070.69−2.95
SVMRadial6.061.970.3514.684.100.030.11−3.58
NNET6.152.040.1314.674.060.080.18−3.55
GLMNET6.153.000.8615.725.370.090.23−1.94
KNN5.772.420.2216.285.260.060.17−2.13
Mean6.052.430.3515.414.720.070.27−2.83
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Rodrigues, N.B.; Barbosa, T.R.; Pinheiro, H.S.K.; Mancini, M.; Read, Q.D.; Blackstock, J.; Winzeler, E.H.; Miller, D.; Owens, P.R.; Libohova, Z. Influences of Sampling Design and Model Selection on Predictions of Chemical Compounds in Petroferric Formations in the Brazilian Amazon. Remote Sens. 2025, 17, 1644. https://doi.org/10.3390/rs17091644

AMA Style

Rodrigues NB, Barbosa TR, Pinheiro HSK, Mancini M, Read QD, Blackstock J, Winzeler EH, Miller D, Owens PR, Libohova Z. Influences of Sampling Design and Model Selection on Predictions of Chemical Compounds in Petroferric Formations in the Brazilian Amazon. Remote Sensing. 2025; 17(9):1644. https://doi.org/10.3390/rs17091644

Chicago/Turabian Style

Rodrigues, Niriele Bruno, Theresa Rocco Barbosa, Helena Saraiva Koenow Pinheiro, Marcelo Mancini, Quentin D. Read, Joshua Blackstock, Edwin H. Winzeler, David Miller, Phillip R. Owens, and Zamir Libohova. 2025. "Influences of Sampling Design and Model Selection on Predictions of Chemical Compounds in Petroferric Formations in the Brazilian Amazon" Remote Sensing 17, no. 9: 1644. https://doi.org/10.3390/rs17091644

APA Style

Rodrigues, N. B., Barbosa, T. R., Pinheiro, H. S. K., Mancini, M., Read, Q. D., Blackstock, J., Winzeler, E. H., Miller, D., Owens, P. R., & Libohova, Z. (2025). Influences of Sampling Design and Model Selection on Predictions of Chemical Compounds in Petroferric Formations in the Brazilian Amazon. Remote Sensing, 17(9), 1644. https://doi.org/10.3390/rs17091644

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop