Prospectivity Mapping of Heavy Mineral Ore Deposits Based upon Machine-Learning Algorithms: Columbite-Tantalite Deposits in West-Central C ô te d’Ivoire

: This study aimed to model the prospectivity for placer deposits using geomorphic and landscape parameters. Within a geographic information system (GIS), spatial autocorrelation analysis of 3709 geochemical samples was used to identify prospective and non-prospective targets for columbite-tantalite (Nb-Ta) placer deposits of Hana-Lobo (H-L) Geological Complex (West-Central C ô te d’Ivoire, West Africa). Based on mineralization system analysis, hydrologic, geomorphologic and landscape parameters were extracted at the locations of the identiﬁed targets. Supervised automatic classiﬁcation approaches were applied, including Random Forest (RF), K-Nearest Neighbors (KNN) and Support Vector Machines (SVM) to ﬁnd a prospectivity model complex enough to capture the nature of the data. Metrics such as cross-validation accuracy (CVA), Receiver Operating Characteristic (ROC) curves, Area Under Curve (AUC) values and F-score values were used to evaluate the performance and robustness of output models. Results of applying machine-learning algorithms demonstrated that predictions provided by the ﬁnal RF and KNN models were very close ( κ = 0.56 and CVA = 0.69; κ = 0.54 and CVA = 0.68, respectively) and those provided by the SVM models were slightly lower with κ = 0.46 and CVA = 0.63. Independent validation results conﬁrmed the slightly higher performance of both KNN and RF prospectivity models, compared to ﬁnal SVM. Sensitivity analyses of both KNN and RF prospectivity models for medium and high-grade Nb-Ta deposits show a prediction rate of up to 90%.


Introduction
Supervised or semi-supervised machine-learning algorithms are often used to model the prospectivity of new exploration target areas in the context of complex and non-linear relationships between ore deposit targets and input proof functions, or when the input datasets have different statistical distributions [1]. These algorithms include Support Vector Machines (SVM) [2][3][4][5][6], K-Nearest Neighbors (KNN) [7], or Random Forest (RF) [8][9][10][11][12]. With these machine-learning algorithms, new prediction methods are increasingly being used in Mineral Prospectivity Mapping (MPM) to predict the prospectivity of new regions at satisfactory rates, especially when the entry space is complex and there are non-linear relationships between entry criteria and mineral exploration targets. For instance, predictive models based on logistic regression functions [13,14], Bayesian networks [15], SVM [4,6], neural network systems [16,17], or decision trees, such as RF [10], can be used to model, predict, and map the prospectivity of new mineral exploration targets.
Prediction methods based on automatic machine-learning algorithms using expert system products such as restricted Boltzmann machines [18,19], wavelet neural networks [20], or group method of data-handling neural networks [21], demonstrated the high potential of artificial neural networks (ANN) to model the prospectivity of new mineral targets. Figure 1 shows the location of the study area, particularly the Bemadi site (located southwest of the study area) that covers an area of 41.81 km², the Brokoua site (at the center of the study area) covering an area of 26.56 km² and the Makua site (northeast of the study area) covering an area of 24.47 km² (Figure 1). Only 14.4% (92.84 km²) of the study area has been sampled by soil geochemistry, especially in the Bemadi, Makua and Brokoua sites. The three sites explored (white and black square in dotted line in Figure 1) are located on the periphery (Bemadi site) or in the vicinity (Makua and Brokoua sites) of the pegmatitic-granite massifs.
The Precambrian rocks occupy 97% of the superficial area of the country and include rocks of two orogenic episodes, the Liberian (3000-2580 Ma) and the Eburnean of lower Proterozoic (2400-1550 Ma) ages.
The Precambrian base rock is a domain of meta-somatic granitic rocks (calco-alkaline and leucocratic granites containing two micas) intruded by schistose veins or rocks with sedimentary origin. These veins include mica schists, sericite schists, schistose chlorite, and micaceous and sandstone schists. Due to rather pronounced geomorphogenesis, the  Cruys, 1965) [25] (map of Africa source: © mappemonde.net, (accessed on 18 October 2022)). Figure 1 shows the location of the study area, particularly the Bemadi site (located southwest of the study area) that covers an area of 41.81 km 2 , the Brokoua site (at the center of the study area) covering an area of 26.56 km 2 and the Makua site (northeast of the study area) covering an area of 24.47 km 2 ( Figure 1). Only 14.4% (92.84 km 2 ) of the study area has been sampled by soil geochemistry, especially in the Bemadi, Makua and Brokoua sites. The three sites explored (white and black square in dotted line in Figure 1) are located on the periphery (Bemadi site) or in the vicinity (Makua and Brokoua sites) of the pegmatitic-granite massifs.
The Precambrian rocks occupy 97% of the superficial area of the country and include rocks of two orogenic episodes, the Liberian (3000-2580 Ma) and the Eburnean of lower Proterozoic (2400-1550 Ma) ages.
The Precambrian base rock is a domain of meta-somatic granitic rocks (calco-alkaline and leucocratic granites containing two micas) intruded by schistose veins or rocks with sedimentary origin. These veins include mica schists, sericite schists, schistose chlorite, and micaceous and sandstone schists. Due to rather pronounced geomorphogenesis, the region is almost entirely covered by eluvial, colluvial and alluvial deposits, which occupy almost all surfaces being studied [30]. Moving inland from the Atlantic coast, the tropical rainforest that covers the southern part of the country is eventually replaced by semi-deciduous tropical forest and, ultimately, by tropical savannah [41].
Rarely visible in outcrops, H-L volcano-sedimentary rocks are observable in altered state, through exploration pits. Lithology consists primarily of sericito-schistose and chlorito-schistose rocks, sandstone, micaceous and arkose schists and mica schists (shown as grey and green panels on Figure 1), homogeneous leucocratic granites, calco-alkaline granites, and two-micas granites showed as pink panels on Figure 1).
Granites appear in the region in the form of domes and extend northeast. Pegmatitic granite massifs of the region (Issia region~384 km from Abidjan) were developed through several tectonic episodes with an important phase of contact metamorphism affecting the adjoining geological lithologies composed of schists and mica schists [30], that belong to a shear zone described by Cruys [25] as favorable to Nb-Ta placer deposits. The geological complex is also marked by several kilometric faults bearing NS, NNW-SSE and NS-SE (showed as red continuous or discontinuous lines in Figure 1).

Mineralization System Analysis and Data Used
Within the H-L Geological Complex, several studies have focused upon the complex and non-linear relationship between known exploration indicators and Nb-Ta placer deposit formation and emplacements [28][29][30]. The mineral system under study belongs to the Man Shield (West African Craton) and its emplacement is related to the Eburnean Orogeny, which occurred 2100 to 2070 Ma BP. Primary mineralization was emplaced in granites and pegmatites in a multi-phase tectonic context [30]. The current location of the Bemadi, Brokoua and Makua sampled deposits (i.e., on the periphery or in the vicinity of mineralized granite pegmatites massifs) does not cast doubt on their granitic and pegmatitic origin [30]. Moreover, a previous study [30], demonstrated that the primary mineralization system of the Nb-Ta fits into a fractional crystallization model that explains the origin of Nb-Ta mineralization (Figure 2a). region is almost entirely covered by eluvial, colluvial and alluvial deposits, which occupy almost all surfaces being studied [30]. Moving inland from the Atlantic coast, the tropical rainforest that covers the southern part of the country is eventually replaced by semi-deciduous tropical forest and, ultimately, by tropical savannah [41]. Rarely visible in outcrops, H-L volcano-sedimentary rocks are observable in altered state, through exploration pits. Lithology consists primarily of sericito-schistose and chlorito-schistose rocks, sandstone, micaceous and arkose schists and mica schists (shown as grey and green panels on Figure 1), homogeneous leucocratic granites, calco-alkaline granites, and two-micas granites showed as pink panels on Figure 1).
Granites appear in the region in the form of domes and extend northeast. Pegmatitic granite massifs of the region (Issia region ~384 km from Abidjan) were developed through several tectonic episodes with an important phase of contact metamorphism affecting the adjoining geological lithologies composed of schists and mica schists [30], that belong to a shear zone described by Cruys [25] as favorable to Nb-Ta placer deposits. The geological complex is also marked by several kilometric faults bearing NS, NNW-SSE and NS-SE (showed as red continuous or discontinuous lines in Figure 1).

Mineralization System Analysis and Data Used
Within the H-L Geological Complex, several studies have focused upon the complex and non-linear relationship between known exploration indicators and Nb-Ta placer deposit formation and emplacements [28][29][30]. The mineral system under study belongs to the Man Shield (West African Craton) and its emplacement is related to the Eburnean Orogeny, which occurred 2100 to 2070 Ma BP. Primary mineralization was emplaced in granites and pegmatites in a multi-phase tectonic context [30]. The current location of the Bemadi, Brokoua and Makua sampled deposits (i.e., on the periphery or in the vicinity of mineralized granite pegmatites massifs) does not cast doubt on their granitic and pegmatitic origin [30]. Moreover, a previous study [30], demonstrated that the primary mineralization system of the Nb-Ta fits into a fractional crystallization model that explains the origin of Nb-Ta mineralization (Figure 2a).  [30] has been modified with the consent of its author.
Chronologically, the geodynamic model [30] (p. 273) established primary metalliferous concentrations of beryl, lithium, and especially Nb-Ta, with four orogenic stages (A, B, C and D) that explains the formation of granitic and pegmatitic rocks. Stage A is marked by erosion of sedimentary rocks and sedimentary volcanic rocks (2106 ± 78 Ma). Stages B, C and D are marked by a transcurrent tectonic deformation (D2) (2073 Ma) accompanied by a process of progressive granitization (towards the surface) of a mineralized magma in niobium-tantalum (Nb-Ta), lithium (Li) and, beryllium (Be).
This means that massifs of pegmatitic granites mineralized in Nb-Ta were set up in favor of the movement of the geological faults (tensiles fissures, Riedel fractures). This indicates that the fracturing system of the Precambrian base rock is involved in the location of the mineralized pegmatitic granite massifs and their associated Nb-Ta placer deposit in proximity. Otherwise, there is a complex spatial relationship between the fracturing density system and emplacement of the soil geochemical pits acquired into the known Nb-Ta Bemadi, Brokoua and Makua deposits. Stage E is marked by a period of erosion due to the action of meteoric water. Observations in field works confirmed that Nb-Ta placer deposits sampled are small amplitudes and are located near the surface, under a low vegetal cover and in a flat layer of mica schists or altered pegmatitic rocks. The deposits are usually made up of two gravelly layers of fine, medium, or coarse grain sizes (characteristic of lateritic soils) [30] (Figure 2b).
Placer deposits of Nb-Ta were described as secondary mineralization built from accumulations of valuable mineral commodities, which are formed by gravimetric separation, with specific gravities greater than that of quartz (>2.65), such as Nb-Ta minerals [25,26,30]. In addition, Nb-Ta placer deposits were developed in a humid tropical context that is favorable first, to the disintegration of mineralized primary rocks, secondly to the transportation of altered products such as gravel, and thirdly to the deposition of altered heavy minerals and sediments in natural trap zones. Placer deposits are a favorable environment for accumulation of sediments and minerals resulting from a succession of several hydro-morphologic processes. Hydro-morphologic processes are marked by major drainage phases and alluviation phases, followed by a phase of fragmentation and wear of vein rocks (hydrothermal alteration) such as pegmatites and quartz [25,26]. Otherwise, there is a complex spatial relationship between hydro-morphologic factors and emplacement of the soil geochemical pits acquired into the known Nb-Ta Bemadi, Brokoua and Makua deposits. Analysis of mineralization systems indicate that hydro-morphologic parameters (such as drainage processes), geomorphic and landscape parameters (such the unevenness of the ground and fracturing of Precambrian base rocks) are involved in the mineralization processes and the locations of the Nb-Ta placer deposits studied.  [30] has been modified with the consent of its author.
Chronologically, the geodynamic model [30] (p. 273) established primary metalliferous concentrations of beryl, lithium, and especially Nb-Ta, with four orogenic stages (A, B, C and D) that explains the formation of granitic and pegmatitic rocks. Stage A is marked by erosion of sedimentary rocks and sedimentary volcanic rocks (2106 ± 78 Ma). Stages B, C and D are marked by a transcurrent tectonic deformation (D2) (2073 Ma) accompanied by a process of progressive granitization (towards the surface) of a mineralized magma in niobium-tantalum (Nb-Ta), lithium (Li) and, beryllium (Be).
This means that massifs of pegmatitic granites mineralized in Nb-Ta were set up in favor of the movement of the geological faults (tensiles fissures, Riedel fractures). This indicates that the fracturing system of the Precambrian base rock is involved in the location of the mineralized pegmatitic granite massifs and their associated Nb-Ta placer deposit in proximity. Otherwise, there is a complex spatial relationship between the fracturing density system and emplacement of the soil geochemical pits acquired into the known Nb-Ta Bemadi, Brokoua and Makua deposits. Stage E is marked by a period of erosion due to the action of meteoric water. Observations in field works confirmed that Nb-Ta placer deposits sampled are small amplitudes and are located near the surface, under a low vegetal cover and in a flat layer of mica schists or altered pegmatitic rocks. The deposits are usually made up of two gravelly layers of fine, medium, or coarse grain sizes (characteristic of lateritic soils) [30] (Figure 2b).
Placer deposits of Nb-Ta were described as secondary mineralization built from accumulations of valuable mineral commodities, which are formed by gravimetric separation, with specific gravities greater than that of quartz (>2.65), such as Nb-Ta minerals [25,26,30]. In addition, Nb-Ta placer deposits were developed in a humid tropical context that is favorable first, to the disintegration of mineralized primary rocks, secondly to the transportation of altered products such as gravel, and thirdly to the deposition of altered heavy minerals and sediments in natural trap zones. Placer deposits are a favorable environment for accumulation of sediments and minerals resulting from a succession of several hydromorphologic processes. Hydro-morphologic processes are marked by major drainage phases and alluviation phases, followed by a phase of fragmentation and wear of vein rocks (hydrothermal alteration) such as pegmatites and quartz [25,26]. Otherwise, there is a complex spatial relationship between hydro-morphologic factors and emplacement of the soil geochemical pits acquired into the known Nb-Ta Bemadi, Brokoua and Makua deposits. Analysis of mineralization systems indicate that hydro-morphologic parameters (such as drainage processes), geomorphic and landscape parameters (such the unevenness of the ground and fracturing of Precambrian base rocks) are involved in the mineralization processes and the locations of the Nb-Ta placer deposits studied. Table 1 summarizes the main type of data that were considered. Geochemical data were acquired from field works by sampling soil. A total of 3709 geochemical samples were taken within the granite schist shear zones described by Cruys [25], in Bemadi, Makua and Brokoua sites. Sampling pits measured 1.20 m long by 0.80 m wide, with a maximum depth of 3 m. The pits were positioned on a regular grid that was 100 m wide by 50 m long. Average Nb-Ta concentrate grades (in grams per cubic meter) of the 3709 gravel samples were then calculated. The geochemical exploration survey data collected for this study, consist of the geographical coordinates of the pit, and the average Nb-Ta grade sampled in the pit.

Data Used
Geological data in vector format were taken from the geological map (1: 200,000 scale) covering the studied area. Geological information used in this study consists of locations of rocky outcrops of granitic pegmatites, together with known Nb-Ta, Be and Li mineral occurrences, and granitic and schist formations hosting the mineralization.
Structural data were extracted from different satellite images. These included a Sentinel 2B-MSI Level 1C multispectral image (acquired 24 December 2017 in a descending orbit). The image contains 13 wavelength bands, ranging from the Visible and Near Infrared (VNIR) to the Shortwave Infrared (SWIR), i.e., between 443 and 2190 nm. The second image that was used is a Sentinel-1A wide interferometric C-band radar image (HH and VV single polarization, 40 m resolution), which was acquired on 6 January 2018 in a descending orbit. These data were used to extract natural lineaments and to map the fracture system of the study area. Finally, Esri world imagery and SPOT image mosaics were used to validate the natural lineaments visually, and to analyze and validate the appearance of the objects at resolutions of 15 m to 2.5 m.

Methodology
Given the lack of data that would be required to investigate existing spatial relationships between exploration criteria and Nb-Ta placer deposits, the MPM selected approach used the predictive power of supervised automatic classification algorithms to model the prospectivity of new prospective and non-prospective Nb-Ta placer deposits. The overall approach is detailed in the flowchart depicted in Figure 3. The workflow is structured around six main steps, denoted S1 to S6 and described below. of rocky outcrops of granitic pegmatites, together with known Nb-Ta, Be and Li mineral occurrences, and granitic and schist formations hosting the mineralization. Structural data were extracted from different satellite images. These included a Sentinel 2B-MSI Level 1C multispectral image (acquired 24 December 2017 in a descending orbit). The image contains 13 wavelength bands, ranging from the Visible and Near Infrared (VNIR) to the Shortwave Infrared (SWIR), i.e., between 443 and 2190 nm. The second image that was used is a Sentinel-1A wide interferometric C-band radar image (HH and VV single polarization, 40 m resolution), which was acquired on 6 January 2018 in a descending orbit. These data were used to extract natural lineaments and to map the fracture system of the study area. Finally, Esri world imagery and SPOT image mosaics were used to validate the natural lineaments visually, and to analyze and validate the appearance of the objects at resolutions of 15 m to 2.5 m.

Methodology
Given the lack of data that would be required to investigate existing spatial relationships between exploration criteria and Nb-Ta placer deposits, the MPM selected approach used the predictive power of supervised automatic classification algorithms to model the prospectivity of new prospective and non-prospective Nb-Ta placer deposits. The overall approach is detailed in the flowchart depicted in Figure 3. The workflow is structured around six main steps, denoted S1 to S6 and described below.

Classification with SVM, KNN and RF Algorithms Machine-Learning
Algorithms such as SVMs were designed for binary classification and do not natively support classification tasks with more than two classes. One approach for using binary classification algorithms for multi-classification problems is to split the multi-class classification dataset into multiple binary classification datasets and fit a binary classification

Classification with SVM, KNN and RF Algorithms Machine-Learning
Algorithms such as SVMs were designed for binary classification and do not natively support classification tasks with more than two classes. One approach for using binary classification algorithms for multi-classification problems is to split the multi-class classification dataset into multiple binary classification datasets and fit a binary classification model on each one. A binary classifier can be trained on each binary classification problem and predictions are made using the model that is the most confident. Two different examples of this approach are the One-vs.-All (OvA) and One-vs.-One (OvO) strategies. The SVM model has been widely documented in the literature (e.g., [42,43]). SVM partitions a k-dimensional space into k-1 hyperplanes. The algorithm uses these optimal-margin hyperplanes to separate the data into different classes, based on a training vector dataset. This classification and regression model is nonparametric, given that it is insensitive to the statistical distribution of the data, which is an important advantage. In this study, the classification results were SVM models with Gaussian, cubic, and quadratic kernels, with varying levels of flexibility and interpretability.
The RF classification method is an ensemble classifier whose predictions are based on complex decision trees. In effect, RF uses a set of classification and regression trees that are created from training datasets in a replacement process, referred to as "bagging." The broader technique of bagging, or bootstrap aggregation that is performed within RF, is an ensemble-based algorithm that fits multiple models on different bootstrapped samples of the training dataset, and then aggregates the predictions made by the decision trees that are formed by each of these models. This process increases the diversity of the trees in the RF, while avoiding the problems of correlation among the different trees, together with overfitting [44]. RF, which uses subsamples of the predictors, has been widely used in classification problems for more details regarding the method [43,45,46]. Generally, RF generates efficient classification results, especially by over-sampling classes with large misclassification costs and under-sampling those with low misclassification costs. In this study, parameters were varied during the learning process to obtain the best possible results. For example, the number of learners and the maximum number of splits were used to control the depth of the trees and to increase the flexibility of the different output predictive models.
The KNN algorithm is well known in the field of remote sensing classification; it uses a similarity measure based on a distance matrix, which was calculated in a manner that was Euclidean or weighted Euclidean to determine the nearest neighbors [47]. In this study, several variants of KNN were analyzed, i.e., fine, medium, coarse, cosine, cubic and weighted models. The training parameters were varied to exploit the flexibility and interpretability of the different output models.

Extraction of Structural Evidence Map
The fractures and tensile fissures generated by the transcurrent deformation (D2) (see Figure 2a) contributed to the migration of the mineralizing fluids responsible for the placement of the mineralized pegmatitic granite. Only one structural evidence criterion was considered in this study, i.e., the fracturing density system. This system is likely to favor the formation of topographic anfractuosities (i.e., winding, or circuitous channels or passages), as well as influencing the emplacement of potentially mineralized host rocks (pegmatitic granite).
To map fracturing density, lineaments were extracted from Sentinel optical and radar images (see Table 1). The lineaments extraction technique used was based on previous studies [48][49][50]. The first one [48] showed-by comparing five satellite image enhancement techniques (mean value of all bands, Principal Component Analysis (PCA), band rate, histogram equalization and pass-through filter top)-that the PCA is an effective technique for automatic identification of geological lineaments. Later, the second approach [49] used the second principal component to map geological lineaments from Radarsat-1, and, the third one [50] used the VNIR and SWIR bands of the OLI and ASTER images.
In this study, the B11 (SWIR) and B08 (NIR) bands were re-sampled at 10 m resolution. The VV bias gamma band of the Sentinel-1 SAR IW GRD image was corrected radiometrically and geometrically at 10 m resolution. Previous works [51,52] recommend a minimum length of 10 pixels as an acceptable detection threshold (LTHR); a value between 3 and 8 pixels as a filter radius (RADI) to avoid the introduction of noise into the process; a value between 10 and 70 pixels as the minimum edge detection level (GTHR); a value between 2 and 5 as the maximum tolerance level (FTHR) to adjust a polyline curve; a value between 3 and 20 degrees recommended as the maximum angle (ATHR) not exceeded for a two-polyline link, and finally a value between 10 and 45 pixels as the maximum distance (DTHR) acceptable for linking two polylines. In this study, lineaments were extracted with the following optimal parameters: RADI: 5; GTHR: 30; LTHR: 5; FTHR: 5; ATHR: 30; and DTHR: 30. A regional bearing (angle from the geographic north direction) rosette was calculated using the Rose and Stereonet chart-plotting program (Rose.net), which can draw structural geology rose charts [53]. Figure 4 shows the major regional fracture directions (angle from the geographic north direction) that were extracted from lineaments, calculated with the Rose and Stereonet chart-plotting program.
Minerals 2022, 12, x FOR PEER REVIEW 9 of 32 radiometrically and geometrically at 10 m resolution. Previous works [51,52] recommend a minimum length of 10 pixels as an acceptable detection threshold (LTHR); a value between 3 and 8 pixels as a filter radius (RADI) to avoid the introduction of noise into the process; a value between 10 and 70 pixels as the minimum edge detection level (GTHR); a value between 2 and 5 as the maximum tolerance level (FTHR) to adjust a polyline curve; a value between 3 and 20 degrees recommended as the maximum angle (ATHR) not exceeded for a two-polyline link, and finally a value between 10 and 45 pixels as the maximum distance (DTHR) acceptable for linking two polylines. In this study, lineaments were extracted with the following optimal parameters: RADI: 5; GTHR: 30; LTHR: 5; FTHR: 5; ATHR: 30; and DTHR: 30. A regional bearing (angle from the geographic north direction) rosette was calculated using the Rose and Stereonet chart-plotting program (Rose.net), which can draw structural geology rose charts [53]. Figure 4 shows the major regional fracture directions (angle from the geographic north direction) that were extracted from lineaments, calculated with the Rose and Stereonet chart-plotting program. Major fracture orientations identified in this study were north to south (0 to 10° north), northeast to southwest (20 to 50° north), east to west (90 to 110° north), and southeast to northwest (120 to 150° north). The study area average bearing was estimated as 65.8° (red line on Figure 4). The obtained geological lineaments were validated using the literature and the high-resolution satellite images available on the study area. The directions agreed well with the regional tectonic deformation system that gave rise to the straight and isoclinal folds observed in the shear zone (granite schists) that have been previously mentioned by field geologists [25,29,54]. These studies also agree with results of structural mapping works [55,56] related to the bedrock of the Lobo Basin in west-central Côte d'Ivoire. Geological lineaments were visually validated using three data sources: a false-color composite (red, green, blue) from bands 4, 3, and 2 of the Sentinel-2B image; Esri world imagery and SPOT-5 image mosaics corresponding to the study area; and the geological map. This task was necessary to separate the natural lineaments from Major fracture orientations identified in this study were north to south (0 to 10 • north), northeast to southwest (20 to 50 • north), east to west (90 to 110 • north), and southeast to northwest (120 to 150 • north). The study area average bearing was estimated as 65.8 • (red line on Figure 4). The obtained geological lineaments were validated using the literature and the high-resolution satellite images available on the study area. The directions agreed well with the regional tectonic deformation system that gave rise to the straight and isoclinal folds observed in the shear zone (granite schists) that have been previously mentioned by field geologists [25,29,54]. These studies also agree with results of structural mapping works [55,56] related to the bedrock of the Lobo Basin in west-central Côte d'Ivoire. Geological lineaments were visually validated using three data sources: a falsecolor composite (red, green, blue) from bands 4, 3, and 2 of the Sentinel-2B image; Esri world imagery and SPOT-5 image mosaics corresponding to the study area; and the geological map. This task was necessary to separate the natural lineaments from anthropogenic lineaments; the latter included boundaries of cultivated land or forests, boundaries of built-up areas, and land or rail transport routes [52,57,58]. Therefore, a fracturing density map was constructed in a GIS from the validated geological lineaments. Figure 5a shows the fracture density map generated and Figure 5b,c shows hydrological predictors used. Figure 5b ,c, respectively, illustrate flow accumulation areas and flow amplitude (i.e., flow magnitude and flow direction), which are the two hydrological evidence criteria that were considered in this study.
Minerals 2022, 12, x FOR PEER REVIEW 10 of 32 anthropogenic lineaments; the latter included boundaries of cultivated land or forests, boundaries of built-up areas, and land or rail transport routes [52,57,58]. Therefore, a fracturing density map was constructed in a GIS from the validated geological lineaments. Figure 5a shows the fracture density map generated and Figure 5b,c shows hydrological predictors used. Figure 5b and 5c, respectively, illustrate flow accumulation areas and flow amplitude (i.e., flow magnitude and flow direction), which are the two hydrological evidence criteria that were considered in this study. The values of the density parameter increase in the green to red color range (0 to 2.6). Yellow to yellow-orange areas represent areas of medium fracturing. The heavily fractured areas are magenta to red and the less fractured areas are green, dark blue and The values of the density parameter increase in the green to red color range (0 to 2.6). Yellow to yellow-orange areas represent areas of medium fracturing. The heavily fractured areas are magenta to red and the less fractured areas are green, dark blue and turquoise blue (Figure 5a). In step S2, fracture density parameter values of the landscape were extracted at the geographic coordinates of each pixel of the study area. The extracted values corresponding to each pit location were used as the structural predictor variable with machine-learning algorithms.

Extraction of Hydrological Evidence Criteria
Field observation of the pit profiles showed a sequence of grano-graded gravel layers covering the Precambrian age base rocks. The observation of a grano-classification phenomenon in a placer-type deposit involves control of hydrologic processes (such as drainage) and geomorphologic or landscape factors (such as relief and slope shapes) that control, first, the erosion processes of the primary mineralized rocks, and second, the transportation and accumulation of erosion products (i.e., sediments and altered minerals) into natural trap areas.
Therefore, the mapping of natural concentration axes of surface runoff highlights the ability of surface runoff to mobilize, transport, erode and deposit eroded sediments and altered Nb-Ta minerals within catchments. Hydrologic exploration criteria of the landscape considered in this study are the accumulation area and the drainage flow amplitude parameters, together with the hydrographic network. The catchment areas of runoff and the network through which water flows to an orifice are called drainage systems. The amplitude and flow accumulation surfaces represent the spatial variation of the water flow phenomenon, parameterized according to a value, extracted at each pixel (or cell) of a digital elevation model covering the extent a study area.
The accumulation capacity of catchment runoff increases towards the riverbeds, i.e., grading from yellow-orange to white pixels (see Figure 5b). Areas that are favorable for runoff accumulation occupy the riverbeds and banks of rivers or marshes. They have high values and are the source of strong river currents that favor soil erosion and sedimentation processes. Lower accumulation values (green pixels on Figure 5b) occupy large areas of valleys or alluvial plains, which are favorable for long-distance sediment transport and deposition.
Flow amplitudes or directions also highlight, on either side of the marshes and rivers, areas of asymmetric interfluves (green pixels on Figure 5c) and topographic irregularities (yellow-orange) that occupy areas of ridges, flanks or summits of the massifs, or peneplains of the region. The direction of runoff flow is naturally from orange-yellow pixels (elevated areas) to green pixels of increasingly lower topographic relief (see Figure 5c).
In step S2, hydrologic parameters were extracted using algorithms that were available in ArcGIS v.10.2. The hydrologic parameter values of the landscape were extracted at the geographic coordinates of each pixel of the study area. The extracted values corresponding to each pit location were used as the hydrological predictor variable with machine-learning algorithms.

Extraction of Geomorphologic Evidence Criteria
Landscape geomorphologic parameters are likely to influence the acceleration, deceleration, convergence, or divergence of runoff at the scale of a study sub-catchments. Relevant elements that were mapped in this study are the curvature surfaces (i.e., concavity, convexity) or slope profile, the degree of slope inclination and the landform. The GIS software that was used has all functionality required to make the necessary calculations from the DEM (Bonham-Carter, 2013) [34]. Figure 6a-c illustrates the geomorphological evidence maps used.
Slope curvatures (Figure 6a) show areas with very concave slopes (dark brown) that are favorable for erosion, areas with very convex slopes (dark blue) that are favorable for runoff acceleration, and quasi-planar areas (light brown) that are unfavorable for sediment erosion or runoff.
With respect to degrees of slope inclination (see Figure 6b), steep slopes (dark brown) favor sediment flow and drainage processes, while increasingly gentle slopes contribute to deceleration of runoff and, consequently, trapping of heavy minerals over time in areas with topographic irregularities. Relief map evidence shows that all mineral targets occur at an altitude ranging from 200 to 393 m (see Figure 6c). Dark brown pixels indicate alluvial, colluvial and eluvial zones that are occupied by placer deposits. Minerals 2022, 12, x FOR PEER REVIEW 12 of 32 Slope curvatures (Figure 6a) show areas with very concave slopes (dark brown) that are favorable for erosion, areas with very convex slopes (dark blue) that are favorable for Ochre-yellow to light brown pixels are the lowest topographic levels (riverbeds and gullies), while dark-brown pixels indicate increasingly higher elevation areas ranging from shallows to hillsides, plateaus, and ridges. In step S2, the geomorphologic parameter values of the landscape were extracted at the geographic coordinates of each cell of the study area. The scale of most evidence maps that were used is 1:30,000. This scale provides sufficient accuracy for camp-scale mineral predictive mapping [42]. Finally, all the geomorphologic, hydrologic, and structural parameter values that were extracted from the placer deposit emplacements were used as predictor variables with machine-learning algorithms.

Selection and Extraction of Mineral Targets
The first step (S1) involved a statistical analysis of autocorrelation (or spatial aggregation technique) of the geochemical dataset (called dataset 1). Dataset 1 is a table consisting of 3709 rows (each row corresponds to a sample) with four columns of values populated by the latitude coordinate, the longitude coordinate, the sample name, and the average grade value of Nb-Ta expressed in grams per cubic meter.
We considered the Nb-Ta average grade value and the geographic coordinates of each pit. We analyzed the geostatistical distribution of dataset 1. Then we distinguished between different types of mineral prospective and non-prospective targets of Nb-Ta placer deposits inside the H-L Complex. Dataset 1 was implemented in a GIS (ArcGIS Version 10.2, ESRI, Redlands, CA, USA) and we used Moran's statistical clustering analysis technique to determine (based on their respective locations) whether spatial patterns of mineral exploration targets were clustered, dispersed or randomly distributed among the placer deposits studied. Given that the pits were arranged in a regular grid work (50 m by 100 m), we calculated Moran's I, i.e., spatial autocorrelation [59], using the inverse of the distance between pits as weights. This form of the weight matrix is a rational a priori choice of hypothesis and is used to characterize the existing spatial relationships between the different mineralized gravel locations that were sampled.
The analysis yielded global values of Moran's I that determined whether the overall pattern simultaneously expressed among spatial locations and mineral grades was clumped (negative-binomial distribution), over-dispersed (binomial), or random (Poisson). Local Moran's I [60] was used to indicate both statistically significant and non-significant spatial aggregations (clusters) of mineral grades (e.g., local patterns of High-High, Low-Low, Low-High and High-Low). Clusters resulting from the statistical analysis of autocorrelation of dataset 1 may represent different types of prospective and non-prospective Nb-Ta samples.
The appropriate tool (ArcGIS Pro 2.7: Cluster and Outlier Analysis) not only calculates these local indices, which can be used to identify spatial outliers (high-low and low-high groupings), but also their associated Z-scores, pseudo-probability values (permutationbased estimates), and codes (COType) for each feature in the geochemical survey data table (dataset 1). The COType (Cluster/Outlier Type) assigned to each feature in the pit grid is used to determine whether a sample belongs to a spatial cluster (or mineral target type). Interpretation of these groupings is carried out in the context of a null hypothesis. For example, when the pseudo-p-values of the features composing the clusters of points (high, low, or outliers) are less than ≤0.05, these points are then considered to be statistically significantly aggregated with a probability of 95%. Clusters are thus clumps or aggregates of High-High (HH) or Low-Low (LL) sample locations embedded in a matrix of low or high values, respectively. Outliers occur when a high value is surrounded by predominantly low values (HL), or conversely, when a low value is surrounded by predominantly high values (LH). The significant clustering response of HH or LL (indicated by the associated Z-score and p-value) could imply the existence of underlying factors that would control the spatial distribution of these mineral exploration targets at the observed population level. The spatial aggregation analysis step (S1) produces prospective and poorly prospective mineral target classes, which are then used for modeling mineral prospectivity of placer deposits, using geomorphic and landscape parameters. The map shown in Figure 7 represents clusters and outliers of the prospective and non-prospective samples selected from the Bemadi, Makua and Brokoua sites. derlying factors that would control the spatial distribution of these mineral exploration targets at the observed population level. The spatial aggregation analysis step (S1) produces prospective and poorly prospective mineral target classes, which are then used for modeling mineral prospectivity of placer deposits, using geomorphic and landscape parameters. The map shown in Figure 7 represents clusters and outliers of the prospective and non-prospective samples selected from the Bemadi, Makua and Brokoua sites. Based on the type of cluster or outlier, the threshold value of the cluster or outlier and the geoeconomic importance of the mineral target (see Section 5 for details) were used to select three classes of prospective samples. The AL cluster (referred to as Class 1 (AL)) was selected as the class of non-prospective samples. Cluster LL (referred to as Class 2 (LL)) was the mineral prospective target of low-grade placer deposits. Outlier LH (referred to as Class 3 (LH)) is the mineral prospective target of medium-grade placer deposits. The cluster HH and the outlier LH were grouped together as a single class (referred to as Class 4 (HH/HL)) of mineral prospective for targets of high-and very high-grade placer deposits. Table 2 summarizes all the evidence layers and features used to build the machinelearning prediction models in this study. Based on the type of cluster or outlier, the threshold value of the cluster or outlier and the geoeconomic importance of the mineral target (see Section 5 for details) were used to select three classes of prospective samples. The AL cluster (referred to as Class 1 (AL)) was selected as the class of non-prospective samples. Cluster LL (referred to as Class 2 (LL)) was the mineral prospective target of low-grade placer deposits. Outlier LH (referred to as Class 3 (LH)) is the mineral prospective target of medium-grade placer deposits. The cluster HH and the outlier LH were grouped together as a single class (referred to as Class 4 (HH/HL)) of mineral prospective for targets of high-and very high-grade placer deposits. Table 2 summarizes all the evidence layers and features used to build the machinelearning prediction models in this study.  (Figure 5a).
Value of the fracture density extracted at the sample locations into deposits (clusters HH/HL, LL and LH) and non-deposit (cluster AL) targets.
2 Evidence map of flow accumulation area (Figure 5b).
Value of the flow accumulation surface extracted at the sample locations into deposits (clusters HH/HL, LL and LH) and non-deposit (cluster AL) targets.

3
Evidence map of flow accumulation or magnitude areas (Figure 5c).
Value of the flow magnitude extracted at the sample locations into deposits (clusters HH/HL, LL and LH) and non-deposit (cluster AL) targets. Evidence map of slope curvature (Figure 6a).
Value of the slope curvature extracted at the sample locations into deposits (clusters HH/HL, LL and LH) and non-deposit (cluster AL) targets.
Value of the slope inclination extracted at the sample locations into deposits (clusters HH/HL, LL and LH) and non-deposit (cluster AL) targets. 6 Evidence map of relief (Figure 6c).
Value of the elevation extracted at the sample locations into deposits (clusters HH/HL, LL and LH) and non-deposit (cluster AL) targets.

Datasets and Processing
In step three (S3), four datasets (datasets 1, 2 and 3) were constructed. First, dataset 1 was used for the spatial aggregation statistical analysis (see Section 3.3). Based on the spatial aggregation statistical analysis results (expressed in the CoType field), the new dataset 2 was constructed. Prospectivity of placer deposits were weighted from 1 to 4. Value 1 was assigned to the non-prospective mineral target type (AL); value 2 to the prospective target type (LL); value 3 to the prospective target-type (LH), and value 4 to the prospective target type (HH/HL). Dataset 2 was composed of 2920 prospective LL, LH, HH/HL samples and 789 non-prospective AL samples.
Second, dataset 2 was randomly partitioned on two sub-datasets (called dataset 3 and dataset 4) for the learning phases. The learning dataset used for machine-learning algorithms is generally composed of 60%-80% initial data or observations [61][62][63]. In this study, training dataset 3 selected a ratio of 80% of dataset 2. Dataset 3 was composed of 172 non-prospective AL, 172 prospective HH/HL, and 90 prospective LH samples. The test dataset 4 was a ratio of 20% of dataset 2. These data were scrutinized to evaluate the robustness and accuracy of the output models in Section 6.

Evaluation of Predictive Models and Validation of Results
In step S5 of the methodological approach (Figure 3), several metrics were considered for evaluating the performance and robustness of the various SVM, KNN and RF predictive models, and then to evaluate the sensitivity of prospectivity maps generated with the best performing models (see Section 6.4). These metrics are cross-validation accuracy (CVA) calculated with Equation (5), sensitivity (true positive rate calculated with Equation (1)), specificity (true negative rate calculated with Equation (2) (3) and (4) were calculated from the true positive and the true negative rates in confusion matrices, as well as F-scores (calculated with Equation (6). The F-score metric was defined as the harmonic mean of the model's precision and recall (i.e., sensitivity). Details of the different metrics and their applications can be found in the literature (e.g., [64]

), and AUC (Area Under Curve) values based on ROC (Receiver Operating Characteristic) curves. Positive and negative predictive values (calculated with Equations
where: TP: the rate (expressed as a percentage) of true positives (refers to well-ranked samples). FN: number of false negatives (refers to misclassified positive samples). FP: number of false positives (refers to misclassified negative samples). TN: the rate (expressed as a percentage) of true negatives (refers to well-ranked negative samples).
Confusion matrices were subsequently constructed and used to calculate and compare the values of the evaluation metrics during training phases. For each category of machinelearning algorithms used in this study, the most precise ones were selected. Consequently, the three best models were used for the independent validation and the robustness-testing phase with dataset 4.

Prospectivity Mapping Process
GIS-based predictive modelling of mineral prospectivity is very practical and is being used increasingly to delineate repeatable mineral exploration targets [42]. A grid of 71,730 cells with an elementary cell size of 25 m 2 covering all the studied area was constructed. First, we superimposed the grid with the six evidence criteria maps (see Section 3.2) and we estimated the centroid point of each cell in GIS (ArcGIS v.10.2). By considering each pit emplacement as the centroid of the grid cell, we extracted the corresponding hydrological, geomorphical, and structural predictors parameter values into a new table. This table, called dataset 5, was used with the final KNN, SVM and RF models to predict prospectivity of the new prospective and non-prospective areas in the H-L Complex. Results are the model responses or the predicted coded values estimated by the final KNN, SVM and RF models. Finally, the model responses (predicted coded value) at each centroid point were rasterized using the most-frequent-value assignment rule. The most common coded value of the mineral target among neighboring cells was assigned in priority. Finally, color coding was applied to each grid cell to represent graphically the Nb-Ta placer deposit prospectivity. The result is the prospectivity map of placer deposits in the northern region of the H-L Geological Complex (see Section 6.4).
In MPM, the locations of known deposits and non-deposit locations can be used to assess the sensitivity of mineral prospecting models [19,65,66]. The locations of exploration targets generated by a MPM approach should be as coincidental as possible with nondeposit locations where mineral deposits are least likely to be present, due to inadequate geological parameters and lack of exploration indicators [35,65,67]. Several approaches were used to assess the sensitivity of mineral prospectivity models. Bonham-Carter et al., 1994 [68] applied the weights of classes of spatial values divided by their corresponding occupied area (the area occupied by each class of evidential values) to estimate the probability of discovering mineral deposits in the classes. Yousefi and Carranza, 2015a, 2016 [69,70] developed a Prediction-Area (P-A) plot through which the percentage of known deposits anticipated by prospectivity classes (prediction rate) and the occupied areas of the corresponding prospectivity classes are used to quantify the relative importance of different prospectivity models. By developing the P-A plot, both the prediction rate and the occupied area of exploration targets contribute to the evaluation of prospectivity models [71][72][73][74][75][76][77].
In this study, we used several known deposit and non-deposit cells to assess the sensitivity of predicted values by KNN, SVM and RF final models. The reference deposits and non-deposit areas were composed by each cluster and outlier point (see Figure 7) that was rasterized to a 25 m 2 cell. The reference map comprised placer deposit and nondeposit cells with 950 AL non-prospective cells (1.4 km 2 ) and 2972 (LL, LH and HH/HL) prospective cells (5.8 km 2 ) of the Bemadi, Brokoua and Makua sites. Then, the generated cells of prospective and non-prospective areas of the Bemadi, Brokoua and Makua sites were superimposed with the prospectivity maps generated by best-performing models. Finally, the known prospective and non-prospective reference classes areas (in km 2 ) corresponding to the new prospective and non-prospective classes areas predicted by prospectivity models were extracted and calculated. Then, confusion matrices were constructed and used to quantify the relative importance of the different prospectivity models. Thus, if two different prospectivity models delineated exploration targets in different occupied areas, but with the same prospectivity score, the performance of the prospectivity model with the smaller target areas is higher than that of the model with larger target areas [78]. Results of sensitivity analysis of prospectivity models are presented in Section 6.4.

Machine-Learning Algorithms Selection
Indeed, automatic classification algorithms (SVM, KNN and RF) require less learning data (thereby increasing the degree of classifier automation) while making them more easily configurable [79]. Their use is more valuable than traditional statistical techniques. In addition, the SVM automatic classification algorithms integrate during their learning phase, an estimate of the complexity of the data to limit the phenomenon of over-adjustment and to find a compromise between the adequacy of the data and the complexity of the data to be generalized.
Decision trees (such as RF) use a hierarchical tree algorithm to make classification decisions. The advantage is that the tree structure is transparent, which in comparison with ANN is easier to interpret [1]. The automatic RF classification method combines the performance of several decision-tree algorithms to classify or predict the value of a target variable [1,44]. To avoid the correlation of different trees, RF increases the diversity of trees by growing them from different subsets of learning data, created through the "bagging" procedure. This process provides greater stability and robustness to the output model, with slight variations in the input data, while increasing the accuracy of the prediction [44].
KNN is a simple algorithm developed by Fix and Hodges in 1951 [80]. This machinelearning algorithm stores all available cases and classifies new cases according to a measure of similarity (weight or distance). This method is a simple but effective way to classify new items. It is one of the simplest and most intuitive techniques commonly used in the field of statistical discrimination [7].

Training of SVM and Other Models
At the step S4, training SVM and other models were conducted using a 'methods and models' learning approach. All processing of classification approaches was performed using Matlab Version R2019a software. The tool provides a variety of supervised classification techniques that can be used to generate multiple output KNN, SVM and RF models by optimizing hyperparameters and testing different classification methods. The software can tune output models by selecting different advanced options. Some of these options are internal parameters or hyperparameters of the model that can strongly affect its performance. For a given model type, the software tries different combinations of hyperparameter values by using an optimization scheme that seeks to minimize model classification error and returns a model with optimized hyperparameters.
UIn addition, software (Matlab Version R2019a) used in this study allows the selection of cross-validation options during the training phase. Applying cross-validation methods also helps with choosing the best performing model by calculating the error using a testing dataset which has not been used for training. The testing dataset helps to calculate the accuracy of the model and how it will generalize with future data. In this study a k-fold cross-validation method with k = 5 (given the small size of the learning dataset 3) was used. The k-fold cross-validation method partitions data into k randomly chosen subsets (or folds) of roughly equal size. During the learning process, the tool uses one subset to train, and the model trained is validated with the remaining subsets. For each round, « x » samples are selected to constitute the learning sample. The remaining k-x samples are used to evaluate the performance of the model. The process is repeated k times, such that each subset is used exactly once for validation. To build the following model, samples were selected differently so that learning and validation samples were never the same. The average error across all k partitions was used to evaluate the accuracy and performance of the learning model.
Determining the host parameters to be used for the training of automatic classification algorithms is a crucial step in a predictive modeling process. This step involves developing a key classification procedure to obtain reliable predictions. However, it is difficult to a priori specify appropriate configuration procedures with the desired precision, as there are fewer rules for determining optimal host parameters in the real world [42]. Applying the grid search method, which consists of testing the hyperparameters of a model by simply crossing each hypothesis in order to create a model for each combination of hyperparameters, SVM, KNN and RF were trained in this study (interactively) by optimizing and testing different classification methods. For each hyperparameter, a set of values were tested. Table 3 presents methods used to form all models tested in this study.  Since the range of values of raw data varies widely, in some machine-learning algorithms, objective functions will not work properly without normalization. For example, many classifiers calculate the distance between two points by the Euclidean distance. If one of the features has a broad range of values, the distance will be governed by this feature. Therefore, the range of all features should be normalized so that each feature contributes approximately proportionately to the final distance. All training procedures were performed within Matlab Version R2019a, with a standardized data option set "True". Data normalization and standardization is a good practice. It normalizes each column of an array using standard scores or the feature-scaling normalization procedure [81].
For optimizing hyperparameters, the Matlab software used provides pre-set SVM models with kernel-scale advanced options and a manual kernel scale mode. SVM performance greatly depends on the kernel functions considered. Consequently, several activation functions were tested in this study, including linear, quadratic, cubic, and Gaussian (fine, medium, and coarse) functions. Box constraint level parameter value used for all multiclass classification approaches was fixed at 1; kernel scale parameter was varied among positive values log-scaled at 4, 8 and 61; and value of detection parameter was varied among positive values log-scaled at 0, 1, 2 and 9.
To optimize RF hyperparameters in this study, the maximum number of split parameters were tested among integers log-scaled of 10, 20 and 30; the number of learner parameter was tested among integers log-scaled of 30, 60, 100 and 500, and learning rate parameters were tested with a real value log-scaled of 1.
To optimize KNN hyperparameters in this study, a number of neighbor parameters were tested among integers log-scaled of 1, 10 and 100. Distance metric parameters were tested among Euclidean, Minkowski (cubic) and cosine to determine the distance to points. Distance weight parameters was tested among equal-weighted, Euclidean-weighted, and inverse squared distance weighted.

Discrimination of Mineral Targets
Results of the spatial autocorrelation analysis shown in Table 4 reveals that the geostatistical distribution of the 3709 Nb-Ta mineralized gravel samples was not the result of a random process (Table 4).  Figure 8 depicts the standard normal distribution of the pseudo-probability values (p) against their associated Z-scores. Here, 2920 samples (i.e., 79% of the surveyed placer deposits) can be grouped into five types of spatial clusters and outliers (or mineral target classes): High-High (HH), color-coded red; High-Low (HL), color-coded orange; Low-Low (LL), color-coded green; and Low-High (LH), color-coded blue. The estimated p-values of the 2920 samples range from 0.002 to 0.038 and, therefore, lie below the p = 0.04 threshold. The relevant features exhibit statistically significant spatial aggregation at 96%. The remaining samples (789 samples i.e., 21% of the surveyed placer deposits), having rejected this hypothesis (p > 0.04), are grouped into their own cluster in this study, which is referred to as AL (color-coded black). This cluster does not show statistically significant spatial aggregation and is characterized by a random distribution pattern.
Analysis of the distributions of mean grade of Nb-Ta values for the output cluster types indicates that a 44 g/m 3 threshold can discriminate between different types of deposits or mineral targets. In contrast, spatial variation in Nb-Ta grades for samples in Class AL (0-1126 g/m 3 ) shows very little similarity among centroids that are in proximity. Class AL represents areas where the probability of finding an HH, HL, LL or LH placer deposits is <4% and, therefore, could be considered negligible. It is thus a class that defines areas of non-prospectivity relative to other deposit types.
The LL cluster type represents a zone of accumulation of mineralized gravels with low Nb-Ta grades (<44 g/m 3 : averaging 5.63 g/m 3 ). The LL cluster therefore corresponds to prospective areas of low grade. Type LH represents an accumulation zone of medium-grade mineralized gravels (<44 g/m 3 : averaging 23.5 g/m 3 ), i.e., prospective medium-grade zones. This type is located at the edge of HH deposits, which represent very high-grade mineralized gravel accumulation zones (>44 g/m 3 : averaging 136.61 g/m 3 ). Finally, the HL outlier cluster type represents a zone of high-grade mineralized gravel accumulations (>44 g/m 3 ; averaging 74.08 g/m 3 ), which are generally surrounded by LL type deposits.
Low (LL), color-coded green; and Low-High (LH), color-coded blue. The estimated p-values of the 2920 samples range from 0.002 to 0.038 and, therefore, lie below the p = 0.04 threshold. The relevant features exhibit statistically significant spatial aggregation at 96%. The remaining samples (789 samples i.e., 21% of the surveyed placer deposits), having rejected this hypothesis (p > 0.04), are grouped into their own cluster in this study, which is referred to as AL (color-coded black). This cluster does not show statistically significant spatial aggregation and is characterized by a random distribution pattern. Analysis of the distributions of mean grade of Nb-Ta values for the output cluster types indicates that a 44 g/m 3 threshold can discriminate between different types of deposits or mineral targets. In contrast, spatial variation in Nb-Ta grades for samples in Class AL (0-1126 g/m 3 ) shows very little similarity among centroids that are in proximity. Class AL represents areas where the probability of finding an HH, HL, LL or LH placer deposits is <4% and, therefore, could be considered negligible. It is thus a class that defines areas of non-prospectivity relative to other deposit types.
The LL cluster type represents a zone of accumulation of mineralized gravels with low Nb-Ta grades (<44 g/m 3 : averaging 5.63 g/m 3 ). The LL cluster therefore corresponds to prospective areas of low grade. Type LH represents an accumulation zone of mediumgrade mineralized gravels (<44 g/m 3 : averaging 23.5 g/m 3 ), i.e., prospective medium-grade zones. This type is located at the edge of HH deposits, which represent very high-grade mineralized gravel accumulation zones (>44 g/m 3 : averaging 136.61 g/m 3 ). Finally, the HL

Machine-Learning Training Phases Analysis
As indicated in Section 5, the SVM, KNN and RF approaches were used to develop predictive models for Nb-Ta placer deposits within the H-L Geological Complex. The results that were obtained during the training phases are presented here.
A compilation of results (Table 5) shows that regardless of the training method that was used, the linear, quadratic and cubic function SVMs, as well as the medium and coarse Gaussian kernel SVMs, produced CVAs results of 52% or less. Only the SVM algorithm with a fine Gaussian kernel produced better results among the SVM models that were tested. CVA that was obtained with the SVM model with a fine Gaussian kernel was 61.9% with the OvO strategy, and 62.4% with the OvA. Moreover, this model had the highest AUC values (Table 5) and managed to correctly predict 75% of the observations in Class 3 (LH), 78% in Class 4 (HH/HL), 79% in Class 1 (AL), and 85% in Class 2 (LL).
The results obtained with the KNN model are shown in Table 6. The three coarse KNN models are clearly the least accurate in the series, with CVAs below 46% and lowest AUC values (63% to 67%). On the one hand, the various KNN algorithms (medium, coarse, cosine, cubic, weighted-Euclidean) display poor accuracy results (CVA <51%). On the other, the fine KNN models with cubic, Euclidean or cosine metric distances show clear improvement in results, with CVAs varying between 62% and 64%. Without question, the best performance is obtained with the KNN approach when weighting is based on inverse distance (IDW) or inverse distance squared (IDW2). In these cases, CVA reaches just over 65%, and AUC values (between 75% and 89%) are the best of those obtained for each of the four classes. As shown in Table 6, the IDW-or IDW2-weighted Euclidean KNN models predict 82% of the observations in Class 1 (AL), 89% in Class 2 (LL), 75% in Class 3 (LH), and 78% in Class 4 (HH/HL).  The last model category to be considered is the RF. Results are summarized in Table 7. CVAs that were obtained vary between 67% and 69%. There are few significant differences in the results that were obtained. For example, the RF model (30 splits (sp)/100 learners (L)) can predict 87% of Class 1 (AL) mineral targets compared to 88% for the RF model (30 sp/500 L). The two models correctly predict 91% and 81% of Class 2 (LL) and four (HH/HL) observations. The RF model (30 sp/100 L) predicts 82% of the observations in Class 3 (LH) compared to 81% for the competing model. Given the importance that is attributed to Classes 2, 3 and 4 in this study, RF model (30 sp/100 L) has been selected for further study in this category.  Overall, predictive abilities of the three models are comparable. They are higher for Class 2, and decrease slightly for Classes 1, 3 and 4. The different indicators (CVA, AUC, and F-score values) indicate that the KNN model is slightly better than the SVM model, but the differences appear negligible. The RF model is systematically superior to SVM and KNN, particularly for Classes 1, 2 and 4. The calculated F-scores (Figure 9b) confirm the Overall, predictive abilities of the three models are comparable. They are higher for Class 2, and decrease slightly for Classes 1, 3 and 4. The different indicators (CVA, AUC, and F-score values) indicate that the KNN model is slightly better than the SVM model, but the differences appear negligible. The RF model is systematically superior to SVM and KNN, particularly for Classes 1, 2 and 4. The calculated F-scores (Figure 9b) confirm the slightly higher performance of the RF model in this study, compared to SVM and KNN. Most importantly, each of the three models could be used to predict mineral targets, given the minimal differences between them.

Evaluation of Robustness of Output Models
In step S5 of the methodological approach (see Figure 3), several metrics were considered for evaluating the performance and robustness of the different SVM, KNN and RF classifiers (see Section 3.5 for details). Good performance using training dataset 3 does not guarantee that the model is sufficiently robust and that it is able to perform on new datasets against which it has not been previously trained. Therefore, the performance and robustness of the output SVM, KNN and RF, those identified as best for each category, were validated using the independent validation dataset 4 (composed with 20% of the original dataset 2). The resulting confusion matrices for each category (KNN, SVM, and RF) were analyzed considering CVA, AUC values and Kappa coefficients. Kappa values were interpreted according to the scale proposed by Landis and Koch [82]. Table 8 shows confusion matrices that were obtained during the independent validation phase.
Independent validation confirms the results that were obtained during the learning test phase. The best predictions are provided by the selected RF algorithm, with κ = 0.56 and CVA = 69%. These metrics are slightly higher than those obtained with the KNN model, which yields κ = 0.54, and similar CVA = 0.68. The SVM model provides slightly inferior results (CVA = 63%, κ = 0.46). Calculated kappa for the RF predictor model is very close to 0.6, indicating strong agreement on the scale created by Landis and Koch [82]. KNN and SVM models attain kappa values between 0.4 and 0.6, indicating moderate agreement on this same scale.

Evaluation of Sensitivity of the Prospectivity Maps Generated
The three predictive models that were retained following the learning and the robustness test phases provide encouraging results. However, KNN and RF are both predictive models that were used to map (following the approach described in Section 3.6) the prospectivity of new non-prospective and prospective exploration targets within the H-L Geological Complex, with a precision of 68% and 69%, respectively. Figure 10a,b illustrates the results.
The light grey areas, which are potentially non-prospective, are of little interest in terms of prospecting for new economically viable mineral targets. In contrast, two categories of potentially favorable zones for the concentration of economically valuable mineralized gravels are highlighted. The first are areas that are favorable for the concentration of medium-grade (orange) and high-grade (red) gravels. These areas correspond mainly to interfluves and river or marshland edges, but they also include areas located near rocky outcrops of mineralized granitic pegmatites. They occupy elevated topographic positions and correspond to areas that are prone to poor drainage. These zones are marked by medium to high-fracture density. The second are zones that are potentially favorable to the concentration of low-grade mineralized gravels (in green). These are zones occupying lower topographic positions, subject to considerable drainage processes, and characterized by a lower fracturing density. They are very extensive and occupy most of the flats in the sub-watersheds under study. Table 9 shows confusion matrices used to evaluate the sensitivity of both KNN and RF prospectivity models.
As observed, the performance of both prospectivity models is very similar. RF and KNN prospectivity models are more sensitive to the wide LL areas (i.e., 3.12 km 2 ) with low-grade deposits at 84% and 77%, respectively. RF and KNN prospectivity models are more specific to the smallest LH and HH/HL prospective areas (i.e., 0.9. to 1.81 km 2 ) with medium-grade and high-grade deposits at 94% and 93%, respectively. Comparatively, the calculated metrics (accuracy, sensitivity, specificity, and F-score) confirm the slightly higher performance of the RF model in this study, compared to KNN.
Result of the sensitivity analysis shows that in the context of a lack of exploration data with complex and non-linear spatial relationships, between spatial evidence criteria and known mineral targets, an MPM approach-based on knowledge of the mineral system and geostatistical analysis of soil geochemistry data (acquired from the field), and using the predictive power of supervised machine-learning algorithms (e.g., SVM, KNN, RF)-is a valid approach to model at a satisfactory rate a set of exploration criteria. The light grey areas, which are potentially non-prospective, are of little interest in terms of prospecting for new economically viable mineral targets. In contrast, two categories of potentially favorable zones for the concentration of economically valuable mineralized gravels are highlighted. The first are areas that are favorable for the concentration of medium-grade (orange) and high-grade (red) gravels. These areas correspond mainly to interfluves and river or marshland edges, but they also include areas located near rocky outcrops of mineralized granitic pegmatites. They occupy elevated topographic positions and correspond to areas that are prone to poor drainage. These zones are marked by medium to high-fracture density. The second are zones that are potentially favorable to the concentration of low-grade mineralized gravels (in green). These are zones occupying lower topographic positions, subject to considerable drainage processes, and characterized by a lower fracturing density. They are very extensive and occupy most of the flats in the sub-watersheds under study. Table 9 shows confusion matrices used to evaluate the sensitivity of both KNN and RF prospectivity models.

Discussion
This study is one of the few mineral prospectivity studies of Nb-Ta placer deposits in West Africa that has been based on spatial analysis and machine-learning methods. The proposed approach lies in prospectivity modeling for placer deposits using geomorphic and landscape parameters. Moran's spatial aggregation statistical analysis technique allowed us to discern geostatistical distribution patterns of 3709 locations of Nb-Ta mineralized gravel samples. The geostatistical distribution is significant at the 96% level, which could distinguish (at a threshold of 44 g/m 3 ) several classes of prospective and non-prospective mineral exploration targets. These classes reflect the numerical composition of the different placer deposit types which were identified in previous work in this study area [24][25][26][27][28][29][30].
Moran's spatial aggregation resulted in automatic definition of five clusters, some of which contained large numbers of samples (e.g., the AL class), while others were underrepresented. Segmentation of the input dataset for learning process (following a random process) satisfies scientific rigor and guarantees a certain a posteriori robustness of the models, but it can generate asymmetric learning and validation data subsets. In the real world, and particularly in mineral prediction, such asymmetry in training data is not the exception.
The consequence is that it can affect the results of the classifications. In semi-supervised or supervised predictive classification approaches, standard classifiers (e.g., maximum likelihood) tend to classify majority classes better, while underestimating minority classes [14]. Over large geographical areas, mineralization may be considered a rare event, with low relative representation [14,83,84]. The results obtained by standard classifiers may be affected by uncertainties in the classification of data on rare events, since the decision limits of classifiers are largely biased towards the majority class and tend to ignore rare events [14]. Results obtained confirm that the SVM, KNN and RF models can handle data asymmetry and produce improved performance in predicting mineral prospectivity in new regions of interest [85][86][87]. To overcome some of the limitations and potentially improve the results, balance techniques, such as the synthetic minority oversampling technique (SMOTE) [88][89][90], could be investigated in forthcoming studies. Observed differences in model results for the four mineral classes are not only from inherent properties of the models, but also from the skewed distribution of the training samples. Indeed, Class 1 comprised 23% of the training dataset, compared to 46% for Class 2, 16% for Class 3, and 26% for Class 4. Analysis of the different performance metrics (CVAs, AUCs, false positives, and false negatives, among others) shows that they are partly dependent on sampling. For example, all three models have high AUCs for Class 2, which contains 46% of the samples, and these values tend to decrease with the number of samples. Overall, the SVM and KNN models were apparently more sensitive to the structure of the training dataset than was the RF predictor model.
The novelty of this study lies in prospectivity modeling for Nb-Ta placer deposits using geomorphic and landscape parameters. Based on important mapping indicators or criteria and using a soil geochemical dataset to assess Nb-Ta sample locations and grades, this study used the predictive power of supervised automatic classification machinelearning algorithms to find predictive models complex enough to capture the nature of the exploration dataset. Results obtained were very close with the three best models (SVM, KNN and RF), especially during the learning phase. A priori, the spatial evidence criteria that were considered in this study as spatial predictors for Nb-Ta placer deposits are highly relevant (see Section 3.2). Results of independent validation confirm the performance of all three predictor models, especially for the final RF model and the final KNN model. The final RF model exhibits a CVA of 69% with Kappa equaling 0.6, and the final KNN model exhibits a CVA of 68% with Kappa equaling 0.6. According to the Landis and Koch [82] rating scale, agreement between predictions and field observations is strong. The final KNN model successfully predicts 67% of known prospective occurrences that were selected as control points, and the final RF model can predict, correctly, 79%. Based on those statistical results, MPM is indeed possible here with a good level of confidence.
The overall performance of the resulting final predictive models was assessed in both training and test datasets using a confusion matrix set of statistical measurements, AUC value, and F-score. The assessment results indicate that the three machine-learning models presented in this study achieved satisfactory performance levels characterized by high predictive accuracy. The accuracy rates that were obtained with the RF, KNN and SVM models in this study are very similar to those reported in previous MPM studies [1,10,42,86]. Most of these works attest that RF and KNN are sufficiently powerful and robust classifiers to predict mineral prospectivity at satisfactory rates. As demonstrated in this paper, performance of both prospectivity models is very similar. RF and KNN prospectivity models are more sensitive to the wide LL areas (i.e., 3.12 km 2 ), with low-grade deposits at 84% and 77%, respectively. RF and KNN prospectivity models are more specific to the smallest LH and HH/HL prospective areas (i.e., 0.9. to 1.81 km 2 ), with medium-grade and high-grade deposits at 94% and 93%, respectively. Comparatively, the calculated metrics (accuracy, sensitivity, specificity and F-score) confirm the slightly higher performance of the RF model in this study, compared to KNN.
The results show that the new prospective areas for medium-and high-grade Nb-Ta are located within interfluve zones, along rivers or marshes (asymmetrically) and near rocky outcrops of mineralized pegmatitic granites. This means that the mineralized gravels have not been displaced over a long distance during successive runoff events and that they have accumulated progressively in the form of placers, preferentially in colluvium and eluvium, and within the vicinity of mineralization sources. This observation confirms results of a textural study of gravels in the Issia region [30]. Indeed, the current study showed that Nb-Ta minerals released on the crests and flanks of the interfluves is generally large and angular in shape. In comparison, previous work [30] showed that at the level of the interfluves, lateritic soils rich in gravels with high Nb-Ta mineral contents were developed. These high Nb-Ta grades are the consequence of the in situ or nearby degradation of a large quantity of mineralized pegmatites.
Topographic crevice zones are not very favorable for the weathering of primary host rocks or drainage of poorly eroded heavy minerals. Prospectivity maps generated shows that topographic crevice zones are narrow and elongated at the edges of rivers or marshes and a little more spread out at the level of the interfluve zones. Topographic crevice zones are real natural traps of heavy minerals, which are fixed in space and time. Previous works [91] reported similar observations involving the formation of placer-type diamond deposits along the Orange River in South Africa. The study highlighted three types of natural traps that were formed by topographic irregularities, which are fixed in space and favorable to the accumulation (in the same place) of heavy diamond minerals of high economic value. Prospective areas with low-grade minerals also deserve further attention. Our results show that these prospective areas occupy slightly lower topographic positions and correspond to the vast areas of lowlands and valleys that are subject to important drainage processes.
The results obtained in this study have demonstrated that, considering the landscape evidence criteria, the low potential to transport fragments and erode mineralized pegmatitic host rocks can be used not only as a spatial indicator for searching new zones favorable to concentrating these little-altered pegmatites, but also as a spatial criterion guiding the search for new deposits of medium and highly mineralized placers.

Conclusions
Results showed that the final RF prospectivity models generated with 30 splits and 100 learners, and the final model obtained with the KNN approach when weighting is based on inverse distance (IDW) or inverse distance-squared (IDW2) with 10 as number of neighbor parameters, emerged as the best performers, followed by the final SVM (fine Gaussian kernel). The RF model, with a Kappa coefficient of about 0.6, predicts mineral targets with rather strong agreement, while the other two models (Kappa about 0.5) do so with moderate agreement. Over a broad exploration area, the RF model and KNN approach has predicted and delimited areas of topographic crevices that are potentially favorable for medium-and high-grade Nb-Ta deposits of economic interest. Moreover, sensitivity analyses of both the final KNN and RF prospectivity maps confirm the slightly higher performance of the RF model. Final RF and KNN prospectivity models are both more sensitive for wide areas occupied by low-grade placer deposits at 84% and 77%, respectively. Conversely, they are more sensitive for the smallest prospective areas of economic interest with medium-grade and high-grade placer deposits at 94% and 93%, respectively. These results help to reduce interpretation bias by geologists in predictive MPM, thereby reducing exploration risk and costs.