Using Machine Learning to Map Western Australian Landscapes for Mineral Exploration

: Landscapes evolve due to climatic conditions, tectonic activity, geological features, biological activity, and sedimentary dynamics. Geological processes at depth ultimately control and are linked to the resulting surface features. Large regions in Australia, West Africa, India, and China are blanketed by cover (intensely weathered surface material and/or later sediment deposition, both up to hundreds of metres thick). Mineral exploration through cover poses a signiﬁcant technological challenge worldwide. Classifying and understanding landscape types and their variability is of key importance for mineral exploration in covered regions. Landscape variability expresses how near-surface geochemistry is linked to underlying lithologies. Therefore, landscape variability mapping should inform surface geochemical sampling strategies for mineral exploration. Advances in satellite imaging and computing power have enabled the creation of large geospatial data sets, the sheer size of which necessitates automated processing. In this study, we describe a methodology to enable the automated mapping of landscape pattern domains using machine learning (ML) algorithms. From a freely available digital elevation model, derived data, and sample landclass boundaries provided by domain experts, our algorithm produces a dense map of the model region in Western Australia. Both random forest and support vector machine classiﬁcation achieve approximately 98% classiﬁcation accuracy with a reasonable runtime of 48 minutes on a single Intel ® Core™ i7-8550U CPU core. We discuss computational resources and study the effect of grid resolution. Larger tiles result in a more contiguous map, whereas smaller tiles result in a more detailed and, at some point, noisy map. Diversity and distribution of landscapes mapped in this study support previous results. In addition, our results are consistent with the geological trends and main basement features in the region. Mapping landscape variability at a large scale can be used globally as a fundamental tool for guiding more efﬁcient mineral exploration programs in regions under cover.


Geomorphology Background
The overall increase in demand for commodities to support a growing population is a powerful driver for mineral exploration. Large surface areas of the continents are covered by intensely weathered surface material and/or sediment deposition, such as in Australia, West Africa, India, and China [1]. Exploration through this cover is becoming one of the fundamental challenges for the minerals industry in this century.
Landscapes contain essential information related to the geochemical footprint of ore deposits at depth (e.g., [2][3][4]). Variable surface topographical features can be grouped to define and classify unique landscape domains. Climatic conditions, tectonic activity, geological features, biological activity, and sedimentary dynamics are fundamentally linked to landscape evolution and its diversity (e.g., [5]). Consequently, the study of landscapes can link surface features to geological processes at depth. Ore deposits and mineral systems can have dispersed and enhanced geochemical footprints due to landscape evolution (e.g., [3,6,7], and references therein). Geochemical dispersion halos through the cover can be detected by selecting suitable landscape regimes and appropriate sample media (e.g., [8][9][10][11][12][13][14][15]).
Cataloguing landscape differences and evolution at a regional scale can be challenging. The primary difficulty is associated with the selection of the geographic extension of surficial features. Field observations have been relied upon to understand landscape diversity. However, a constraint on this approach lies in the uncertainty related to the extrapolation of field observations, especially when attempting to extrapolate them to regional scales. Such extrapolation can be unreliable due to the complexity and variability of landforms, the paucity of data availability, and the difficulty in defining quantitative criteria that discriminate diverse landscape types. Modern data analytics technology and advanced satellite imaging provide access to large data sets that can assist in characterising landscape features and their distribution at regional scales. The ability to accurately map landscape domains enhances the ability to link surface geochemistry with the geology at depth.
The landscapes of Western Australia are characterised by low topographic relief and a deeply weathered blanket that covers most of the state. This landscape context is challenging for mineral exploration due to the lack of fresh rock exposure at surface. However, domain experts have identified distinct landscape variability that changes imperceptibly at small scales. Landscape evolution may record geochemical and stratigraphic information that link the basement and the surface (e.g., [5,11,[16][17][18][19]). The integration of landscape evolution and surface geochemistry in deeply and intensely weathered regions is a fundamental tool in mineral exploration used around the world (e.g., [2,6,7], and references therein).
Landscapes are defined by their surface features, such as landforms, vegetation, slopes, sedimentary systems, soils, etc. [20]. The variability of these features allows landscapes to be differentiated from one another, which facilitates classification. From a geological perspective, landscapes are fundamentally linked to: (i) climatic conditions, which control, for example, water availability and weathering processes; (ii) tectonic activity, a main driver of the evolution of surface elevation, which also has a significant impact on sedimentary dynamics and erosional processes; and (iii) geological features, such as structures and lithologies, which control the architecture and variability of the land at surface and at depth [5,[16][17][18][19]. Consequently, studying landscapes can reveal the link between surface features and geological processes at depth. In this work, we explored the question of whether these subtle differences between landscape types in low-relief and intensely weathered regions can be quantified on a regional scale, using machine learning approaches, with only desktop compute power.
With this paper, we aim to provide a useful tool for mineral exploration in areas of cover. The innovation presented in this paper comprises the mapping of landscape variability types at a large scale in a covered region, where mineral exploration is highly challenging. The outputs of this research allow for the classification and fine tuning of mineral exploration protocols in such regions to improve sample media selection strategies and more successfully interpret surface geochemical data. The methodology described in this paper can be implemented in other similar regions globally to assist mineral exploration for any commodity (e.g., Au, Ni, etc).

Overview of Technologies Using Remote Sensing Data for Landscape Mapping
For small scales of approximately 10 km, a wide variety of quantitative methods have been developed in geomorphology ( [21][22][23][24][25], and references therein). Others have applied quantitative methods successfully at a regional and continental scale (e.g., [26][27][28]). The difficulty lies both in identifying features that can be used to differentiate landscapes into different domains and in using those features to determine the geographic extent of the domains. This problem is exacerbated in a setting where landscapes are difficult for the untrained eye to distinguish, as is the case of the deeply weathered landscapes of Western Australia.
One of the most challenging aspects of landscape study is cataloguing their differences at regional or even continental scale (e.g., [29]). Integrating data from observations on large scales is an important problem also outside of mineral exploration (e.g., [30]). The difficulty lies not in identifying features that can differentiate landscapes into different domains but also in using those features to determine the geographic extension of the domains. Field observations have been crucial to understand landscape diversity at the regional scale [11]. However, field observations are practically limited to the immediate vicinity of roads, and hence are inherently linear, and any attempt at extrapolating to regional scale introduces significant uncertainty. Such extrapolation suffers from the complexity and variability of landforms and the difficulty in defining quantitative criteria that discriminate diverse landscape types.
Advances in satellite imaging and computing power have enabled the creation of huge yet accessible geospatial data sets. These data sets can assist in characterising landscape features and their distribution at regional scales (e.g., [31][32][33]), which directly impacts many disciplines, including mineral exploration, geomorphology, environmental studies and geoarchaeology, among others (e.g., [2,6,7,34]). Moreover, combined with the drive toward ML in recent years, such data sets have enabled a data-driven approach to automatic landscape classification, without explicitly deriving a specific metric first. Presented with a set of labeled data, an ML classifier learns how to distinguish between classes and can then predict labels for unseen data. Support vector machine and random forest classifiers typically produce the most accurate results. State-of-the-art in the field of computer vision, deep convolutional neural networks (DCNNs) are becoming more common in geology as well. DCNNs implicitly aggregate data, require little to no feature engineering [35], and classification accuracy often exceeds 90%. However, training is computationally expensive.
Classification for mapping purposes is naturally based on spatial input data, which lends itself to either a pixel or an object approach [36]. Conceptually simple, the pixel approach directly feeds pixel values into ML. However, since a pixel is a largely arbitrary region from a geographical point of view, the pixel approach is typically less accurate as it disregards spatial context. By contrast, an object is a geographically meaningful region and therefore respects natural borders (e.g., roads, for urban mapping). Pixel values from such a region are aggregated before being fed into the ML classifier, preferably in a way that preserves spatial context. This aggregation may improve accuracy, but it is more complex to implement.
Most studies of land cover/land use classification or urban mapping use the pixel approach, and typically achieve accuracies of 80%-90%, sometimes 95%. Lidberg et al. [37] showed that ML outperforms previously used single thresholds when creating highresolution wet area maps. They reported an overall accuracy of 84% for RF and artificial neural networks (ANNs), followed by 82% for SVM, as compared to 79% for singlethreshold classification.
Maxwell et al. [38] compared the performance of six ML classifiers for an agricultural and an urban land cover data set, and reported 89.1% accuracy for SVM and 87.1% for RF. Feature selection (picking only the most important features) increased accuracy to 94.4% for SVM, while decreasing that for RF to 87.8%. They also recommend that, "if possible, the analyst should experiment with multiple classifiers to determine which provides the optimal classification for a specific classification task." Similarly, Abdi [35] compared SVM, RF, extreme gradient boosting, and DCNNs for land cover and land use classification in a boreal landscape. SVM performed best in his study; however, all approaches yielded similar accuracies in the range of 73%-76%. For urban mapping, Zhang et al. [36] combined remote sensing data and social sensing data, and reported 78% accuracy using a random forest classifier. Fouedjio and Klump [39] investigated the influence of spatial autocorrelation on predictions by random forest and benchmarked them against kriging, which is used more conventionally as a geostatistical technique in mineral resources exploration.
An interesting approach that fits the landscape mapping category, but which does not make use of canonical ML classifiers, is that of Jasiewicz et al. [27,40]. They first computed a signature for a local region (e.g., a 16 × 16 px block) called a scene. Then, they computed the distance between scene signatures. Their library provides several signature functions and distance metrics. "Being able to quantify the overall similarity between two landscapes using a single number is a key element" of their work, which allowed them to use averaging for classification resulting in excellent run-time performance. Furthermore, how their algorithm makes decisions can also be easily reproduced and validated by a human. However, reducing landscape complexity to a single dimension inevitably leads to some information loss. In connection with the various averaging steps employed in their methodology, this may or may not be critical. They do not report any accuracy metric.
The literature suggests that landscape mapping algorithms based on classic ML are often less accurate than those using DCNNs, perhaps because most studies follow the pixel approach, i.e., they do not aggregate pixel values prior to classification. Our goal was to implement an algorithm that (1) performs comparably to ANNs/DCNNs (i.e., accuracy ≥90%), but is based on classic ML methods for which training is cheap; (2) is open source, written in a modern language, and, therefore, is easily extendable; and (3) can potentially be run in parallel, enabling processing on a continental scale.
We apply our algorithm to a region of approximately 1000 × 1000 km 2 in the south of Western Australia (see Figure 1). Inputs are a freely available (licensed under the terms of the Creative Commons Attribution 3.0 Australia license) digital elevation model (DEM) from Geoscience Australia [41] along with derived data, with sample landclass boundaries provided by domain experts. Our algorithm then produces a dense map of the model region. In the following section, we review a general ML workflow and detail our methodology. In Section 3, we study how tile size affects classification accuracy and demonstrate a number of extensions that may improve accuracy or provide additional functionality. Discussion and conclusion are presented in Section 4.

Machine Learning
Traditionally, for a computer to solve a problem, a series of instructions are explicitly programmed, largely of type "if this, do that", tailored to said problem. In ML, instead of explicit programming, a model is trained. One method of training is to provide known input data along with known sound decisions. After successful training, the model is able to make sound decisions from new, unseen data. This method is known as supervised learning and is what we employed in this paper. The main advantage over explicit programming is that ML models can learn complex relationships between data and decisions (provided enough training data is available), which otherwise would require a human to implement, thereby avoiding costly and error-prone algorithm develoment. Even worse, said relationships may not even be known, rendering explicit programming impossible.
The main advantage over explicit programming is that ML models can learn complex relationships between data and decisions (provided enough good training data is available), which otherwise would require a human to implement similarly complex-hence, costly and error-prone-algorithms. Even worse, said relationships may not even be known, rendering explicit programming impossible.
A general ML workflow includes three main steps: (i) preparing the source data, i.e., selecting data that is relevant to the problem at hand, and transforming it into a format suited for (ii) training a model, followed by (iii) predicting new, unseen data [43]. Machine learning methods can be split into two approaches: supervised and unsupervised learning. Supervised learning requires labeled data on which a model can be trained; such labels are usually provided by a human domain expert. Unsupervised learning requires no labels; one goal could be to detect clusters-i.e., groups with similar features-within the data. We limit this study to supervised learning, but note that an extension to unsupervised learning is straightforward.
Our task is classification: predicting a label for a sample, given a set of features. As labels, we chose arbitrary integers that uniquely identify the eight landforms indicated in Figure 1. We have tested a number of classification methods, namely (i) support vector machine classifier with linear and radial basis function kernels, denoted SVC-LIN and SVC-RBF [44,45]; (ii) decision trees (DT, [46]); and (iii) random forests, consisting of 100 or 1000 Decision Trees (RF-100 and RF-1000, [47]), all accessed via their Python wrappers in scikit-learn [48]. Implemented in Python 3.7, our code is available at https://gitlab.com/ fgradi/lpr/. This paper is based on the code dated 1 July 2021.

Data Preparation: Assembling the Feature Matrix
Many fundamental ML algorithms are designed to operate on a so-called feature matrix. Each row in this matrix corresponds to a sample of the quantity we want to predict. In our case, a sample is the smallest spatial unit that gets assigned a landclass label. We call these units "tiles" (cf. Figure 2), and chose square tiles with side lengths ranging from 51 to 1001 px (4.4 to 86 km). Columns represent features: attributes of a tile which we think are relevant for its classification, e.g., slope or flatness. Assembling the feature matrix from our source data is the major and first of the three steps of our ML approach, described in detail below.
Since the task is 2D pattern recognition and classification, it is crucial that (some) spatial context is preserved during this step. In initial exploratory analysis, we applied principal component analysis to the feature vectors for each pixel, with each pixel treated as an independent observation, and with no consideration of the spatial correlation between nearby pixels. When the pixel data were projected onto the principal components, we found that the component scores did not cluster according to the landscape classes, and the resulting map (not shown) was inaccurate and noisy. This approach produces a large feature matrix, which increases memory requirements.
A more viable approach is spatial aggregation of our source data, but in a way that preserves variation of features typical for a particular landscape class. Simple averaging does not work, as it smooths out those variations, leaving little to distinguish between classes. Instead, we decompose the data into tiles as shown in Figure 2 and compute per-tile normalised histograms of source data. Tiles may overlap, and tile size s and stride r can be specified. Treatment then depends on the type of numerical input, which is either nominal, ordinal, or integer/real. Integer and real data have both ordering (1 < 2 < 4) and distance (2 − 1 < 4 − 2). Nominals have neither, e.g., "sunny", "overcast", "rainy". Ordinals have an ordering, but no notion of distance: "cold" < "warm" < "hot". For nominals, within each tile, we calculate the fraction of each area that is equal to a possible value. For ordinals, treat as nominals except using ≤ instead of = on each possible value. For integers and reals, discretise, then treat, as for ordinals.
Nominal or ordinal source data with n possible values results in n features. For integers and reals, the chosen discretisation determines the number of features generated. For example, the slope relief classification source data consists of six slope classes of ordinal type, ranging from "level" to "very steep". We generate six slopeX features, each value being the fraction of a tile that has the corresponding slope relief classification.  Within each tile, we compute the fraction of each area that is equal to a possible value. In other words, we compute per-tile normalised histograms of source data. In this example, we use s = 6 and a generic field of "data" ranging 0 to 6, sorted into seven bins, which results in seven features "data0" . . . "data6". (c) The bounding box of a landclass training region determines the training grid (gray), which generally differs from the grid (black) used for predicting the final map. This helps to identify overfitted models. See text for details. Table 1 summarises all 37 features. We do not use the DEM directly as we aim to classify shape, not elevation. We do, however, include basic statistics of the DEM as features, such as skewness, kurtosis, and interquartile range. Normalising the data is useful as some ML algorithms are not scale invariant, i.e., they perform poorly if some features range, e.g., from 0 to 1 while others range from 0 to 100. The code is designed to make adding additional features straightforward. For example, one could easily include adjacency information, as is done in [27].
The histogram approach has a number of advantages: (i) tile size directly controls the spatial scale of classification, (ii) model training is comparably cheap, potentially enabling continent-scale classification, (iii) classification is invariant under 90 degrees' rotation, (iv) implementation is reasonably straightforward, and (v) how the algorithm decides can be easily validated by a human. Convolutional neural networks provide an alternative approach, and their applicability to landscape pattern recognition could be explored in the future.  Table 2), NH, six bins [54]

Training, Cross-Validation, Prediction
Once the feature matrix is assembled, we train a number of ML models. Table 3 shows the number of training samples per class. Class 8 has almost 10,000 samples, whereas class 7 has only 560. This is a consequence of the difference in extent of a land class, and scale over which their features vary, cf. Figure 1: class 8 in the NW of our study area covers a vast area of several hundreds of kilometres, whereas class 7 covers only some tens of kilometres. To address this imbalance, we analyse both overall accuracy (defined as the number of correct predictions divided by the total number of predictions) and per-class accuracy. Initially, we estimated accuracy by tenfold cross-validation: Our labeled data is decomposed into ten equal-sized subsets. A model is trained using nine of these, and its performance is tested on the remaining subset. Repeating this process for different subsets yields ten accuracy estimates, the mean of which we refer to as cross-validation accuracy. Accuracy may be affected by the choice of hyperparameters, i.e. parameters that are unrelated to the training data, and which can be tuned during training. The hyperparameter space is typically small enough such that a simple grid search is feasible. Once we are satisfied with model performance, we train the final model using the entire training data set, and then use it to predict a map.
As an alternative to cross-validation accuracy, one can evaluate accuracy post-training, i.e., based on said final prediction, for all tiles for which the ground-truth is known. However, this is usually considered bad practice, since accuracy would be based entirely on the same data that the model has been trained on.
In our case, however, it was cross-validation accuracy that proved problematic. For tile sizes larger than 17 km, cross-validation accuracy would often approach 100%, whereas post-training accuracy for the same model was much lower, indicating overfitting. This discrepancy is due to the fact that the grids we use for training are, in general, offset from the one on which we predict the final map, as shown in Figure 2c. For training, we extract a block of pixel data defined by the axis-aligned minimum bounding box of a landclass, then decompose that block into tiles (gray grid). The grid for predicting the final map (black), however, is defined by the extent of the input DEM. While aligning both grids would be trivial, we chose not to, as this mismatch helps detecting models that are overly grid-dependent, i.e., overfitted. Post-training accuracy was almost always lower than cross-validation accuracy. We therefore chose to report only the conservative estimate of post-training accuracy, referred to as "accuracy" below.

Computational Cost
In this section, we report the computational cost in terms of aggregation, training, and prediction CPU time, as well as memory requirements. All processing was carried out on an Intel ® Core™ i7-8550U CPU notebook with 16 2 . Decomposition into overlapping tiles using a constant stride size of 25 px produces ≈155,000 tiles of which ≈30,000 are invalid (ocean). A tile is considered valid if it contains at least 25% valid pixels. We emphasise that the total number of tiles is largely independent of tile size due to keeping the stride size constant. This means that overlap increases with tile size, from 50% at the smallest tile size, to 75% at what we report as the best tile size (see below), to 97.5% at the largest tile size studied. This decision was made in an attempt to keep the number of training samples constant, while still enabling us to study how aggregation length scale affects classification accuracy. We would expect that accuracy is low for small tiles as they fall short of the defining length scales. Increasing the tile size, without increasing the overlap at the same time, would reduce the number of training samples, which typically degrades accuracy. Figure 3 shows how aggregation, training, and prediction runtimes scale with tile size. By far the largest cost, aggregation (Figure 3a) takes between 1 and 25 h and scales approximately with tile size squared. This is expected, as the total number of tiles is basically constant. We note that aggregation is a perfectly parallel problem, i.e., parallelisation should result in a linear speed-up. However, we did not attempt parallelisation of the process in this project.  (a) (b) Figure 3. Effect of tile size on (a) aggregation and (b) training and prediction runtime. Note that, for clarity, we omitted prediction runtime for SVC-LIN. It has a similar (lack of) dependency on tile size as RF-100, but is two orders of magnitude faster.
Training (solid lines in Figure 3b) is orders of magnitudes faster, typically taking seconds to minutes depending on the ML algorithm. As the size of the feature matrix is largely independent of tile size, the same should be true of training runtime, which is indeed the case for RF-100 and SVC-LIN. For SVC-RBF, however, training runtime varies significantly, from 1.6 s at s = 86 km to 95 s at s = 4.4 km, which may be due to the large overlap and hence rather strongly correlated training samples at large tile sizes.
Finally, prediction runtime (dashed lines in Figure 3b) generally follows the same trends: it is largely independent of tile size for RF-100 and SVC-LIN, but it is strongly dependent on tile size for SVC-RBF. Prediction is about as fast as training for SVC-RBF, and about four times faster for RF-100. For SVC-LIN, prediction is 100 times faster than training, takes less than 0.05 s, and was omitted from Figure 3b for clarity.
Memory usage varied between 1.8 and 2.2 GB. A larger workstation or cluster implementation should be sufficient to run the algorithm for the entire continent of Australia.

Hyperparameter Tuning
Classification accuracy and runtime performance depend on the choice of hyperparameters: parameters specific to the ML algorithm, which are selected before training, and should be tuned for the problem at hand.
For the support vector machines, there are two hyperparameters: a regularization parameter C and a kernel coefficient γ [45]. Since training is rapid in our case, we conducted a simple grid search, which produced an optimum C = 10 6 and γ = 10 −4 for SVC-RBF, and C = 10 4 for SVC-LIN.
For the random forest classifier, one has to specify the number of decision trees. We report results for a random forest consisting of 100 decision trees (RF-100). We have also tested RF-1000, which increased training and prediction time by a factor of 10 (as expected), improved accuracy by merely 0.2%, and produced virtually identical maps.

Results
We visualise results by generating maps such as those in Figure 4. Semitransparent colours indicate the predicted landclass, dashed lines indicate training regions, and both are overlaid on top of the flatness map. A square (or circle) in the bottom right corner indicates the tile size. As explained above, tiles from the training set (i.e., those within dashed lines) may still be misclassified, which is the case if post-training accuracy is less than 100%.

Effect of Tile Size on Accuracy
In this section, we will discuss the effect of tile size on the classification accuracy for different models. As expected, this effect is quite significant: "All spatial analyses are sensitive to the length, area, or volume over which a variable is distributed in space" [55]. In Figure 4a, we varied tile size from 4.4 to 86 km (51 to 1001 px), which produced an accuracy between 39.8 and 97.8%, shown by solid lines, for each of our models. Shaded areas indicate the range of per-class accuracy, again per-model, which we will discuss further below.
On average, RF-100 (black line) produces the highest accuracy, followed by SVC-LIN (blue) and SVC-RBF (red). The best map, shown in panel (b), had an accuracy of 97.8% and was produced by RF-100 with a tile size of 8.7 km (101 px). This was the second smallest tile size we tested. The minimum per-class accuracy in this case was 94.6%, indicating excellent classification for every landclass.
The remaining algorithms achieve best accuracy at similarly small tile sizes. Accuracy peaks at s = 13 km for SVC-RBF, and s = 17 km for SVC-LIN. The resulting maps, shown in Figure 5, are not fundamentally different from the overall best map in Figure 4b.
In general, smaller tiles increase the (nominal) map resolution and the number of available training samples, too little of which typically degrade accuracy. However, smaller tiles also increase the time required for aggregation and training. More severely, smaller tiles may no longer capture the spatial scales which define a particular landscape. The result is a noisy map. Conversely, larger tiles reduce noise and generate smoother boundaries between classes: Figure 4c shows a map with a tile size of s = 21 km, which still results in 96.3% accuracy. However, larger tiles also reduce the number of available training samples, possibly to zero for small training regions.  Overlapping tiles, i.e., partly reusing data, may alleviate this problem, at the risk of producing overfitted models that fail to generalise well. This can be observed in Figure 4d, for landclass 7: the training region is smaller than the tile size, which means that without overlap, there would be no training samples at all. With a constant stride size, the overlap is 97.5% and produces 519 training samples. Yet, accuracy for this class is only 33%, due to the lack of independent training samples. In general, classes with small training regions yield the lowest class accuracy (i.e., classes 2, 5 and 7 in our example). In Figure 4a, the shaded region indicates the range of per-class accuracy. The maximum per-class accuracy is always close to 100% i.e., there is always a class that is almost perfectly predicted. The minimum per-class accuracy, however, decreases for tile sizes larger than 30 km, quite dramatically so for the support vector models (red and blue dashed lines), and slightly less severely for RF-100 (black). It is this misclassification of small training classes that reduces overall accuracy as tiles get larger; we remind the reader that we define overall accuracy as the mean of post-training class accuracy.  Figure 6 shows the Gini importance for the 37 features, averaged over all runs of different tile size and overlap, as discussed in Section 3.1. Squares denote the mean importance, while error bars show minimum and maximum importance. While there is some variation, the overall trend is confirmed: features that are important on average generally remain so irrespective of resolution and tile size. The most important features are topographic wetness index, slope IQR, and slopeVG, with the first two being significantly more important that the rest. Importance for subsequent features decreases approximately linear. The vast majority of features does indeed contribute to the model. Least important ones (VBF9, slopeST, slopeVS) are so because there are few tiles with these values.

Circular Tiles
Square tiles are easy to implement, but they come with two issues. First, the resulting classification is not rotationally invariant; rotating the source data changes which part of said data gets aggregated in a particular tile (unless we rotate by multiples of 90 degrees). Consequently, the predicted map changes, which is generally unwanted. Second, square tiles tend to produce blocky shapes, which becomes most apparent for large block sizes, and especially for RF-100. Circular tiles can alleviate both issues. We also found they improve accuracy slightly (by ≈1%), at the expense of a similar increase in aggregation time. Arrows in Figure 7 highlight some rather subtle differences between square and circular tiles for RF-100 at s = 43 km.

Transition Belts
Landscapes vary in morphology based on different climatic conditions, basement architecture and lithologies, tectonic activity, and sedimentary dynamic regimes. All of these factors have a large footprint. Major changes in any of these variables result in changes in morphology. Often gradual in nature, we refer to these changes as transition belts. Sharp boundaries, however, may be geologically important, as they may indicate a sudden change in underlying geological structures, lithological packages, or a sharp change in the sedimentary dynamic driving forces (e.g., sedimentary river dominated regions versus transgression-regression-dominated regions; Figure 1; (e.g., [11])).
However, so far, we have posed the landscape classification problem as binary; a tile belongs to one and only one class. By definition, this leads to a sharp boundary between any two classes. This section describes a feature that implements transition belts.
Of mostly cosmetic value is a purely geometric approach, in which one may define a belt of fixed width (e.g., 20 km) along class boundaries. Support vector machines and random forests allow for more advanced approaches, each providing suitable but slightly different measures. Support vector machines provide for every sample a distance to the decision boundary. Note that this is a distance in feature space, entirely unrelated to a distance on the map. This distance could be considered a proxy for the probability that a sample belongs to a class; a sample immediately upon a decision boundary is equally likely to belong to either class. Since our features are normalised, we could choose a threshold distance beyond which a sample is considered to belong to a particular class with 100% certainty. Below that threshold, we could linearly interpolate the probability.
Random forests report for every class a probability that a given sample belongs to that class. The probability is simply the portion of decision trees in the random forest that predict said class for the sample in question.
The results of implementing this idea are shown in Figure 8. Transparency represents probability or uncertainty, with opaque areas corresponding to a high probability (>80%) that a given sample belongs to the class indicated by the colour. Therefore, mostly translucent areas indicate a slow, gradual transition from one class to another and appear as rather wide belts between classes. Conversely, a narrow transition belt indicates a distinct, sharp class boundary. This is purely a postprocessing feature and comes with no significant computational costs. . Map with transition belts, RF-100 at s = 21 km. Clear, semiopaque, and mostly opaque areas denote <40%, 40-80%, and >80% probability, respectively, that a given tile belongs to the indicated class.

Novelty Detection
Located in the SW of our model region, the Stirling Range is distinct from its surroundings, rising to ≈1100 m above sea level in an otherwise largely flat area. A domain expert would clearly identify it as a separate class. However, we have ommitted the Stirling Range from our classification so far, as its rather small spatial extent provides few training samples at our 3-arc-sec DEM resolution. For the same reason, we have omitted coastal regions, which are distinct from the inland patterns due to the modern sedimentary dynamics of the coast (e.g., erosional regime of incident waves, coastline, and mobile sand dune systems).
It would be useful if our algorithm detected such instances, i.e., those which do not resemble any training data. Known as outlier or novelty detection, a number of algorithms exists for this task, of which we have tested OneClassSVM, IsolationForests, and LocalOutlierFactor. Technically, we combine training data of all classes into a single class and train a model. We then predict whether a new sample belongs to that single class (inlier) or not (outlier/novelty). Impementation is straightforward and the required additional training is cheap.
Example results are shown in Figure 9. In panel (a), the IsolationForest classifier clearly identifies as unknown (i) the Stirling Range, (ii) a similar region SW to it, and (iii) some coastal regions. Almost everything else is classified as known. This classifier detects the Stirling Range at all tile sizes we have tested, with smaller tile sizes generally resulting in more detailed shapes which better match the boundary drawn by the domain expert. OneClassSVM (not shown) performs about the same up to s = 17 km, beyond which results degrade. LocalOutlierFactor (Figure 9b) produces too many false positives to be useful at all tile sizes we have tested.

Discussion and Conclusions
One of the most challenging aspects of landscape study is cataloguing their differences at a regional scale (e.g., [29]). The difficulty is not related to classifying the features that can differentiate landscapes into different domains, but in determining the geographic extent of such features. Field observations have been relied upon as a proxy to understand landscape diversity at a regional scale [11]. However, a constraint on this approach is the uncertainty in the extrapolation of field observations, especially when attempting to extrapolate them to regional scales. Such extrapolation suffers due to the complexity and variability of landforms, the paucity of data availability, and the difficulty in defining quantitative criteria that discriminates diverse landscape types from each other. However, there are a wide variety of existing quantitative methods in geomorphology developed for small scales (10 km). Modern data analytics technology, and the advances in satellite imaging, provides access to large data sets that can assist in characterising landscape features and their distribution at regional scales (e.g., [27,[31][32][33]40]).
In this study, we have presented an automatic landscape mapping methodology. For every landclass present in the test region, a domain expert defines a sample region. After training on these sample regions, machine learning algorithms are able to predict a dense map. We tested our Python implementation using data derived from a 3-arc-second digital elevation model for a region in Western Australia of approximately 1000 × 1000 km 2 . At an optimised tile size, we achieve an accuracy of almost 98%, with a reasonable runtime of 48 min.
Spatial aggregation of the raw pixel values is crucial to classification accuracy, but it is crucial in a way that conserves some spatial variation. We found a per-tile normalised histogram approach to work well. Classification based on raw pixel values fared much worse, which is consistent with the observation that performance of the normalised histogram approach degrades as the tile size approaches zero (see Figure 4). While currently the most expensive computationally, this aggregation stage could see a significant speedup through advanced vectorisation, or possibly a C++ implementation.
We also presented three extensions: (i) circular tiles, which enable rotationally invariant classification, (ii) transition belts, which indicate how gradual or abrupt landscapes change, and (iii) novelty detection, which can identify regions which do not match any of the landclasses defined during training.
The methodology presented is a building block of a future toolbox that will allow landscape types to be mapped at continental scale using remote sensing techniques and machine learning algorithms. Our approach is scalable and repeatable, both spatially and temporally. We note, however, that ML does not necessarily remove subjectivity. Results still depend on which and how many polygons the domain expert chooses for training, and the size and stride of tiles. Larger tiles produce a more contiguous but lower resolution map. Conversely, smaller tiles result in a more detailed, and at some point, noisy map. The choice of ML algorithm is not overly critical: random forest and support vector machine classifiers generate quite similar maps, which is a favourable outcome.
This work builds on previous studies about characterisation of landscapes in the South of Western Australia and worldwide (e.g., [5,18,29], and references therein). The diversity and distribution of landscapes mapped in this study support previous results presented by domain experts, which were based on field observations [11]. This study identified four main landscape domains in several field regional transects, conducted at different latitudes in the South of Western Australia (Figure 1). These transects were designed to capture the geomorphological variability between typical landscape features in the Yilgarn Craton to the coastline, in order to characterize the landscape evolution.
The landscape changes were summarised from the topographically high, dissected environment with thick cover development and uneven basement topography, to the nearly flat coastal regions with thin cover and sand dune systems at the coastline. These differential landscape domains were identified in a previous study and classified into a framework of four different landscape types: (i) Albany (green, Figure 9); (ii) Kalgoorlie-Norseman (pink and dark blue); (iii) Esperance (bright blue and dark green); and (iv) Neale (dark yellow) [9].
Gonzalez-Alvarez et al. [11] faced three main challenges after reconstructing the transects and the observations from the field: (i) how to infer their direct observations along the traverses laterally, (ii) how to clearly identify boundaries between landscape types along the traverses, and (iii) how to profile the landscape types with specific features or combination of features. The present study identifies the four landscape domains mentioned above, which substantiates the methodology. It also addresses all three of the above challenges. In addition, our results are geologically meaningful since they are consistent with the geological trends and main basement features in the region. The extension and boundaries of the landscape types defined are an important contribution to the understanding of landscape evolution of this region in Western Australia. Landscape variability is directly related to sedimentary regimes. Sedimentary regimes dominated by erosional processes may provide the best context to detect the geochemical footprint of ore bodies located at depth. In contrast, depositional regimes may obscure that geochemical signature by bringing exotic material to the region. The stratigraphic variability of the cover captures landscape evolution, and, therefore, links the geology at depth with the surface. Mapping landscape variability allows for a better understanding of the relation between sedimentary regimes, landscape evolution, and geology. Therefore, landscape mapping can be used as a valuable tool in mineral exploration.
In the context of mineral exploration in intensely weathered regions of low relief, our results support that DEM can be used to (i) discriminate and characterise landscape variability types at large scales, (ii) map the extension of the landscape domains, (iii) quantify their surface geometrical features, and (iv) establish the transitional boundaries between landscapes. These outputs offer valuable assistance in the design and interpretation of surface geochemical surveys at a regional scale in areas of low relief and intensely weathered landscapes. The variability of the landscape types can reflect geological, tectonic, and stratigraphic changes. Identifying such changes can be fundamental for a more efficient interpretation of surface geochemical data (e.g., detecting the geochemical footprint of potential ore deposits at depth, establishing changes in surface biodiversity associated to geological changes at depth, detecting diverse tectonic regimes, etc.).
Author Contributions: Methodology, software, validation, formal analysis, investigation, data curation, writing-original draft preparation, and visualisation, Thomas Albrecht; conceptualisation, writing-review and editing, project administration, and funding acquisition, Ignacio González-Álvarez and Jens Klump. All authors have read and agreed to the published version of the manuscript.
Funding: Support by the Discovery Program through its strategic funding (CSIRO, Mineral Resources) is gratefully acknowleged.

Data Availability Statement:
Publicly available datasets were analyzed in this study, summarised and referenced in Table 1.