Multisensoral Topsoil Mapping in the Semiarid Lake Manyara Region , Northern Tanzania

This study pursues the mapping of the distribution of topsoils and surface substrates of the Lake Manyara area of northern Tanzania. The nine soil and lithological target classes were selected through fieldwork and laboratory analysis of soil samples. High-resolution WorldView-2 data, TerraSAR-X intensity data, medium-resolution ASTER spectral bands and indices, as well as ENVISAT ASAR intensity and SRTM-X-derived topographic parameters served as input features. Objects were derived from image segmentation. The classification of the image objects was conducted applying a nonlinear support vector machine approach. With the recursive feature elimination approach, the most input-relevant features for separating the target classes were selected. Despite multiple target classes, an overall accuracy of 71.9% was achieved. Inaccuracies occurred between classes with high CaCO3 content and between classes of silica-rich substrates. The incorporation of different input feature datasets improved the classification accuracy. An in-depth interpretation of the classification result was conducted with three soil profile transects.


Introduction
The spatial distribution of soils and lithology provides essential input information for different scientific and economic applications, including landscape reconstruction [1], digital soil mapping (DSM) and mineral exploration for agricultural [2] or mining applications [3].Though the soil must be considered as a three dimensional medium, a wide range of remote sensing sensors provide useful information in assessing various details of the mineral composition and other physical and/or chemical properties of the uppermost parts of the soils, as well as for spatially contiguous areas [4][5][6].The topsoil is generally the most relevant part of the soil, considering food production, degradation and soil management [7].Although the definition of topsoil varies in different soil taxonomies [7][8][9][10], the uppermost part of the soil belongs to the topsoil.The topsoil thickness is related to local conditions of pedogenesis, erosion and deposition processes.Normally, topsoil is characterized by a thickness of 10-30 cm [7,8].In this study, we regard the soil surface properties as topsoil/lithology proxy.We hypothesize that the analysis of physical-chemical properties, the collection of field reference data and the remote sensing analysis of the upper surface strata yield valuable information about the topsoil and/or lithologic characteristics.Moreover, the topographic position and geomorphological processes also influence the topsoil characteristics and, hence, should be included in a comprehensive analysis of the spatial distribution of topsoils.
The surface reflectance of the mineral composition of a surface, which is received by a multi-or hyper-spectral sensor, is influenced by soil organic matter, moisture content, as well as texture and surface roughness [11].Backscatter signals from Synthetic Aperture Radar (SAR) sensors of different wavelengths are dependent on the surface roughness and are sensitive to the dielectric properties of soils [12][13][14].Soil mapping using remote sensing data show limitations due to the complex physical and chemical nature of soils.Remotely derived datasets can characterize the surface (optical remote sensing systems) or the uppermost part of soils (SAR systems) [5,15].Since soils are complex three-dimensional structures, the surface characteristics may not represent the underlying layers of soil.The remote sensing signal may also be a product of different soil surface properties.This effect will increase with a lower spatial resolution of the datasets.Very high-resolution sensors, like WorldView-2 and GeoEye-1, provide a high spatial differentiation.On the other hand, lower spatial resolution sensors, like the Landsat series or ASTER, provide a better spectral coverage, especially in the mid-infrared region, which is important for mineral mapping purposes [5,16].Vegetation cover is another important factor to consider.Already sparse vegetation cover may influence the identification of soil attributes using remote sensing methods [17,18].Spectral indices from multi-or hyper-spectral remote sensing images are effective tools for the classification and evaluation of photosynthetic vegetation activity.Vegetation indices (VI), like the Normalized Difference Vegetation Index (NDVI), utilize the difference of absorption and reflection in the spectral wavelengths of the red (0.625-0.74 µm) and near-infrared (IR; 0.74-1 µm) [19].Dead materials in grasslands blur VI, making it hard to distinguish between dead materials and some other land cover [20].This is especially a problem in arid and semiarid regions, due to relatively long dry periods.A strategy to resolve these problems consists of long-term monitoring via remote sensing and collection of ground information [21,22].
A wide range of studies proved the applicability of techniques using remote sensing data for topsoil mapping.In the following, some of them are described.Landsat 5 TM imagery was used to detect basalt outcrops for supporting soil mapping, applying reflectance values, band ratios and indices [23].Landsat 7 ETM+ data were used to determine surface soil properties with the help of laboratory-analyzed surface soil samples [24].The ASTER multispectral bands and derived indices and ratios were often utilized for lithological mapping [25][26][27][28].ASTER data were also used to identify mineral components in tropical soils using reflectance spectroscopy signatures from soil samples [29].
Various studies include additional variables, especially in geostatistical approaches of the spatial soil distribution [5].Topographical features, in particular, provide information on the terrain and, hence, on soil formation processes [30].Mulder et al. [31] used ASTER data and derivatives, as well as elevation as topographical proxy for DSM.Hahn and Gloaguen [32] compared different input variable combinations of ASTER-derived land use, geology, topographical parameters and others to estimate soil distribution by support vector machines (SVM).Rossel and Chen [33] used Landsat data and derivatives, topographical derivatives, climate parameters, as well as soil, geological and radiometric maps and spectrometry results from soil samples to determine the surface soil properties for Australia.Selige et al. [34] found out that soil organic matter and soil texture of topsoil correlate with the spectral properties of a hyperspectral sensor.They were also able to model the distribution of sand, clay, organic carbon (Corg) and nitrogen.SAR backscatter intensity information from X-, C-and L-band sensors proved to be sensitive for soil moisture differences, surface roughness and, to some extent, also to soil texture [13,14,[35][36][37][38][39][40][41].Hengl et al. [42] applied an automated random forest approach to map soil properties of Africa with DEM-based landforms parameters and MODIS data at a spatial resolution of 250 m for the Africa Soil Information Service (AfSIS) project.A comprehensive overview about remote sensing in soil mapping is provided by Mulder et al. [5] and with a special focus on Africa by Dewitte et al. [6].
The lithologies and the soils of the Lake Manyara basin have complex genetic origins.The Proterozoic gneissic basement, tectonic and volcanic processes, as well as the (paleo-)hydrological processes and the sedimentation of the paleolake Manyara influence soil formation.This results in a small-scale distribution and fuzzy transitions of today's soils, topsoils and outcropping lithology, which cannot be depicted by the available soil map for the region with a scale of 1:2,000,000 [43].Consequently, the categorization of soils is a complex process due to their three-dimensional nature.Hence, remotely-sensed surface features yield auxiliary information of topsoil characteristics and their distribution.Combined with topographic information, the analysis results in valuable information that allows also a rough identification of soil types.
The aim of this study is to map the distribution of the topsoil and surface substrate characteristics using multispectral, topographical and SAR input data.The laboratory analysis of surface samples provides soil properties used to categorize and characterize the topsoils and surface substrates.In order to improve the topsoil classification, we followed a multiscale approach using: (i) image object segments from a high-resolution WorldView-2 scene; (ii) low-resolution ASTER multispectral data and indices; (iii) X-and C-band SAR backscatter; as well as (iv) topographical derivatives.We compare and discuss the final mapping results with soil catenae covering characteristic transects of the study area.

Study Area
The study area is located within the East African Rift System of northern Tanzania; in the surroundings of the Makuyuni village.The area is drained towards the west by the Makuyuni River disemboguing into the endorheic Lake Manyara Basin (Figure 1).The precipitation calculations from the daily Rainfall Estimate Product 3B42 (V7) of the Tropical Rainfall Measurement Mission (TRMM) show a bimodal rainfall pattern for the years 2000-2013 [44].For this period, the average annual precipitation of 651 mm is mainly caused by two wet seasons.One occurs between November and January and a second between March and May [45].This results in a sparsely-vegetated semiarid environment dominated by bushy grassland.The study area is also characterized by a variety of degradation processes due to long dry periods and short, but intensive rainfall events, as well as contributing anthropological factors, like overgrazing [46].The lithology of the study area is very complex, because different lithological units interleave here.The underlying basement of the Masai Plateau is formed of Proterozoic intermediate quartzite and gneisses and is exposed by tectonic faults [47].Explosive volcanism, especially from the volcano Essimingor, and faulting associated with the rifting of the basin produced alkaline lavas, like alkali basalt, phonolite, nephelinite and tuffs.The volcano Ol Doinyo Lengai (90 km north of the study area) has a carbonate volcanism, and its carbonate tephra deposits are widespread [47][48][49].Lacustrine and fluvially deposited sediments can be found 140 m above today's level of Lake Manyara.The so-called Manyara Beds crop out where the Makuyuni River and gully system incise into the lacustrine and terrestrial deposits.The lower member of the Manyara Beds is of lacustrine origin and is composed mainly of mudstones, siltstones, diatomites, marls and tuff that have been deposited in a reducing environment.These sediments have an age of approximately 1.03-0.633Ma.A tephra layer, which was dated to 0.633 Ma, marks the transition of the younger upper member of the Manyara Beds [50][51][52].

Input Data and Pre-Processing
Multiscale remote sensing data and their derivatives, as well as topographic indices delineated from a Shuttle Radar Topography Mission (SRTM) DEM served as input information for the analysis.All image datasets were co-registered to ensure complementary datasets.

WorldView-2
WorldView-2 is a commercial multispectral sensor, which was launched in October 2009.It has a very high geometrical resolution for its' eight multispectral bands (MS) at 1.85 m ground resolution and for the panchromatic band of 0.46 m at nadir [53].The scene was acquired on 21 February 2011 (Table 1); following the winter wet season and a strong precipitation event mid-February (Figure 2).

ASTER Bands and Indices
The Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) consists of three subsystems with a spectral coverage in the visible-near infrared (VNIR), the shortwave infrared (SWIR) and the thermal infrared (TIR) wavelength regions (Table 1).ASTER was launched onboard NASA's TERRA spacecraft in December 1999.The spectral resolution was mainly designed for vegetation, soil and mineral mapping [54].An ASTER L1B scene was obtained on 23 August 2006 after a long dry season.Because of the cross-detector leakage between the SWIR bands, crosstalk correction was applied using a correction tool from the Earth Remote Sensing Data Applications Centre (ERSDAC) [55].
The six spectral bands of the SWIR system were selected as input features for the analysis.Spectral indices derived from the ASTER VNIR and SWIR bands were also included as input parameters (Table 2).The purpose of these indices is a relative amplification of selective absorption and reflection features, which are caused by different surface materials at distinct wavelengths.They help to detect different mineral compositions, but also to emphasize the spectral differences of target objects.The indices listed in Table 2 are based on a comprehensive literature review.

Topographic Indices
During the Shuttle Radar Topography Mission (SRTM) in the year 2000, X-band data were acquired, which provided a DEM with 25 m ground resolution.The SRTM-X dataset has no full coverage worldwide; however, one track covers the study area.The DEM was projected to the Earth Gravitational Model (EGM96) vertical datum.The short wave X-band-derived DEM resulted in good elevation accuracy [60], but also yielded small-scale noise at the surface.To reduce this effect while preserving the topography, we applied a multidirectional Lee filter [61].The DEM was used to calculate different topographic indices, which characterize the topographic position of the topsoils in the study area (Table 3).

SAR Data
We acquired two TerraSAR-X (TSX1) (~9.65 GHz; X-band) StripMap and two Envisat ASAR (~5.331GHz; C-band) scenes for different dates (Table 4).Precise orbits were applied to the ASAR scenes.All SAR scenes were calibrated and radiometrically corrected for topographic effects to gamma naught (γ) using the local incident angle derived from the SRTM-X DEM.The scenes were terrain corrected, and speckle effects were reduced by applying a Lee filter [61].The two TSX1 scenes were mosaicked to a single dataset in order to cover the whole study area.The images were acquired in the dry season (Table 4; Figure 2), to minimize the influence of soil moisture on the backscatter intensity signal [36][37][38]75].

Field Reference Data, Laboratory Analysis and Target Classes
During six field campaigns from 2010 to 2014, 602 reference sites were visited within the study area, including fieldwork conducted one month after the acquisition of the WV-2 scene.Because the southern and eastern parts of the study area are remote and partly inaccessible, we decided on a random clustered sampling strategy (Figure 1).The landscape is considered as stable and the mineral components as conservative in relation to the resolution of the input data.The collected parameters consist of: texture, calcium carbonate (CaCO3) content (with hydrochloride acid), soil color, visible mineral components of surface substrates, vegetation cover, topographic position, GPS and photo references.The reference points serve as training and test data for the SVM analysis.
The categorization of soils and topsoils is a complex process.In addition to the description of field reference points, we also conducted laboratory analyses for a better understanding and for the target class selection.From 27 reference locations, surface substrate samples (0-2 cm) were collected and physical and chemical analyses conducted (Table 5).Soil samples were air-dried and sieved (<2 mm).
Texture was analyzed with the Bouyoucos hydrometer method after dispersing the samples with 1N sodium hexametaphosphate and represented according to the United States Department of Agriculture (USDA) classification [8].CaCO3 was measured using the methods proposed in Buurman et al. [76].Corg was determined using the Springer and Klee method [77].Available fractions of heavy metals (Fe and Mn) were extracted according to the Lindsay-Norwell procedure [78].Exchangeable bases (K, Ca, Mg and Na) are analyzed based on the Mehlich 3 method [79].The field reference collection and the laboratory samples resulted in seven topsoil classes (Table 5), two additional lithological classes (Figure 3) and a class for surface water (Class 1), which includes the Makuyuni River and water reservoirs for cattle farming and irrigation."Carbonate-rich substrate" (Class 2) mostly consists of erosive areas with lacustrine sediments and more than 20% carbonate gravel or concretions.Class 2 appears with the Lower Manyara Beds, associated calcaric Regosols and secondary hardened carbonates.The CaCO3 content is rather high (128 g/kg) and the Corg content relatively low (2.75 g/kg)."Calcaric topsoil" (Class 3) features a high CaCO3 content and a comparatively high clay content."Dark topsoil" (Class 4) shows the highest silt content, is dark in color (Munsell® color: hue of 7.5 or 10 YR (yellow red); values of ≤3 or lower; chroma of ≤2), has a low CaCO3 content compared to "Class 3" and a low Fe content compared to "Class 8".It is associated with colluvial and fluvial deposits."Tuff outcrop" (Class 5) defines distinct outcropping layers of hardened tuff."Reddish topsoil" (Class 6) has a distinct hue of 5 YR or redder (Munsell ® color) and can be distinguished from "Class 3" by a low CaCO3 content, from "Class 4" by color and texture, from "Class 7" by texture, cations and Fe content and from "Class 8" by Fe and Mn content."Silica-rich topsoil" (Class 7) is associated with the felsic basement and the high quartz sand and grit content, which is a surface residual due to selective erosion.The hue of the soil is 5-7.5 YR; the color value is 4; and the chroma 6-4 (Munsell ® color).The "Topsoil with iron oxides properties" (Class 8) class describes a soil associated with mafic lithology (Class 9) and with a high Fe and Mn content, which makes it clearly distinguishable from "Class 3" and "Class 4"."Mafic-dominated cover beds" (Class 9) describe outcroppings and weathered mafic (nephelinite, phonolite, basalt) ridges and the Essimingor volcano."Mafic river beds" (Class 10) are the same material as "Class 9", but the boulders are hardly weathered, which results in different spectral properties and concentrates within the river beds.
In order to validate and interpret the results of the topsoil and surface substrates' classification procedure, we conducted three soil catenae consisting of 24 soil profiles with detailed profile descriptions according to the World Reference Base for Soil Resources (WRB) 2014 [80].

Methods
The workflow consists of several steps (Figure 4): (I) image segmentation based on the high-resolution WorldView-2 images; (II) vegetation and areas affected by clouds and shadowing effects were excluded from further processing; (III) for each remaining segment, mean values of the input feature sets listed in the previous section (Tables 3 and 4) were extracted; (IV) an SVM model was built; (V) SVM-recursive feature elimination (RFE) reduced the number of variables before classifying the segments with the SVM approach; (VI) we compared the results of various input feature set combinations; (VII) accuracy assessment; and (VIII) external validation using soil catenae.

Image Object Segmentation
Image objects, which represent contiguous areas in the image, were delineated by multi-resolution segmentation [81].The segmentation is purely based on the WorldView-2 bands.The reasons for applying an object-oriented approach are reduced processing costs, the possibility to extract values from multiple scales and the option to generate additional object-based input features.The multi-resolution segmentation is a bottom-up approach, which applies region merging beginning with the pixel level [82].The heterogeneity measurement ƒ (Equation 1), which defines if objects are merged, is controlled by a threshold.If the heterogeneity measure exceeds the threshold, which is determined by the scale parameter, the merging of image objects is terminated.∆hcolor defines the difference in spectral heterogeneity and ∆hshape the consideration of the smoothness and compactness of the image objects.wcolor and wshape are the according weight measures [82].
Roads and buildings were easily identified from the resulting segments by spectral values, shape and spatial relations.Since the image acquisition took place shortly after the winter rainy season, we could verify in the field that, with the exception of some rare occasions, all vegetation cover was photosynthetic active.Therefore, vegetation cover was determined by NDVI thresholding, utilizing a histogram.These three land cover types were excluded from further processing, since they are not the focus of the research objective.This is especially important for the vegetation, because the influence on the spectral response (dead organic materials, as well as photosynthetic vegetation) is considered high [20].
After this pre-selection, 47% of the study area of 1200 km 2 was considered as open soil or vegetation-free lithology.Some of the reference points had to be excluded from further analysis, leading to 432 vegetation-free reference points.The 1,005,058 image segments result in an average mapping unit of 550 km 2 .For these image objects, mean values from the SAR images, ASTER bands and indices, as well as from topographic parameters were extracted.The following additional input features were computed from WV2: (i) standard deviation for all spectral bands; (ii) NDVI [19]; (iii) spectral brightness; and (iv) texture homogeneity measure following Haralick et al. [83].

Support Vector Machines
The machine learning concept of SVM was developed to solve binary problems in pattern recognition applications.The development and theoretical background is published by Vapnik [84,85], Hearst [86], Burges [87] and Schölkopf and Smola [88].Remote sensing studies make use of SVM properties, like high computation performance and high classification accuracies with small numbers of training samples [89].Recent studies used SVM approaches to identify lithological units with remote sensing data [90,91].
The fundamental principle of SVM is the maximization of margins between training samples of two target classes.Not all features of the training dataset are used for this approach; only those samples that are close to the margin.They serve as support vectors, which are used to define the boundaries of the margin.A maximized margin is referred to as the optimal separating hyperplane [87].To prevent an over-fitting of the hyperplane caused by outliers in the training dataset, a "soft margin" approach was introduced [92].This approach uses a cost parameter (C), which determines a penalty for the support vectors.Low C values indicate a stronger generalization of the model; high values provide more influence for single input features [93].
For this study, we utilized a support vector classifier (C-SVC) provided by the Library for Support Vector Machines (LIBSVM) [94].C-SVC works as a "one-against-one" classifier that discriminates between two target classes.A multi-class approach is solved by constructing multiple target value pairs.In some cases, including soil-related issues, it is hardly possible to separate the target classes in a single input space with a linear function [32].SVMs therefore project the input features in an n-dimensional feature space.To avoid the computational effort of projecting all input features into a multi-dimensional feature space, kernels can be used to calculate their dot product in the feature space.Various kernels can be applied with SVMs.In this study, a radial basis kernel function (RBF) was utilized, which is widely used when a nonlinear distribution of feature values is expected [32,95].A linear kernel serving as reference was also applied.The width of the RBF, and hence, the influence of a training sample on the adjacent feature space, is controlled by the constant γ.High values indicate a strong influence, whereas low values indicate a weak influence.Thirty percent of the reference samples (130) were randomly selected to serve exclusively as test datasets, and the remaining 70% was used for the training of the SVM model.All input feature sets were scaled to a range of [−1, +1].For the derivation of the constants C and γ, a grid search was conducted by an iterative cross-validation of the performance of fitting the model to the training data [94].

Recursive Feature Selection
In order to identify a minimum subset of features that contribute to the discrimination of the target classes, a RFE technique was applied [96].Many of the spectral and topographical input features carry redundant information.A subset of features provides, in addition to a higher computation performance, the possibility for a better interpretation of the interrelation between the topsoil reference and the spectral and topographic parameters of the datasets explaining the topsoil distribution.RFE is a backward elimination method, which starts with a full set of features and iteratively reduces their number according to their contribution to the classification accuracy [97].For this, the SVM classifier is trained at each iteration, and a ranking criterion is computed for all features.The feature with the smallest criterion is then removed before the next iteration [98].SVM-RFE was performed with the e1071 package [99].

Results and Discussion
The comparison of different input feature groups shows that all additional input features increase the overall accuracy of the classification (Table 6).The classification of only the spectral bands of WorldView-2 with an RBF-kernel reaches an accuracy of 62.9%.By incorporating more features from the ASTER data, SAR scenes and topographic indices, an overall accuracy of 70.4% was achieved.By conducting the classification with the parameters selected by RFE (Table 7), the highest accuracy of 71.9% was reached.The application of a linear kernel instead of an RBF-kernel led to lower accuracies.An RFE was performed for the dataset with all 73 input features.The RFE shows that with seven input features, an accuracy exceeding 60% can be attained (Figure 5).The classification accuracy for the SVM, with an RBF-kernel, peaks with a selection of 36 input features, then performs relatively stable until the maximum number of input features is reached.The so-called Hughes phenomenon, which describes the decrease in classification accuracy when additional input features are added to an already large dataset, cannot be observed with the RBF-kernel [100].Yet, a small decrease can be noted for the linear kernel (Figure 5).The 36 input features from the RFE selection represent all input feature groups (Table 7).Out of the first seven input features, two are topographic indices.The MRRTF results in high values for flat elevated areas [65], and the geomorphons (geomorphologic phonotypes) classify the topography into landscape elements [63].Both features describe the position of the target classes in the study area.WV2 contributes, along with the spectral Bands 3 and 1, two further input datasets.The ASTER Calcite Index and the Ferric Iron (Fe³ + ) Index may explain the distribution of the two target classes with high CaCO3 content (Classes 2 and 3) and the topsoil class with iron oxide properties (Class 8).The AlOH Group Index may support the discrimination of clay minerals [56].The confusion matrix of the RFE-selected input feature dataset reveals that the most competitive classes, concerning the user's and the producer's accuracy, are Class 2 "carbonate-rich substrates" and Class 3 "calcaric topsoil" (Table 8).Both classes have high carbonate content, and the topographic position is overlapping.The difference between both classes is related to the amount of CaCO3 concretions, which are much higher in the lacustrine deposits.If we were to merge both classes, the overall accuracy would reach 79%.However, the visual validation shows a reasonable distribution for both classes.Class 3 also overlaps with Class 4 "dark topsoil".Class 4 is associated mainly with colluvial and fluvial deposits and shows low CaCO3 content.The transition to Class 3 is gradual.The low producer's accuracy of Class 5 "tuff outcrop" can be explained by the relatively small area of these outcrops.The producer's accuracy of this particular class is higher (75%) when only applying the WV2-related input parameters, but the medium-resolution information of the ASTER-and DEM-derived features seems to corrupt the correct identification."Carbonate-rich substrates" mainly represent the lacustrine lower member of the Manyara Beds, which are exposed prevalently at the foot of slope and mid-slope positions of the Makuyuni River system, as well as in associated gully systems (Figure 6).The class "calcaric topsoil" indicates soils that show an enrichment of CaCO3 due to inputs from carbonatic volcanic ash deposits or development processes upon the "carbonate-rich substrates".In some cases, CaCO3-rich soils developed on secondary translocated carbonates or consist of eroded soils exposing CaCO3 concretions.The latter ones were identified during fieldwork in areas with higher slope degrees or large specific catchment areas."Tuff outcrops" (Class 5) were recognized at a stratigraphic position above the lower member of the Manyara Beds, which coincides with the results of fieldwork and reviewed scientific literature [47].The outcrops are too minuscule to be displayed in the map (Figure 6).The class "reddish topsoil" is identified with satisfying accuracy.This class is located mainly on stable flat ridge tops and is used agriculturally.Consequently, topsoils are disturbed and reworked by ploughing activity, bringing leached CaCO3 back to the surface (Table 5).This makes the difference in Class 7 "silica-rich topsoil".These soils are not disturbed, and consequently, silica enriches at the surface due to selective erosion processes."Silica-rich topsoils" and "reddish topsoils" developed on the Proterozoic intermediate quartzite and gneisses of the Masai Plateau, occur especially in the south of the study area.However, also, these areas were subject to carbonatic volcanic ash deposits.The topsoils with iron oxide properties (Class 8) occur in association with mafic ridges (phonolite, nephelinite) or along the slopes of the Essimingor volcano (Figure 6).Class 9 "Mafic-dominated cover beds" was identified well.Like Class 8, Class 9 can be found at the volcano slopes and on the mafic ridges.Since the cover beds are densely vegetated by shrubs, only small, vegetation-sparse areas were used for the classification.The "mafic river beds" are often covered by vegetation and water.Nevertheless, the mafic material at point bars in the Makuyuni River was traced with high accuracies.Out of 24 soil profile analyses conducted in the study area, we identified seven main soil types (see Figure 7).In the following, we show that these topsoils can be related to or associated with specific WRB soil types according to the applied catena approach.Vertisols are found in flat areas and in depressions characterized by high clay contents and representing formerly wet positions, related to a high biomass production.They are associated with "dark topsoil" (Class 4).Vertisols occur in association with Vertic Cambisols (Clayic) (Soil Profile 1; Figure 7b) that also relates to the pedolithological Class 4 "dark topsoil".In the study area, Calcisols occur with lacustrine "carbonate-rich substrates" (Class 2) and "calcaric topsoils" (Class 3), which are characterized by eroded Luvisols exposing CaCO3 concretions.
Andosols are located on flat and stable ridge positions with low erosion potential.These soils developed from parent material of volcanic origin, such as volcanic ash, tuff and pumice.They show high mineral proportions indicating fertile soils suitable for crop production.In our analysis, Andosols co-exist with "reddish topsoils" (Class 6).Cambisols are widely distributed in the study area and occur mainly on relatively flat mid-slope positions.Along the Makuyuni River terraces, they are distinguished as Cambisols (Colluvic) (Soil Profiles 15-17; Figure 7).On flat ridge positions, they develop as Andic Cambisol (Soil Profiles 6, 8 and 9, Figure 7).Rhodic Cambisols (Soil Profile 20; Figure 7c) are particularly located on intensively-used agricultural fields and correlate with "reddish topsoils" (Class 6), showing a dark reddish brown 5 YR 3/4 Munsell® color for the first 15 cm of soil depth.Cambisols and Luvisols are associated with each other and correlate with "silica-rich topsoils" (Class 7) and "reddish topsoils" (Class 6).The Haplic Ferralsol (Soil Profile 14; Figure 7d) correlates with "silica-rich topsoil" (Class 7).These soils developed on a weathered felsic basement.The resulting map provides a very detailed distribution of topsoils and surface substrates for the study area, which outcompetes other spatial soil information available for this region, like the official soil map by De Pauw [43], the 250 m Africa Soil Information Service (AfSIS) product [42] or the products from the Soil and Terrain Database (SOTER) program [101].Furthermore, the comparison with the soil profile catenae shows that the detailed topsoil information can be related to specific WRB-based soil types with little additional fieldwork and/or expert knowledge.Nevertheless, providing detailed information on topsoils and surface substrates in comparison to other DSM studies [31,32,42] remains the main intention of the paper.

Conclusions
The introduced study has mapped the distribution of topsoils and lithology in a study area in the semiarid Lake Manyara Basin.Applying an integrated approach, combining surface characteristics and terrain features, the spatial distribution of topsoils and related soil types was derived.The topsoils have complex genetic origins related to different substrates, resulting in a high spatial heterogeneity.The non-vegetated areas were classified with a multisensoral approach, which included WV2 and ASTER multispectral data, the TSX1 and Envisat ASAR SAR scenes, and topographical indices were derived from SRTM-X data.With a C-SVC and an RBF-kernel, an overall accuracy of 71.9% was achieved for a challenging classification depth of 10 target classes.The final map is coherent with field observations and laboratory analysis of 27 soil samples.The applied methodological approach seems suitable for multiscale and multisensoral datasets of large areas.We show that the topsoil classification can be associated with soil profiles obtained by fieldwork and certain terrain positions derived from DEM, thus allowing a distinct spatial attribution of soil types.
The results of the topsoil classification and the related soil type association give valuable information, which can help to find locations for agricultural projects in the region and may thereby support the transition to the sustainable self-subsistence of the local population.This may contribute to a reduction of cattle-induced overgrazing and subsequent land degradation.For many applications, like archaeological field studies and paleontological surveys, high-resolution topsoil and surface substrate information yields greater insight than low-resolution soil type maps.The results of this work also help to explain the geological situation of the study area and the landscape evolution.Despite the potential influence of different fluvial and mass movement processes on the topsoil distribution, this study draws a valuable picture of the general situation.

Figure 2 .
Figure 2. Precipitation, fieldwork and remote sensing data of the year 2011.

Figure 3 .
Figure 3. Target classes identified by field surveys and laboratory analysis (Class 1 = water).

Figure 5 .
Figure 5. Accuracy curves from RFE for a linear and an RBF-kernel.

Figure 6 .
Figure 6.Final classification of topsoil distribution in the study area.

Table 2 .
Spectral indices of ASTER VNIR and SWIR bands.

Table 6 .
Overall accuracies for different input feature groups.RBF, radial basis function.

Table 7 .
Relevance ranking of RFE selected input features.SD, standard deviation.