A Phytolith Supported Biosphere-Hydrosphere Predictive Model for Southern Ethiopia: Insights into Paleoenvironmental Changes and Human Landscape Preferences since the Last Glacial Maximum

During the past 25 ka, southern Ethiopia has undergone tremendous climatic changes, from dry and relatively cold during the Last Glacial Maximum (LGM, 25–18 ka) to the African Humid Period (AHP, 15–5 ka), and back to present-day dry conditions. As a contribution to better understand the effects of climate change on vegetation and lakes, we here present a new Predictive Vegetation Model that is linked with a Lake Balance Model and available vegetation-proxy records from southern Ethiopia including a new phytolith record from the Chew Bahir basin. We constructed a detailed paleo-landcover map of southern Ethiopia during the LGM, AHP (with and without influence of the Congo Air Boundary) and the modern-day potential natural landcover. Compared to today, we observe a 15–20% reduction in moisture availability during the LGM with widespread open landscapes and only few remaining forest refugia. We identify 25–40% increased moisture availability during the AHP with prevailing forests in the mid-altitudes and indications that modern anthropogenic landcover change has affected the water balance. In comparison with existing archaeological records, we find that human occupations tend to correspond with open landscapes during the late Pleistocene and Holocene in southern Ethiopia.


Introduction
The formation of the East African Rift System (EARS) led to large topographical contrasts in southern Ethiopia [1] and is thus responsible for an extreme precipitation gradient between the dry lowlands of the Omo-Turkana and Chew Bahir basins and the moist Southwestern Ethiopian Highlands [2] (Figure 1). Due to this topography and its position within the global atmospheric circulation system, the prevailing vegetation is partitioned into a complex mosaic of forests, bushlands and grasslands [3]. In the past several centuries, intensified agriculture, de-and reforestation pronouncedly reorganised the biosphere in Ethiopia [3]. Such human induced landcover change may have affected ecosystem climatic boundary conditions with subsequent effects on the hydrosphere and potential consequences for the local economy and food security.

Introduction
The formation of the East African Rift System (EARS) led to large topographical contrasts in southern Ethiopia [1] and is thus responsible for an extreme precipitation gradient between the dry lowlands of the Omo-Turkana and Chew Bahir basins and the moist Southwestern Ethiopian Highlands [2] (Figure 1). Due to this topography and its position within the global atmospheric circulation system, the prevailing vegetation is partitioned into a complex mosaic of forests, bushlands and grasslands [3]. In the past several centuries, intensified agriculture, de-and reforestation pronouncedly reorganised the biosphere in Ethiopia [3]. Such human induced landcover change may have affected ecosystem climatic boundary conditions with subsequent effects on the hydrosphere and potential consequences for the local economy and food security. Overview of the study area with sites mentioned in the text, investigated lakes, overflow dynamics and modernday potential vegetation. (A) Catchment of the Omo River, lakes Abaya and Chamo and paleo-lake Chew Bahir with lake's overflow locations. Hill-shaded potential vegetation is based on Friis, et al. [3]. (B) Legend of potential vegetation arranged by altitude with (a to c) monthly temperature means in °C and precipitation in mm per month (IRI, last accessed 12/2020). The locations of the photographs of representative vegetation regimes (d to g, photos from Annett Junginger) are marked on the catchment map. (C) Cross-section from Lake Abaya to Lake Turkana showing the overflow direction and lake ladder with their prevailing potential vegetation (numbers correspond to the potential vegetation types marked on A and B).
Highly variable water availability in the EARS during the Pleistocene and Holocene must have caused a significant reorganization-pressure on plant habitats and subsequent adaptation of habitat preferences of early humans [4,5]. During the past 25,000 calibrated years before present (herein ka), the region has been characterized by high amplitude Figure 1. Overview of the study area with sites mentioned in the text, investigated lakes, overflow dynamics and modernday potential vegetation. (A) Catchment of the Omo River, lakes Abaya and Chamo and paleo-lake Chew Bahir with lake's overflow locations. Hill-shaded potential vegetation is based on Friis, et al. [3]. (B) Legend of potential vegetation arranged by altitude with (a to c) monthly temperature means in • C and precipitation in mm per month (IRI, last accessed 12/2020). The locations of the photographs of representative vegetation regimes (d to g, photos from Annett Junginger) are marked on the catchment map. (C) Cross-section from Lake Abaya to Lake Turkana showing the overflow direction and lake ladder with their prevailing potential vegetation (numbers correspond to the potential vegetation types marked on A and B).
Highly variable water availability in the EARS during the Pleistocene and Holocene must have caused a significant reorganization-pressure on plant habitats and subsequent adaptation of habitat preferences of early humans [4,5]. During the past 25,000 calibrated years before present (herein ka), the region has been characterized by high amplitude climatic change, including the drier and colder episode during the Last Glacial Maximum (LGM, 25-18 ka) [6,7], the African Humid Period (AHP, 15-5 ka) [8,9] and present-day dry conditions. These climatic fluctuations caused tremendous changes in lake water levels in eastern Africa [10][11][12][13], river flow [8,14] and vegetation composition and patterns [15][16][17][18]. The human preference for certain landscape types remains a recurring question. According to the savanna hypothesis [19,20], humans originated within a savanna landscape, and the perception of the savanna as an ideal habitat was evolutionarily imprinted onto humans [21]. Another hypothesis by Tveit, et al. [22] resulted in a more variable notion about human landscape preferences. The archaeological record shows that over time, humans lived in a wide range of landscapes.
Forested and humid landscapes were supposedly preferred habitats in Tiemassas in western Africa 44 ka [23]. In eastern Africa at Panga ya Saidi, humans lived in a forestgrassland landscape at the coast during the transition from the Middle to the Later Stone Age around 67 ka [24][25][26]. Other studies indicate that humans survived in a grassland environment along the shores of Lake Victoria during the Late Pleistocene Middle Stone Age 40-60 ka [27] and, at 105 ka, they lived in a relatively humid Kalahari far away from the coast [28]. It has been shown that humans were able to adapt to alpine conditions with gallery forests in small valleys and collected raw materials along mountain glaciers above 4000 m a.s.l. at Fincha Habera, Ethiopia from 47 to 31 ka [29]. On top of the Dendi caldera, Ethiopia (3000 m a.s.l) humans created handaxes [30]. Thus, based on the Upper Pleistocene and Holocene archaeological record, humans seem not to have shown a preference for a specific single landscape type, but rather occupied complex landscapes and developed flexible strategies to respond to a transforming environment [4,5,31,32]. For southern Ethiopia, a comparison of occupation frequencies between the lowlands and hypothesised refuge areas in, e.g., the SW Ethiopian Highlands suggested that during short-term dry spells puncturing the AHP the humid highlands could have been the sink area for vertical migration of highly mobile hunter-gatherer groups [33]. However, paleo-vegetation and paleoclimatic data covering the entire southern Ethiopian region, particularly during the LGM, the AHP and the late Holocene are scarce, yet they play a key part in understanding the potential constraints of natural resources for humans.
In an effort to address this lack of environmental context Fischer, et al. [10] have recently developed a Lake Balance Model (LBM) to reconstruct paleo-lake Chew Bahir's response to moisture changes during the AHP. The LBM identified the importance of temporarily interconnected lake catchments during major humid periods. The LBM suggested a precipitation increase of +6.5% to compensate for increased open water evaporation. Furthermore, an additional +7% increase was calculated to account for the increased ET on land due to a change in rainfall seasonality. Some studies suggest that increased precipitation during the AHP in the EARS was due to an eastward-shift of the Congo Air Boundary (CAB) [34,35]. However, the magnitude of vegetation changes was not a well-constrained parameter in our LBM, due to the lack of reliable information on regional vegetation. Instead, Fischer, et al. [10] used estimations from a Kenyan LBM study, that suggested an +7-15% precipitation increase to compensate for the biosphere-hydrosphere feedback during the AHP [36]. Hence, the LBM reconstructs a total precipitation increase of +20 to 30% during the AHP to explain maximum observed lake levels for paleo-lake Chew Bahir.
Since specific information about the paleo-vegetation and its effect on the hydrosphere is missing for southern Ethiopia, we present here the results of a Predictive Habitat or Predictive Vegetation Modelling (PVM). PVM is a common approach used to understand habitat suitability and possible habitat shifts due to changes in the environmental conditions (predictors) such as elevation and precipitation [37][38][39]. PVMs are widely used to model the impact of modern-day climate change on landcover and vegetation. If there is a strong relationship between the environmental predictors and the vegetation, the models can be used to forecast future scenarios using projected data. For this purpose, numerous techniques have been proposed in the past such as Generalized Linear Models [37,40], Generalized Additive Models [40,41], Bioclimatic Envelopes [42] and Bayesian statistics [43]. Recently, machine learning approaches have been used more frequently in this research field, such as Support Vector Machines [44,45], Neural Networks [46], Random Forest [45] and Boosted Regression Trees (BRT), also called Stochastic Gradient Boosting [37,45,47,48].
In order to understand the effect of the precipitation and the temperature on the paleo vegetation and subsequently the paleo-hydrosphere during the climatic extremes of the past 25 ka and to provide a profound discussion about human landscape preferences in southern Ethiopia, we link a PVM (based on BRTs) with the previously established LBM by Fischer, et al. [10]. BRTs combine the strength of regression or classification trees with the boosting algorithm to combine multiple weak models for improved predictive performance. BRTs work for different predictor variables with different scales and are robust against outliers [47]. The resulting model is subsequently tested using a new phytolith proxy record from the Chew Bahir basin. The combination provides independent catchmentscale estimates of paleo-precipitation during the LGM and the AHP, as well as detailed maps of the prevailing vegetation mosaic covering the orographic gradient of the EARS in southern Ethiopia.

Geology, Hydrology and Climate
The 115,613 km 2 study area covers the southwestern Ethiopian highlands, which is the source region of the Omo River and the catchment of lakes Abaya, Chamo and paleo-lake Chew Bahir ( Figure 1). The Omo River is the main tributary of Lake Turkana with a catchment of 75,000 km 2 . Lake Turkana and paleo-lake Chew Bahir (20,650 km 2 ) are both part of the Broadly Rifted Zone (BRZ) with rift floor elevations of~500 m a.s.l. The Lake Abaya catchment (16,200 km 2 ) and the Lake Chamo catchment (1800 km 2 ) are part of the Southern Main Ethiopian Rift. Lake Abaya and Lake Chamo are located at around 1000 m a.s.l. [10]. The southwestern Ethiopian highlands consists of 500 to 1500 m thick (up to 3000 m thick in places) basalts and intercalated silicic volcanics of Eocene to Late Oligocene age [1,49]. The rift floor between the Ethiopian and Somalian Plateaus is filled with Late Miocene to Quaternary sediments. In the Broadly Rifted Zone, Precambrian basement is exposed at the rift shoulders, particularly at the Hammar range.
During major humid periods, such as the AHP, a cascading lake system developed, with Lake Abaya (+18 m) and Lake Chamo (+14 m), overspilling into paleo-lake Chew Bahir (+45 m; 2500 km 2 ), which in turn was overspilling into Lake Turkana [10]. Despite increased moisture availability during the AHP, intense lake level fluctuations have been recorded. Since the recent past, paleo-lake Chew Bahir is a desiccated playa with a seasonally flooded wetland [11]. Lake Turkana is the world's largest permanent desert lake with a modern-day extent of 7000 km 2 and a catchment area of 148,000 km 2 [12,50]. During the AHP, the Lake Turkana water table was +100 m higher due to increased precipitation and inflow from paleo-lake Chew Bahir (and lakes Abaya and Chamo) as well as lakes Nakuru-Elementeita, Baringo-Bogoria and Suguta from the Kenyan plateau [13].
Today's climate in the BRZ lowlands of paleo-lake Chew Bahir and Lake Turkana can be classified as hot semiarid (Koeppen climate classification), with precipitation (P) below evapotranspiration (ET). In contrast, the majority of the southwestern Ethiopian highlands and the catchments of lakes Abaya and Chamo receive higher P with a bi-modal precipitation pattern, despite intervening intense dry seasons. The most elevated parts are humid with P > ET and a unimodal P pattern of up to 2000 mm per year [51].

Vegetation
The prevailing vegetation is a complex mosaic with desert shrubland along Lake Turkana's shore, woodlands and wooded grasslands in the Omo River lowlands and the paleo-lake Chew Bahir catchment, afro-montane forests of the Ethiopian highlands, and afro-alpine vegetation in most elevated parts ( Figure 1) [3,52]. Friis, et al. [3] summarized and mapped distinguishable vegetation classes based on characteristic species in the "Atlas of the potential vegetation in Ethiopia" as follows: The desert and semi-desert shrubland in Ethiopia, according to Friis, et al. [3], is located below 400 m a.s.l. and characterized as scarcely vegetated with highly drought tolerant species such as Poaceae (grasses) with mainly Dactyloctenium aegyptium, and rela-tively fewer perennials, such as Panicum turgidum. The Acacia-Commiphora woodland and bushland covers vast parts of the dry lowlands in eastern and southern Ethiopia and is located between 400 to 900 and 1600 to 1900 m a.s.l. with a high physiognomical diversity. It overlaps with the desert and semi-desert shrubland, as well as the Combretum-Terminalia woodland and wooded grassland. Species within the Acacia-Commiphora unit are typically drought resistant trees and shrubs with, either deciduous or small, evergreen leaves. The western boundary of the Combretum-Terminalia unit is not sharply defined, due to the shift from a bimodal to primarily unimodal precipitation pattern. Partly, on steep slopes and areas with tall grasses on deep, loamy soils, fire regimes lead to the dominance of Combretum-Terminalia, whereas flat and sandy soils are dominated by Acacia-Commiphora. Typical for the Combretum-Terminalia strata are small to moderate trees with rather large deciduous leaves of the name inherent genera Combretum and Terminalia. The ground layer is comprised of dense grass vegetation, primarily from the genera Hyparrhenia, Panicum and Pennisetum, and productivity (biomass) is strongly correlated with the seasonality of precipitation [3]. This Combretum-Terminalia vegetation unit is dominant in the lowlands of western Ethiopia and penetrates into the Ethiopian highlands through the river valleys. The elevational range of this unit is 400 to 1800 m a.s.l. [3].
Currently, the dry evergreen afro-montane forest and grassland between 1800 and up to 3000 m a.s.l. is a complex mosaic due to agricultural intensification since the late Holocene that led to soil erosion and various succession stages from grasslands to forests. Juniperus procera and Podocarpus falcatus are characteristic, while Podocarpus falcatus is also present in the drier parts of the moist evergreen afro-montane forest. Pouteria adolfi-friederici is typical of the moist forests, which are closed forests with tree heights up to 30-40 m, separated from dry forests by a mean annual precipitation >700 mm per year. This unit ranges from 1500 to 3000 m a.s.l. The Ericaceous belt is a common high-elevation vegetation type throughout eastern Africa and ranges from 3000 to 3200 m a.s.l., with local variations. The afro-alpine vegetation unit, characterised by the giant herb Lobelia rhynchopetalum, covers the highest elevations of Ethiopia [3,53].
The aquatic vegetation unit is separated into freshwater and salt-water species. Below 1000 ppm dissolved salt, open freshwater Lemnaceae are typical. Floodplain and lake shore vegetation is characteristically dominated by sedges of the genus Cyperus (primarily C 3 [54]), and the characteristic grass species Leersia hexandra (C 3 ) and Panicum hygrocharis (C 3 C 3 /C 4 [55]). The salt-lake vegetation unit is highly dependent on salinity and mostly characterized by salt tolerant taxa Suaeda monoica, Atriplex spp. and Salicornia spp. [3].

Overview of Archaeological Records of the Last 25 ka in Ethiopia
At several archaeological sites in Ethiopia, cultural sequences end just before the Last Glacial Maximum and continue after a distinct hiatus starting around 14 ka or even later in the Holocene. No human activity is visible in the cultural stratigraphy of Goda Buticha from~25-8 ka [56][57][58]. In the stratigraphy of Mochena Borago, a gap in human activity is reported from~36-10 ka [33,59,60]. Combining the different sites in the Ziway-Shala Basin, it appears that humans were not active in the area from~22-14 ka [33,61]. The youngest date at Fincha Habera is~31 ka [29]. The youngest radiocarbon dates from Porc Epic are~35 ka [62,63]. All these archaeological records point towards the absence of human activities at the investigated sites during the LGM [57,64].
In addition to the absence of humans, other scenarios might have caused this gap, such as erosion of sediments with artefacts and research bias [56,65]. Indications for such bias come from Sodicho Cave 40 km away from Mochena Borago [64]. Here, the lowermost cultural layers are dated to~27-15 ka before a gap occurs that corresponds to the AHP [64]. After the AHP, indications for human activity starts again at~4.8 ka [64]. So far, only Sodicho Cave indicates human activity in southern Ethiopia during the LGM [64].
Signs of human activity during the AHP remain, however, scarce in southern Ethiopia. The age-depth model from Sodicho Cave and manganese concentrations indicate a decline of human activities at~15 ka [64]. Younger radiocarbon dates at 13.5 ka stem from sieve bias come from Sodicho Cave 40 km away from Mochena Borago [64]. Here, the lowermost cultural layers are dated to ~27-15 ka before a gap occurs that corresponds to the AHP [64]. After the AHP, indications for human activity starts again at ~4.8 ka [64]. So far, only Sodicho Cave indicates human activity in southern Ethiopia during the LGM [64].
Signs of human activity during the AHP remain, however, scarce in southern Ethiopia. The age-depth model from Sodicho Cave and manganese concentrations indicate a decline of human activities at ~15 ka [64]. Younger radiocarbon dates at 13.5 ka stem from sieve finds and were therefore not included in the age-depth model [64]. The only other site with a similar age in the study area so far is the Harurona Cave record, dated to 14 ka [66]. Outside of the study area, human activity at Aladi Springs is dated to around 13 ka [67,68], at Laga Oda to ~12 ka [69] and at the Ziway-Shala sites to ~14 ka, ~13 ka and ~11 ka [61]. Another possible gap at Ziway-Shala corresponds to the Younger Dryas that also marks the end of the Pleistocene and the beginning of the Holocene [11,33,61]. Other Ethiopian sites with archaeological records originating from the beginning of the Holocene are Baahti Nebait [70] and Dibé Rockshelter, both dated to 11 ka [71]. During the later stages of the AHP, human activity is dated to ~10 ka at Mochena Borago [60]. Examples are three caves in the Gamo highlands [72] and six caves or rockshelters in the Kaffa region [73] (Figure 1). The Dendi Lake Rockshelter is one example of humans using high-altitude landscapes during the Middle and Late Holocene [74].

Materials and Methods
To understand the interrelation of precipitation and vegetation we developed a new PVM in R (available at Github: https://github.com/MLFischer/Paleo-Vegetation-Model (accessed 01-Oct-2021) and linked it to the previously published LBM of the southern Main Ethiopian Rift [10]. The four major steps of this study are summarized in Figure 2.   [75], the resulting lake balance of lakes Abaya, Chamo and paleo-lake Chew Bahir and, hence, the lake surface elevation of paleo-lake Chew Bahir. (C) Three different environmental scenarios are applied in this study: Pre-Industrial time with modern-day climate, reduced or eliminated human influence and to be modelled potential landcover, and AHP and LGM time based with major assumption for this model. All three scenarios are visible in the Potassium record in the lacustrine sediments of paleo-lake Chew Bahir [33]. (D) Comparison with global paleo reconstructions (macro scale) and catchment (small scale) proxy-based reconstructions including the phytoliths from the current study.

Training and Prediction Data
A multi-source and multi-annual (2001 to 2018) dataset of the Omo River and lakes Abaya, Chamo and paleo-lake Chew Bahir catchments was created, containing modern-day elevation of the Shuttle Radar Topography Mission (SRTM) [76], monthly precipitation derived from Global Precipitation Measurement (GPM) [77], landcover (LC) [78] and Vegetation Continuous Fields (VCF) [79], as well as the monthly Enhanced Vegetation Index (EVI) [80] ( Table 1). The input datasets were processed in R [81] with the packages raster [82] and rgdal [83]. The boundary of the study area was determined by catchment delineation in ArcGIS 10.4.1 using the SRTM elevation. To minimize the direct human impact on the modern environmental parameters, Open Street Map [84] based streets, buildings and industrial areas were delineated with a 500 m buffer to mask these areas from further analysis. In addition, all areas represented by a LC property [85] that is neither evergreen broadleaf forest (>60% woody cover, >2 m height), open shrubland (shrub cover 10-60%, <2 m height), savanna (forest cover 10-30%, >2 m height) nor grassland (tree cover < 10%) in at least one of the years from 2001 to 2018 were excluded. To produce a consistent LC dataset, the remaining multi-annual layers have been temporally aggregated by applying a majority decision. The aggregation of the monthly EVI (vegetation greenness) and the annual VCF (tree, non-tree and barren soil distribution) was achieved with an arithmetic mean function. Out of the monthly EVI, the annual average EVI was calculated. The average monthly precipitation from the years 2001 to 2018 was accessed and calculated using the Giovanni online data system [86]. To link the spatial domain of these datasets, all layers (bilinear resampling for continuous and natural neighbour for discrete data) have been resampled to a spatial resolution of 926 m, which is the native resolution of the EVI dataset. One raster-stack was created for training (data_tr) composed of elevation, precipitation, LC and vegetation data, that was excluded using the Open Street Map and LC mask. Another raster-stack of the entire study area for prediction (data_pr) was created, containing the elevation and monthly precipitation only.

Model Training and Validation
For the prediction of LC, VCF and annual EVI based on the elevation and monthly precipitation, Boosted Regression Trees were used [37,47], which combine classification and regression trees with the gradient boosting algorithm [87] using the R package gbm [88]. Individual models for each target were trained (LC-BRT, EVI-BRT and VCF-BRT) based on the data_tr dataset. To predict LC, a multinomial loss function, a training fraction of 75%, a five-fold cross validation and a maximum of 1000 trees was chosen. Subsequently, the bag fraction, the learning rate and the interaction depth were hyper-tuned and a weighting coefficient were set to achieve the best multiclass ROC (Receiver Operating Characteristic, R package pROC, [89]) and an optimized confusion matrix based on the R package caret [90]. To avoid overfitting, we used the number of trees with the minimized loss in the 25% validation fraction. To validate the model performance, we separately calculated the confusion matrix and the resulting accuracy, Cohen's Kappa and ROC for wildlife reserves, national parks and controlled hunting areas within the study area and further tested the model on protected areas outside of the study area (test dataset) using the balanced accuracy for each class. To predict annual EVI, a Gaussian loss function and a training fraction of 75% has been used. The model has been hyper-tuned in the same manner, but without the weighting coefficient, to optimize the resulting determination coefficient between predicted and observed annual EVI. To predict VCF, separate models were built for the tree cover (TC), non-tree cover (NTC) and non-vegetation cover (NVC) and optimized the same way. For both EVI and VCF, the determination coefficient and the average residuum of observation and prediction were calculated separately for each protected area type.

Model Link and Scenarios
To reconstruct the biosphere-hydrosphere interaction from the LGM onwards, three scenarios are defined to be modelled: (1) the hypothetical pre-industrial scenario (PI), with the potential LC (reduced human influence) based on the same precipitation amount and temperature as modern-day, (2a) the AHP with an increased precipitation amount that is distributed equally as modern day, (2b) the AHP with an CAB-based precipitation increase, assuming that the precipitation increase is happening between the months of June and September, as the increased precipitation was due to an eastward-shift of the Congo Air Boundary (CAB) [13,35] and (3) the LGM with a decreased temperature and a presumably decreased precipitation amount.
To assess the effect of paleo-precipitation changes on paleo-vegetation for each scenario, the central concept is to link this new PVM to the existing LBM from our precursor study [10]. This model link is divided into three stages: The first one is to calculate the effect of an expected range of precipitation change on the vegetation as compared to modern-day precipitation and to compute the resulting vegetation distribution in the catchments of lakes Abaya, Chamo and paleo-lake Chew Bahir. The second stage is the application of an established parametrization approach [36,75,91] to calculate the resulting ET on land as a function of the precipitation-vegetation distribution for each lake's catchment, separately. In the final, third stage, this precipitation-vegetation derived ET is used as input for the existing LBM of lakes Abaya, Chamo and paleo-lake Chew Bahir to model the new threshold of appearance (AHP and AHP-CAB) and disappearance (LGM) of paleo-lake Chew Bahir.

Stage 1-Precipitation to Vegetation
For each scenario, the data_pr dataset and the LC-BRT were used to predict the LC. For the PI scenario, the original data_pr dataset was utilised. The output of the multinomial LC-BRT is applied to classify each datapoint according to its most likely class. For the AHP scenario, the precipitation amount of each month and data point was changed by multiplying it with the percentage of the precipitation change (e.g., 1.3 for 130% precipitation amount as compared to modern-day conditions) starting from 100 to 150% with increments of 1%. For each precipitation amount within each scenario, the LC distribution in the catchments (Abaya, Chamo and Chew Bahir) is retained in a scenario specific spreadsheet for the next analysis step. For the AHP-CAB scenario, the absolute annual difference for each percentage change is calculated as an absolute value for each datapoint and converted to a percentage change in the months from June to September, covering the same precipitation range. For the LGM scenario, the significant temperature difference from modern-day conditions is considered by using the GDGT based temperature reconstructions [7,92]. For this purpose, the modern-day elevation (E) was converted into an elevation equivalent (EE) using the modern-day lapse rate (MDLR) over eastern Africa of −5.8 • C km −1 , the LGM calculated lapse rate of −6.7 • C km −1 [7], their difference of 0.9 • C km −1 (LRD) and a base cooling (BC) between 3 and 4 • C [92]. Afterwards, EE is calculated according to equation 1. With this EE, the paleo-vegetation is predicted, covering the precipitation range from 50 to 100% with increments of 1% compared to the modern-day amount. For the LGM LC-BRT model, a condition was added to classify each datapoint as afro-alpine vegetation if the EE exceeds 3000 m a.s.l. [3]. For the modern-day elevation and temperature gradient, the ratio of areas within the modelled area that have elevations higher than this threshold is negligible. Based on Fischer, et al. [10], we here use the assumption of similar temperatures during the AHP as today to simplify the approach.

Stage 2-Vegetation to ET
A parametrization approach was applied to calculate ET by a vegetation-precipitation dependent albedo, emissivity, soil moisture availability and surface drag coefficient [36,75,91]. The parameters have been calibrated meticulously ( Table 2) to acquire the same results as calculated by Fischer, et al. [10] and to account for the closed basin balanced water budget for the modern-day vegetation coverage, maintaining the parameter gradients in between the classes. ET calculation is based on the surface-drag coefficient, the wind speed, the soil moisture availability, the gas constant for dry air, the surface and air temperatures, the relative humidity and the saturation vapor pressure as a function of surface and air temperature [10]. The resulting ET for the scenario specific precipitation-vegetation distribution (AHP, AHP-CAB, LGM) has been calculated separately. Furthermore, the average temperature decrease for each catchment and lake in the LGM scenario was calculated based on the GDGT temperature reconstructions [7,92], which also resulted in an updated lake evaporation for lakes Abaya, Chamo and paleo-lake Chew Bahir.

Stage 3-ET to Lake Levels and Paleo-Precipitation
In the third stage, precipitation and the precipitation-dependent ET of each scenario was used as input for the LBM. The model used a change in precipitation of 100 to 150% for the AHP and AHP-CAB scenario and of 50 to 100% for the LGM (the upper and lower limit are set generously to cover definitely the expected precipitation amount for each scenario). For any amount of precipitation, the lake's reaction was simulated over 500 years, which is sufficient time for the system to reach its ET-precipitation equilibrium [10]. For simulation of each scenario, we analysed the resulting equilibrium lake level of paleo-lake Chew Bahir, the water fluxes between the catchments (Lake Abaya-Lake Chamo-paleo-lake Chew Bahir-Lake Turkana) and the relative importance of the extended catchment (lakes Abaya and Chamo) for the water balance of paleo-lake Chew Bahir. The precipitation threshold of the transition of lake appearance (to go from "no lake" conditions to a "flooded basin") in the AHP and AHP-CAB scenario is used as a precipitation estimate for the final LC prediction. The precipitation interval (comparison of 3 to 4 • C base cooling during LGM) of lake disappearance (to go from "flooded basin" conditions to a "no lake") is used as precipitation estimate in the LGM scenario.

Paleo-Vegetation Maps
Using the precipitation estimate from the prior step as input for the BRT model for LC, VCF and EVI, the vegetation distribution (LC, VCF and EVI) for each scenario from the LGM onwards until the modern-day conditions was modelled, presenting the most likely spatial distribution of vegetation type and density.

Phytolith Proxy
To infer and test the modelling results with proxies of past vegetation, lacustrine sediment samples covering the past 25 ka from the paleo-lake Chew Bahir basin CB-03 short-core were used [11,33,93,94]. Pollen is a well-known proxy for paleo-vegetation reconstruction but are not preserved in the Chew Bahir lacustrine sediments. Instead, phytoliths were used to gain information about paleo-vegetation, as phytoliths are particularly valuable for the identification of grass type composition [95]. Phytoliths are microscopic opal silica infillings of plant cells [95,96]. With sheet-wash, fluvial or aeolian transport distances typically up to 10 km from the sample (core) location, phytoliths reflect a mixed signal of the landscape mosaic [95,97].
We analysed 27 samples using a wet-oxidation and heavy-liquid density separation method described in Yost, et al. [95]. Phytoliths were identified using a modern comparative collection from C. Yost and the descriptive phytolith references listed in Yost, et al. [95]. Phytoliths were classified according to the International Code for Phytolith Nomenclature (ICPN 1.0; [98]) and only grass short-cell phytolith morphotypes were assigned to C 4 mesophytic, C 4 xerophytic and C 3 grass functional type (GFT) categories for percentage calculations following Table 1 from Yost, et al. [95] and the results of a modern phytoscape approach from the adjacent Turkana Basin [99]. The Iph aridity index, also used to discriminate short-grass xerophytic (Sahelian) savanna from tall-grass mesophytic (Sudanian) savanna [100], was calculated as described in Bremond, et al. [101]. The 95% confidence intervals in GFT percentages and Iph index values were calculated using the nonparametric bootstrap resampling method described in Yost, et al. [95]. Phytolith preservation was assessed using the criteria discussed in Yost, et al. [95] and was ranked on a numerical scale from good to poor to none. Phytolith concentrations were calculated by counting a Lycopodium spike added to each sample at the end of the phytolith extraction steps. Phytolith relative abundance was calculated based on the total grass short-cell phytolith count for each sample. These efforts led to a time series of phytolith preservation and C 4 mesic, C 4 xeric and C 3 grass phytolith relative abundance. Additionally, diatoms, sponge spicules, burned phytoliths and microcharcoal were counted but are not discussed in this study.

Dataset Exploration and Description
The data_tr dataset contains 65,052 datapoints, the data_pr dataset 135,368 datapoints. The elevation ranges from 360 to 3593 m a.s.l. with a mean of 1450 m a.s.l. as shown in Figure 3A. The precipitation ranges from 410 mm a −1 to 1740 mm a −1 with a mean of 1120 mm a −1 . The highest precipitation is recorded in the Southwestern Highlands (located in the north-western part of the study area, Figure 1), while the lowest annual amount is close to the shores of Lake Turkana and the southern Chew Bahir basin. The seasonality is unimodal with a strong peak during northern hemisphere summer in the forest classified areas and less intense in the savanna classified areas ( Figure 3B,C). The precipitation seasonality for grassland and open shrubland classified areas is bimodal with peaks during northern hemisphere spring and autumn, with the latter's rainy season being less intense. Roughly 45% of the study area is classified as grassland, 47% as savanna, 3% as open shrubland and around 5% as forest. EVI varies from 0.06 to 0.55 with a mean of 0.29. TC ranges from 0 to 82% with a median of 17%. Forest has the highest TC with a median of 71%, followed by savanna (18%) and grassland with a median of 8%, as shown in Figure 3B. The area with NTC comprises 0 to 87% with a median of 16%. Grasslands and savanna have a similar NTC median of 70%, in contrast to forests (27.5%) and open shrubland (22.5%). The NVC spans from 0 to 100% and has a median of 18%. The highest NVC median of 77% is encountered within the open shrubland areas. Grasslands have a higher median NVC (19%) than savanna classified areas (10%) and forest areas (2%).

Model Training and Validation
For the LC-BRT, the best fit to the data was achieved with a ROC of 0.79, 298 trees, a bag fraction of 0.5, an interaction depth of 1 and a learning rate of 0.05 (Table 3). The most important predictors are the September rains (24%), the elevation (20%), the precipitation in June (13%) and the precipitation in May (11%). The overall accuracy based on the confusion matrix (Table 4) for the multiclass prediction is 0.73 and the overall Cohen's Kappa is 0.57.
For wildlife reserves ( Figure 3A), the accuracy is 0.81, kappa is 0.61 and the ROC is 0.76. In national parks, the accuracy is 0.86, kappa is 0.73 and the ROC is 0.87. In controlled hunting areas, the accuracy is 0.87, kappa is 0.5 and the ROC is 0.78. In the test dataset, containing protected areas outside the study area, we achieved a balanced accuracy for evergreen broadleaf forests (0.9), open shrublands (0.51), savanna (0.76) and grasslands (0.58). The ROC for the test dataset is 0.72. The spatial distribution of the model prediction agrees well with the training data in most of the study area ( Figure 4), but disagrees in the transition areas, especially in the surroundings of forests and open shrubland areas. The determination coefficient (R 2 ) for EVI prediction is 0.8. The most important predictors are the precipitation in February (26%), May (21%), September (13%) and the elevation (11%). The R 2 for wildlife reserves within the study area is 0.58 and the EVI is overestimated by 5.6%. The determination coefficient for national parks is 0.64 and the EVI is underestimated by 1.4%. In controlled hunting areas, the R 2 is 0.59 and EVI is overestimated by 2.9%. In the test dataset outside the study area, the determination coefficient is 0.68 and EVI is overestimated by 14% on average.
For TC prediction, R 2 is 0.8. The most important predictors are the September rains (17%), the elevation (14%), December rains (10%) and precipitation in August (10%). The R 2 in wildlife reserves is 0.53 and TC is underestimated by around 1% in average. For national parks, the R 2 is 0.51 and the residuum is −0.28% on average. In controlled hunting areas, R 2 is 0.78 and the average residuum is −1.4%. In the test areas, the determination coefficient is 0.76 and the average residuum is −2.5%.
Even after fine-tuning the hyperparameters for the NTC, the coefficient of determination is only 0.49, indicating an insufficient correlation between the observed and predicted NTC. In the test area, R 2 is even lower with only 0.34, which makes a further use of the NTC-BRT unfeasible. For the NVC, the determination coefficient is 0.55 within the study area and only 0.4 in the test areas, making an application of the NVC-BRT also impracticable.
For TC prediction, R 2 is 0.8. The most important predictors are the September rains (17%), the elevation (14%), December rains (10%) and precipitation in August (10%). The R 2 in wildlife reserves is 0.53 and TC is underestimated by around 1% in average. For national parks, the R 2 is 0.51 and the residuum is −0.28% on average. In controlled hunting areas, R 2 is 0.78 and the average residuum is −1.4%. In the test areas, the determination coefficient is 0.76 and the average residuum is −2.5%.  The modern-day and supposed PI LC distribution of lakes Abaya, Chamo and paleolake Chew Bahir is summarized in Table 5. The most significant changes in the catchments of lakes Abaya and Chamo are the decrease of agricultural areas from 19% (Abaya) and 10.8% (Chamo) to 0% and the increase of savanna covered areas by 14% (Abaya) and 24% (Chamo). In the catchment of paleo-lake Chew Bahir, the model suggests a precipitationelevation based difference of open shrubland covered areas by around 14%. The forest covered areas are increasing moderately (0.4 to 4.1%) throughout all the catchments.
Based on the parametrizations of the LCs, we calculated a modern-day ET of 1072 mm a −1 for the catchment of Lake Abaya, which differs by −51 mm a −1 from the previously calculated value of Fischer, et al. [10]. In the Lake Chamo catchment, ET is 28 mm a −1 higher relative to the previous study, resulting in an annual ET of 1088 mm a −1 . In the paleo-lake Chew Bahir catchment, the MD ET is 939 mm a −1 compared to 892 mm a −1 of Fischer, et al. [10].
The PI (scenario 1) leads to a decreased estimated ET (PI ET of 845 mm a −1 ) in the Lake Abaya catchment, due to a decrease of agricultural area. This decrease results in a positive water budget within the catchment of Lake Abaya, which would then overflow to Lake Chamo and, hence, to the Chew Bahir basin. The estimated PI ET for the Lake Chamo catchment is 1026 mm a −1 , while it is 872 mm a −1 for the paleo-lake Chew Bahir catchment. The simulation of the precipitation dependent vegetation-ET for the 2a scenario (AHP, no CAB) is shown in Figure 5A. The forest coverage in the Lake Abaya catchment is increasing almost linearly to the 120% precipitation threshold and is then stabilizing, whereas the grassland coverage is decreasing, starting from around 45% and reaching 5% at around 125% precipitation. This forces a positive water budget at the modern-day precipitation amount and a decrease of the water surplus until the forest saturation point at around 120%. The grassland coverage of the Lake Chamo catchment is decreasing rapidly, being replaced by savanna, whereas the forest coverage is increasing slowly until 120% precipitation, after which it begins to decrease. The water budget is slightly positive within the simulated precipitation increase and ET is following the precipitation amount, starting to increase after the forest saturation point at~120%. In the paleo-lake Chew Bahir catchment, the model shows a positive water budget for the modern-day precipitation amount. The open shrubland coverage is decreasing to almost zero, whereas the forest coverage is increasing slightly with increasing precipitation. The main process in the paleo-lake Chew Bahir catchment is the replacement of grassland by savanna.

Scenario 2b-AHP, with CAB
In the CAB scenario 2b, the precipitation dependent vegetation-ET interrelation for each catchment is summarized in Figure 5B. The potential increase of forest coverage with increasing precipitation is higher compared to the 2a scenario and remains linear in the Lake Abaya catchment until a threshold at~140% precipitation. In the Lake Chamo catchment, forest coverage reacts at a greater precipitation increase and reaches 20% at 120% precipitation. In both mid-altitude catchments (Abaya and Chamo), grassland is replaced quickly by savanna due to increased precipitation. In the catchment of paleo-lake Chew Bahir, the 2b scenario leads to the same vegetation reaction to precipitation increase as in the 2a scenario, with a more pronounced increase of the forest coverage and a less pronounced decrease in grasslands.

Scenario 3-LGM
The simulation of scenario 3 (LGM) shows a~20% temperature-driven coverage of the Lake Abaya catchment with afro-alpine vegetation, which is independent from the precipitation amount ( Figure 5C). Within all catchments, the forest coverage is zero and the savanna is replaced by grassland, due to the decreased precipitation amount. In the paleo-lake Chew Bahir catchment, the savanna coverage is reaching almost zero at~80% precipitation. The grassland coverage decreased and is replaced by open shrubland with a stable plateau at~80% precipitation. In all catchments, ET is decreasing with a decline in precipitation. Water balance is positive for precipitation levels above~80% (paleo-lake Chew Bahir) and 70% (Lake Abaya) of modern levels. sciences 2021, 11, x FOR PEER REVIEW 16 of 32

PVM and LBM Model Link
The combined LBM for the 2a AHP scenario with no change in the seasonality of the precipitation shows a lake occurrence of paleo-lake Chew Bahir under modern-day precipitation conditions. With increasing precipitation from 15 to 20%, the model shows initially no lake, which changes as soon as the threshold of 25% precipitation is reached and a lake is (re)established. In the 2b scenario, as a theoretical contrast scenario with additional precipitation in June to September, the threshold for lake appearance is at~41% more precipitation compared to today. For scenario 3, the modelled LGM environmental conditions, caused the lakes to disappear below 80-85% of the modern-day precipitation. We used these thresholds (2a-125%, 2b-141%, 3-82.5%) as minimum (AHP) and maximum (LGM) precipitation thresholds and as input parameters for the spatial reconstruction of the vegetation mosaic during these time periods by combining the LC-BRT, the EVI-BRT and the TC-BRT ( Figure 6). We added the maximum lake extents of lakes Abaya, Chamo and paleo-lake Chew Bahir [10], as well as Lake Turkana [12] for the AHP scenarios. During the LGM, the model suggests that a desiccation of lakes Abaya and Chamo would have required 10% further reduction in precipitation than required for the desiccation of paleo-lake Chew Bahir ( Figure 5C). Scenario 2b AHP (with CAB) with spatial application of the LC-BRT, EVI-BRT and TC-BRT using modern-day elevation and 141% increased precipitation (threshold for lake appearance for scenario 2b). (C) Scenario 3 LGM with spatial application of the LC-BRT, EVI-BRT and TC-BRT using elevation equivalent due to cooler temperatures and 82.5% precipitation (threshold for lake disappearance for scenario 3).

Phytolith Proxy
Phytolith counts ranged from 0 to 252 per sample, with 16 (59%) yielding no phytoliths (Table 6). Almost no phytoliths were preserved before ~12 ka in the studied Chew Bahir Lake sediment record, except for a few that date to ~24 ka (Figure 7). During the AHP, preservation of phytoliths peaked at ~8 ka and decreased to almost zero at around 5 ka. Phytolith preservation remained low after the AHP. Hence, classification of the surrounding vegetation in the area is limited in this study to the AHP. With the exception of sample 524, uncertainties in the GFT percentages were on average ±9% (C3 grasses), ±10% (C4 mesic grasses) and ±7% (C4 xeric grasses). Even though the sample 524 GFT percentage was plotted in Figure 7, it should be noted that only three phytoliths were used in the GFT Figure 6. Model result map for scenario 2a, 2b and 3: (A) Scenario 2a AHP with spatial application of the LC-BRT, EVI-BRT and TC-BRT using modern-day elevation and 125% precipitation (threshold for lake appearance for scenario 2a). (B) Scenario 2b AHP (with CAB) with spatial application of the LC-BRT, EVI-BRT and TC-BRT using modern-day elevation and 141% increased precipitation (threshold for lake appearance for scenario 2b). (C) Scenario 3 LGM with spatial application of the LC-BRT, EVI-BRT and TC-BRT using elevation equivalent due to cooler temperatures and 82.5% precipitation (threshold for lake disappearance for scenario 3).

Phytolith Proxy
Phytolith counts ranged from 0 to 252 per sample, with 16 (59%) yielding no phytoliths (Table 6). Almost no phytoliths were preserved before~12 ka in the studied Chew Bahir Lake sediment record, except for a few that date to~24 ka (Figure 7). During the AHP, preservation of phytoliths peaked at~8 ka and decreased to almost zero at around 5 ka. Phytolith preservation remained low after the AHP. Hence, classification of the surrounding vegetation in the area is limited in this study to the AHP. With the exception of sample 524, uncertainties in the GFT percentages were on average ±9% (C 3 grasses), ±10% (C 4 mesic grasses) and ±7% (C 4 xeric grasses). Even though the sample 524 GFT percentage was plotted in Figure 7, it should be noted that only three phytoliths were used in the GFT calculation. Likewise, the youngest sample from a depth of 4 m yielded only 20 short-cell phytoliths and exhibited evidence of poor preservation, so the calculated GFT percentages have a high level of uncertainty (±20% for C 3 and C 4 mesic grasses) and are not used directly in our interpretations. The GFT distribution is showing the highest relative abundance of C 4 mesophytic grasses with a linear positive trend until the end of the AHP. C 4 xerophytic grasses are highest around 12 ka and are decreasing to zero at the end of the AHP. C 3 grasses have a comparatively low abundance but reach their highest values at 11 ka and 8 ka. With the exception of sample 464, Iph aridity index values during the AHP were all well below the 27.8 threshold at the 95% confidence level, indicating the presence of mesophytic tall-grass Sudanian savanna or mesic edaphic grasslands associated with paleo-lake Chew Bahir (Table 5). Even though no globular granulate phytolith morphotypes diagnostic of woody plants (trees and shrubs) were observed, during the AHP,~8% of the total phytolith count at this time was comprised of morphotypes typically derived from woody plants.

Scenario 2-AHP
In our preceding LBM study [10], it was estimated that a precipitation threshold of 120 to 130% during the AHP was required to allow paleo-lake Chew Bahir to reach the overflow sill at 543 m a.s.l., whereby +7-15% of the increase is due to vegetation feedback. However, the quantification of the vegetation feedback was previously used from a study at Lake Naivasha in Kenya by Bergner, et al. [36], where paleo-vegetation was

Results Synthesis
Our biosphere-hydrosphere modelling approach for southern Ethiopia from the LGM times to the present shows: (a) during the LGM, an annual precipitation decrease of at least 15 to 20%, along with decreased temperatures. This would explain the lake regression and would have resulted in extensive grassland cover and sparse vegetation in the lower elevations and widespread afro-alpine vegetation above approximately 2000 m a.s.l. (b) A precipitation increase of at least 25% during the AHP, which would explain the observed high lake level of paleo-lake Chew Bahir reaching the overflow sill. This precipitation increase would produce dense forest cover in the high altitudes (above 2000 m a.s.l.) and dense vegetation cover in the vicinity of northern Lake Turkana, as well as along the shores of paleo-lake Chew Bahir. (c) In the CAB scenario, it would require at least 41% more precipitation than modern levels to sustain the observed lake changes and would produce less dense vegetation in the high altitudes and favour forest coverage at the mid-altitudes (between 1000 and 2000 m a.s.l.). (d) The modern-day land-use has about a fourth of the forest coverage (10%) at high-altitudes compared to the pre-industrial potential LC (44%) and reduced forest coverage at mid altitudes from 16 to 4%.

Scenario 3-LGM
Previous studies based on reconstructed vegetation and their potential hydro-climatic habitat on a pan-African scale revealed a reduction of around 25 to 27% annual precipitation during LGM times in eastern Africa, compared to the amount of modern-day precipitation [102]. The orographic temperature driven displacement of the vegetation belts was estimated to be around −700 m in the dry, high mountains and around −1000 m in the humid mountainous regions [15], which is also supported by our modelling results that suggest an orographic lowering of the afro-alpine vegetation to around −1000 m. The pan-African vegetation reconstruction (based on sea surface temperature derived paleoprecipitation estimates and resulting vegetation transfer functions) by Anhuf, et al. [15] suggests grass savanna cover in the surrounding of Lake Turkana, dry forest or savanna vegetation in the majority of the area and forests in the highland areas. The coupled earth system model HadCM3LC yields a forest cover of less than 40% in Ethiopia during LGM times, in contrast to partly 60 to 80% forest cover during pre-industrial time [103]. Our model results for the LGM times are in overall agreement with the earth system model [103] and transfer function results [102], but paleo-lake Chew Bahir would already dry out at a precipitation decrease of 15 to 20%. In contrast to Hopcroft and Valdes [104], who used HadGEM2-ES to infer global vegetation patterns, our modelling results do not show vast spatial distribution of forests in Ethiopia, neither in the western nor in the central highlands.
There is no general decrease in the EVI (vegetation greenness or proxy for productivity), but instead a large decrease in EVI in the high altitudes of the western highlands and the southeastern escarpments. In contrast, an EVI increase is observed in the rift lakes region surrounding lakes Abaya and Chamo and the lowlands of the BRZ. For all altitudes, the tree cover decreased from 17 to 14%. Tree refugia during the LGM are mainly the Gofa range, the northwestern escarpments, as well as parts of the Agere-Selam escarpment (Figure 1). In contrast to the Agere-Selam escarpment, our model did not suggest a tree refuge in the Gamo-Gidole Horst, which suggests the possibility of a complex reorganisation of the vegetation mosaic in the EARS during the LGM.
The model result is tested with a new pollen record from the Gelba wetlands at 2300 m a.s.l. (Figure 1) [105], which are located in the Gamo-Gidole Horst at the catchment boundary of lakes Abaya and Chamo. Results from the Gelba wetland record show a high abundance of afro-alpine vegetation during late LGM times (Ericaceous, afro-montane forest), in addition to more dry conditions until about 13.5 ka BP, with increasing afro-alpine vegetation and wetlands/open water after 13 ka BP. Pollen from Podocarpus, Juniperus, Artemisia, Rumex and Poaceae are abundant in that region during the LGM [105], which indicates the presence of dry afro-montane forest species.

Scenario 2-AHP
In our preceding LBM study [10], it was estimated that a precipitation threshold of 120 to 130% during the AHP was required to allow paleo-lake Chew Bahir to reach the overflow sill at 543 m a.s.l., whereby +7-15% of the increase is due to vegetation feedback. However, the quantification of the vegetation feedback was previously used from a study at Lake Naivasha in Kenya by Bergner, et al. [36], where paleo-vegetation was extrapolated based on pollen proxy information and subsequently parametrised to calculate ET in the same manner as explained in Section 3.3.2. Since pollen data as well as precise paleovegetation estimates are scarce in southern Ethiopia, a continuous PVM was used for all precipitation amounts and a new threshold was determined. This threshold shows an addition of +18-34% is needed to compensate for the biosphere feedback.
The impact of changing seasonality on ET, due to a shift in the CAB (Scenario 2b), with the additional precipitation required to compensate for the increased ET during presentday dry months, was estimated at +7% based on SEBAL results for each month [10]. Our current hydrosphere-biosphere approach suggests an even larger effect of +16%, including the biosphere feedback. The 2b scenario (CAB caused seasonality changes) promotes the growth of forests in the mid-altitudes (between 1000 and 2000 m a.s.l.) which comprise most of the catchment areas for lakes Abaya and Chamo, both of which were significant water sources for paleo-lake Chew Bahir. Forest growth in these catchments would decrease the surface runoff to the lakes while significantly increasing ET on land. However, our modelling results provide possible realizations of a hydrosphere-biosphere interaction within the EARS as a minimum precipitation amount with two contrasting AHP scenarios, whereas a mixture of both (2a and 2b) seem most likely.
Scenario 2a (equally enhanced rainfall) suggests an approximately 40% increase in EVI and a 150% increase in tree cover throughout the study area compared to modern-day conditions. For scenario 2b, the tree cover increase is lower at 115%. The EVI increase is almost the same at 38%. In comparison, the CAB 2b scenario would increase EVI in the lowlands (below 1000 m a.s.l.) and the lake shores of Abaya and Chamo, as well as for the eastern escarpments, which are regions where the dry season is well pronounced. In contrast, the 2a scenario would increase EVI on the highlands in the northwestern part of the Omo River catchment, but the increase is not significant compared to the overall increase throughout the study area. The diverging model results between 2a and 2b for the southeastern escarpments also apply to the TC prediction. The 2b scenario produces forest growth, whereas the 2a scenario does not produce forests in that region. Furthermore, the 2b CAB scenario would result in an increase in forest coverage in the vicinity of Lake Abaya relative to modern-day conditions, which would not be the case in scenario 2a.
We conclude that the potential forest coverage during the AHP in southern Ethiopia is restructured and the overall trend observed in the modelling results is in agreement with the pollen record from Augustijns, et al. [105]. The pollen record shows wet to open water conditions and afro-montane forest pollen peaks at 13 ka and 7 ka. A pollen record from the central Ethiopian highlands from Lake Dendi (3270 m a.s.l.) revealed a low abundance of Podocarpus and Juniperus pollen during the AHP and instead yielded high Poaceae abundances [16], which would be in accordance with the modelling results of scenario 2b. This could be interpreted as supporting evidence for the concept of an eastward shift and intensification of the CAB.

Scenario 1-Pre-Industrial
How agriculture and grazing altered the landscape and affected the potential vegetation cover during the PI remains an open question. For instance, the Northwestern Ethiopian Highlands have been deforested since at least 3 ka, when agriculture and grazing progressively replaced hunting and gathering [109,110]. Soil erosion altered the landscape and may have degraded the ecosystem irreversibly [3]. This also affected the water run-off coefficient and the water budget of the lake basins [110]. Such a complete deforestation is not visible in the pollen record from the Chencha bog (3000 m a.s.l.) on the Chencha Horst (Figure 1), where the afro-montane forest cover seems to decrease at 1.7 ka but recovers after 0.8 ka [105]. In contrast, around Lake Turkana, Gil-Romera, et al. [111] did not see indications for a human-induced ecosystem shift in the past 2 ka. Instead, they concluded that bush encroachment and retraction are predominately controlled by rainfall and fire on multi-decadal to centennial cycles.
Overall, the model suggests that 11.5% of afro-montane forests have been replaced by agricultural areas. The forest coverage decreased (pre-industrial to modern-day) from 44% to 10% in high altitudes and from 16 to 4% in mid to low altitudes, showing the immense impact of agriculture on the landscape in the highlands. The model results agree with the atlas of the potential vegetation of Ethiopia [3] for the moist evergreen afro-montane forest but classifies vast parts of the dry evergreen afro-montane forest as grassland. Either the lack of sufficient training data for the dry central Ethiopian highlands could bias the result as the model is based on a learning algorithm, or grassland may already be a proper classification for the degraded potential landscape. The grassland and savanna classified areas of the PI scenario are different to the results of the potential vegetation based on Friis, et al. [3]. The species perspective from Friis, et al. [3] does not match either the observed or the predicted landscape phenology and density of the MODIS LC, which follows a north-south gradient ( Figure 3).
The shrinking of agricultural areas and the increase of savanna and grassland areas in the catchment of Lake Abaya reduces the ET rate on land for the PI scenario, which then leads to a modelled positive water budget within the lake system of lakes Abaya, Chamo and paleo-lake Chew Bahir. This could imply that agricultural activities in the Main Ethiopian Rift degraded the landscape and replaced high forest ET (low run-off) with high agricultural ET, conserving the water balance. However, if the actual soil-determined potential vegetation is a grassland complex (low ET, higher run-off), a hypothetical spontaneous end of agriculture would lead to a less negative water budget and increased lake levels. In contrast, a potential and realistic further increase in agriculture and extensive water use could result in a more negative water budget and decrease the lake levels of lakes Abaya and Chamo.
The LC in the Lake Abaya basin has been changing for decades [112], which may lead to an increase of areas with extensive water use and agriculture (coffee, banana) on the one hand, and gully eroded areas with a high run-off for the lake's water supply on the other. This explains the high sediment load of Lake Abaya, visible even in satellite imagery. For the PI, this could further imply that extensive agriculture has already affected the water budget of Lake Abaya and subsequently Lake Chamo and paleo-lake Chew Bahir. This extrapolation is supported by a report by von Höhnel [113], who observed in 1888 at the end of the dry season that the southern shore of paleo-lake Chew Bahir was occupied by a shallow lake, which is desiccated today. High lake levels after the termination of the Little Ice Age at the end of the 19th century have also been reported (e.g., Nicholson [114]) for lakes in the vicinity and under a similar climate regime. Based on the model results, we conclude that the persistent aridification of southern Ethiopia since the beginning of the 20th century, could be at least partly caused by LC changes.

Proxy Model Interference
While the atlas of the potential vegetation of Ethiopia [3] uses dominant and characteristic plant species to classify the broad variety of vegetation types, the MODIS LC uses remote sensing derived phenology and spectral properties to classify the vegetation and the landcover. Phytoliths as an environmental proxy from the Chew Bahir sediment cores [33], allow us to distinguish mesic and xeric as well as C 4 and C 3 grasses [95]. The best preservation of phytoliths is recorded during the AHP (Figure 7). The abundance of C 3 grass phytoliths is interpreted as the existence of aquatic vegetation of a wetland (dominated by C 3 ) near the sediment core site, similar to today. Due to changes in the water level of paleo-lake Chew Bahir over the past 20 ka, the wetland vegetation may have migrated up to 100 km northwards. If the wetlands are close to the drill location, the C 3 signals prevails. The wetland, with its dense aquatic vegetation seen today, may act as a phytolith filter that filters out long-distance phytolith transport. We thus assume that the origin of C 4 phytoliths in the Chew Bahir sediments are sourced in the catchment of the Hamar Range and in particular the catchment that is related to the alluvial fan 6 km west of the drill site. C 4 xeric phytolith types may indicate the existence of open shrubland to grassland on these alluvial fans, which would be the desert and semi-desert shrubland unit based on Friis, et al. [3]. C 4 mesic phytolith types indicate a grassland to savanna landscape, which would belong to the Acacia-Commiphora, respectively, the Combretum-Terminalia woodland and wooded grassland [3], especially Combretum-Terminalia woodland, as it is typically covered by dense (mesic) grass vegetation.
During the LGM, the absence of phytoliths (most likely caused by dissolution) supports the scenario of a highly alkaline paleo-Lake Chew Bahir. Our model uses this condition as a lake disappearance precipitation threshold (see Figure 5C). During the AHP, the rapid onset of phytolith preservation starting around 11 ka agrees with the rapid onset of humid conditions as recorded in the Chew Bahir K record and other records from paleo-lakes in the vicinity [13,34,35,93]. Phytolith preservation decreased continuously over a period of 3000 years (8 to 5 ka), again providing proxy evidence for a gradual decline of the AHP in the region, as previously also suggested by Foerster, et al. [33] and Fischer, et al. [10] and statistically analysed by Trauth, et al. [94]. During the main phase of the AHP (11 to 6 ka), the record shows the dominance of C 4 mesic phytoliths, and hence a savanna landscape with underlying dense productive grass, even close to the lake shore, which is a modern-day sparse grass and/or open shrubland area. The slow replacement of C 4 xeric grasses (suggesting open or semi-desert shrubland) could indicate either a slow climatic signal towards more and continuous precipitation in the lowlands, or a signal of the slow replacement process towards a more closed savanna landscape in the area. This trend could also be interpreted as a slow reaction of the landscape due to a groundwatertable-dependent vegetation structure, since groundwater may react over millennial time scales [115]. On the other hand, this trend could be an artefact in the phytolith record due to morphotype specific dissolution [116] during the AHP, showing the limitation of this phytolith record. The continuous upward trend is punctuated by two C 3 grass phytolith peaks at 10.5 ka and 9 ka, that is in agreement with the Chew Bahir K proxy record [11]. This could mark a brief period of dry conditions leading to rapid lake desiccation and a formation of a dense and productive wetland in the basin until it becomes flooded again. Fischer, et al. [10] concluded, based on the lake dynamics modelling and K-proxy results, that many of these lake level fluctuations interrupted the paleo-lake highstand during the AHP. A higher resolution phytolith analysis extending further back in time may detect additional lake desiccation phases.

Limitations and Advantages of the Method
We used elevation and monthly precipitation as the only predictors in our model, being aware that LC is also determined by other environmental factors, such as lithology, soil type or depth, geomorphological position (such as aspect, slope and terrain position) or windspeed and dry season length [117]. Precipitation and elevation, however, are the major determinants of the vegetation in tropical Africa [3,117]. Incorrect LC classifications, especially in the transition areas (forest to savanna and grassland to open shrubland), are below a significant error for our application on the hydrological cycle but may remain a consequence of the simple model structure. A better spatial precision for LC prediction could be achieved with the incorporation of more predictors, but testing, for example, the topographic position index on different spatial scales has not shown a significant effect. In addition to derivates of precipitation and elevation, reliable high-resolution data are especially scarce for soils and the lithology in southern Ethiopia. Additionally, a model with a higher complexity may lack traceability. Hence, we used these simple predictors for our LC prediction, being aware that the interpretation of results for areas of smaller scales, such as archaeological sites, incorporate uncertainties.
With a ROC of 0.79, the model fits the LC prediction well. It was necessary, however, to include an additional weighting factor, making each class equally important, as the imbalance between them made a successful model training for all classes impossible. This has led to a higher classification importance for the open shrubland and forest classes, which increases their suitable habitat in the PI prediction tremendously. This effect is acceptable since the dry parts of the grassland classified areas of the MODIS product for example, may be classified as open shrubland as well. The parameter variability (greenness, phenology, precipitation and elevation) in the LC groups overlap with the neighbouring groups. Another effect is the higher importance of the forest class in the model training and hence the broad increase of forest coverage in the high altitudes for the PI scenario. Forests are mainly limited to refugia and inaccessible areas. Hence, the environmental boundaries of the predictors (precipitation, elevation) can only be partly learned by the algorithm. The algorithm cannot learn a potential habitat of forests, if there are no remaining broadleaf forests left today, which may be true especially for the drier parts of the Ethiopian highlands [3]. Moreover, the coarse data resolution of the precipitation, compared to the landcover, leads inherently to the overlapping predictor variable space in between the LC's.
While LC, EVI and TC prediction works well, NVC and NTC prediction was not possible. The model is not able to predict low NTC correctly, as low NTC could imply high TC or high NVC, either on densely green or sparsely vegetated areas (see Figure 8A). The prediction of NVC works well for low NVC but fails to predict high NVC ( Figure 8B). The predicted NVC stabilizes at around 20-40%, while the observed NVC reaches almost 100%. Again, we encounter problems due to the coarse resolution of the GPM precipitation data and overlapping predictor variable space on a variety of plant phenology. This inappropriate model behaviour might also be caused by non-involved forcing factors such as the underlying soil. For instance, the white clayish lacustrine sediments of Chew Bahir, covering about 755 km 2 , prevents plant growth, while the predictions based on the precipitation-elevation would predict sparse vegetation. for our LC prediction, being aware that the interpretation of results for areas of smaller scales, such as archaeological sites, incorporate uncertainties. With a ROC of 0.79, the model fits the LC prediction well. It was necessary, however, to include an additional weighting factor, making each class equally important, as the imbalance between them made a successful model training for all classes impossible. This has led to a higher classification importance for the open shrubland and forest classes, which increases their suitable habitat in the PI prediction tremendously. This effect is acceptable since the dry parts of the grassland classified areas of the MODIS product for example, may be classified as open shrubland as well. The parameter variability (greenness, phenology, precipitation and elevation) in the LC groups overlap with the neighbouring groups. Another effect is the higher importance of the forest class in the model training and hence the broad increase of forest coverage in the high altitudes for the PI scenario. Forests are mainly limited to refugia and inaccessible areas. Hence, the environmental boundaries of the predictors (precipitation, elevation) can only be partly learned by the algorithm. The algorithm cannot learn a potential habitat of forests, if there are no remaining broadleaf forests left today, which may be true especially for the drier parts of the Ethiopian highlands [3]. Moreover, the coarse data resolution of the precipitation, compared to the landcover, leads inherently to the overlapping predictor variable space in between the LC's.
While LC, EVI and TC prediction works well, NVC and NTC prediction was not possible. The model is not able to predict low NTC correctly, as low NTC could imply high TC or high NVC, either on densely green or sparsely vegetated areas (see Figure 8A). The prediction of NVC works well for low NVC but fails to predict high NVC ( Figure 8B). The predicted NVC stabilizes at around 20-40%, while the observed NVC reaches almost 100%. Again, we encounter problems due to the coarse resolution of the GPM precipitation data and overlapping predictor variable space on a variety of plant phenology. This inappropriate model behaviour might also be caused by non-involved forcing factors such as the underlying soil. For instance, the white clayish lacustrine sediments of Chew Bahir, covering about 755 km 2 , prevents plant growth, while the predictions based on the precipitation-elevation would predict sparse vegetation. The model link is the central step in aligning the thresholds of the lake's system with the response of the surrounding vegetation. The parameter-based approach in estimating the annual ET [36,75,91] is widely used and tested for its sensitivity for the input parameter [10,36]. For the application, the LBM is dependent on precise input parameters, that are determined, tested and calibrated in our precursor study [10]. The calibrated surface The model link is the central step in aligning the thresholds of the lake's system with the response of the surrounding vegetation. The parameter-based approach in estimating the annual ET [36,75,91] is widely used and tested for its sensitivity for the input parameter [10,36]. For the application, the LBM is dependent on precise input parameters, that are determined, tested and calibrated in our precursor study [10]. The calibrated surface parameters (albedo, emissivity, soil moisture availability and roughness length) were tested carefully on the closed basins internal logic of the resulting ET in balance with P. This resulted in a difference of the LC surface parameters, compared to Bergner, et al. [36] and Lyons, et al. [91], but the in-between class differences remained. Land surface parameterization remains an approximation with classified LC's. The transitions between open shrublands, grasslands, savannas and forests are gradual. Continuous predictor models combined with ET land surface parameterization could improve the performance of the hydrosphere-biosphere prediction.

Human Landscape Preferences
Our model shows that the landscape in the study area during the LGM is dominated by a mixture of grasslands with a variety of vegetation types, resulting in a complex, mosaic landscape that would have offered a wide range of resources for hunter-gatherers. Huntergatherers at Sodicho Cave used exclusively obsidian raw material to produce their stone tools [118]. The use of the Humbo Baantu outcrop as obsidian source is attested for Sodicho Cave [64] and Mochena Borago [59,60]. The Humbo Baantu outcrop is around 20 km southeast from Mochena Borago [60] and around 60 km from Sodicho Cave (Figure 1). The use of these outcrops [64,118] shows the wide range of hunter-gatherers in the open landscape during the LGM. Movements through such landscapes to reach for example the Humbo Baantu obsidian outcrop, would have required at least a short-term camp and open water stop, that might have been available as residual water in the highlands [64,119] or even in the colder areas of afro-alpine vegetation or at small, temporal streams in the Lake Abaya catchment. The potential of alpine environments for human life/occupation is shown at Fincha Habera, although Ossendorf, et al. [29] suggest the existence of gallery forests within the Bale Mountain valleys. Trees are almost nonexistent at Mochena Borago according to our modelling results during the LGM, and the absence of evidence of human activity [59,60] may indicate that humans avoided the afro-alpine belt. Our modelling results support the refugium character of Ethiopian highlands during cold and dry conditions [10,33,59,60,64,120]. In summary, it seems that humans in southern Ethiopia lived in a rather open grassland with some trees during the LGM.
For the African Humid Period, the modelling results indicate a vast expansion of forests and dense vegetation in southern Ethiopia. According to the results, forests and dense vegetation covered the known obsidian outcrops in the area that were intensively used by humans during the LGM. So far, the archaeological record and our modeling results suggest that human activities ceased during very wet periods when dense vegetation and forests predominated. The exact reasons for this remain unknown, but it can be speculated that also pronounced wet phases were not necessarily favorable for humans, possibly because of dense vegetation constraining mobility or promoting the spread of diseases.
With obsidian becoming inaccessible due to the forest cover, the Main Ethiopian Rift lakes, such as Ziway-Shala presumably offered easier access to food, water and raw material [33,61,64]. The highly variable lithic technology along the lakes through time indicates a complex and changing settlement pattern coherent to changing lake levels [10,33,61]. Due to a gap in the archaeological record, rapidly changing lake levels seem to be unfavorable conditions for humans, however, it is also possible that signs of former human activities were simply washed away [33]. The shores of lakes Abaya and Chamo became densely vegetated and forested during the AHP, according to the model scenario 2b (CAB caused change in seasonality). If humans preferred more open landscapes in southern Ethiopia at this time, the closed landscape at the shores of lakes Abaya and Chamo would explain the absence of archaeological records. There are also no published archaeological records from the shores of paleo-lake Chew-Bahir, except for a brief report on rock engravings (5.66 ka ± 0.11; 4781-4274 cal BP) from an island in the southern part of the lake after the termination of the AHP [121]. The persistence of an open landscape, however, indicates a potential for archaeological sites during the AHP, similar to Lake Turkana [33]. After the AHP, human activity is recorded again in the archaeological record of Mochena Borago and Sodicho Caves [64,118,122]. The modelling results show that Mochena Borago and Sodicho Cave may have been surrounded by grassland once again.
Within the last 25 ka, hunter-gatherers used Mochena Borago and Sodicho Caves when the vicinity of the sites were characterized by grassland and solitary trees. With caution, we may propose that during the LGM and Middle Holocene, humans in southern Ethiopia occupied or even may have preferred open landscapes.

Conclusions
We developed a new Predictive Vegetation Model (PVM) based on open-source methods and multi-source data, which we linked to a Lake Balance Model (LBM) of the southern Main Ethiopian Rift [10] to independently reconstruct and disentangle the environmental processes, changes and amplitudes for the Last Glacial Maximum (LGM), the African Humid Period (AHP) and the pre-industrial (PI) time. The model output was compared to a new phytolith proxy record from Chew Bahir basin and pollen records from southern Ethiopia. The model shows a 15-20% decrease of annual precipitation during the LGM, which was dominated by open landscapes in the low and high altitudes with only a few forest refuges remaining, leading to the desiccation of paleo-lake Chew Bahir. During the AHP, a 25-40% increase in the annual precipitation amount resulted in a doubling of forest-covered areas and would explain the maximum possible lake level of paleo-lake Chew Bahir. Additional rainfall during northern hemisphere summer due to an eastward shift of the CAB would lead to a dense forest coverage in the mid-altitudes and open landscape in the highlands, which is supported by pollen records. The phytolith record indicates a rapid onset of humid conditions at~12 ka and a slow transition throughout the AHP to mesic conditions. Our comparison of model results and archeological records in that region suggests that humans may have largely occupied open landscapes. For the modern time, we conclude that the agriculturally altered landscape may endanger the water supply of the lakes Abaya and Chamo and may have contributed to the aridification of the Chew Bahir basin during the 20th century. This is of interest and concern for land and water management in Ethiopia in the near future, due to the increasing agricultural use of the Lake Abaya basin.