Inundation–Desiccation State Prediction for Salt Pans in the Western Pannonian Basin Using Remote Sensing, Groundwater, and Meteorological Data

: Salt pans are unique wetland ecosystems. In the Austrian Seewinkel region, salt pans are in an increasingly vulnerable state due to groundwater drainage and heightened climatic pressures. It is crucial to model how seasonal and long-term hydrological and climatological variations affect the salt pan dynamics in Seewinkel, yet a comprehensive understanding of the driving processes is lacking. The goal of this study is to develop random forest machine learning models driven by hydrological and meteorological data that allow us to predict in early spring (March) of each year the inundation state in the subsequent summer and fall. We utilize Earth observation data from Landsat 5 (L5), 8 (L8), and 9 (L9) to derive the time series of the inundation state for 34 salt pans for the period 1984–2022. Furthermore, we demonstrate that the groundwater level observed in March is the strongest predictor of the salt pan inundation state in summer and fall. Utilizing local groundwater data yields a Matthews correlation coefﬁcient of 0.59. Models using globally available meteorological data, either instead of or in addition to groundwater data, provide comparable results. This allows the global transfer of the approach to comparable ecosystems where no in situ data are available.


Introduction
Salt pans are a special type of terrestrial wetlands, which are formed in relation to arid climates, topographic depressions, and salt-rich groundwater [1,2].They can be defined as "(. . . ) arid zone basins (. . .), subject to ephemeral surface water inundation of variable periodicity and extent" [2].Saline lakes, such as salt pans, are of vital importance for biodiversity and water management [2,3]; however, at the global scale, their number is declining mainly due to direct human intervention in their hydrology and climate change [4,5].Although global data on salt pans are missing [6], many case studies suggest a global trend toward salt pan degradation and decline [2,5,[7][8][9][10].These trends also apply to the salt pans in Seewinkel in eastern Austria [11,12], where key regional ecosystem functions are under threat.The lives of, among others, halophytes [13], amphibians, reptiles [14,15], and birds [16,17] depend on these wetlands.Halophytes, such as communities of Puccinellio-Salicornietea, require a high groundwater level facilitating capillary rise to ensure their water supply [13].Birds, such as the kentish plover (Charadrius alexandrinus), use high water levels in spring (for hatching [18]) and summer [17], as do amphibians and reptiles [14,15].In Central Europe, such ecosystems can only be found in the Pannonian Basin [19] due to the unique tectonic conditions in the region [20].In recent years, processes such as eutrophication, paludification, siltation, overgrowth with vegetation, fragmentation, long-term drying, and in consequence, habitat loss, have accelerated [21,22].These are largely connected to excessive groundwater drainage for land use change [21,23].The potential impact of climate change on the salt pans in Seewinkel is not yet fully understood [23], although small, geographically isolated wetlands reportedly react rather quickly to meteorological forcing [24].
The salt pans in the Seewinkel region follow the salt pan cycle [1], in which the dry basin is the default, central, and recurrent moment, which is alternated by its opposite state: the varying presence of water [1,25].In summer, high evaporation rates in combination with an interruption of groundwater supply tend to outweigh precipitation [14] leading to salt pan desiccation.Especially during late winter and early spring [21], low evaporation rates allow precipitation combined with an increased contribution of groundwater to fill the basins.Wind contributes to important ecosystem processes as it influences evaporation rates and drives the mixing of water when the salt pan is inundated [21].It also strengthens capillary rise and blows out inorganic sediments from the salt pan basins during periods of desiccation [20].Salt pans in poor hydrological conditions lose additional water by surface water infiltrating into deeper soil layers [14].Thus, monitoring and predicting both the long-term and short-term variability of surface water occurrence in the Seewinkel salt pans is needed to assess ecosystem change and their resilience.
Wetland hydroperiod [26], a key characteristic and ecological indicator of intermittent wetlands, such as salt pans [19,27], can be characterized by means of water height (WH), water extent (WE), or water volume (WV) [2,28,29].WH derived from in situ water level gauges offers the most reliable and temporally frequent source of information.However, water gauges provide merely vertical, locally tied measurements and are costly to install and maintain.Especially for salt pans, the water level gauge must be positioned at the deepest point due to increased drying towards the edges.In many regions of the world, long-term, automatic in situ measurements are not widely available, as is also the case for the Seewinkel region (web address: https://wasser.bgld.gv.at/hydrographie/die-seenandhttps://ehyd.gv.at/ (accessed on 14 August 2023)).
WE is especially suited for studying the inundation state of the salt pans due to their shallow topography so that small changes in water volume cause substantial changes in water surface area.WE can be reliably retrieved from Earth observation (EO) satellite data that provide global, freely available information of high spatial and sufficient temporal resolution [30,31].Multispectral imagery has proven to be suitable for studying salt pans because of the high reflectivity of exposed salt surfaces and the absorption of infrared radiation by water surfaces [6].Although commonly suffering from cloud cover, multispectral observations are less affected by wind than radar systems [32][33][34], which have been widely used to monitor wetlands [30,31,35,36].Most studies use data from the moderate resolution imaging spectroradiometer (MODIS) [37] or a series of the Landsat missions, which together cover an observation period of nearly 50 years [38].Examples of global satellitederived WE products are the global surface water (GSW) product [39] and the dynamic surface water extent product [40].Additionally, continental-scale products exist [41].These large-scale products include data on salt pans (Figure 1); however, they are often inaccurate for small-size ecosystems, such as those encountered in Seewinkel [39].Local case studies using remote sensing to derive WE and inundation states are numerous [42][43][44][45][46][47][48][49][50].[51].Additionally, the groundwater stations used in the study are marked.The salt pans are colored based on the water occurrence product of the Global Surface Water (GSW) data set [39].The basemap stems from basemap.at (web address: https://basemap.at/(accessed on 14 August 2023)).The coordinate reference system is the MGI/Austria GK M34 (EPSG:31259; web address: https://epsg.io/31259(accessed on 14 August 2023)).The map inset shows the location of the study area (marked with an X) within Austria.
Advances in ML [70,71] have boosted the relevance of stochastic modeling for predicting lake WH [81][82][83][84][85]. Past research in modeling wetland inundation dynamics using ML methods is often restricted to using in situ measurements for identifying the presence of water [86,87].Greater data availability provided by EO [39,88,89] has contributed to studies utilizing WE for modeling wetlands, although, to our knowledge, not for salt pans and in different temporal resolutions.The monthly inundation state of freshwater playas in the Great Plains of North America has been modeled on a large spatial scale using a monthly global water extent product based on Landsat [39] and climate and land cover data [90].Inundation patterns in the Darling River Floodplain, Australia, were modeled using Landsat data and topography, meteorological, and hydrological data [91].Satellite-derived WE (lake surface area) of Lake Gregory, Australia's salt lake, has been modeled using ML with precipitation and temperature as predictors [92].Quantification of wetland permanence of four water body permanence classes in the Prairie Pothole Region, although not carried out for salt pans, was executed by ref. [93], who, in addition to climate and land cover, introduced features based on topography to ML modeling.Various ML models were used for the mentioned studies.Ref. [90] used a long short-term memory neural network, ref. [91] used RF, ref. [92] applied a generalized group method of data handling, and ref. [93] used extreme gradient boosting techniques.
The goal of this study is to combine long-term EO and ML to build seasonal prediction models of the inundation state of salt pans in the Seewinkel region of Austria.Modeling of the salt pan inundation state contributes to a better understanding of the effect of climate variability and groundwater exploitation on salt pan health.First, we derive and evaluate a reference data set of the yearly inundation states from 1984 to 2022 of 34 salt pans based on the Landsat satellite archive [38,94].Second, we use this new long-term inundation state data set as a target variable in combination with meteorological data from global reanalysis and in situ groundwater data to develop ML models to predict in early spring (end of March) if the salt pans fall dry during July, August, September, or October (JASO) of the same year.We develop models with and without the use of local in situ groundwater gauge data, which are precise but sparse at the global level.Therefore, we tested whether meteorological predictors in combination with EO-based estimates of WE are fit for use in regions with limited availability of in situ data.In total, we create three core models: a meteorology-based model, a groundwater-based model, and one combining meteorological and groundwater data.To identify the most important drivers, we apply concepts from explainable artificial intelligence [95].

Study Area
The study area is located in the Lake Neusiedl-Seewinkel National Park (Figure 1).Salt pans in Seewinkel are steppe wetlands [14] and mostly relicts of the Würm glaciation [96,97].Their size ranges between 0.03 km 2 and 1.5 km 2 (based on the data collected in the scope of this study).The salt pans are highly heterogeneous in their hydrological and ecological condition [14,98].Due to various causes, the number of salt pans in Seewinkel has decreased from 139 in the year 1855 to about 59 ecologically intact specimens in 2012 [14].Historically, the landscape has been subject to strong economic utilization, most notably through the intensification and expansion of agricultural practices since the 1960s [96] and because of tourism [14].A process referred to as "drying from beneath" [20] has been identified as the main driver behind "dying salt pans" [14], although diverging theories exist [97,99].Due to human-induced groundwater drainage, the capillary rise is disrupted starting at a depth to groundwater of approximately 70 cm [21].This finally results in the failure of the water retention capacity of the salt pans and desalinization [21].Some salt pans tend to be naturally filled with water all year (type: 'naturally perennial'), and some artificially hold water over the whole year (type: 'artificially perennial').Others tend to desiccate over continuous periods of time (type: 'periodically filled') [14].
The climate in Seewinkel can be classified as Dfb-climate (warm-summer humid continental; based on monthly ERA5-Land data from 1984 to 2022; Köppen-Geiger climate classification [100]).The total annual potential evaporation (E pot ) is 602 mm (1978-2010; [101]), the total annual precipitation (P) is 556 mm (1971-2020; [102]), and the annual mean 2 m temperature (T) is 11.1 °C (1971-2020; [102]).P is highest from May to September and lowest from October to April (based on monthly ERA5-Land from 1984 to 2022; verified via ref.[103]).While there is research on the impact of climate change on Lake Neusiedl [101,102,104,105], it remains unclear whether climate change currently affects the salt pans in Seewinkel [23].Most scenarios (Representative Concentration Pathway (RCP) 4.5 and RCP 8.5) for the Austrian state of Burgenland predict a significant increase in temperature for all seasons, whereas a slight increase in mean annual precipitation is expected together with a significant increase (+33%) in winter precipitation in the distant future (RCP 8.5) [106].Based on climate scenarios, it is estimated that the return period of moderate and extreme droughts will decrease in lowland Austria over the course of the 21st century [107], suggesting greater pressure on vulnerable wetland ecosystems in the future.

Multispectral Satellite Imagery
The salt pan inundation state was mapped using multispectral data from the Landsat 5, 8, and 9 satellites acquired during the time period from 1984 to 2022 [94,108].
Landsat 5 operated between 1984 and 2013.Its thematic mapper (TM) provided data in six bands in the visible, near-infrared (NIR), and shortwave infrared (SWIR) portions of the electromagnetic spectrum.Landsat 8 was launched in 2013, and Landsat 9 was launched in 2021.Their operational land imager (OLI) acquires data in nine bands, of which six bands are used here to be consistent with the information from the TM.Landsat data have a 30 m spatial resolution with a revisit time of approximately 16 days [38].We accessed the Level 2, Collection 2, surface reflectance data for Landsat 5 TM (DOI: 10.5066/P9IAXOVV) and Landsat 8-9 OLI (DOI: 10.5066/P9OGBGM6) using Google Earth Engine (GEE) [109].Level 2 data also include masks for clouds and cloud shadows, which were applied to the data.The WE retrieval was limited to scenes acquired between April and October of each year, as drying commonly occurs during these months and to limit the potential influence of ice cover on classification accuracy.Furthermore, scenes that contain more than 30% cloud cover according to the metadata were excluded from the classification.This resulted in a total of 286 images acquired between May 1984 and October 2022 that were used as the basis for the wetland extent classification.In 2012, there is a gap in the Landsat time series, as the Landsat 5 data for the region were only available until the end of 2011, and Landsat 8 was launched in 2013 [108].
In addition to the Landsat data, the Sentinel-2 data with similar spectral characteristics and a slightly higher spatial resolution from 10 m to 20 m was used for validation of the retrieved WEs.Seven Sentinel-2 scenes acquired within one day before or after the respective Landsat scene were used for this purpose.In addition, a cloud-free scene acquired on 31 May 2018 was used for manually delineating salt pan polygons.

Predictor Data 2.3.1. Predictor Selection
The selected features relate to the main drivers of salt pan variability, i.e., groundwater, precipitation, temperature, and evaporation (Table 1).They were narrowed down from a larger set of potential drivers based on the literature on wetlands and salt pan modeling in general, as well as the salt pans in Seewinkel, in particular [1].For example, topography, a feature proposed by ref. [93], was excluded as a predictor since the variation between salt pan basins is likely to be minimal (based on a local digital elevation model (DEM) (web address: https://geodaten.bgld.gv.at/de/downloads/hoehenmodelle-orthofotos.html (accessed on 14 August 2023))).Information on anthropogenic drivers in sufficient resolution, e.g., well extraction amounts or channel discharge, was not available [110].
WE in spring is expected to have a major impact on inundation state estimates in summer [21].However, we did not include WE as a predictor for the following reasons: First, its inclusion potentially results in overshadowing other predictors.Although possibly improving model performance, this would limit model interpretability and ecosystem understanding.Second, natural spring WE is the result of the underlying processes steered by groundwater, precipitation, temperature, and evaporation.Hence, WE information is implicitly included in the nine predictors used in this study.
To account for interannual variability in hydrological and meteorological conditions, we applied various integration periods to the selected predictors.However, the overall focus is on integration over 12 months as this period covers the entire time since the last prediction was made.Other integration periods span 6 months, whereas the SGI is a continuous variable, as indicated in Table 1.To detect trends in the features, we applied the Mann-Kendall test [111] to the nine predictors.We abstained from feature selection methods [112] as we wanted to obtain information on feature importance and partial dependencies on all introduced predictors.Importantly, our sample size is expected to be large enough with respect to the maximum number of features (nine features for the combined model), hence decreasing the likelihood of overfitting [113].Groundwater is of key importance for salt pan water abundance in Seewinkel [21] Short-term and especially long-term groundwater depletion leads to salt pan degradation [114] Austrian eHyd portal SGI [unitless] (SGI) Cont.
The Standardized Groundwater Index can serve as a robust estimation of groundwater drought [115,116] Groundwater drought in March influences the salt pan water extent in spring and therefore the inundation state in summer Level ratio [unitless] (GW level ratio) Oct./March (6 m.) Fall-winter groundwater level ratio is closely connected to regional precipitation during that time [117] The level ratio stands in relation to salt pan water extent in spring [14] Meteorology Anomalies [unitless] (T Anom.) 12 m.

Groundwater Level
Seewinkel is equipped with 84 groundwater gauges located in relative proximity to the salt pans.Of these, only six provide continuous observations throughout the time span covered by the Landsat observations and could thus be used for our study (i.e., stations 306043, 319418, 316174, 305755, 305813, and 319426).The point-based data are provided natively as monthly means through eHyd (web address: https://ehyd.gv.at/ (accessed on 14 August 2023)).First, we derived the mean monthly groundwater level of the six stations.We then calculated the groundwater anomalies to exclude long-term climatology.This was completed by subtracting the mean seasonal component from the original time series [128,129]: where A t is the anomaly at time t, D t is the monthly averaged values at time t, and C t is the long-term seasonal climatology.C t was calculated by averaging groundwater levels per month for the entire reference period from 1984 to 2022.Finally, we calculated the 12-month average of the time series between April (previous year) and March (current year) to derive presummer season groundwater anomalies for each year.All of these steps were executed with the Python packages numpy [130] and pandas [131].We decided not to apply further detrending to the anomalies to inform the model about long-term environmental changes (e.g., introduced by human management).Other predictors that were derived are the Standardized Groundwater Index (SGI) in March of each year and the October/March groundwater level ratio [115].The SGI is the only predictor with a continuous accumulation period and is based on a non-parametric normal scores transform of the groundwater level data for each calendar month [115].It represents information on the groundwater level, not as an average over 12 months, but as derived in March.The SGI was also calculated based on the monthly groundwater values averaged across all six stations.The calculation was executed through the Python package pastas [132].The groundwater level ratio was calculated as the ratio of the groundwater level in March (the time of prediction and typically the time of year of the groundwater level maximum [21]) divided by the level in October of the previous year (approx.the lowest level [21]).It serves as a proxy for groundwater recharge during winter.

ERA5-Land Meteorology
The ERA5-Land reanalysis [121] is aimed at land applications.It has the following main characteristics: the data from ERA5-Land offers a high spatial (ca. 9 km × 9 km) and temporal resolution, global availability, and temporal coverage dating back to the 1950s, thus, overlapping with Landsat retrieval periods [133][134][135][136]. Three variables were used: the total precipitation P, the potential evaporation E pot , and the 2 m temperature T. The E pot in ERA5-Land is higher than the Pannonian average of 600-800 mm per year [137] as the E pot is often overestimated because of representing open water evaporation [121].We used the ERA5-Land monthly averaged data from 1950 to the present (DOI: 10.24381/cds.68d2bb30) for the calculation of nearly all meteorological predictors.Only the number of days in a year with a maximum temperature above 25 °C was derived from hourly 2 m temperature data based on the ERA5-Land hourly data (DOI: 10.24381/cds.e2161bac).
First, the seven pixels covering the study area were combined by spatial averaging for each variable.Preprocessing of the ERA5-Land variables was performed in accordance with Equation (1) and the Python packages numpy [130] and pandas [131].Subsequently, a 12-month average was performed in the case of T and a 12-month summation in the cases of P and E pot .For the same reason mentioned in Section 2.3.2,detrending was not performed for these three features either.In addition, we used the ERA5-Land monthly P to compute the drought indicator standardized precipitation index (SPI) over 6 months and 24 months (SPI; [125,138]) using the R package SPEI [124].The SPI represents the precipitation conditions of a predefined time period in relation to the respective normal values.
It builds on the calculation of a normal distribution as in reference [138].Furthermore, the number of days in a year with a maximum temperature above 25 °C was derived.

Derivation of Water Extent Time Series
The WE mapping was based on the unique spectral properties of water, vegetation, and salt pans.While water absorbs most of the incoming radiation in the near-infrared spectrum [139], dry salt pans reflect more radiation than inundated areas throughout the visible and near-infrared wavelengths [6].Visible, NIR, and SWIR bands were used as input data for the classification.Additionally, the normalized difference vegetation index (NDVI), the normalized difference water index (NDWI) [140], and the modified normalized difference water index (MNDWI) [141] were computed as complementing the spectral bands with vegetation indices has been shown to have a positive impact on water classification [39].To classify inundated and non-inundated pixels, RF models [72] were trained using data sampled from the permanently inundated nearby Lake Neusiedl and permanently non-inundated areas from various land cover classes that are common in the region, such as agriculture and grassland.For each month that was selected for monitoring due to the reasons detailed in Section 2.2 (April to October), a separate model was trained to take the seasonality in reflectance values into account.
As the locations of the salt pans are known a priori, we used them as the baseline for the extent monitoring.The baseline extent of the salt pans was defined by manual delineation of 34 inundated and dry salt pans and water bodies visible in a cloud-free image acquired by Sentinel-2 on 31 May 2018.The rationale behind choosing a year late in the study period for the baseline inventory of wetlands was the focus of this study to model the inundation/drying behavior of currently existing water bodies.A year early in the study period likely would have resulted in a higher number of wetlands for monitoring; however, many of them would not have shown water during a large part of the study period.The salt pans were either covered by water or by salt crusts, which made them easy to distinguish visually from their vegetated surroundings.The remaining salt pans could not be clearly distinguished visually, so it was assumed that reliable monitoring would not be feasible based on the Landsat data.The baseline salt pan data set was then used to derive a time series of WE for each salt pan from the classified Landsat data.For each time step, connected water pixels were first clustered, and, for each salt pan contained in the baseline data set, the clusters intersecting its baseline extent were used for determining its water area.In the case that the baseline extent of a salt pan was covered by the cloud/cloudshadow mask ≥ 10%, the WE of the respective salt pan was set to a missing value.The classification process resulted in 34 WE time series, one for each salt pan, thus covering more than half of all functioning salt pans in the region [14].Each time series comprises between 199 (Oberer Stinkersee) and 245 (Albersee) valid data points derived from the 286 Landsat images.The number of valid data points per salt pan varies due to the influence of the cloud/cloud-shadow masking.
The retrieved WEs were validated for seven dates using reference sample points that were manually labeled as 'water' and 'non-water' samples using the Sentinel-2 data acquired within one day before or after the respective Landsat scene.As wetlands make up a relatively small portion of the study area, a stratified approach for the sampling was selected, i.e., the same number of water and non-water samples was drawn.For this purpose, each of the Sentinel-2 scenes was preliminarily classified using a threshold of NDWI > 0, and then a morphological opening (a kernel size of 3 × 3 pixels) was applied to the resulting binary raster to remove isolated pixels.This preliminary classification was then used to sample 50 reference points for each class, i.e., a number of 100 reference pixels was available for each of the seven validation dates.These samples were then assigned a preliminary label according to the NDWI-derived class, which was then manually corrected if a visual inspection of the original Sentinel-2 imagery showed discrepancies.For each validation date, a confusion matrix was computed between the reference points and our Landsat-based WE classification.The overall classification accuracy (OA), true positive rate (TPR), and kappa coefficient are reported as performance metrics for the WE classification.Accuracies were computed for each validation date because the conditions, particularly in cloud cover, can change significantly between different image acquisitions, which is reflected in the obtained accuracy measures.

Derivation of JASO Inundation State
To derive the model target variable, an inundation state, that is 'desiccated' (0) or 'inundated' (1), was assigned to each year.This was completed based on the WE time series in JASO described above for nearly every year (1984-2022) and for each of the 34 salt pans individually.Due to the WE data gaps caused by cloud cover during the year 2002 (only acquisitions on 13 June 2002 and on 20 June 2002), this year could not be used for inundation state modeling.The only acquisition for the year 1999 in JASO, on 25 September 1999, did not result in WE data for Kiesgrube and St. Martins Therme 2 due to cloud masking.We decided to manually insert two states 'inundated' into the time series for the two salt pans after a visual inspection of the image, as otherwise the entire year would have been discarded.The year 2012 is missing in the inundation state time series due to the reasons described in Section 3.1.1.
In the case of a desiccation event, that is, a WE of zero, in any of the four JASO months for each year, the year was tagged as 'desiccated'.Correspondingly, in the event that no desiccation occurred, meaning a non-zero WE was present during the entire JASO period, the year was tagged as 'inundated'.Hence, a yearly and binary target space was formed.This resulted in a total of 1258 data points (37 years times 34 salt pans).
We decided to display the inundation state for each salt pan and year to increase the understanding of the model target.This was completed with the Python package matplotlib [142], as well as with all other data visualizations.Additionally, the month of the first desiccation event per year between April and October was visualized based on the original WE time series for spotting inundation events outside the JASO period.The first desiccation event per year is, furthermore, of key ecological importance [14,21].

Exploratory Data Analysis: Separability and Correlation Analysis
As a first step, we analyzed the feature space to unveil underlying distributions, feature relations, possible non-linearities in modeling, data complexity, and class separability [143].This was completed by visual inspection of histograms per predictor and scatter plots between all predictor pairs in combination with class-based coloring using a seaborn pair plot [144].The complexity analysis was performed for two exemplary salt pans: Lange Lacke, which is one of the largest salt pans, and Unterer Stinkersee, which is known for its close connection to groundwater [14].We abstained from a complete analysis for all salt pans via the maximum Fisher's discriminant ratio or other complexity metrics [145] due to compactness and the ability of the RF to detect multi-dimensional patterns.
Furthermore, we performed a correlation analysis to gain an understanding of the temporal agreement between the predictive features.For this purpose, both Pearson's correlation coefficient r and Spearman's rank correlation coefficient ρ were calculated between the nine features with the scipy stats Python package [146].

Random Forests
RF models [72] are commonly used in classification and regression problems in remote sensing [147,148] and hydrology [69,73].Because of the assumed strong relationship between the targets (i.e., the inundation state of different salt pans), we used the scikitlearn multi-output option (web address: https://scikit-learn.org/stable/modules/tree.html#tree-multioutput (accessed on 14 August 2023)), which is expected to improve generalization accuracy by estimating the inundation state of multiple salt pans in a single model [149,150].This assumption is based on detailed information on the hydrology of the individual salt pans [14,20].We applied hyperparameter optimization (also referred to as hyperparameter tuning) to reduce overfitting [151].All concepts applied here are based on algorithms of the respective scikit-learn packages [152].

Model Setup
The inundation state in summer/fall serves as the target variable in our models.Using data until the end of March of each year, we predict whether a salt pan dries out ('desiccated') or remains 'inundated' during JASO of the same year.This binary classification scheme has already been used as the basis for modeling WE in a number of studies [87,90,91].The simplicity of the inundation state in summer/fall, meaning its low temporal resolution, its low number of classes, and its low degree of mathematical abstraction (Section 3.1.2),in combination with our predictor setup and the RF algorithm, leads to relatively good model performance and model interpretability.The inundation state in summer/fall comprises a number of advantages: it is relatively robust considering the inhomogeneous number of acquisitions per year; it closes the lack of preceding research by introducing a classification task with a low number of classes; it addresses important hydrological and ecological issues, as many plants and animals in Seewinkel rely on water abundance within the salt pans during summer/fall [14,17,18]; it is of interest to decisionmakers that can use our models to enable efficient water management by, e.g., steering artificial inundation; and it further helps us to identify the most useful predictors quantifying refilling during winter/spring months, as it is aimed at the core characteristic of salt pans [1], namely the yearly inundation state in JASO.The main limitations of the target chosen in this study are its coarse temporal resolution and the omittance of spring inundation/desiccation events that are of high ecological importance [14,17].
Although the prediction is only made once per year, lead times vary between three and six months as the drying events accounted for by our model can happen in any month from July to October.Despite desiccation sometimes also occurring before July, we aimed for a setup that enables forecasting.This is, on the one hand, more challenging because we include longer lead times, but, on the other hand, more valuable for policymakers and stakeholders.
To gain a thorough understanding of the predictors of salt pan desiccation while ensuring global model transferability, we developed four RF models.The model GROUND-WATER only uses in situ groundwater information, the model METEOROLOGY uses only meteorological data, and the COMBINED model uses both groundwater and meteorological predictors (Table 1; sometimes referred to as main models).In addition, we developed the RANDOM model, based on a single predictor randomly sampled from a uniform distribution to create a baseline for testing model performance by involving chance.
For each of the four models, we followed the same training, validation, and test splitting [153,154].We use the definition of ref. [155], who defines the validation set via the use of hyperparameter tuning and the test set via the use of a final and independent evaluation of the model.The model splits are presented in Figure 2. We followed an approximate overall 70% (training), 10% (validation), and 20% (testing) split.For hyperparameter tuning (described in Section 3.2.5),we applied a six-fold stratified cross-validation (CV) within the 80% model validation set that recurrently applied the approximate 70%/10% split.This roughly translates into an 85%/15% split relative to the entire model validation set (Figure 2).

Model Testing
Inside model testing (∼20% of the entire data set), we made use of a leave-one-out cross-validation (LOOCV) [156,157].This is meant to improve the prediction skill as always all other folds, except for the current single test fold, were used for model training as part of a recurrent 97%/3% split with respect to the entire data set (Figure 2).Hence, seven LOOCV runs were executed to test, each time, one year (at this point the independent year) of the overall seven independent test years.This independent test set (Test 7 in Figure 2) comprises the years 1985, 1991, 1997, 2004, 2010, 2017, and 2022.The years were chosen at roughly similar intervals across the entire temporal domain to ensure a balanced distribution of folds over time while maintaining the class balance of the entire data set.
While independent model testing uses ∼20% of the data (seven test years), we aimed to understand the year-wise model performance from 1984 to 2022.Hence, we reintroduced the folds used for model validation to model testing as part of a dependent test set.Here, the training data (∼80%, 30 additional years) were reused from model validation in the scope of the overall LOOCV (Test 30 in Figure 2).The year-wise information on metrics additionally benefits model understanding, as the target data exhibits skewed fold-wise class distributions that affect fold-wise model performance.Although the reintroduction enables the calculation of feature importance and partial dependencies for all folds, we abstained from such analysis due to small year-wise sample sizes.The reintroduction of the years used for model validation inherits the predisposition that these folds have been part of the hyperparameter tuning.Therefore, the metrics for the folds of the dependent test set might be inflated as the hyperparameters were adjusted to exactly this set (model validation set).
As we did not want to exclude fold estimation for the initial training set, we used LOOCV instead of nested CV [158], which is more commonly used for time series.The metrics for some salt pans may also be heightened due to a large target autocorrelation as we apply an LOOCV scheme [158], although the effect is expected to be minimal due to the random characteristics (sample bagging) of the RF algorithm [72].
The estimations within the LOOCV scheme were carried out in the scope of 30 runs per main model.These repetitions aim to address the stochastic variability (randomness within the sampling and selection of features) of the RF approach [72].The results of the LOOCV were divided into training and test scores for the four model setups and two test sets (independent test set and dependent test set) and averaged over all runs.We calculated the standard deviation (SD) for model variability quantification but did not include it in the results, as it was generally low (max.SD of the Matthews correlation coefficient (MCC; Section 3.2.6) of 0.02 for the dependent test set (model GROUNDWATER); max.SD of the MCC of 0.03 for the independent test set (model RANDOM)).
For more insight, we calculated the average fold-wise LOOCV performance over all 30 model runs for the three main models (not the RANDOM model; Section 4.2.2).This provides information on the development of the test metrics over time.Additionally, random model realizations for the three main models were chosen to gain an in-depth understanding of model test set behavior (Section 4.2.3).This allows for studying the salt pan-wise inundation prediction, meaning the outcome as true positive (TP), false positive (FP), true negative (TN), and false negative (FN), for every year.The results for the RANDOM model were based on 200 repetitions to account for the random nature.

Model Validation
For model validation, hyperparameter tuning was applied using GridSearchCV [159].The final hyperparameters used for the three models can be seen in Table 2.We used the hyperparameters from the COMBINED model for the calculation of the RANDOM model, as they offer the most robust solution against overfitting due to the larger number of features compared to the GROUNDWATER model and METEOROLOGY model.Throughout the parameter-tuning process, GridSearchCV was performed with a sixfold stratified CV that uses an approx.85%/15% split within the model validation set, as described in Section 3.2.4.Hence, for each validation run, five years were introduced as the actual validation set.The hyperparameter tuning was based on the definition of a range of values deemed sensible by the literature [151].Individual hyperparameters were varied within the predefined range and compared with performance differences of the training and validation sets.This gave a rough indication of and lever against overfitting.Subsequently, a further, more restricted range of hyperparameters was declared that allowed for trainingtest score differences of a maximum of 25%.Once more applying GridSearchCV, this range of values (Table 2) was tested using the MCC as the indicator.This resulted in the final set of hyperparameters also displayed in Table 2.
The hyperparameters were chosen using a different CV scheme and, hence, different model splits compared to the LOOCV scheme of the main models.The results obtained in the scope of the hyperparameter tuning were slightly worse compared to the test scores.In combination with the removal of the seven test folds, independent model testing is ensured, as hyperparameters adjusted to a certain split are prevented.The hyperparameters for all models (Table 2) are less complex compared to the default scikit-learn ones that are designed to fit a maximum of use cases [160].Generally, the decision trees have similar complexity between the models.The exact number of trees (40, 100, or 300) is not essential for model performance, as was verified both by varying the number of trees and by the CV results provided by GridSearchCV.

Evaluation Metrics
Global confusion matrices with an MCC, F1-Score, and OA were calculated for the training sets, test sets (Test 7 and Test 30), and validation sets as the main indicators of the classification performance.Since we have a slight class imbalance, but both classes are equally important, we performed macro-averaging to compute the F1-score to put equal weight on both classes [161].As the F1-Score does not consider TNs, and the overall accuracy is vulnerable to a skewed class distribution [162], the main metric considered in this study is the MCC.It is a robust metric regardless of class imbalances [163].In the case of the main models, the metrics, in addition to being averaged over the 30 model runs (200 for the RANDOM model), were either averaged over all folds and salt pans based on the LOOCV scheme [156,157] or averaged over all salt pans for a single fold.To gain a salt pan-wise understanding of the model performance, the metrics were averaged across all years for the random model realizations described in Section 3.2.4 for the eight salt pans that exhibit a balanced class distribution.It was not possible to include the results for many of the salt pans due to the model's tendency to predict a single class in the case of a highly skewed class distribution per salt pan.We applied the formulation of the MCC used in scikit-learn [152]:

Feature Importance
For model interpretability, we calculated the feature importance by the mean decrease in impurity (MDI, also Gini index) [164] and partial dependencies [165,166].The results for the MDI were averaged separately for each of the three main models over all CV folds and model runs.Since the predictors used in this study are all time series that exhibit high cardinality, the use of feature importance based on the MDI is justified [72,167,168].Partial dependency plots (PDP) and individual conditional expectation (ICE) plots [165,166] were calculated for each salt pan individually using the training set for the test fold 1984, as this includes the most recent information.We interpret the PDP or ICE curve as the probability of predicting 'inundated' or 'desiccated' given different predictor values, as can be performed for a binary classification [169].The PDP and ICE plots were based on the COMBINED model's run, which is also displayed in Section 4.2.3.Salt pans with an especially large partial dependency, spread with regard for the respective predictor, were manually chosen and visualized, in addition to the PDP, by the ICE plots [166].The spread was calculated by subtracting the minimum partial dependence value from the maximum partial dependence value for each salt pan and predictor combination.For Heidlacke, Hottergrube, and Gsigsee, no partial dependencies could be calculated as only the state 'desiccated' is present.Both algorithms are based on the respective scikit-learn packages [152].

Salt Pan Mapping
For most validation dates, moderate to high accuracy and kappa values could be achieved (Table 3).The WE tends to be underestimated, and reference samples along the edges of the WE are often not contained in the Landsat-derived WE.
A total of 60% of the complete inundation state data set (754 combinations of years and salt pans) are classified as 'desiccated', and 40% (504 events) are classified as 'inundated' (Figure 3a).For individual salt pans, the class distribution is highly heterogeneous, with a class imbalance of up to 100% in the case of Hottergrube, Heidlacke, and Gsigsee.For some salt pans, e.g., Lange Lacke, Unterer Stinkersee, or Herrnsee, extensive years of drought are needed to force drying.Unterer Stinkersee (type: 'naturally perennial') falls dry less frequently compared to Lange Lacke (type: 'periodically filled').Salt pans that regularly fall dry are, e.g., Hochstätten, Fuchslochlacke 1 and 2, Huldenlacke, or Oberer Stinkersee.For Darscholacke, Zicksee, Kiesgrube, and Badesee Apetlon (type: 'artificially perennial'), an inundated state can be observed for nearly all years.Numerous salt pans fell dry six years in a row from 2016 to 2022, the longest (nearly common) desiccation period since 1984.In many cases, the year-wise distribution is similarly skewed as in recent years.The 'inundated' state commonly occurs clustered in time and over multiple pans.For example, the three periods in   As Figure 3b shows, in some years, (early) desiccation is prevalent; for others, inundation is present throughout the year (assuming the presence of water in winter).Since 2016, desiccation occurs earlier in the year compared to earlier periods.A total of 41 desiccation events outside the JASO period can be found when the 'inundated' state is prevalent.

Exploratory Data Analysis
We detected no trend over the study period for five features (GW Anom., GW lvl.ratio, SGI, P Anom., and SPI 6) and an increasing trend for the other four features (T Anom., Number of days above 25 °C, E Anom., and SPI 24).As expected, the features that are connected to an increase in WE, e.g., groundwater and precipitation anomalies, have positive correlations with each other (Figure 4).The same applies to the features that are assumed to be connected to a decrease in WE, i.e., temperature, evaporation, and number of days above 25 °C.Consequently, correlations between predictors associated with water gain and water loss, respectively, are negative.The relationships between features that build on integration periods other than 12 months (e.g., SPI 6) are less clear.
The histograms in Figure 4 suggest a better separability between the desiccation and inundation states for Lange Lacke (a) compared to Unterer Stinkersee (b).This is indeed confirmed by the inundation state classes in the scatter plots, which show more distinct clusters in the case of Lange Lacke.The plots suggest that groundwater-based features, especially SGI, are the most promising predictors for desiccation forecasting.They also reveal that no two-dimensional predictor combination can lead to perfect class separability on its own and that multiple features should be included in the models.

Average Prediction Skill
On average over all 37 folds provided by the LOOCV and 30 model runs, the three models have moderate skill with an MCC of approximately 0.6 (GROUNDWATER: 0.6, METEOROLOGY: 0.59, COMBINED: 0.6).For the independent test sets (seven test folds), a 0.24 performance increase with respect to the RANDOM model is obtained on average for the three models.Generally, the test metrics as averaged over all folds (described above) and the dependent folds (Table 4) confirm the abilities of GROUNDWATER in modeling the inundation state.The independent test set performance metrics of METEOROLOGY exceed those of GROUNDWATER.The COMBINED model does not show any increase in performance with respect to METEOROLOGY or GROUNDWATER.Averaged over all independent test folds and 30 model runs together, the MCC of METEOROLOGY is 0.09 (0.07) higher than that of COMBINED (GROUNDWATER).Differences between the confusion matrices, as averaged over 30 model runs, are minimal between the three models (below 1% in regard to the entire sample; Figure 5).
All three models struggle more with the correct estimation of state 'inundated' compared to 'desiccated' (relation of FPs and FNs to the total number of state 'inundated' and 'desiccated', respectively).GROUNDWATER exhibits more skill in the classification of TNs (a surplus of nine TNs) compared to the METEOROLOGY model.COMBINED manages to achieve a surplus of one TP and nine TNs compared to METEOROLOGY.As discussed in Section 4.2.1, large salt pan inundation state variability results in extensive year-wise heterogeneity in the class distribution and model performance.As indicated by Figure 5, the performance between the folds is heterogeneous.The models perform better, and with less variability in terms of the MCC, for the independent test folds (GW 7, METEO 7, and COM 7) compared to the 30 dependent test folds (GW 30, METEO 30,and COM 30).Outliers excluded, the tested folds exhibit a pronounced dynamic over the years.Beginning in 2004, the estimates tend to improve.Here, the MCC does not fall below 0.5, neither for the seven test folds nor for the additional 30 test folds.In the years 2006 and 2007, the GROUNDWATER model performed much better for the 30 dependent test folds compared to the other two models.A very different picture is observed pre-2004: all three models struggle to achieve scores from above 0.5 to 0.6.Especially 1992 presents itself as challenging for the RF models.
All GW 7, METEO 7, and COM 7 (Figure 5) perform better compared to GW 30, METEO 30,and COM 30.With this in mind, the difference in metrics between the two sets, as displayed in Table 4 can be integrated more clearly.Table 4. Average performance of different model setups inside LOOCV scheme separated for testing the seven independent test folds (1985, 1991, 1997, 2004, 2010, 2017, and 2022), and the thirty dependent test folds that have already been part of the validation set.Results are averaged over 30 model runs.As the SD was, in all cases, below 0.03, we disregarded this information for each metric.

Detailed Analysis of Single LOOCV Model Runs
All three models successfully predict the diverse interannual dynamics of the inundation state per salt pan (Figure 6).TPs and TNs are numerous, especially in the years with average conditions.Overall, an accurate estimation in times of pronounced dry and especially wet periods appears more challenging.For instance, the models lack adequate prediction for the dry periods in 1986,1992,2003,2007,2011, and 2016, as well as for the wet periods in 1987 and around 1996 (1997), 2010, and 2015.For 18 salt pans with a dominant state, the events are correctly predicted in favor of the majority class (a co-occurrence of light blue and dark red and of light red and dark blue, respectively, in Figure 6).In these cases, the estimates between the three main models (i.e., GROUNDWATER, METEOROLOGY, and COMBINED) do not differ, e.g., Badesee Apetlon.Differences between the models exist for 16 salt pans but are only marginal for Standlacke, Ochsenbrunnlacke, Wörtenlacken 1, Sechsmahdlacke, Fuchslochlacke 1, Herrnsee, Birnbaumlacke, and Albersee.The eight cases with varying results between the models are Zicklacke, Katschitzlacke, Fuchslochlacke 3, Oberer Stinkersee, Mittlerer Stinkersee, Wörtenlacken 2, Neubruchlacke, and Lange Lacke.These salt pans feature a more balanced underlying class distribution with varying class succession.Here, the models demonstrate their flexibility, i.e., the ability to estimate different states for subsequent years.When there is a shift from periods of 'inundated' to 'desiccated', or vice-versa, the models do not correctly predict wet and dry states for many salt pans.For instance, correct estimates for wet conditions in 1994 and for dry conditions in 2016 emerge one year later after a dry period (around 1993) or wet period (around 2015) (Figure 6).The strength of model GROUNDWATER lies in its ability to more precisely identify wet, and, in particular, dry episodes compared to model METEOROLOGY (the identification can be salt pan-specific).Examples here include Lange Lacke and Wörtenlacken 2 around 1991 and Neubruchlacke, Katschitzlacke, and Zicklacke after 2004.The METEOROLOGY model struggles to achieve this, often failing to correctly estimate for a number of years in a row (e.g., Neubruchlacke around 2006 and Zicklacke around 2014).However, for many salt pans, the model performed better in 1986 and 1988.In a number of additional instances, the METEOROLOGY model manages to outperform the other two.This translates into the correct estimation of single anomalous events before and after extremely wet or dry conditions.Examples include Ochsenbrunnlacke, Mittlerer Stinkersee, Fuchslochlacke 1, Fuchslochlacke 3, and Katschitzlacke in 2016 (as outcome TP) or Lange Lacke and Wörtenlacken 2 in 1994 (as outcome TN).The COMBINED model performs similarly to the GROUNDWATER model, though it surpasses its performance in a few cases (e.g., Mittlerer Stinkersee in 1999 and 2000).In these cases, the model is able to integrate meteorological and groundwater-based information in a productive manner.
The MCCs for the eight salt pans that exhibit a more balanced class distribution differ between the three models and between the salt pans (Table 5).On average, the GROUND-WATER model performs best with a moderate MCC of 0.44 and the METEOROLOGY model performs the worst (0.29).The COMBINED model attains an MCC of 0.37.For Fuchslochlacke 3 and Mittlerer Stinkersee, the METEOROLOGY model performs better than the GROUNDWATER and COMBINED models, whereas the opposite is true for the other six salt pans.Compared to the overall model performance of 0.6, the models perform worse for these eight salt pans with an average MCC of 0.37.
Table 5. Salt pan-wise MCC (as averaged over all folds) for the three single model realizations (GW-GROUNDWATER, METEO-METEOROLOGY, COM-COMBINED) displayed in Figure 6 for the eight salt pans that exhibit a balanced class distribution.In the GROUNDWATER model, GW anomalies and the SGI have comparable importance (around 0.35, respectively) while this is lower for the GW level ratio (Figure 7).In the METEOROLOGY model, SPI 6 has little importance while the predictors derived from temperature have highest importance.SPI 24, although substantially correlated with GW anomalies and the SGI (r = 0.64, r = 0.53, respectively; Section 4.2.1), which are important in the GROUNDWATER model, has a lower importance than the temperature predictors.These observations are in contrast with COMBINED, where the SGI has the highest importance by far.With a mean importance of 0.47, it is much larger than the other predictors, most of which lie at a maximum of ∼0.05.Only groundwater and temperature anomalies have an importance >∼0.05.

Partial Dependency
The potential significance of groundwater-based predictors to estimate the inundation state of Lange Lacke (Section 4.2.1)translates into a pronounced evolution of partial dependency against the SGI (Figure 8, steel-gray dashed) in the COMBINED model (also for Wörtenlacken 2).SPI 24 has a comparable impact on Unterer Stinkersee (light blue dash-dotted), although to a lesser extent.It is striking that the other predictors do not, or only slightly, affect the prediction skills in both cases.Statistically, when involving the entire population of 34 salt pans, the variability in the inundation states of nine salt pans can be explained mainly by meteorological predictors, whereas, in 22 cases, groundwaterbased features perform best (Table A1, Appendix A).In some cases, there is only little variability of the partial dependency when plotted against any of the predictors.This is also indicated by Table A1, though it is not further regarded here due to compactness.We find that, except for the E pot and SPI 6, every predictor exhibits a strong interaction with the dynamics of at least one salt pan (Table A1; Figure 8 in dotted red).Groundwater-based features exhibit more pronounced curves that span a wider partial dependency range compared to the other predictors.
In general, the partial dependencies agree with the underlying physical process.For instance, higher temperature anomalies (Kirchsee) or the number of days above 25 °C (Katschitzlacke) contribute to a prediction probability in favor of the 'desiccated' inundation state.This rule is not true for all salt pans, although this dependence commonly applies.The PDP reveals all important probability thresholds for inundation state prediction.For example, at SGI = 0, the class attribution probability switches from the 'desiccated' state to the 'inundated' state for Lange Lacke.At around 60 days above 25 °C in the previous 12 months, the estimates change from 'inundated' to 'desiccated' for Katschitzlacke.Similar inferences for nearly all salt pans can be made.These are summarized in Appendix A.

Assumptions
The modeling framework has been built on several assumptions.First, the targets (i.e., inundation dynamics for different salt pans) are correlated.Second, the accumulation periods (particularly, the dominant 12-month period) hold explanatory power in regard to the inundation state in summer.Third, the salt pans are in an environmental condition that is good enough to allow them to react to the natural drivers applied in this study.In other words, the salt pans need to be at least sufficiently well connected to the salt pan cycle to respond to the groundwater-based and meteorological predictors.Fourth, we assume that the climatology always leads to the prediction of a drying from spring toward summer.Hence, the models cannot predict the 'inundated' state based on dry conditions in spring.This is unless the salt pan-wise class distribution is skewed towards the 'inundated' state, causing the models to always predict the 'inundated' state (outcome TN).In other words, if the situation during the lead time deviates much from the climatology, the models will not capture many of the effects on the salt pan inundation state.This is a disadvantage in years when drivers strongly change and may lead to misclassifications.
In total, desiccation in spring (in April to June; here used as a proxy for dry conditions) in combination with the 'inundated' state in JASO occurred for 41 events (Figure 3a) and resulted in 20 TNs and 21 FPs for the GROUNDWATER model.The year 2008 accumulated a notable number of fourteen FP outcomes.This circumstance also reveals that not all annual desiccation events were captured.Taking into account the desiccation events from the beginning of April to the end of October would result in a class distribution of 63%/ 37%.

Predictors
The results of the EDA (only for Lange Lacke; Figure 4a), feature importance (Figure 7), and calculation of partial dependencies (Figure 8) support the assumption of a close connection between salt pans and groundwater [14,21].We suspect that the rather long time steps of the model and the respective long-term predictor setup support the forcing of the slow-reacting features evolving around groundwater as a key predictor.It is to be determined whether the high importance of groundwater is actually due to the contribution of groundwater to salt pan water status directly or more generally to water abundance, i.e., drought conditions, in the region.The outstandingly high feature importance of SGI is presumptively connected to its continuous nature, rather than relying on artificially thresholded integration periods [115].
Still, the METEOROLOGY model achieved similar scores compared to the GROUND-WATER model.Both were able to capture many of the interannual differences in the inundation state.We managed to find meteorological predictors that are of importance for spring salt pan water abundance, which is essential for the salt pan inundation state in JASO.The importance of meteorological predictors could stem from their temporal autocorrelation from one year to the next [170,171].For the meteorological predictors, a single (or more) not included month(s) from the previous year could make a change in spring water abundance.
We find that, other than the continuous SGI, time periods of 12 months or more work best for predicting salt pan inundation state.Such predictors exhibited large feature importance within their model setups.It is up to further research to determine whether the 12-monthly anomaly mean is the most appropriate integration period.This argument is particularly relevant as shifting climate patterns influence groundwater recharge.The SPI 6 and the GW level ratio relate to a similar time period (6 months).This period does not seem to be particularly relevant, as both predictors were comparatively insignificant in all three models (also when disregarding the SGI).Temperature-based predictors were most important in the METEOROLOGY model despite exhibiting low correlations with the SGI.
For some combinations of salt pans and predictors, the PDP (Figure 8) exhibited sigmoid curves with a wide spread.The PDPs for Lange Lacke and Unterer Stinkersee showed a clear progression against the SGI and SPI 24, respectively.The SPI 24 is closely related to groundwater drought as suggested in the literature [125] and by the correlation analysis (r SPI24,SGI = 0.53 and ρ SPI24,SGI = 0.55; Section 4.2.1).Therefore, our results can confirm the observation made by ref. [14] that both salt pans are closely connected to groundwater.This is even more true for Wörtenlacken 2, which is reported to have an atypically strong connection to groundwater, even greater than that of Lange Lacke [14].Similar inferences can be made for all other salt pans (Appendix A).Additionally, the probability thresholds for the SGI were similar in the case of Lange Lacke and Wörtenlacken 2 (Figure 8).However, such an analysis is prone to misinterpretations as partial dependency behavior can vary depending on the model setup and the underlying training data.For example, the hydrology of Katschitzlacke is reportedly similar to Lange Lacke [14], whereas our results indicate a closer connection to the predictor number of days above 25 °C.
A drawback associated with the input data is their low spatial resolution.The argument is particularly valid for P anomalies, since groundwater level, E pot anomalies, and T anomalies vary less in space and time [172,173].Here, future models could improve the (spatial) representation of precipitation.An understanding of the inundation state in JASO would require seasonal forecasts of hydrological and meteorological variables.Meteorological predictors that focus on depicting changing precipitation, evaporation, and temperature patterns in the region due to climate change should additionally prove beneficial [24].
Features evolving due to the human impact on the ecosystem, such as the (e.g., monthly) amount of groundwater extraction from wells and discharge into drainage canals, were not used, as, to our knowledge, no such information is available in the region.However, the use of this information could potentially enhance the knowledge to be gained from the models, especially if such information was available at the subregional scale or for each salt pan.

Target
Our results confirm that the EO-based inundation state is a useful target variable for ML-based modeling.Data from the Landsat mission has been shown to form a useful basis for quantifying interannual dynamics in surface water dynamics [39,54,174].Although in some years, the impact of cloud cover was high, the summer/fall inundation status could be retrieved for all salt pans over the entire study period except for the years 2002 and 2012.The variation in this target variable roughly showed similar dynamics to some of the variables considered in other studies on a larger area, e.g., SPEI3 [103] or long-term precipitation [102].The year 2015 represents an exception, as it is referred to as drought year in ref. [103] but appears rather wet in our analysis.This might be because of the rather wet conditions in fall-winter 2014, which is also visible in ref. [103].
Uncertainties in the salt pan time series are expected to be larger for smaller salt pans, which have a larger relative proportion of mixed pixels with bordering land (Section 4.1).However, this argument turns out to be secondary since the accuracy of the model target is dependent on the exact recognition of desiccation and not on the precise sensing of the true WE.Higher resolution remote sensing products, such as Sentinel-2 imagery, could reduce the error connected to spotting desiccation inside the 'last' pixels.Such data would need to be used in combination with, e.g., the Landsat archive, to build the models on extensive time series.In addition to using satellite data with higher resolutions, we propose the use of alternative target variables to avoid the skewed salt pan-wise (and year-wise) class imbalances.The time of the first desiccation (Figure 3) would constitute an interesting target variable [14,21].

Model Error
Although in this study we were able to predict the salt pan inundation state in Seewinkel with only moderate accuracy, the average performance of the three independent test sets indicates a gain of 0.24 compared to the RANDOM model.We regard the average score between the models of 0.6 as acceptable only insofar as the assumed reasons for the observed model error are numerous and, depending on the salt pan and year, heavyweighing.Therefore, the model error can be, approximately, explained.Increasing the model performance based on the issues described, in detail, below is largely limited by data uncertainty and data availability.The failure of the model to make correct predictions if the meteorological conditions deviate much from the climatology, the artificial inundation, and the uncertain hydrological condition, meaning surface water possibly infiltrating into deeper layers, explain the results and provide starting points for future improvements to the model.In general, we consider the model setup performant and stable.
Many years exhibit highly skewed class distributions, especially since 2016.This influences the metrics since different years are connected to varying degrees of difficulty in correct estimation.The total skill of the three models is very similar.The indirect setup of the model, which means the prediction of the inundation state in summer via the water balance at the end of March, can be considered a major contributor to this outcome.Salt pans with a more balanced class distribution are more challenging to correctly estimate for the three models (Table 5).On average, the GROUNDWATER model performed best in predicting these eight salt pans (Zicklacke, Katschitzlacke, Fuchslochlacke 3, Oberer Stinkersee, Mittlerer Stinkersee, Wörtenlacken 2, Neubruchlacke, and Lange Lacke), al-though interpreting these results proved difficult due to the widely varying hydrological conditions of the salt pans [14].Section 4.2.2 stresses the importance of the underlying physical conditions on the fold-wise performance.As already discussed in Section 5.1, moderate success mainly lies in the struggle to estimate extreme dry, and, especially, wet conditions in summer (e.g., drought around 1992, 2003, and 2016, and wet periods around especially 1996 (1997) and 2010).
As stated in Section 4.2.3, estimates were worse for years in which the inundation state shifted to the alternative state.The misclassifications are probably due to some salt pans reacting faster to hydrometeorological changes than others.Hence, for some salt pans, the environmental conditions of the previous months and year(s) have a stronger influence on the prediction of the current year than for others.Furthermore, the misclassifications may be partly due to the fact that the input features are coarse resolution (i.e., do not differ between Lacken) and partly to the model trying to get a best fit over all the years.
Although we did not apply feature selection [112] to reduce the number of features used in this study, we were able to inhibit overfitting in model testing.This was completed by trimming the decision trees used in the four RF models in the scope of the hyperparameter optimization.This built on our model design, which enables independent model testing and, practically, on closely monitoring training-test differences throughout this study.
In addition to changing climate patterns, a process referred to as "drying from beneath" [20] challenges the water-holding capacity of the salt pans.Depending on the ecological state of the salt pans, this mechanism can directly influence WE and, therefore, the inundation state.We suppose that the worse the ecological health of the salt pan, the higher the negative impact on model performance.However, it is not possible to characterize this ecological state based on our models and using the available input data.Due to the skewed class distribution, the assumption that our predictor selection works better for more natural/ecologically healthy salt pans could not be answered inside this model setup.The disregarded large human influence on the water cycle [21] constitutes an additional source of error.
All models are subject to a division between the periods before and after 2004 (Figure 5).This pattern cannot be found in the target variable (Figure 3).Additional research is needed to clearly connect climate change and the phenomenon of "dying salt pans" to these observations.As artificially inundated salt pans were introduced into the modeling, yearwise estimates could additionally have been affected due to misguided thresholding.

Model Transferability
The model based on meteorological predictors can be transferred to any other salt pan ecosystem worldwide in combination with the use of high-resolution remote sensing imagery, such as that provided by Landsat.In general, globally available predictor data in sufficient temporal and spatial resolution with respect to the studied ecosystem are needed, at best in combination with uncertainty quantification.This can be ensured by choosing an adequate spatial resolution of the predictors with regard to the catchment size.Here, ERA5-Land offers a good starting point with its 9 km × 9 km spatial resolution.The EO data should have a suitable temporal and spatial resolution to capture the dynamics of the studied ecosystem.For example, it is not possible to retrieve the water extent information of ecosystems of a smaller size than the Landsat resolution of 30 m × 30 m. Another important constraint is that this approach will likely not be suitable in the case of water bodies whose water extent shows a low sensitivity with respect to water volume, i.e., with steep bathymetry in which a drop in the water level will not lead to a proportional decrease in the water area.

Conclusions
As salt pans in Seewinkel are increasingly vulnerable ecosystems in often poor hydrological conditions, we aimed at improving ecosystem understanding and, finally, decisionmaking by predicting the salt pan inundation state in summer and fall with ML models.
Our models stress the importance of groundwater for the estimation of the inundation state in summer/fall.This solidifies the general notion represented in the literature [14,21] and calls for sustainable groundwater management in the region to ensure the conservation of this ecosystem.We stress that the use of the SGI [115] as a predictor is promising.The model based on meteorological predictors can be transferred to any other salt pan ecosystem worldwide in combination with the use of high-resolution remote sensing imagery, such as the Landsat archive.METEOROLOGY achieved an MCC of 0.66 compared to GROUNDWATER with 0.59 and COMBINED with 0.57, with respect to the independent test set.We identified the most likely sources of error, namely the struggle to estimate the inundation state correctly in the case of extreme environmental conditions developing after March, human intervention into the water cycle by artificially inundating the salt pans, and surface water loss due to the possible infiltration into deeper layers due to a failure of the water retention capacity [20].Furthermore, we highlight the potential of the concept of partial dependency [166] to understand threshold-dependent ecosystems, such as salt pans in the Seewinkel region.
To our knowledge, the results represent the first data-driven prediction and understanding of salt pan dynamics in the Seewinkel region.We identified the main drivers and potential improvements for future model development.In this context, the use of more advanced ML algorithms could prove beneficial.
Furthermore, the possibility of transferring the METEOROLOGY model to other salt pan ecosystems in combination with EO data makes this study particularly valuable.We propose the application of our models to salt pans of larger sizes and ones that are less influenced by humans and in a better ecological condition.This could improve both performance and interpretability.
The possibility of predicting the salt pan inundation state in summer/fall is of potential importance to decision-makers in conservation and tourism [14,17].A better understanding of salt pans can contribute to preserving this unique geographic space in the Pannonian Basin.Data Availability Statement: Data and code will be made available on request.
Appendix A Table A1.Most important predictors for each salt pan according to PDP displayed together with the largest spread (in parentheses).Furthermore, the threshold for predicting a certain class for the most important predictor is indicated.

Figure 1 .
Figure 1.Location of Seewinkel in Eastern Austria.Outlines of salt pan basins were provided by Lake Neusiedl-Seewinkel National Park administration [51].Additionally, the groundwater stations used in the study are marked.The salt pans are colored based on the water occurrence product of the Global Surface Water (GSW) data set [39].The basemap stems from basemap.at (web address: https://basemap.at/(accessed on 14 August 2023)).The coordinate reference system is the MGI/Austria GK M34 (EPSG:31259; web address: https://epsg.io/31259(accessed on 14 August 2023)).The map inset shows the location of the study area (marked with an X) within Austria.

Figure 2 .
Figure 2. Model splits separated into training sets (blue), validation sets (green), and test sets (pink).The years that correspond to each fold are indicated.Additionally, the cross-validation (CV) schemes are marked on the left side, and the arrows represent the introduction of the seven independent test folds to the leave-one-out cross-validation (LOOCV) scheme.

Figure 3 .
Figure 3. (a) Binary classification into 'desiccated' state (light grey) and 'inundated' state (dark blue) for each salt pan and for each year.The years 2002 and 2012 are missing due to data gaps.(b) First month of each year in which the salt pans desiccate.Only months between April and October are shown.

Figure 4 .
Figure 4. Histograms and scatterplots for Lange Lacke (a) and Unterer Stinkersee (b) for all nine predictors with coloring in the respective classes.Correlation coefficients (Pearson's r and Spearman's ρ) are additionally displayed on blue background.

Figure 5 .
Figure 5. Fold-wise average Matthews correlation coefficient (MCC) over 30 model run for every model (GW-GROUNDWATER, METEO-METEOROLOGY, COM-COMBINED) and split.The averaged confusion matrices over 30 models run for all 37 folds are additionally displayed for all three models.The folds from independent test set are marked with (T).

Figure 6 .
Figure 6.Confusion matrix outcomes for all years (part of this study) from 1984 to 2022 for the three main models based on the results for the dependent and independent test sets.For each salt pan (3 rows), the top-most row represents the GROUNDWATER model, the middle row represents the METEOROLOGY model, and the last row represents the COMBINED model.The underlying confusion matrix is rolled out for all 34 salt pans as a function of time.Again, the folds from the independent test set are marked with (T).

Figure 7 .
Figure 7. Feature importance calculated as average across all folds and 30 model runs for the three main models.

Figure 8 .
Figure 8.The Partial Dependency Plots (PDP) are displayed for each predictor inside the COMBINED model for three selected salt pans: Lange Lacke, Unterer Stinkersee, and, additionally, a salt pan with a pronounced partial dependency dynamic against the respective predictor.For these salt pans, the individual conditional expectation (ICE) plots are also included in red with reduced line width.

Table 1 .
The nine predictors used for modeling inundation state are divided into meteorology and hydrology.Furthermore, the relation to the salt pan cycle is explained, and additional information is provided.

Table 2 .
[152]parameters chosen for the three model setups: GROUNDWATER, METEOROLOGY, and COMBINED.The numbers in parentheses relate to the average test set and training set performance given together with the standard deviation (SD; again, in parentheses).Furthermore, the entire selection of tested parameters, as well as the default parameters as proposed by scikit-learn[152], are indicated.

Table 3 .
True positive ratio (TPR), overall accuracy (OA), and kappa values derived from confusion matrices between Landsat-based water maps and reference data based on Sentinel-2.