Default Versus Configured-Geostatistical Modeling of Suspended Particulate Matter in Potter Cove, West Antarctic Peninsula

The glacier retreat observed during the last decades at Potter Cove (PC) causes an increasing amount of suspended particulate matter (SPM) in the water column, which has a high impact on sessile filter feeder’ species at PC located at the West Antarctic Peninsula. SPM presents a highly-fluctuating dynamic pattern on a daily, monthly, seasonal, and interannual basis. Geostatistical interpolation techniques are widely used by default to generate reliable spatial information and thereby to improve the ecological understanding of environmental variables, which is often fundamental for guiding decision-makers and scientists. In this study, we compared the results of default and configured settings of three geostatistical algorithms (Simple Kriging, Ordinary Kriging, and Empirical Bayesian) and developed a performance index. In order to interpolate SPM data from the summer season 2010/2011 at PC, the best performance was obtained with Empirical Bayesian Kriging (standard mean = −0.001 and root mean square standardized = 0.995). It showed an excellent performance (performance index = 0.004), improving both evaluation parameters when radio and neighborhood were configured. About 69% of the models showed improved standard means when configured compared to the default settings following a here proposed guideline.


Introduction
Mean surface air temperature records show rapid warming, although regionally variable, by approximately +0.5 • C per decade at the Western Antarctic Peninsula (WAP) between the 1950s and the early 2000s [1,2]. The temperature increase has contributed to the shortening of the winter sea ice season [3], the disintegration of major ice shelves [4], and accelerated the retreat of 90% of tidewater glaciers on the Peninsula and the adjacent archipelagos [5,6]. Some of the strongest effects of this warming are manifested in Western Antarctic coastal systems, including bays and fjords with melting glaciers. The regional and episodic glacier melt events, among others, driven by advection of lower latitude warm air masses, amplify the environmental and ecological variability of coastal systems through discharge waves of meltwater and calving ice growlers [6][7][8]. A widely observed consequence of glacial melt and erosion of sub-and proglacial regions are the surface plumes of suspended particulate matter (SPM) dispersing in the coastal and shelf systems [9][10][11][12]. All of these coastal processes shape fjordic ecosystems at the level of community composition [13][14][15][16][17][18][19][20][21] and species phenology down to the level of gene transcription [22][23][24].
A quantitative understanding of climate change effects in WAP nearshore and coastal marine communities must be gained based on glaciological processes knowledge, on the one hand, and spatial knowledge of melt effects in well-studied showcase regions, on the other hand. If both sources of knowledge are available, geostatistical modeling can be applied to quantify and spatially predict future changes of the current SPM-related processes of coastal climate change at the WAP.
Geostatistical interpolation techniques are often applied in environmental analyses on, e.g., atmospheric [32][33][34], terrestrial [35][36][37], freshwater [38], and marine ecosystems [39][40][41], to generate a spatial surface prediction of a target variable from point measurements [42]. In this context, geostatistical interpolation algorithms, such as Kriging, are probabilistic methods relying on variogram models that account for the spatial structure of measured values at given locations and their overall spatial arrangement, as well as the prediction location [43]. Semivariances for pairs of measurement stations for chosen distance intervals can be quantified through variogram analysis, allowing to model possible spatial autocorrelation patterns. Corresponding models then enable Kriging predictions at unmeasured locations in terms of moving weighted averages of surrounding sampled data points to assure a minimal interpolation error [44]. The application of geostatistical models requires a robust sampling design, which anticipates certain data coverage and distribution for a statistically valid assessment of the spatial process [38,45]. Consequently, geostatistical models are not only site and data-dependent but also rely on all configured parameters that define the model strategy, such as the distances between the measured points and the number of "neighbors". Nevertheless, besides a minimized interpolation error, a final model should also take into account expert knowledge about the natural dynamics of the target variable.
The provision of a full parameter list and configuration that defines the model strategy allows a reproducible interpolation of a target variable. Few studies related to precipitation and soil prediction inform about the model strategy, focusing on certain parameters configuration, such as algorithm used, amount of samples, and surrounding search with a detailed description on the data [34,[36][37][38]40]. Since software packages allow simple and straightforward analyses of randomly collected data sets without previous testing of quality standards and with default modeling configuration, more often, thematic maps get published without quality check and error estimation.
In Potter Cove (PC), a small fjord at the Antarctic Peninsula, glacial sediment discharge is a well-known and intensely investigated seasonal phenomenon. At three stations, SPM sampling in the water column has been conducted for 20 years, ranging from 1992-2012 as a part of the environmental long-term series for horizontal and vertical dynamics at Carlini research station [26]. More than 30 spatially distributed short cores and water column samples were taken during two summer seasons (2010/2012), yielding dense horizontal and latitudinal SPM distribution snapshots under different melt and runoff conditions at two vertical levels of the water column [46]. Besides, measurements of SPM in the water at four stations and depths have been sampled for ecological research during the summer season [27]. Just like the release of meteoric water studied by Meredith et al. [8], the concentration of SPM in any sampling point of the surface prediction depends on the meteorological conditions, mainly driven by air temperature and wind, which could also define the residual time of the particles in the column water or the resuspension of SPM from the sea-bed [46,47]. Therefore, SPM is highly variable in space and time, and its interpolation remains challenging. Even if a four-dimensional data set is available by combining different sources, the bias interpolation could increase if such variations in sources, space, and time are belittled. Further, the SPM variable with its morphology and structure characteristics in the water column is eligible as it is continuously distributed and dispersed depending on environmental conditions as long it is not captured in a water mass with a distinct density from another. In a shallow coastal system such as at PC with meltwater discharges to the upper layers, the stratification remains unstable up to depth close to the sea-bed with an occasional pycnocline in the near-surface layer at~5 m [8,26,48]. For these reasons, a partition of the input data is necessary to reduce the variation in space and time to increase the resolution for the interpolation of SPM data. This interpolation represents a snapshot of one day only under certain meteorological conditions with maximum sample sites spatially distributed within the study area.
Here, we aimed to evaluate systematically the performance of differently configured geostatistical models using the same SPM input data to generate the optimized picture of SPM distribution in Potter Cove as a case study and to assess the improvement through model configuration according to three indices developed.

Study Area
The 8.5 km 2 wide PC is a coastal fjord that disembogues into Maxwell Bay at the south-western 25 de Mayo/King George Island, the biggest island of the South Shetland archipelago (58 • 35.0 to 58 • 41.0 W and 62 • 13.9 to 62 • 15.7 S). The Fourcade Glacier surrounds the fjord in the North and the East. Like neighboring ice sheets off King George Island, a fast retreat of Fourcade Glacier has been observed by comparing satellite records going back into the 1950s [6,49,50]. Wölfl et al. [51] described the glacial retreats and advances through bedforms at the bottom of the fjord, identified as moraine formations. Two of them (M1 and M4) divide the fjord into three main sections: the inner (~1.07 km 2 ; <50 m depth), middle (~1.65 km 2 ; >50 m depth), outer cove (~2.5 km 2 ; <240 m depth). They influence the bottom water circulation, resulting in higher wave energy impacts in the inner than in the outer cove [52]. Today, the glacier is almost on land, and its retreat induces fine sediment discharge directly from beneath the glacier or through four main proglacial melt water streams (MWS) (MWS 1, 2, 4, 5; Figure 3) of between 23 × 10 3 t yr-1 and 39 × 10 3 t yr −1 into the fjord [46].

Workflow Geostatistical Modeling
Geostatistical modeling was applied using ArcGIS, Geostatistical Analyst extension (10.4.1, Geostatistical Wizard tool code, ESRI Inc., Redlands, CA, USA [53]), and RStudio (3.1.3) following a six steps workflow (Figures 1 and 2). The data compilation, georeference, and exploration (step 1) are most laborious and time-consuming and required for further data pre-processing (step 2). After these two steps, the model can be built by applying interpolation techniques with a low computational cost. The configuration of the parameters included in the model is the essential step since it defines the modeling strategy (step 3) and, consequently, the prediction of values at unsampled locations (step 4) and the outcome: a surface map based on statistically and spatially reasonable results for predictions and interpolation error (step 5) that displays the distribution pattern of the phenomenon (step 6). Expert knowledge on the processes involved for the target variable to model as well as statistically significant evaluation values, provide the continuity on the workflow to the use of the final output or a re-start of the workflow.  Detailed workflow for geostatistical studies and building a spatial model from step 1 to 6. The arrows in color show the workflow of the different processes (in boxes) to apply an accurate geostatistical model. In the circles are the outputs to be spatially georeferenced into a shapefile. The black arrows that come out of a process box indicate a result, while when they point to the process, it means input for the application.

Data Exploration and Pre-Processing
For this study, a PC-SPM georeferenced data set of 20 years (1992-2012), sampled at least once during the summer season, was compiled [26,[54][55][56][57], reinforced with daily meteorological data on wind, air temperature, precipitation, humidity (Servicio Meteorológico Nacional Argentino and Schloss et al., 2012) (n = 1352, Table 1), and published through the data archive PANGAEA [58]. Data from Carlini long-term ecological research since 1996/1997 were georeferenced by overlapping a Figure 1. Six steps workflow (modified from ESRI ArcGIS Pro) to build and to evaluate geostatistical models for spatial studies using ArcGIS 10.4.1 (ESRI, Inc.). The different steps involve the data examination, the parameter definition to structure the model strategy, the prediction, and the evaluation, resulting in an optimized model to explain the phenomena. If the resulted model is a non-precise, the process should be iterated at data examination and selection (step 1) or by restructuring the model (steps 2 and 3). Detailed configurations for each step are given in Figure 2.  The arrows in color show the workflow of the different processes (in boxes) to apply an accurate geostatistical model. In the circles are the outputs to be spatially georeferenced into a shapefile. The black arrows that come out of a process box indicate a result, while when they point to the process, it means input for the application.

Data Exploration and Pre-Processing
For this study, a PC-SPM georeferenced data set of 20 years (1992-2012), sampled at least once during the summer season, was compiled [26,[54][55][56][57], reinforced with daily meteorological data on wind, air temperature, precipitation, humidity (Servicio Meteorológico Nacional Argentino and Schloss et al., 2012) (n = 1352, Table 1), and published through the data archive PANGAEA [58]. Data from Carlini long-term ecological research since 1996/1997 were georeferenced by overlapping a Detailed workflow for geostatistical studies and building a spatial model from step 1 to 6. The arrows in color show the workflow of the different processes (in boxes) to apply an accurate geostatistical model. In the circles are the outputs to be spatially georeferenced into a shapefile. The black arrows that come out of a process box indicate a result, while when they point to the process, it means input for the application.

Data Exploration and Pre-Processing
For this study, a PC-SPM georeferenced data set of 20 years (1992-2012), sampled at least once during the summer season, was compiled [26,[54][55][56][57], reinforced with daily meteorological data on wind, air temperature, precipitation, humidity (Servicio Meteorológico Nacional Argentino and Schloss et al., 2012) (n = 1352, Table 1), and published through the data archive PANGAEA [58]. Data from Carlini long-term ecological research since 1996/1997 were georeferenced by overlapping a high-resolution satellite image and in situ spatial references using the 'Fit to display' ArcGIS tool and individually uploaded to the data archive [59]. Table 1. Suspended particulate matter (SPM) data compiled and published by several authors [26,27,[54][55][56][57] among 20 years of research in Potter Cove (PC), Antarctic (joint in Neder et al. [58]). Data increasing time resolution reduced by semivariogram cloud and normality analysis to apply the geostatistical model with two approaches (in grey): with meltwater streams discharges with SPM inputs -MWS-and without.  Figure 3 shows the spatial distribution of the compiled PC-SPM data set. The following two-decades PC-SPM statistical data exploration in space and time (step 1) leads to a reduced but homogeneous data set (Table 1, grey-shaded), which is required for an optimized geostatistical 2D-modeling of SPM in PC (step 6).  Normality analysis applied to the PC-SPM data set consisted of scatter plot, skewness as a measure of data set symmetry, kurtosis for tail-heaviness analysis, and Normal Q-Q plots comparing quartile per quartile the data distribution to a standard normal distribution. Spatial autocorrelation

Normality and Autocorrelation Analysis for a Valid Data Set to Model
Normality analysis applied to the PC-SPM data set consisted of scatter plot, skewness as a measure of data set symmetry, kurtosis for tail-heaviness analysis, and Normal Q-Q plots comparing quartile per quartile the data distribution to a standard normal distribution. Spatial autocorrelation was tested with semivariogram for data interpolation. Trend analysis was carried out by the Wizard-ArcGIS tool to check the possible spatial trends for varying expected values of the random variable.
Data distribution and autocorrelation were checked to define the data set to model. The four-dimensional PC-SPM data set spanning two decades (summer 1992-2012, n = 1352) showed the highest variation with an asymmetric distribution with outliers excluded by ± 3 interquartile and without ( Figure 3A, left and right respectively) and according to the semivariogram, with no autocorrelation (Figures S1 and S2). This reflects a strong heterogeneity of lithogenic particle concentration in water and meltwater distribution, which increases the bias and supports misclassification of a mean SPM surface distribution pattern. Hence, the data set compiled during two decades was subdivided according to the following criteria: (i) time window (sampling from one day only) and (ii) sufficient spatial resolution from three to two dimensions (longitudinal and latitudinal) in a consistent depth layer of 0-5 m ( Table 1). The statistical summary (Table 1) highlights that as skewness and kurtosis values improve among the data sets, the smaller the time window and the denser the spatial coverage of the data at the same time. The one-summer surficial SPM data set (December 2010-February 2011) showed the highest standard deviation of the SPM concentration mean (Table 1, 46.13 ± 188.27 mg/L) resulting from an extreme year of two maximum events of meltwater discharge and, therefore, could cause a high SPM variability [46]. These data display a non-normal and asymmetric distribution, without autocorrelation based on semivariogram analysis.
The data exploration of the highest temporally resolution, one day with mean summer meteorological conditions (n = 28, 09/02/2011, [46]), identified the most homogenous data set, spatially distributed across the area ( Figure S3), shallow depths of 0-5 m, normally distributed with a more reliable variance, based on Robinson and Metternicht [60], due to its low negative skewness and an improved autocorrelation compared to the other data sets explored. Consequently, this data set was determined as the most suitable, meeting the requirements for geostatistical modeling (data set A).
In a second modeling approach, SPM stations of four meltwater streams (MWS 1,2,4,5) sampled three days before (06/02/2011) were added to the one-day data set (data set B), which indeed reduces the time resolution and increases the asymmetry of data distribution but represents a more realistic situation of the phenomena expected in PC during a summer. With that, we tested the influence of local MWS lithological input on SPM distribution in surface waters. The sample points and its spatial distribution of the data used for modeling are shown in Figure 3, where stations in pink 'X' were excluded from geostatistical modeling. SPM concentration goes from 0-31.5 mg/L, on a colored scale per quartile from blue to orange, where the extreme values in red belong to the MWS stations.

Spatial Variability
Voronoi analysis interpolates SPM concentrations by Thiessen polygons corresponding to the Delaunay triangulation for the sampled points in the center and a geometrical distance to the neighbors [61]. The SPM value assigned to a polygon with simple Voronoi is the value recorded at the sample point within that polygon, whereas the one with mean Voronoi represents the mean value of samples within a given polygon and its neighbors [62]. Neighborhood analysis consists of the Euclidean distances' analysis between the samples closest and furthest away, and the spatial natural neighborhood based on the weighted overlap of Voronoi polygons and a new polygon around an interpolation point [53]. This defines the parameter neighborhood's radius by the number of neighbors included in the modeling step (step 3). We tested the spatial variability of the SPM concentrations with Voronoi and neighborhood analysis.

Modeling Strategy
Simple Kriging (SK), Ordinary Kriging (OK), and Empirical Bayesian Kriging (EBK) were the three different interpolation algorithms applied in the default version suggested by ESRI and configured to both one-day SPM data sets (data set A and B, Section 2.3, Table 1, grey-shaded). A total of 22 geostatistical models was calculated (step 4) and visualized as SPM concentration (mg/l) surface prediction with associated predicted standard error maps (PredSE). For the PredSE map, natural breaks classification was chosen. When configured, semivariogram was optimized for SK and OK with nugget and partition sill calculation and without anisotropy, and for EBK with 100 simulations, one overlap and power type except when transforming the data (Figures S4 and S5). The sampling neighborhood influences the accuracy of interpolations and is optimized for each case by configuring the following neighborhood parameters: neighborhood radius, the ratio of the maximum and the minimum number of neighbors, sector type based on trend analysis and spatial variability, smooth factor, and data transformation factor (see Supplementary Materials section II).

Model Evaluation and Performances Indices
The resulting changes in different modeling strategies were evaluated by comparing statistical errors. As an outcome of step 4, an uncertainty analysis (cross-validation) of the predicted values reveals the degree of precision by providing the misclassification difference between measured and predicted value, mean squared error (MSE), root mean square error (RMS), mean standardized (MS), root mean square standardized error (RMSS), and average standard error (ASE). The model evaluation (step 5) considered MS ≈ 0, RMSS ≈ 1, and the smallest ASE, as well as a reasonably interpolated surface that matches the general expert knowledge about the SPM distribution (Figure 5a from Monien et al. [46], Figure 2d -band 4 of satellite image-from Jerosch et al. [63]). How to assess the performance of a model and the improvement after tuning the model configuration (e.g., radius, number of neighbors, smoothing factor, etc.) is not defined in the bibliography nor internet forums. In this study, we developed three indices for model evaluation (step 5) to define the best algorithm and configuration to represent the SPM plume extension on 9 February 2011 on a solid and traceable basis (step 6).
Three indices were developed to evaluate performance differences regarding the modeling strategy applied. The MS (1) and RMSS (2) improvement indices were based, respectively, on the MS or RMSS change rate percentage between a configured geostatistical model by default (GM d ) and the tuned model (GM m ) and by the distance to the optimal value (zero for MS and one for RMSS). Each index presumes an improvement of the default model in the configured model. A weighted performance index was applied for each model as RMSS is assumed to be the most restrictive parameter (RMSS factor 0.7; MS, factor 0.3).
For Equations (1) and (2), ideal values are close to +100%, with a positive scale representing an improvement of the model, whereas negative values imply diminishment of the model confidentiality and, hence, its performance due to a distancing of the statistical parameter from the optimal value. For Equation (3), ideal values are close to zero.

Pre-Process: Voronoi and Neighborhood Analysis
One-day SPM data without (data set A) and with meltwater streams (MWS) input (data set B) differ spatially concerning the interpolated pattern of SPM concentration. In both cases, Voronoi ( Figure 4) and natural neighbor ( Figure 5) analyses indicate higher values in the inner cove and near the southern Potter Peninsula coast as well as low values in the outer cove close to Maxwell Bay. The simple Voronoi SPM distribution based on the measured values recorded at the sampled location reveals high spatial variability of SPM concentrations with three areas with high values or outliers ( Figure 4A1,B1). Remarkably, high values or outliers (5.2-6.6 mg/L SPM) identified by the red polygon class surrounded by low values (0-2.71 mg/L SPM) appear in data without meltwater discharge points included near MWS-5 and in the northern regions of the inner cove ( Figure 4A1). On the contrary, the simple Voronoi with local MWS discharge input identifies high values near the area of all MWS except the MWS-5 ( Figure 4B1). The mean Voronoi for data set B produces a smooth gradient of SPM concentrations across the entire cove from the outer western edge toward the inner south-eastern coastline ( Figure 4B2).  Depending on the data used with or without MWS input, the neighborhood analysis reveals different mathematical distances of closer and further away neighbors to additionally delimit the model neighborhood's radius through applying the sum or the mean of such distances (Table 2). Furthermore, the natural neighborhood analysis identifies spatial differences in the SPM concentration ( Figure 5), similar to mean Voronoi ( Figure 4A2,B2), with higher values at the eastern coastline of the inner cove close to the glacier front line.

Geostatistical Modeling Differences
Eleven geostatistical interpolation models are applied to each one-day SPM data set without (data set A) and with meltwater streams input (data set B) with default ( Figure 6) and configured settings (Figure 7). Figure 8 summarizes the parameter configuration for each of these models. The SPM concentrations result in similar visual patterns in all models, varying mainly for the type of input data set. High variability of the SPM concentration prevails in the area of the inner cove close to the two meltwater streams (MWS-1 and MWS-2), entering SPM directly from the melting glacier and carrying abundant eroded SPM. In general, Empirical Bayesian Kriging generates the lowest standard deviation errors compared to the other geostatistical methods. Default models ( Figure 6) produce distinct SPM plume extensions in comparison to the smooth gradients given in the configured models (Figure 7). Smoothing effects can be achieved by transforming the input data (compare Figure 6, EBK0 and Figure 7II, EBK5) and by the configuration of a smooth parameter ( Figure 7II, EBK6). On the contrary, when the neighborhood search is a 'full sector' and a minimal number of neighbors as the mean of the closest Euclidean distances among pairs, it results in a polygon output similar to Voronoi polygons ( Figure 7I, EBK4). Depending on the data used with or without MWS input, the neighborhood analysis reveals different mathematical distances of closer and further away neighbors to additionally delimit the model neighborhood's radius through applying the sum or the mean of such distances (Table 2). Furthermore, the natural neighborhood analysis identifies spatial differences in the SPM concentration ( Figure 5), similar to mean Voronoi ( Figure 4A2,B2), with higher values at the eastern coastline of the inner cove close to the glacier front line.

Geostatistical Modeling Differences
Eleven geostatistical interpolation models are applied to each one-day SPM data set without (data set A) and with meltwater streams input (data set B) with default ( Figure 6) and configured settings (Figure 7). Figure 8 summarizes the parameter configuration for each of these models. The SPM concentrations result in similar visual patterns in all models, varying mainly for the type of input data set. High variability of the SPM concentration prevails in the area of the inner cove close to the two meltwater streams (MWS-1 and MWS-2), entering SPM directly from the melting glacier and carrying abundant eroded SPM. In general, Empirical Bayesian Kriging generates the lowest standard deviation errors compared to the other geostatistical methods. Default models ( Figure 6) produce distinct SPM plume extensions in comparison to the smooth gradients given in the configured models (Figure 7). Smoothing effects can be achieved by transforming the input data (compare Figure 6, EBK0 and Figure 7II, EBK5) and by the configuration of a smooth parameter ( Figure 7II, EBK6). On the contrary, when the neighborhood search is a 'full sector' and a minimal number of neighbors as the mean of the closest Euclidean distances among pairs, it results in a polygon output similar to Voronoi polygons ( Figure 7I, EBK4).    Figure 7I, was not performed because the trend removal with optimization was not applicable due to a required estimation of the tendency at each location instead of a general one.  Figure 7I, was not performed because the trend removal with optimization was not applicable due to a required estimation of the tendency at each location instead of a general one.   Table S1.

Evaluation and Performance of the Model
For each of the geostatistical methods applied, the predicted standard error maps project higher errors for default models ( Figure 6) than for the configured models (Figure 7). Considering all models, the Empirical Bayesian Kriging algorithm shows a lower predicted standard error compared to Simple or Ordinary Kriging. For the configured models (Figure 7), the predicted standard errors are lower for one-day SPM data with MWS input than without MWS discharge included.
The deviation error between predicted and observed SPM concentrations at each station among the different geostatistical models increases from stations in the west toward the eastern stations ( Figure 9, Figure S6). The data set with MWS discharge input deviates stronger (up to +12 mg/L or −30 mg/L) at stations near coastal meltwater run-off ( Figure 9B) compared to ±4 mg/L for the data set without MWS ( Figure 9A). This results in a higher over-or underestimation of the predicted values with different deviations within the stations ( Figure S7). Models of data with MWS input, over-predicted SPM concentration in stations WC-16 and WC-02, proximal to the meltwater streams. Contrarily, MWS-5 and MWS-4 concentrations are rather under-predicted due to their spatial proximity to open water stations with low SPM concentrations.
A statistical error analysis is used for the assessment of improved/deteriorated model performance ( Figure 8). Ordinated by mean standardized (MS)~0 and root mean square standardized error (RMSS)~1, the default model OK0 and the configured model EBK8 perform best, while SK2 performs worst (Table S2). Regarding improvements indices Equations (1) and (2), about 70%, i.e., 11 out of 16 geostatistical models, have achieved an improved MS, whereas only~31%, i.e., 5 out of 16, are improved in RMSS. EBK8 is the only configured model with improvement in MS and RMSS for both SPM data sets with and without MWS input. A particularly high improvement in MS (97.11%) and RMSS (74.92%) is for the EBK8B model of one-day SPM data with MWS sediment discharges (Table S1).
According to the performance index, Simple Kriging achieves the weakest improvements. The performances between the two approaches for one-day SPM data with and without MWS are more similar among Empirical Bayesian Kriging than for Simple or Ordinary Kriging. Particularly, the combination of the configured neighborhood with a full sector radius as maximized distances between neighbors in the EBK8 model performs better in both SPM data sets with and without data transformation. The best geostatistical models are interpolated with one-day SPM data with MWS input (data B). Ordinary Kriging by default performs best (OK0, Figure 6), with a performance index of 0.001, and second, the Empirical Bayesian Kriging configured for a maximum and a minimum number of neighbors with the mean of the Euclidian distances of furthest away neighbor as maximum, and the half of it as the minimum, (EBK8, Figure 7II) with a performance index of 0.004.

Evaluation of Geostatistical Modeling Approaches of SPM Run-Off Plume in Potter Cove
We have tested and evaluated different strategies to identify the spatial extension of the SPM plume caused by glacial melting by applying geostatistical modeling in PC as a case study. With this geostatistical application to SPM concentration data in PC to achieve its spatial distribution of a normal summer day, we provide the baseline of the natural variability of this meteorological dependent variable by a homogeneous daily snapshot. The analysis of 20 years of PC-SPM concentration data has revealed a high data variability in space (latitudinal, longitudinal, and depth) and time. Despite a relatively high total number of data collection within two decades (n = 1352), this has led to a reduced data set with a temporal snapshot of one day for spatial modeling with a correct autocorrelation assumption. To test the improvement of model performances after tuning the model configuration, we have developed an evaluation strategy and applied three statistical performance indices. The estimation of the sediment plume extent represents a summer scenario of strong glacial melting. We present a detailed workflow to assess an improved performance when configuring geostatistical models.
We have shown a successful application of geostatistical modeling to the highly fluctuating SPM variable, measured and scattered inconsistently, albeit required an immense reduction of the input data to a minor time window to reduce the sampling bias to the minimum. Despite a comprehensive, spatially distributed data collection, covering three levels of the water column (0-5 m, 15 m, >15 m), the data analysis has revealed the difficulty to merge the data sets from different researches due to high variability and sampling heterogeneity caused by the different sampling years, stations, and the number of repetitions per stations. However, when the foci are on a surficial prediction, we can combine homogeneous temporal data from long-term studies and spatial snapshot data from different research activities, e.g., [26,46,55], to compensate for some sampling deficits and further identify spatial-temporal trends to make future projections.
The semivariogram analysis is crucial to presume the model performance to reduce systematic statistical errors. Comparing the spatial variability of the different time windows helps to understand the need for increasing the time resolution of SPM sampling to a snapshot of one-day. Consequently, the two-dimensional reduced but homogeneous data set is reliable for spatial interpolation, while the two decades PC-SPM data set of joint researches data set is not ( Table 2). A supplementary analysis of the weather conditions of up to five days before the measurements has shown intensified discharge of eroded sediments and can explain delayed sediment plume after days of high temperature and intensive melting. A theoretical model considering the dynamics and variation of SPM in time is the following: where each term means: the time variation (1 • ), the vertical advection (2 • ), the horizontal advection (3 • ), the vertical diffusion (4 • ) equal to the difference of SPM regarding a variety of wind (αSPM) and temperature (βSPM). The inclusion of local SPM sources from MWS discharge allows for a better understanding of the transition zone between the cryosphere and the coastal waters. High SPM concentrations are projected in the inner cove and low in the outer cove by the most suitable models. Inclusion of the MWS discharge points has highlighted misclassification of predicted SPM concentration in the peripheral areas (e.g., WC-16, WC-2 stations), which has created a major bias for the surface prediction of the general meltwater plume extension and concerning SPM concentrations in the central plume areas. We propose the comparison of SPM concentration and SPM plume extension with and without the inclusion of MWS discharge data to represent different aerial warming states or phases and, consequently, melt scenarios. While the first scenario (without MWS data) represents an early warming phase after a cold period when MWS is still frozen, the second scenario applies more to the 2nd and 3rd day of a warm air inflow when the streams have thawed, and the maximum discharge has been reached (see also Meredith et al. [8]). Thus, both scenarios are valid for short periods of only a few days of melt event and could exemplify a mean stage of the SPM distribution in the specific summer from December 2010 until February 2011 since the data represent the weather conditions on mean summer melt days (2.3 • C mean air temperature, Servicio Meteorológico Nacional Argentino). However, it cannot generalize the SPM phenomenon and its dynamic. For a general scenario (e.g., seasonally or monthly mean) on SMP distribution in PC, a significant number of homogeneous snapshot data sets like those we processed in this study would be needed to calculate a mean or a median SPM distribution map from those.

Geostatistical Modeling Strategies
The outcomes of this study recommend the configuration of models (Figure 7). Default Simple Kriging and Empirical Bayesian Kriging methods for SPM data without MWS input have shown a smooth and similar SPM interpolation pattern with a concentration between~3 and 4 mg/L ( Figure 6). Ordinary Kriging by default differs in the area between the moraines (M1 and M4) where SPM concentration values increase to~5-8 mg/L. Instead, OK interpolation is more comparable to the Voronoi map ( Figure 4A) and natural neighbor ( Figure 5A), showing the spatial variability and a general distribution trend in a robust way. These projections reveal a higher SPM concentration near the MWS-5 at the middle cove, even if the meltwater stream is not considered. In this area, additional variables, such as currents and sediment transport, would improve the prediction of the SPM concentration. In this transition area between meltwater habitat and fjord habitat [31], it is harder to predict SPM concentrations in comparison to the inner cove close to the glacier front and the meltwater stream discharge (consistent high SPM values) or to the outer cove in the open sea to Maxwell Bay (consistent low SPM values). This coincides with a new approach on a hydrographical model in PC, showing a vortex water circulation point in this area, retaining the SPM to be exported outside the cove [47].
For all tested parameters to configure the model in this study, the neighborhood is the most important as in other studies (e.g., [36,38,60]). The sampling neighborhood can explain the general trend induced due to different sample densities among the study area by defining the maximum search area and the number of neighbors included. Here, we recommend following the Euclidean distances' analysis and calculation of MFAN and the half of it (Table 2).

Limits on Methodology Evaluation
The creation of indices to assess the performance of geostatistical models and their improvement on MS or RMSS when configuring the models provide a methodology to compare different model performances. In this study, the best model with a performance index close to zero ( Figure 8) is OK0 by default (0.001), followed by EBK8 (0.004) with MWS inputs, and in the third position by EBK8 (0.005) without MWS. Even if the EBK8 model is placed second and third, we consider it the best one to explain the SPM distribution in PC because different to OK0, (i) it performs well independently of the data set used: approaches with and without MWS, (ii) after the configuration, it improves in both MS ≈ 0 and RMSS ≈ 1,~97% for MS and~75% for RMSS ( Figure 8) with an improved variogram ( Figure S5.II), (iii) it projects coincident results with a robust interpolation as the natural neighborhood map and with the SPM plume expected from satellite image from Jerosch et. al. [63] or modeled by Monien et al. [46], and (iv) coincident with Monien et al. [46], it reflects that the SPM concentration is influenced by the distance between the four meltwater streams analyzed and the glacier front. Hence, it shows an SPM concentration gradient increasing from south to north of the cove and from east to west from the further away stream MWS-5, whose origins of lithogenic particles are not from the Fourcade Glacier to the direct glacier run-off stream MWS-1.
This selected model (EBK8) with MWS inputs show a clear gradient of SPM distribution from high to low from the inner east coast to the outer west coast. An approximate area of 2 km 2 , between 1.5 and 1.7 km of distance to the Fourcade Glacier front, has mid-high values (4-35 mg/L) with high concentrations (7-35 mg/L) up to~300 m distance to the SPM origins of MWS stations.
Low concentrations (0-2 mg/L) are close to open sea conditions with more than 2 km of distance to MWS. These areas coincide, respectively, with the meltwater fjord and marine habitats at the inner and outer cove from Jerosch et al. [31] (Figure 4d from Jerosch et al., 2018), which shows the spatial extent of glacier influence in the cove. Glacier retreat with non-measured SPM input at the west coast or underwater could affect the predicted SPM concentrations. This material input will either sediment in a significant proportion, remain in the water column and transported among the cove, or be exported into Bransfield Strait [46]. But the outcome of this study has shown its superficial distribution. The high SPM concentration at the inner cove could also be deposited or be longer suspended, modifying the marine ecosystem (e.g., salinity conditions, light availability, carbon and iron concentrations) and making this area vulnerable to changes in the pelagic and benthic community [14,17,64,65]. Consequently, since glaciers are retreating among the entire west Antarctic Peninsula [5], ecosystem changes at PC due to increased SPM concentration can be expected in Antarctic fjords and coastal areas influenced by retreating glaciers.

Conclusions
The configuration of the model's parameters affects the resulting interpolation, and thus, for replicable results, they have to be reported. The default and configured interpolations applied in this study determine the detailed steps in Figures 1 and 2 for building a geostatistical model where the crucial ones are: 1.
Data exploration: Explore data via Semivariogram for a homogeneous data set and Voronoi plot to identify measurement errors or extreme values, consider as 'outliers', and to evaluate their inclusion. To also evaluate spatial variances of the target variable in a short distance among stations.

2.
Model parameters configuration: Run natural neighborhood analysis and calculate the Euclidean distances' sum and mean of furthest away and closest neighbor to determine the radius and neighborhood configuration.

3.
Model strategy definition: Build either Ordinary Kriging by default or Empirical Bayesian Kriging with a maximum neighborhood configuration as the value of the mean of the furthest away neighbors' distances and a minimum as the half of it; a configuration of a radius that is the sum of all furthest away distances. Compare models' performances with and without data transformation.

4.
Model evaluation and selection: Calculate performance index -Equation (3)-for each geostatistical model to decide and select best spatial interpolation based on mean standardized (MS), root mean squared standardized (RMSS), and previous knowledge of the system and parameter to interpolate.
As these steps reach a good interpolation performance for a highly fluctuating variable, such as SPM, we assume that these main steps of the workflow (Figure 2) could also guide the modeling of stable variables, such as soil grain size or total organic carbon content in the sediment. All tested geostatistical interpolations from this study show the importance of the model's parameter configuration and the necessity for further researches of combined interdisciplinary planning when sampling to achieve a well spatial explanation of the target phenomenon.
Supplementary Materials: The following are available online at http://www.mdpi.com/2311-5521/5/4/235/s1, Section I: Data Exploration: Figure S1: Scatter plot of suspended particulate matter values in different station points in Potter Cove. Figure S2: Semivariogram cloud plots of SPM data partitioning. Figure S3: Trend analysis for one-day SPM data with and without meltwater streams input. Section II: Detailed procedure configuration and statistical differences between geostatistical models: Table S1: Detailed numbers of the configuration of the different parameters applied. Figure S4. (I and II), S5. (I and II): Semivariogram and semivariogram model of each Simple Kriging (SK), Ordinary Kriging (OK), and Empirical Bayesian Kriging (EBK) by default and configured of suspended particulate matter data with and without meltwater streams input. Section III -Model evaluation: Cross-validation and deviation: Figure S6: Zoom of statistical error variation among the different interpolation methods applied delineated by station location sorted from west to east. Figure S7ab: Scatter plot of observed vs. predicted values for all interpolation methods of one-day SPM data with and without meltwater streams input.