Performance Evaluation of Wave Input Reduction Techniques for Modeling InterAnnual Sandbar Dynamics

In process-based numerical models, reducing the amount of input parameters, known as input reduction (IR), is often required to reduce the computational effort of these models and to enable long-term, ensemble predictions. Currently, a comprehensive performance assessment of IR-methods is lacking, which hampers guidance on selecting suitable methods and settings in practice. In this study, we investigated the performance of 10 IR-methods and 36 subvariants for wave climate reduction to model the inter-annual evolution of nearshore bars. The performance of reduced wave climates is evaluated by means of a brute force simulation based on the full climate. Additionally, we tested how the performance is affected by the number of wave conditions, sequencing, and duration of the reduced wave climate. We found that the Sediment Transport Bins method is the most promising method. Furthermore, we found that the resolution in directional space is more important for the performance than the resolution in wave height. The results show that a reduced wave climate with fewer conditions applied on a smaller timescale performs better in terms of morphology than a climate with more conditions applied on a longer timescale. The findings of this study can be applied as initial guidelines for selecting input reduction methods at other locations, in other models, or for other domains.


Introduction
Understanding and predicting the evolution of coastal morphology is important in coastal engineering, because of its implications for coastal safety, the environment, and the economy.For instance, coastal morphodynamics influence the occurrence of rip currents affecting swimmer safety, the protection of the hinterland from coastal flooding and erosion, and the establishment and development of coastal ecosystems.Often, process-based morphodynamic models are used to predict coastal evolution.These models account for a wide range of coastal processes, such as waves, currents, sediment transport, and morphology, which results in a high level of complexity and, consequently, extensive computational effort.
As complexity and computational effort increase, process or input reduction is necessary to obtain feasible computational times for engineering applications.In a morphodynamic sense, input reduction (IR) can be defined as the selection of a reduced set of representative forcing conditions that lead to accurate approximations of the long-term morphological evolution [1].A robust input reduction method should preserve some natural variability of the environment to be able to represent the full set of conditions accurately [2,3].In coastal environments, waves are typically the dominating forcing conditions (i.e., for wave-dominated coasts).Accurate modeling of the nearshore morphodynamics requires selecting representative wave conditions that capture the variation in both wave height and direction, including mild and extreme events.
Mainly two categories of wave input reduction methods exist: binning and clustering methods.Binning methods divide the wave conditions into bins, sometimes using a specific weight target, such as longshore sediment transport.Clustering methods cluster the wave conditions according to their statistical similarity.Binning methods have been previously investigated by [2,3].However, to the best of our knowledge, clustering methods applied to inter-annual morphological predictions have not yet been addressed.
In addition to the selection of the representative wave conditions, the number, the duration, and sequencing of the wave conditions also affect the IR performance [4].Sequencing of wave conditions refers to the order in which the representative wave conditions occur in the model.The chronology of the storm conditions is likely to affect the morphological response of the sandbar due to non-linear effects.The sequencing of the representative wave conditions can be performed by random, systematic or Markov Chain sequencing methods.Random sequencing draws wave conditions randomly from the reduced wave climate [2,3].Systematic sequencing orders the wave conditions according to wave height (e.g., descending or ascending) and incident wave angle with respect to the shore-normal (e.g., either from positive to negative or vice versa, see [2]).Markov Chain sequencing utilizes the wave chronology of the full dataset to order the representative wave conditions.To the best of our knowledge, Markov Chain sequencing methods applied to inter-annual morphological predictions have not yet been addressed in the literature.
Although input reduction is a common practice in morphodynamic modeling [2,3,[5][6][7][8][9], a comprehensive study on the performance of different IR-methods is lacking.In this study, we investigate the performance of 10 input reduction techniques with 36 subvariants (i.e., different initializations and input variables), including both binning and clustering methods.To this end, we use a cross-shore sandbar behavior model forced with measured wave time-series to simulate the morphological evolution of a beach profile for the cases of Noordwijk in the Netherlands (Figure 1a) and Anmok in South Korea (Figure 1b).For the most promising method, we systematically assess the performance with respect to the number of wave conditions, the sequencing method, and the duration of reduced wave climate (following [2]).As the cross-shore model is computationally inexpensive, we are able to test a wide range of input reduction methods and settings to derive guidelines to select a suitable input reduction setup for future studies.

Research Steps for Testing IR-Methods and Settings
We divided the performance assessment into three steps (see Figure 2): (1) selecting the most promising IR-method; (2) testing the influence of different settings on the performance of the most promising method (i.e., the number of conditions, the sequencing method, and the duration of the reduced wave climate); and (3) verifying the optimal settings for a second case study.For the first two steps we used the case study of Noordwijk in the Netherlands (Figure 1a) and for the third step, the case study of Anmok in South Korea (Figure 1b).First, we assessed the performance of 36 IR-method variations (see Table 1) for different simulation times for the case study of Noordwijk.For this assessment, we kept the number of wave conditions constant at 12 and used only random sequencing.The descriptions of the IR-methods included in this study are provided in Section 3. Second, we tested the influence of the number of wave conditions, the sequencing method, and the duration of the wave climate for the most promising IR-method found in step 1 (i.e., the Sediment Transport Bins Method).As the variables are interconnected, we tested different combinations of these variables.The tested settings are described in Section 4. Finally, we verified the most promising settings from the Noordwijk case by applying it to Anmok beach.The tested IR-methods and settings are summarized in Table 1.
Table 1.Overview of simulated input reduction (IR)-methods and variations.The meaning of the variable symbols is as follows: S y = longshore sediment transport; T p = wave peak period; θ = wave angle; H rms = root-mean-square wave height.

Performance Evaluation
We assessed the performance of the 36 IR-method variants using a cross-shore bar sandbar behavior model (i.e., UNIBEST-TC), which was calibrated by [10] for Noordwijk beach in the Netherlands.We used this 3.3 years-long brute force model simulation (i.e., forced with the full measured wave time series) as the benchmark to assess the performance of the IR-methods.For the quantitative performance, we used the cumulative skill score described in [11] for the cross-shore profile at the end of the 3.3 years simulation.
The skill score (R) of the reduced wave climate model predictions (z red ) in relation to the brute force predictions (z f ull ) is determined for all bar profiles in time (t) over the cross-shore distance (x) on the bar profile (x 1 − x end ).
R = 1 means a perfect match (i.e., no deviations) between the morphological prediction with the reduced wave climate and the brute force model.R ≤ 1 indicates discrepancies between the morphological prediction with the reduced wave conditions and with the brute force model.Because of the tendency of the skill score R to reward predictions that underestimate the overall magnitude of bed changes [12], a qualitative assessment was also performed by visual comparison of the final profile of the reduced and full wave climate.
For the validation of the most promising settings, we applied the results obtained from Noordwijk case in a calibrated 0.8 years-long brute-force model of Anmok beach.The model was forced with wave time series measured 850 m offshore (i.e., at ca. 20 m water depth) with a temporal resolution of 1 h.The calibration was performed by means of the skill score (Equations ( 1)-( 3))11 and visual comparison with the beach profile obtained from a survey campaign.The model settings for the UNIBEST-TC model are listed in Table 2.

Tested Input Reduction Methods
We selected five binning and five clustering methods for the performance assessment (see Table 1).In most of these methods, the wave conditions are clustered based on their spectral parameters, such as the root-mean-square wave height (H rms ), peak period (T p ), and wave direction (θ).However, we also clustered the conditions with respect to their contribution to the sediment transport by substituting H rms by the associated longshore sediment transport, S y (when known) or H rms p , where p is the power that represents the non-linear relation between wave height and sediment transport.Typically, p varies from p = 2 to p = 3.In this study, p = 2.5 was applied.All variations in terms of input variables are shown in Table 1.
In each bin or cluster, the representative wave condition is defined as the centroid of that bin or cluster.For the spectral wave parameters, the centroids are defined as the average of the wave conditions within a cluster or bin.For H rms p , the definition of the centroids within a bin is defined by a non-linear weighting formula for the wave height Equation ( 4).
where f i is the frequency of occurrence of the root-mean-square wave height H rms,i of a wave condition x i belonging to the cluster or bin j with observations C j .For S y the centroids are defined by the average of the wave conditions within the bin or cluster.To obtain the values of H rms the nearest wave condition of the centroid is used.The sub-sections below discuss the principles of the selected IR-methods.

Conditions with the Largest Transport Contribution Method
The Conditions with the Largest Transport Contribution method (CLTCM) [13] selects only the wave conditions with the highest contributions to the longshore sediment transport.Initially, the wave conditions are binned into a larger number of wave height and wave direction bins than the desired number of wave conditions for the reduced wave climate.The sediment transport contribution of each bin is determined, and the k bins with the highest sediment transport contribution are selected as the representative wave conditions.This method uses the sediment transport rates as the nput variable (see Table 1) and, hence, requires the transport rates corresponding to the wave conditions to be known before the input reduction is executed.

Fixed Bins Method
The Fixed Bins method (FBM) ( [3]) divides the wave conditions in pre-defined wave height and wave direction bins.The algorithm first divides the wave conditions in directional bins with uniform resolution.Next, each directional bin is divided into wave height bins according to its range of wave height.This results in wave height bins that can vary among the directional classes (see Figure 3a).

Energy Flux Method
The Energy Flux method (EFM) ( [3]) divides the wave conditions in pre-defined wave direction and wave height bins with equal amount of energy flux (E f ).
where ρ is the water density (assumed to be 1025 kg/m 3 ), g is the gravity acceleration (g = 9.81 m/s 2 ), H = deep water wave height and c g = wave group celerity in deep water.The EFM generates a higher bin resolution for conditions with more wave energy and a lower resolution for conditions with less wave energy (see Figure 3b).The wave height of the representative wave conditions is defined as the inverse function of the average energy flux of each bin while wave period and wave direction are defined as the average of the wave conditions in a bin.

Sediment Transport Bins Method
Similar to the EFM, Sediment Transport Bins method (STBM) divides the wave data in pre-defined wave direction and wave height bins with equal weight, but the weight is determined by the longshore sediment transport obtained from the brute force simulation.In contrast to the EFM, the definition of the directional bins starts from the shore-normal angle such that wave conditions that cause opposite sediment transport rates do not average out within a bin (see Figure 3c).
Similar to the EFM, Sediment Transport Bins method (STBM) divides the wave data in predefined wave direction and wave height bins with equal weight, but the weight is determined by the longshore sediment transport obtained from the brute force simulation.In contrast to the EFM, the definition of the directional bins starts from the shore-normal angle such that wave conditions that cause opposite sediment transport rates do not average out within a bin (see Figure 3c).

Representative Wave Approach
The Representative Wave approach (RWA) is adapted from [6] and divides the wave data into bins over time.In this paper, we divided the wave data into seasons.For each section, the representative wave condition is the average of the wave conditions in that bin.This is the only method that preserves the chronology of the original wave dataset.

Clustering Methods
Table 1 provides an overview of the clustering methods and variations that were selected for this study.The clustering methods use the normalized 3-dimensional (e.g., H rms , T p , θ) Euclidean distance as a measure of similarity between the wave conditions.The closer the distance between the wave conditions, the more similar to each other they are.Similar to the binning methods, H 2.5  rms or S y can be used as an alternative input variable for H rms .We tested these variations for the crisp k-means, fuzzy k-means, and k-harmonic means methods (see Table 1).The similarity of the wave conditions depends on the cluster initiation.Therefore, we tested different cluster initiations for the clustering methods: fixed bins, maximum dissimilarity algorithm, and K-means++ algorithm [14].

Maximum Dissimilarity Algorithm
The Maximum Dissimilarity algorithm (MDA) (see [15]) creates a subset of k centroids that represents the full diversity of the wave data by maximizing the dissimilarity between the vectors in the subset.To measure the dissimilarity between vectors, we used the MaxMin Algorithm [16] with the efficient algorithm of [17].The first centroid is the wave condition with the maximum distance to all other wave data.After the first centroid is excluded from the dataset, the second centroid is the wave condition with the maximum distance to the first centroid.The subsequent centroids are the wave conditions with the maximum distance among the minimum distance of the remaining wave conditions to the previous centroids.

Grouping with Equal Sediment Influence Method
The Grouping with Equal Sediment Influence method (GESIM) aggregates wave conditions in clusters with approximately the same sediment transport contribution [13].Therefore, it uses only S y , T p , and θ as input variables.It has the same principle as the STBM, but it aggregates the wave conditions into clusters instead of dividing them into bins.GESIM starts by selecting k initial wave conditions as individual clusters using the MDA.Subsequently, in every iteration, each cluster incorporates the closest observation to the cluster until a total sediment transport threshold is reached.The threshold is defined by the total sediment transport divided by the number of representative cases (k).When wave conditions cannot join a cluster anymore, the remaining wave conditions join the cluster to which they have the smallest distance.In the end, this results in k clusters that represent approximately the same amount of sediment transport.The centroid of each cluster is defined as the average of the wave conditions in the cluster.

Crisp K-Means Method
The Crisp K-Means method (CKM), also known as K-means, is one of the most widely used clustering methods [18,19].It starts with k initial centroids that are defined randomly with weights based on the distance of the wave conditions through the K-means++ algorithm [14].Then, every wave condition is assigned to the cluster it is closest to.The CKM has a hard membership function which means that wave conditions can only be a member of one cluster.Next, the centroids are updated by averaging the wave conditions that constitute the clusters.This procedure is repeated iteratively until the difference between the current and previous centroids is smaller than a user-defined accuracy criterion (see Figure 4a).More details can be found in [20].

Fuzzy K-Means Method
The Fuzzy K-Means method (FKM) (see [21]) is similar to the CKM, but with a soft membership function.Therefore, wave conditions can be assigned to more than one cluster.This means that all wave conditions have some influence on the definition of the centroids determined by the fuzzy membership function.Initially, the centroids are defined as in the CKM.Then, the fuzzy membership function (M i,j ) of each wave condition is calculated for every cluster.
where x i , v j is the Euclidean distance between wave conditions (x i ) and centroids (v j ), i being the wave observation index of the full dataset, j the cluster index, and k the number of clusters (i.e., number of representative wave conditions).The fuzzy parameter o, where o > 1, is case specific and requires calibration.
Based on sensitivity analyses, we used o = 1.5.The new centroids are defined as the weighted average of the wave conditions using the fuzzy membership as weight.In this way, wave conditions closer to the previous centroid have a higher influence on the definition of the next centroid.This iterative process is repeated iteratively until the algorithm converges towards a stable solution (see Figure 4b).

K-Harmonic Means
The K-Harmonic means (KHM) (see [22,23]) has the same procedure as the FKM, but the weight used for the definition of the centroids is defined by a dynamic weighting function (K i ).
In this case o ≥ 2. Higher dimensions of the dataset require a larger value for o [23].The parameter o is case specific and calibration is required to define it.Based on sensitivity analysis we used o = 4.2.The dynamic weight leads to a larger influence of outliers on selecting the centroids rather than wave conditions that are closer to the centroids (Figure 4c).

∑ 𝑥 , 𝑣 ∑ 𝑥 − 𝑣
In this case  ≥ 2. Higher dimensions of the dataset require a larger value for  [23].The parameter o is case specific and calibration is required to define it.Based on sensitivity analysis we used  = 4.2.The dynamic weight leads to a larger influence of outliers on selecting the centroids rather than wave conditions that are closer to the centroids (Figure 4c).

Number of Representative Wave Conditions
The performance of an IR-method depends on the number of representative wave conditions (k) included in the reduced wave climate.Therefore, we tested the influence of the number of wave conditions on the performance of the most promising method.For binning methods, the number of representative wave conditions is defined by the combination of the number of directional bins (ndir) and the number of wave height bins (nhrms).The resolution of ndir and nhrms affects the performance of the input reduction method given by the distinct effects that wave height and wave direction have on sediment transport and, thus, morphology.In step 1 (i.e., testing different IR-methods), we used, k = 12, ndir = 4, and nhrms = 3.For the sensitivity testing, we varied k from 8 to 32 for different combinations of ndir and nhrms (see Table 3).

Sequencing Methods
The sequencing of the wave conditions can have a major impact on the performance of the morphological predictions due to the non-linear response of morphology to wave [2].Ideally, the sequencing of representative wave conditions should resemble the natural variability of the full wave climate.Therefore, we tested the influence of different sequencing methods on the performance of the most promising IR-method.The sequencing methods simulated are listed in Table 4. Figure 5 illustrates the sequencing methods applied to the STBM with k = 12 and T wc = 301 days.For the random sequencing and Monte Carlo methods, we used five replicates to limit the effect of the random initial choice on the performance of the method.Note that the Markov Chain sequencing has no repetitions since it does not contain randomness and that the reduced climate of the Monte Carlo Markov Chain with repetition sequencing is not repeated four times as in the other methods.

Random Sequencing
Random sequencing orders the representative wave conditions randomly.First, the representative wave conditions are assigned integers ranging from 1 to  .Next, a random permutation of the integers is performed.The integers are then sorted in ascending order and their respective representative wave conditions are sequenced accordingly.The random sequence is performed with five repetitions for each method, except for the RWA that has its sequence determined by the chronology of the dataset.

Markov Chain Sequencing
The Markov Chain sequencing (MC) orders the representative wave conditions in the way they most likely would occur in the full dataset.The procedure is described as follows: 1. Number the representative wave conditions stored in the database  from 1 to ; 2. Determine for every wave condition from the full dataset () which of the representative wave conditions in  is most similar to it.In this step, a new vector  is created with size  × 1 ( =

Random Sequencing
Random sequencing orders the representative wave conditions randomly.First, the representative wave conditions are assigned integers ranging from 1 to k. Next, a random permutation of the integers is performed.The integers are then sorted in ascending order and their respective representative wave conditions are sequenced accordingly.The random sequence is performed with five repetitions for each method, except for the RWA that has its sequence determined by the chronology of the dataset.

Markov Chain Sequencing
The Markov Chain sequencing (MC) orders the representative wave conditions in the way they most likely would occur in the full dataset.The procedure is described as follows: 1.
Number the representative wave conditions stored in the database V from 1 to k;

2.
Determine for every wave condition from the full dataset (X) which of the representative wave conditions in V is most similar to it.In this step, a new vector F is created with size N × 1 (N = number of observations of the full dataset), in which the number of the wave conditions that is most similar to each observation is stored (see Equation ( 8)).
where I is a true-false indicator that is 1 when the equation between brackets is true and 0 when it is false, x i is the wave observation in the full dataset, and v j the representative wave condition.

3.
Determine the Markov transitions for the wave conditions in F. The Markov transitions are stored in a Markov transition matrix M of size k × k, where k is the number of representative wave conditions (see Equation ( 9)).
where P is the transition probability of a representative wave condition n in F given a representative wave condition m with transition index t.

4.
Define two time series matrices: A s and A NS .A s starts empty and will contain the numbers that are assigned to the wave conditions in step 1 in the sequence determined by the algorithm.A NS contains the numbers assigned to the wave conditions in step 1 at the start of the algorithm.When a wave condition is selected by the algorithm, its number will be deleted from matrix A NS and added to the matrix A s . 5.
Define the first wave condition (A s,1 ) as the most similar one to the initial wave condition in the observation dataset (see Equation ( 10)).
where x 1 , v j and x 1 , V are the Euclidean distances between the initial observation of the full dataset (x 1 ) and a representative wave condition (v j ) and all representative wave conditions in the reduced dataset (V), respectively.The number assigned to the initial wave condition (A s,1 ) is deleted from the matrix A NS , which reduces its size to (k − 1).6.
The next wave condition to be selected for the reduced time series (A s,t ) is the one with the highest Markov transition probability (M), conditional on the previous selected wave condition (A s,t−1 ) and available in the matrix A NS (see Equation ( 11)).
where t is a transition index and q is the transition probability index of M.

7.
Reorder the wave conditions in the database V according to their assigned numbers in matrix A s .

Monte Carlo Markov Chain Sequencing
The Monte Carlo Markov Chain sequencing (MCMC) orders the representative wave conditions randomly corresponding to the Markov transition probabilities.Since the MCMC sequencing contains randomness, it was performed with five repetitions.The procedure of this sequencing method is described as follows: 1.

2.
Determine the cumulative Markov transitions ( k n=1 P(F t+1 = n F t = m)) for the wave conditions in F. The cumulative Markov transitions are stored in a Markov transition matrix M of size k × k: 3.
Define two time series matrices: A s and A NS as in step 4 of Section 4.2.2; 4.
Define the first wave condition (A s,1 ) as the most similar one to the initial wave condition in the observation dataset as in Equation (10).The Markov transition probability of the initial wave condition (A s,1 ) is reduced from the cumulative Markov transition matrix M and the remaining cumulative probabilities are normalized.Moreover, the number assigned to the initial wave condition (A s,1 )) will now be deleted from the matrix A NS , hence, its size reduces to (k − 1) × 1.

5.
Draw a random number between 0 and 1 (R t ).The next wave condition to be selected (A s,t ) is the first occurrence with the Markov transition probability containing the random number previously defined: Subtract the Markov transition probability of the selected wave condition A s,t from the cumulative Markov transition matrix M and normalize the remaining probabilities: Exclude the selected wave condition A s,t from the matrix A NS .8.
Reorder the wave conditions in the database V according to their assigned numbers in matrix A s .

Monte Carlo Markov Chain with Repetition Sequencing
The Monte Carlo Markov Chain with repetition sequencing (MCMCR) has the same principle as the MCMC sequencing.However, instead of excluding the selected wave condition A s,t immediately, it allows the wave case to be repeated NR times, where NR is the number of repetitions of the reduced wave climate (see Section 4.3).Hence, in this sequencing method, the reduced wave climate is not entirely repeated, but the wave conditions are allowed to persist NR times.The MCMCR sequencing was performed with 5 repetitions.

Wave Climate Duration
The wave climate duration indicates the timescale for which the reduced wave climate is applied.The durations that we tested are listed in Table 5.These values are defined according to [2].The duration of the reduced wave climate (T wc ) is determined by the sum of the durations of its conditions (see Equation ( 15)).
where T R = input reduction period (for Noordwijk, T R = 3.3 years), and N R = number of repetitions of the reduced wave climate.Ideally, the reduced wave climate should resemble the natural variability of the full wave climate.The duration of each representative wave condition should be long enough for the morphology to adjust to the hydrodynamic conditions, though not too long to prevent unrealistic irreversible morphological disturbances.The duration of the wave climate is associated with the required computational time through the number of transitions, NoT = NoC * N R -1, where NoC = number of reduced wave conditions.The higher the number of transitions, the higher the computational demand as models need to spin up between wave conditions ([2]).

Duration of Wave Climate (days) Number of Repetitions
5. Results

Performance Evaluation of Input Reduction Methods
The performance of the IR-variants in terms of average cumulative skill score is presented in Figure 6.The CLTCM and MDA methods have no performance score as the selected representative wave conditions resulted in such unrealistic morphological changes that the model simulations crashed.Therefore, these two methods are not considered suitable for input reduction.For the remaining 34 variants, we separately discuss the results for the binning (Section 5.1.1)and clustering (Section 5.1.2) methods and reflect on the influence of the duration of the reduced wave climate.

Performance Evaluation of Input Reduction Methods
The performance of the IR-variants in terms of average cumulative skill score is presented in Figure 6.The CLTCM and MDA methods have no performance score as the selected representative wave conditions resulted in such unrealistic morphological changes that the model simulations crashed.Therefore, these two methods are not considered suitable for input reduction.For the remaining 34 variants, we separately discuss the results for the binning (Section 5.1.1)and clustering (Section 5.1.2) methods and reflect on the influence of the duration of the reduced wave climate.

Binning Methods
Overall, the binning methods perform better than the clustering methods (see Figure 6).In terms of skill score, the STBM and EFM are the most promising methods.The STBM performs best both in terms of skill score (see Figure 6) and the modeled morphological evolution (see Figure 7).The better performance of the STBM is probably related to the bin definition: The STBM weighs both wave height and wave direction, while the EFM only weighs wave height.Moreover, the STBM does not allow opposite wave contributions to the sediment transport to average out (its directional bins definition starts from the shore-normal wave angle) while the EFM does.Therefore, the STBM is selected as the most promising method to test the influence of the number of wave conditions, sequencing, and wave climate duration on the performance.

Binning Methods
Overall, the binning methods perform better than the clustering methods (see Figure 6).In terms of skill score, the STBM and EFM are the most promising methods.The STBM performs best both in terms of skill score (see Figure 6) and the modeled morphological evolution (see Figure 7).The better performance of the STBM is probably related to the bin definition: The STBM weighs both wave height and wave direction, while the EFM only weighs wave height.Moreover, the STBM does not allow opposite wave contributions to the sediment transport to average out (its directional bins definition starts from the shore-normal wave angle) while the EFM does.Therefore, the STBM is selected as the most promising method to test the influence of the number of wave conditions, sequencing, and wave climate duration on the performance.The performance of the input reduction increases for shorter wave climate durations.The skill is poor when the reduced wave climate is applied for the total duration of the brute force model ( = 1205 days).This is attributed to unrealistically long durations for extreme wave conditions which result in unrealistic and irreversible morphological changes.We selected  = 301 days (i.e., approximately a yearly timescale) as the optimal balance between performance (i.e., reflecting the wave climate variability) and computational effort (i.e., number of transitions).
Among the Fixed Bins methods, FBM1 has relatively good results, but only when the reduced wave climate is repeated often ( = 301 and smaller).FBM2 presents very poor performance as it does not select low wave height cases due to the weighting function.FBM3 shows consistent good performance for all durations of wave climate.However, its morphological response is very poor (see Figure 7a).This is a result of the skill score rewarding predictions that underestimate the overall magnitude of bed changes [12].The RWA performs poorly; the average conditions of the seasons tend to be similar, resulting in a poor selection of representative wave conditions.

Clustering Methods
The clustering methods perform generally worse than the most promising binning methods (i.e., EFM and STBM).The clustering methods rely primarily on the frequency of occurrence of the observations, which leads to an over-representation of frequently occurring low wave conditions in The performance of the input reduction increases for shorter wave climate durations.The skill is poor when the reduced wave climate is applied for the total duration of the brute force model (T wc = 1205 days).This is attributed to unrealistically long durations for extreme wave conditions which result in unrealistic and irreversible morphological changes.We selected T wc = 301 days (i.e., approximately a yearly timescale) as the optimal balance between performance (i.e., reflecting the wave climate variability) and computational effort (i.e., number of transitions).
Among the Fixed Bins methods, FBM1 has relatively good results, but only when the reduced wave climate is repeated often (T wc = 301 and smaller).FBM2 presents very poor performance as it does not select low wave height cases due to the weighting function.FBM3 shows consistent good performance for all durations of wave climate.However, its morphological response is very poor (see Figure 7a).This is a result of the skill score rewarding predictions that underestimate the overall magnitude of bed changes [12].The RWA performs poorly; the average conditions of the seasons tend to be similar, resulting in a poor selection of representative wave conditions.

Clustering Methods
The clustering methods perform generally worse than the most promising binning methods (i.e., EFM and STBM).The clustering methods rely primarily on the frequency of occurrence of the observations, which leads to an over-representation of frequently occurring low wave conditions in the selection of representative waves.Since the morphology is highly dependent on energetic conditions, this reduces the performance of the clustering methods.Of all clustering methods, the Crisp k-means methods (CKM) tend to perform better than the others.Among the CKMs, the cluster initiation MDA performs slightly better than K-means++ or Fixed Bins.The results are best with S y as the input variable instead of H rms or H 2.5  rms .For the Fuzzy k-means and K-harmonic means methods, the cluster initiation of Fixed Bins shows the best performance.These methods performed better with H 2.5  rms as the input variable, because there is a balance between the weighting function and the high dependency on the dense cloud of observations intrinsic of these methods.The weighting function leads to a selection of centroids with higher wave height, while the dependency on the observation's frequency of occurrence leads to a selection of centroids with low wave height.When using S y as the input variable, these methods have lower skill due to the absence of an inverse function for the sediment transport used in this study (i.e., the nearest wave condition to the centroid is selected), which results in a poor selection of representative wave conditions.
FKM8 shows similar skill scores to the EFM, but the morphological response of FKM8 is evidently worse than EFM (see Figure 7b,c, respectively).This is a result of the skill score rewarding predictions that underestimate the overall magnitude of bed changes [12].GESIM also performs poorly because it does not aggregate the observations into clusters well: Once a cluster reaches the limit of sediment transport, it 'closes' and the observations closer to this cluster will be aggregated into another one which might be relatively far from the observations.This could lead to a poor selection of representative wave conditions since they are defined as the average of the observations within a cluster.
Additionally, most of the methods that do not present the pattern of improvement of skill score with decreasing duration of the wave climate are associated with input variables H 2.5  rms and S y (CKM5, CKM8, FKM3, FKM6, FKM9, KHM3, KHM6).When the selection of the representative wave conditions is initially poor, decreasing T wc does not improve the performance of the method.Whereas when the selection is initially reasonably good, decreasing T wc can improve the performance of the method.

Performance Evaluation of Input Reduction Settings
The effects of the input reductions settings (i.e., number of wave conditions, sequencing method, and wave climate duration) are only evaluated for the STBM as the most promising IR-method.The skill scores for simulations with a different number of conditions and sequencing methods are shown in Figure 8.Note that, except for the Markov Chain Sequencing (S2), that does not contain randomness, we used the mean skill score of five random replicates for the other sequencing methods.The Monte Carlo Markov Chain Sequencing with repetition (S4) does not apply for T wc = 1205 days, as a repetition of wave conditions is not possible when applying it on the full timescale of the reduction period.

Number of Wave Conditions in Reduced Climate
Overall, the input reduction for  = 32 has the best performance in terms of skill score, whereas  = 10 has the worst performance.The performance appears to be related to the resolution in directional bins:  = 32 has the highest number of directional bins (i.e., eight), whereas  = 10 has only two directional bins.Increasing the number of cases does not necessarily imply a substantial improvement in skill score, except when the number of directional bins is increased considerably, such as for  = 32 and  = 24.The influence of the wave height on the longshore transport and morphology, even though non-linear, is always proportional.However, the influence of the wave direction on the longshore sediment transport is sinusoidal (i.e., fluctuating around the angle of maximum transport) and, hence, not proportionally increasing or decreasing with wave angle.Therefore, the resolution in directional space appears to be more important than the wave height.For the Noordwijk case in particular, the importance of the directional bins is also related to the strong influence of the cross-shore distribution of the longshore sediment transport on the inter-annual bar morphology [10].

Sequencing of Wave Conditions
Although the random and MCMC sequencing methods generally have similar skill scores, visual inspection of the temporal evolution of the cross-shore profile indicated that random sequencing yields the best results.The MC and MCMCR methods perform poorly, because they tend to aggregate calm conditions at the beginning of the simulation and energetic conditions at the end.This aggregation occurs because the highest probabilities of the Markov Chain transitions remain on the same state.Therefore, the methods that introduce randomness in the sequencing tend to better resemble the natural variations on the wave climate for the case study.

Duration of Reduced Wave Climate
The influence of the duration of the reduced wave climate was further investigated by comparing the STBM with random sequencing for  = 12 and  = 32 .When applying the reduced wave climate on a smaller timescale (e.g., 134 days), increasing the number of representative cases does not result in much improvement of morphological evolution (see Figure 9).Although the morphological evolution is more in line with the brute force simulation for  = 32 than for  = 12 with the same  , the number of transitions is much higher for a higher number of wave conditions

Number of Wave Conditions in Reduced Climate
Overall, the input reduction for k = 32A has the best performance in terms of skill score, whereas k = 10 has the worst performance.The performance appears to be related to the resolution in directional bins: k = 32A has the highest number of directional bins (i.e., eight), whereas k = 10 has only two directional bins.Increasing the number of cases does not necessarily imply a substantial improvement in skill score, except when the number of directional bins is increased considerably, such as for k = 32A and k = 24C.The influence of the wave height on the longshore transport and morphology, even though non-linear, is always proportional.However, the influence of the wave direction on the longshore sediment transport is sinusoidal (i.e., fluctuating around the angle of maximum transport) and, hence, not proportionally increasing or decreasing with wave angle.Therefore, the resolution in directional space appears to be more important than the wave height.For the Noordwijk case in particular, the importance of the directional bins is also related to the strong influence of the cross-shore distribution of the longshore sediment transport on the inter-annual bar morphology [10].

Sequencing of Wave Conditions
Although the random and MCMC sequencing methods generally have similar skill scores, visual inspection of the temporal evolution of the cross-shore profile indicated that random sequencing yields the best results.The MC and MCMCR methods perform poorly, because they tend to aggregate calm conditions at the beginning of the simulation and energetic conditions at the end.This aggregation occurs because the highest probabilities of the Markov Chain transitions remain on the same state.Therefore, the methods that introduce randomness in the sequencing tend to better resemble the natural variations on the wave climate for the case study.

Duration of Reduced Wave Climate
The influence of the duration of the reduced wave climate was further investigated by comparing the STBM with random sequencing for k = 12 and k = 32A.When applying the reduced wave climate on a smaller timescale (e.g., 134 days), increasing the number of representative cases does not result in much improvement of morphological evolution (see Figure 9).Although the morphological evolution is more in line with the brute force simulation for k = 32A than for k = 12 with the same T wc , the number of transitions is much higher for a higher number of wave conditions (see Table 6).Hence, the slight performance improvement has relatively large computational costs.Therefore, k = 12 seems to be more appropriate than k = 32A.Furthermore, the morphological evolution of the nearshore bars is better represented by a reduced climate with less representative wave conditions applied for a shorter duration (k = 12 with T wc = 134 days) than with more representative wave conditions applied for a longer duration (k = 32A with T wc = 301 days).Therefore, the wave climate duration is found to be more important for the performance than the number of wave conditions in the reduced climate.(see Table 6).Hence, the slight performance improvement has relatively large computational costs.Therefore,  = 12 seems to be more appropriate than  = 32.Furthermore, the morphological evolution of the nearshore bars is better represented by a reduced climate with less representative wave conditions applied for a shorter duration (  = 12 with  = 134 days) than with more representative wave conditions applied for a longer duration (  = 32 with  = 301 days).Therefore, the wave climate duration is found to be more important for the performance than the number of wave conditions in the reduced climate.

Validation with Anmok Beach
The most promising input reduction setup for Noordwijk (i.e., STBM,  = 12 , random sequencing, and  = 134 days), was applied to Anmok beach in South Korea (Figure 1).The result show that the final profiles are well represented in amplitude with slight errors in phase (see Figure 10).The replicate R1 presents small scale undulations in the profile due to the coarse sediment

Validation with Anmok Beach
The most promising input reduction setup for Noordwijk (i.e., STBM, k = 12, random sequencing, and T wc = 134 days), was applied to Anmok beach in South Korea (Figure 1).The result show that the final profiles are well represented in amplitude with slight errors in phase (see Figure 10).The replicate R1 presents small scale undulations in the profile due to the coarse sediment characteristic of the profile.Despite these instabilities, the overall performance of the input reduction in is not impaired.The skill scores for Anmok are smaller than the ones for Noordwijk.The average skill score is 0.64, with maximum and minimum skill score of 0.75 and 0.57, respectively.This could be due to the smaller timescale analyzed in Anmok (T R = 0.8 years) since in the beginning of the simulation the morphological variation is small causing larger errors and very low skill scores (see Equation ( 1)), thus, influencing more the cumulative skill score of shorter periods [12].Nevertheless, the results of the validation were considered satisfactory.characteristic of the profile.Despite these instabilities, the overall performance of the input reduction in Anmok is not impaired.The skill scores for Anmok are smaller than the ones for Noordwijk.The average skill score is 0.64, with maximum and minimum skill score of 0.75 and 0.57, respectively.This could be due to the smaller timescale analyzed in Anmok ( = 0.8 years) since in the beginning of the simulation the morphological variation is small causing larger errors and very low skill scores (see Equation ( 1)), thus, influencing more the cumulative skill score of shorter periods [12].Nevertheless, the results of the validation were considered satisfactory.

Discussion
A good selection of representative wave conditions for morphology should balance mild and energetic conditions as well as direction variability while prioritizing directions that contribute the most to the sediment transport.In our assessment, we found that binning methods perform better than clustering methods.Among the binning methods, the ones that split the wave conditions into bins with equal weight performed better than the ones that split the wave conditions arbitrarily into bins, as long as the reduced wave climate is not very detailed (e.g.,  < 16, consistent with [2]).On the other hand, [3] found that the EFM performed better than the CERC (Coastal Engineering Research Center) method proposed by them, which is analogous to the STBM used in this study but with the longshore sediment transport calculated by the CERC formula [24].The difference in the findings of [3] and the present study could be related to the incident wave angle that is not considered in the CERC formula.Therefore, positive and negative transport contributions can cancel themselves out.Yet, the EFM is the second-best method in this study.Note that in the STBM, sediment transport rates obtained from the brute force simulations were used as input, which commonly is not available.In this case, we recommend the use of sediment transport formulas considering different coast angles or other proxies, such as the energy flux.The clustering methods did not perform well because of their high dependency on the most recurrent observations.For Noordwijk, these were the mild wave conditions, resulting in a lack of energetic conditions.However, for very energetic coasts where the occurrence of mild conditions is not dominant over energetic conditions, the clustering methods may perform better.

Discussion
A good selection of representative wave conditions for morphology should balance mild and energetic conditions as well as direction variability while prioritizing directions that contribute the most to the sediment transport.In our assessment, we found that binning methods perform better than clustering methods.Among the binning methods, the ones that split the wave conditions into bins with equal weight performed better than the ones that split the wave conditions arbitrarily into bins, as long as the reduced wave climate is not very detailed (e.g., k < 16, consistent with [2]).On the other hand, [3] found that the EFM performed better than the CERC (Coastal Engineering Research Center) method proposed by them, which is analogous to the STBM used in this study but with the longshore sediment transport calculated by the CERC formula [24].The difference in the findings of [3] and the present study could be related to the incident wave angle that is not considered in the CERC formula.Therefore, positive and negative transport contributions can cancel themselves out.Yet, the EFM is the second-best method in this study.Note that in the STBM, sediment transport rates obtained from the brute force simulations were used as input, which commonly is not available.In this case, we recommend the use of sediment transport formulas considering different coast angles or other proxies, such as the energy flux.The clustering methods did not perform well because of their high dependency on the most recurrent observations.For Noordwijk, these were the mild wave conditions, resulting in a lack of energetic conditions.However, for very energetic coasts where the occurrence of mild conditions is not dominant over energetic conditions, the clustering methods may perform better.
The sequencing of wave conditions influences the morphological response of the simulations considerably.The random sequencing showed the best morphological response for the cases studies in this since randomly ordered reduced wave time series retained a higher variability than the other methods that use statistical information through Markov Chain probabilities.This is in agreement with the results of [2], who found that randomly ordered synthetic time series performed better than systematic sequencing of wave conditions, such as ascending or descending wave heights combined with wave angles towards positive and negative directions.Despite its good performance, the random sequencing has the drawback that it is completely random without any user control.Furthermore, random sequencing has its limitations since it highly depends on the initial condition (i.e., the initial profile).For instance, a winter profile evolves differently than a summer profile with the same sequence of wave conditions.Since in Noordwijk and Anmok the bar dynamics do not seem to be related to specific storm events, the chronology is limitedly relevant, and random sequencing can be applied.However, [2] found that in Hasaki, where chronology is important, random ordering of synthetic time series did not represent the inter-annual bar evolution very well.For such cases, other sequencing methods may perform better.
Regarding the number of representative wave conditions in the reduced wave climate, we found that k = 12 is a good quantity of representative wave conditions given that the duration of the wave climate is of the order of 100 days.This aligns with [3] who indicated k = 12 as an optimal quantity.Additionally, k = 12 is in agreement with the commonly applied wave climates in morphodynamic simulations which typically make use of about 10 waves conditions [5,25].The wave climate duration turned out to be a very important aspect of wave climate reduction for morphological applications.This agrees with [2].The same wave condition applied on different timescales will likely give rise to distinct morphology.Generally, decreasing the reduced wave climate duration improves the performance.The present analysis has as the lower limit.A further decrease on T wc would not necessarily improve the skill of the reduced models even further.There is a lower limit of the wave climate duration associated with the response of morphology to the hydrodynamic forcing.If the duration of the wave climate is too short, there is not enough time for the morphology to adjust to the hydrodynamic conditions.This is not observed in the results of this study because the simulated durations of the wave climate were well above the lower limit, which is around 10 to 20 days according to [2].Additionally, a further decrease on T wc implies loss of applicability since the number of transitions and, thus, computational time increase when the duration of the reduce wave climate decreases.
In this study, we used the Unibest-TC profile model, due to its reduced computational time that allowed to run the considerable amount of simulations required by our methodology.In reality, brute force computations are feasible with this model.Therefore, input reduction techniques are strictly not necessary.Moreover, in Noordwijk, the alongshore variability is small, so a 1D domain is acceptable.In Anmok beach alongshore variability is present and affects the local morphodynamics.However, the changes in alongshore positions of the crescentic bars are very slow compared to the cross-shore evolution allowing for a 1D approach.For larger timescales, this is not expected to be valid.The findings of this study can still be used as initial guidelines when performing input reduction with different models and domains even for areas where alongshore variability is important.

Conclusions
In this paper, the performance of 36 variants of wave input reduction (IR) methods in modeling the interannual sandbar evolution was evaluated.The selection of the proper settings for wave-IR is a balance between the resemblance of the natural variability of the full dataset and computational effort.This study provided insights into the methods and settings that are most promising to reduce computational effort at limited performance loss.The results showed that the Sediment Transport Bins method has the best performance of all 36 methods.Generally, binning methods perform better

Figure 1 .
Figure 1.Location of the simulated beach profile and the wave observations used as forcing of the models of the study area, Noordwijk, the Netherlands (a) and the validation area, Anmok, South Korea (b).Map data: Google.

Figure 2 .
Figure 2. Flowchart of the research outline of this study, #wc is number of wave conditions.

Figure 3 .
Figure 3. Examples of a selection of Fixed Bins (a), Energy Flux (b), and Sediment Transport Bins (c) methods with 12 representative wave conditions.The red crosses are the representative wave conditions, and the small dots are the wave data.The colors and black lines indicate the bins.

Figure 3 .
Figure 3. Examples of a selection of Fixed Bins (a), Energy Flux (b), and Sediment Transport Bins (c) methods with 12 representative wave conditions.The red crosses are the representative wave conditions, and the small dots are the wave data.The colors and black lines indicate the bins.

Figure 4 .
Figure 4. Examples of a selection of Crisp k-means (a), Fuzzy k-means (b), and K-harmonic (c) methods with 12 representative wave conditions.The red crosses are the representative wave conditions, and the small dots are the wave data.The colors indicate the clusters.The black lines represent the path followed by the centroids during the iterative process.

Figure 4 .
Figure 4. Examples of a selection of Crisp k-means (a), Fuzzy k-means (b), and K-harmonic (c) methods with 12 representative wave conditions.The red crosses are the representative wave conditions, and the small dots are the wave data.The colors indicate the clusters.The black lines represent the path followed by the centroids during the iterative process.

Figure 5 .
Figure 5. Wave height time-series of the reduced wave climate by the Sediment Transport Bins method (STBM) with a duration of 301 days sequenced by random (a), Markov Chain (b), Monte Carlo Markov Chain (c), and Monte Carlo Markov Chain with repetition (d) methods with  = 12.The colored lines represent the five repetitions.The dashed line marks the duration of each reduced wave climate.

Figure 5 .
Figure 5. Wave height time-series of the reduced wave climate by the Sediment Transport Bins method (STBM) with a duration of 301 days sequenced by random (a), Markov Chain (b), Monte Carlo Markov Chain (c), and Monte Carlo Markov Chain with repetition (d) methods with k = 12.The colored lines represent the five repetitions.The dashed line marks the duration of each reduced wave climate.

Figure 6 .
Figure 6.Average cumulative skill score of the five random sequences of the simulated methods.Figure 6.Average cumulative skill score of the five random sequences of the simulated methods.

Figure 6 .
Figure 6.Average cumulative skill score of the five random sequences of the simulated methods.Figure 6.Average cumulative skill score of the five random sequences of the simulated methods.

J 21 Figure 7 .
Figure 7. Initial and final profiles with the reduced wave climate of methods Fixed Bins method 3 (FBM3) (a), Fuzzy K-Means method 8 (FKM8) (b), Energy Flux method (EFM) (c), and Sediment Transport Bins method (STBM) (d) with  = 12 and  = 301 days.The black lines are the initial (dashed) and final (solid) profiles from the brute force model.The colored lines are the final profiles of each random sequence.

Figure 7 .
Figure 7. Initial and final profiles with the reduced wave climate of methods Fixed Bins method 3 (FBM3) (a), Fuzzy K-Means method 8 (FKM8) (b), Energy Flux method (EFM) (c), and Sediment Transport Bins method (STBM) (d) with k = 12 and T wc = 301 days.The black lines are the initial (dashed) and final (solid) profiles from the brute force model.The colored lines are the final profiles of each random sequence.

Figure 8 .
Figure 8.Average cumulative skill score of the five replicates of the different sequencing and number of cases of the Sediment Transport Bins method.

Figure 8 .
Figure 8.Average cumulative skill score of the five replicates of the different sequencing and number of cases of the Sediment Transport Bins method.

Figure 9 .Table 6 .
Figure 9.Initial and final profiles with the reduced wave climate of the STBM, random sequencing,  = 12 and  = 301 days (a),  = 32 and  = 301 days (b),  = 12 and  = 134 days (c), and  = 32 and  = 134 days (d).The black lines are the initial (dashed) and final (solid) profiles from the brute force model.The colored lines are the final profiles of each random sequence.

Figure 9 .Table 6 .
Figure 9.Initial and final profiles with the reduced wave climate of the STBM, random sequencing, k = 12 and T wc = 301 days (a), k = 32A and T wc = 301 days (b), k = 12 and T wc = 134 days (c), and k = 32A and T wc = 134 days (d).The black lines are the initial (dashed) and final (solid) profiles from the brute force model.The colored lines are the final profiles of each random sequence.Table 6.Number of cases, duration of wave climate, and number of transitions.Number of Cases (k) Duration of Wave Climate (T wc ) Number of Transitions (NoT) k = 12 T wc = 301 (NR = 4) NoT = 47 k = 32 A T wc = 301 (NR = 4) NoT = 127 k = 12 T wc = 134 (NR = 9) NoT = 107 k = 32 A T wc = 134 (NR = 9) NoT = 287

Figure 10 .
Figure 10.Initial and final profiles with the reduced wave climate of the SBM, random sequencing,  = 12 and  = 98 days.The black lines are the initial (dashed) and final (solid) profiles from the brute force model.The colored lines are the final profiles of each random sequence.

Figure 10 .
Figure 10.Initial and final profiles with the reduced wave climate of the SBM, random sequencing, k = 12 and T wc = 98 days.The black lines are the initial (dashed) and final (solid) profiles from the brute force model.The colored lines are the final profiles of each random sequence.

Table 2 .
Calibration parameters of the brute force model of Anmok beach.

Table 3 .
Simulated number of wave conditions.