An Optimized-Parameter Spectral Clustering Approach to Coherent Structure Detection in Geophysical Flows

: In Lagrangian dynamics, the detection of coherent clusters can help understand the organization of transport by identifying regions with coherent trajectory patterns. Many clustering algorithms, however, rely on user-input parameters, requiring a priori knowledge about the ﬂow and making the outcome subjective. Building on the conventional spectral clustering method of Hadjighasem et al. (2016), a new optimized-parameter spectral clustering approach is developed that automatically identiﬁes optimal parameters within pre-deﬁned ranges. A noise-based metric for quantifying the coherence of the resulting coherent clusters is also introduced. The optimized-parameter spectral clustering is applied to two benchmark analytical ﬂows, the Bickley Jet and the asymmetric Dufﬁng oscillator, and to a realistic, numerically generated oceanic coastal ﬂow. In the latter case, the identiﬁed model-based clusters are tested using observed trajectories of real drifters. In all examples, our approach succeeded in performing the partition of the domain into coherent clusters with minimal inter-cluster similarity and maximum intra-cluster similarity. For the coastal ﬂow, the resulting coherent clusters are qualitatively similar over the same phase of the tide on different days and even different years, whereas coherent clusters for the opposite tidal phase are qualitatively different.


Introduction
In geophysical fluid flows, the Lagrangian approach, where one follows fluid parcels as they move through time and space, provides a natural perspective to study the dynamics of motion and the patterns of transport [1][2][3][4][5]. Even simple, time-periodic velocity fields can generate complicated trajectories and chaotic motion [6,7]. It thus comes as no surprise that Lagrangian transport in realistic oceanic flows, which are aperiodic, can be incredibly complex.
The Lagrangian structures that organize transport and govern coherent trajectory patterns are referred to as Lagrangian Coherent Structures (LCS), a term introduced by Haller and Yuan [8]. These structures act as the hidden skeleton of a flow [9,10] that can be uncovered using techniques from the dynamical systems theory [8,11,12]. Recent review papers of LCS detection methods include [13][14][15][16].
Cluster-based LCS methods take root in the field of unsupervised machine learning. Several algorithms have been adapted recently to LCS detection in geophysical flows [26][27][28]. A recurring concern with these methods is that the results rely on a set of user-input parameters, making the outcome subjective. Hadjighasem et al. [15] present a comparison of the number of parameters required by several clustering methods to construct a Lagrangian field and show how spectral clustering requires the least amount of user inputs. The authors describe how to "choose a reasonable set of parameters for each method" to yield "the most favorable outcome". They describe a robust outcome as an outcome where "small variations in the parameters do not lead to drastic changes in the outcome", which is a fair way to describe robustness. The reasonability of the parameters and the favorability of the outcome, however, suggest that in the end, the user will have to make decisions about the outcomes of the analyses that could be arbitrary and subjective. In this paper, we append the conventional spectral clustering method of HA16 (explained in Sections 2.1 and 2.2) with an algorithm for the automatic identification of the optimal parameters within pre-defined parameter ranges (Section 2.4). Our optimized-parameter spectral clustering method is still subject to all the usual limitations of the spectral approaches, including the need for a detectable spectral gap [15,29] and a tendency to favor clusters of equal size [30]. In addition to the parameter optimization algorithm for spectral clustering, we also introduce a noise-based metric for quantifying the coherence of the resulting coherent clusters (Section 2.4), which allows comparing the clusters to each other within the same flow, as well as inter-comparing the clusters between different flows. The optimized-parameter spectral clustering approach is then applied in Section 3 to two idealized analytical flows, the Bickley Jet and the asymmetric Duffing oscillator, and to a realistic oceanic coastal flow. In the last example, the identified clusters are tested using observed trajectories of real drifters. The results are discussed in Section 4.

Materials and Methods
Consider an unsteady flow that is known, either from measurements or simulations, over a finite-time window from t 0 to t f within a finite spatial domain. Our goal is to separate the domain into coherent clusters according to their Lagrangian behavior. Please note that the time window and the domain size should be considered part of the problem set up rather than parameters of the cluster detection method.
Below we review the conventional spectral clustering method, discuss its shortcomings associated with the need for user-input parameters, and then introduce a new and improved optimized-parameter spectral clustering method. We also review the Finite-Time-Lyapunov Exponents and Poincaré section techniques, which will be used in Section 3 for comparisons with the spectral clusters.

Conventional Spectral Clustering Method
The spectral clustering approach to LCS detection by [27], hereinafter referred to as HA16, partitions the dataset according to similarity between trajectories, such that intracluster similarity is maximized, and inter-cluster similarity is minimized. The similarity is determined from the pairwise distances between trajectories. Reviews of spectral clustering can be found in [29,31]. Below we briefly summarize the method. Let x i k = x i (t k ) and x j k = x j (t k ) be two trajectories in R d , where d = [2,3] for two-or three-dimensional flows, respectively, with positions at n discrete times from t 0 to t m−1 = T. The time-averaged distance r ij between trajectories is, following HA16: with | · | denoting the spatial Euclidean norm and where it is assumed that during each time step dt = t n+1 − t n , the distance is constant and equal to the mean between the values at endpoints. Next, convert the pairwise distances r ij into the similarity weights w ij : From the weights w ij , build the similarity graph G = (V, E, W), with nodes V = {x 1 , x 2 , . . . }, edges E = {e ij } ∈ V × V between nodes x i and x j , and similarity matrix W = {w ij } ∈ R n×n that associates the weights w ij to the edges e ij . Next, define the sparsification radius r and sparsify W by removing all weights The advantage of sparsifying W is two-fold: first, it saves computational efforts by removing a large number of entries; second, it minimizes the influences of entries with insignificant similarities.
The optimal partition of the domain is one with maximal intra-cluster similarity and minimal inter-cluster similarity; the ideal case would be one in which the dissimilarity between subsets is absolute, or the similarity is zero. The partition of the domain into k subsets that maximize the intra-cluster similarity and minimize the inter-cluster similarity can be achieved by minimizing the Ncut function, Ncut(V 1 , . . . , where V i is a subset of the nodes within the domain,V i is the complement of V i and vol(V) = ∑ i∈V deg(x i ) is the volume of set V, with deg(x i ) = ∑ n j=1 w ij . For large datasets, however, minimizing Ncut is computationally expensive. Therefore, an approximate solution is often used, which can be efficiently constructed with the eigenvectors v i of an eigenvalue problem for the graph Laplacian L = D − W (see [29,31] for derivations of this result): where the diagonal degree matrix, D, contains degrees, deg(x i ), for all nodes x i . The solutions v i s are the generalized eigenvectors associated with the unnormalized graph Laplacian L from Equation (4). As detailed in [29], it is often advantageous to use a normalized Laplacian: normalized spectral clustering performs better in partitioning the graph such that within-cluster similarity is maximized, especially for irregular graphs with vertices of highly varying degrees. We thus use the normalized graph Laplacian following the derivation in HA16. Hereinafter, the v i 's refer to the eigenvectors of the normalized graph Laplacian L N . From perturbation theory and the Davis-Kahan theorem, which bounds the differences between eigenspaces of symmetric matrices under perturbations, the eigenvectors v i 's are close to the vectors from the exact Ncut partition. The order of eigenvectors and eigenvalues is meaningful and the first k eigenvectors v i s can be used as approximate solutions for the k connected components of the graph. The optimal number of eigenvectors that are kept is retrieved by the heuristic argument of the spectral eigengap, i.e., the largest gap between successive eigenvalues λ i of L N . This number k gap equals the number of eigenvalues before the eigengap. The resulting partition separates the k gap connected components from the rest of the domain, which is assigned to the "incoherent background" subset. The wider the eigengap, the closer the approximate solution is to the exact solution of the graph cut minimization problem and the better the partition of the domain will be.
The final step is to extract the coherent clusters from the weighted graph data. To partition N trajectories, or nodes, into K = k gap + 1 clusters (i.e., into k gap coherent clusters and 1 incoherent background cluster), the K-Means algorithm ( [32], summarized below) is applied to the matrix N nodes), and the matrix U has size N-by-k gap . U contains information about the connectivity between nodes within the connected part of the graph. The output of K-Means is an N-by-1 vector that assigns a cluster index of each node. The clusters are mutually exclusive, i.e., each node only belongs to 1 cluster. For this reason, the K-Means is sometimes referred to as Hard C-Means, in contrast to the Fuzzy C-Means clustering, where a node can be assigned to different clusters with some probability.
The K-means iterative algorithm is applied to U as follows: 1. randomly assign each node to a cluster; 2. compute the cluster centers for all clusters: where the first summation goes over all nodes n belonging to cluster k and where N k is the number of nodes in cluster k; 3. compute the sum of inter-cluster distances from the corresponding cluster centers: 4. compute distances from each node n = 1, . . . , N to all cluster centers k = 1, . . . , K and assign each node to the cluster with the closest center; 5. repeat steps 2-4 until the cluster assignment does not change or until the improvement in Dist is below a tolerance threshold. Then, output the cluster assignment.
In the spectral clustering approach, the K-Means algorithm is not applied to the flow trajectories, but to the eigenvectors of the graph Laplacian L N . The clustering of spectral vectors as opposed to a trajectory dataset is the key difference with other clustering methods for LCS detection, including the Fuzzy C-Means approach by [26].

Challenges of Conventional Spectral Clustering
One important shortcoming of the conventional spectral clustering is the need for userinput parameters, including the sparsification radius r in Equation (1) and the percentage of sparsification in W, both of which significantly affect the spectral clustering results. For r, HA16 advise to choose the r value which, first, sparsifies the weights matrix by 5-10% (so that 90-95% entries are removed during the sparsification step), and second, the optimal r should correspond to the largest eigengap. As illustrated in Figure 1, however, the largest eigengap is often achieved at an r value that contradicts the 5-10% sparsification rule of thumb, and in many problems corresponds to the largest r considered. Thus, in the end, the choice of r is left entirely to the subjective judgement of the user and relies on the a priori knowledge of the system under study.
Another user input is the distance function and, subsequently, how it impacts the diagonal elements w i,i in W. [31] use a Gaussian kernel for the distance between nodes, a commonly chosen function in spectral clustering [30], and choose an offset value for the diagonal of elements w i,i in W. HA16 use a different approach, where the distance function is described by Equation (2) and the offset is added to W before computing D. This offset value in the diagonal elements is set to a large constant, which is immaterial for unnormalized graph Laplacians. Normalized Laplacians are however more common and adding the offset for the adjacency matrix W before calculating D does affect the eigenvalues and eigenvectors. The reader should thus be mindful of the scalar chosen for this diagonal offset, as it can impact the spectral clustering results.
Very large values (towards 10 16 ) of the offset can introduce numerical errors due to the limits of double-precision. On the opposite, an offset value that is too small compared to the other w ij 's entries of W can yield erroneous results. Unlike r, the offset is not a real physical parameter, but rather a numerical manipulation to avoid infinity in the diagonal. A convergence in the spectral decomposition is expected for large values of the offset, as long as these scalar values are numerically reasonable. 'Large' values, however, will depend on the flow: for this reason, the proposed approach verifies the convergence of the results and sets the offset value as a coefficient related to the maximum w ij entry in W, as described in more detail below. . Three other values of r yielded a higher eigengap, however, as seen in panels (a,c,d). Moreover, to follow the rule of thumb of 5-10% sparsification, r = 4.5 in panel (d) should be chosen instead. This value fails to detect the individual vortices, instead grouping them in pairs. It does, however, detect the meandering jet as an individual coherent structure. Looking at the maximum eigengap, the value of r = 1.5 in panel (c) should be chosen.

Improved Optimized-Parameter Spectral Clustering
Here we propose an improvement to the conventional spectral clustering algorithm, which minimizes the need for user-input parameters. The optimal parameter choices are based on the notion of convergence (for w i,i ) and an argument involving the normalized spectral eigengap (for r). With w i,i and r obtained in a user-independent fashion within predefined ranges, the sparsification number is no longer needed. With these modifications, the only user input is the pre-defined range for parameters, within which the method automatically detects optimal parameter values. We also provide guidance for the choice of the parameter range that is most inclusive and worked well in all our examples.
The sparsification radius r defines the largest allowed distance between trajectories, because all weights corresponding to larger distances are removed during sparsification. Thus, r can be thought of as the parameter defining the size of the resulting clusters. It is tempting to use the same eigengap argument to define the optimal r, which was used to define the optimal number of clusters k gap , namely that the optimal r should yield the largest eigengap. The eigenvalues of the graph Laplacian, and thus the corresponding eigengap value, however, typically decrease with r, as more and more entries are removed from the weight matrix during sparsification. Thus, the r value that yields the largest eigengap is often the largest r considered. To overcome this issue and use the spectral gap arguments for choosing the optimal r, we introduce the normalized spectral eigengap by referencing the largest eigengap to the full dynamic range of the spectral values. Specifically, we define the normalized eigengap as the absolute value of the maximum eigengap divided by the difference between the largest and smallest eigenvalues, λ max − λ min : This is similar in spirit to the recently proposed Normalized Maximum Eigengap by [33], but in our approach the denominator is automatically calculated from the spectral decomposition and does not include any user-input coefficient. As normalization does not change the physical meaning of the eigengap, large values of the normalized eigengap still yield spectral clusters with high intra-cluster similarity and low inter-cluster similarity. The optimal r should result in clusters with higher/lower intra-/inter-cluster similarity than those for both locally smaller or larger r values. Thus, the optimal r values should correspond to local maxima in the normalized eigengap. If several local maxima are present, we suggest proceeding with identifying coherent clusters for all the corresponding r values. This will ensure that in flows with more than one dominant length scale, all coherent clusters of different sizes will be recovered. Since a reliable estimate of a coherent cluster requires at least several trajectories within it, the smallest r considered should be dictated by the resolution, i.e., by the number of trajectories released in the domain and the distance between neighboring trajectories. In all our simulations, the simulated trajectories are released on a regular rectangular grid spanning the domain and the trajectory grid spacing is the initial distance between neighboring simulated trajectories. As a rule of thumb, we propose starting the r search with a value 10 times larger than the trajectory grid spacing. At smaller values of r, the number of nodes within the r radius is very low, which can result in computations on subsets with new nodes only; such small clusters may be erroneous and numerically unstable. Finally, r values approaching the domain size will result in the nearly entire domain being assigned to one cluster, and while this partition has high intraand low inter-cluster similarity, it is trivial and not physically relevant. Choosing r values that correspond to the local maxima in normalized eigengap (and not the largest value of the normalized eigengap) avoids this problem. Although the normalized eigengaps can be also high for both very small and very large values of r, they are not peaks; peaks, rather, indicate that structures at the corresponding r value are more connected in comparison to the incoherent background than the structures at neighboring values of r. Thus, these are the structures that are of most interest to us.
We now describe the strategy for eliminating the need for choosing the user-input parameter w i,i . In practice, the offset value chosen for the diagonal of W impacts the spectral clustering results. Being the artificial parameter introduced for numerical stability considerations, convergence of results is expected for increasing values of w (HA16). However, while w should be substantially higher than the rest of the weights, excessively large ws result in numerical errors due to the limits of double-precision. Insufficiently large values of the offset will also yield erroneous results. Finally, the offset depends on the flow under study, and what is large for one system may be small for the other. For this reason, we propose to seek the optimal flow-dependent offset value of the form: max(w ij ) × 10 n with integer n, and pick n which corresponds to the smallest value at which convergence of the results, specifically, convergence of the normalized spectral eigengap, is achieved.
With w and r determined based on the normalized eigengap arguments as outlined above, and the number of clusters K defined by the number of eigenvalues before the eigengap (K = k gap + 1), the new spectral clustering protocol does not require any other user input to find the optimal partition of the domain. Note, however, that the user still needs to pre-define the range over which the algorithm will look for the optimal parameters, as well as to choose the parameter increment for the parameter sweeps, so our optimized-parameter method cannot be considered entirely parameter-free.
As with all spectral methods, the proposed optimized-parameter spectral clustering requires the existence of a detectable spectral gap. As detailed in [29], the larger the eigengap, the better the spectral clustering partition will be in terms of yielding large/small intra-/inter-cluster similarity. Note, however, that we do not impose an explicit threshold on how large the spectral gap should be. Instead, we simply define the normalized spectral gap as the largest gap between successive eigenvalues of the normalized eigenspectrum. Thus, our optimized-parameter spectral clustering method can be applied to any flow (except the degenerate case when the eigenspectrum is completely flat) and will identify a partition of the domain into coherent clusters based on the eigengap heuristic, but the values of the normalized eigengap should be taken into consideration when interpreting the resulting clusters and the associated partition of the domain. Note also that in addition to the values of the normalized eigengap, the proposed noise-based coherence metric (introduced below in Section 2.4) can also be used to quantify the robustness of the detected features and their sensitivity to numerical noise and observational or model uncertainty.

Noise-Based Cluster Coherence Metric
Once the spectral clustering analysis is conducted, how coherent are the resulting clusters? The eigenvalues of the graph Laplacian L N contain information about coherence of eigenvectors but not of spectral clusters, because there is no one-to-one correspondence between former and latter. The spectral eigengap value, on the other hand, contains information about how optimal the partition is in terms of minimizing/maximizing inter-/intracluster similarity, but it does not provide information about the coherence of individual clusters. In many applications, such as identifying an optimal field experiment design or a strategy for a search-and-rescue operation, it is the coherence of individual clusters that determines the drifter release strategy or the search pattern. To determine how coherent the clusters are, we define below one possible coherence metric that is based on the robustness of the clusters with respect to a small random perturbation. This applied numerical noise can be representative, for example, of uncertainties in ocean circulation models. Our metric is similar, in spirit, to the approach by [34]. The robustness to noise is computed as follows. For trajectories starting at t 0 inside a given cluster, randomly perturb their final positions at time t f , and advect trajectories backward in time to the start time t 0 . The percentage of trajectories that return within the boundaries of their cluster is the proposed coherence metric. In all numerical simulations, we use the noise magnitude equal to 1 100 th of the trajectory grid spacing (here, the grid spacing is the initial distance between neighboring trajectories, as defined in Section 2.3) but the results are similar for slightly larger or smaller perturbations. Essentially, our noise-based coherence metric favors coherent sets with large area-to-perimeter ratio and thus punishes small and filamentated clusters. Please note that because our coherence metric is not used to detect coherent clusters, it is not a parameter of our spectral clustering method. Rather, the coherence metric is computed after the clusters have been identified, and it allows us to interpret and inter-compare the robustness of the detected clusters with respect to noise.

Optimized-Parameter Spectral Clustering Algorithm Summary
The application of the optimized-parameter spectral clustering consists of 3 steps:

1.
r-sweep: compute the normalized eigengap for variable sparsification radii r, from r min = 10 × (trajectory grid spacing) to r max = (the domain size), keeping the fixed offset value (w = max(w ij ) × 10 7 worked well in all examples), and identify all local maxima. This choice of the upper limit, r max , is the most inclusive as it allows detecting clusters that are as large as the entire domain. This choice of the lower limit, r min , ensures that clusters contain at least 10 nodes (to avoid detecting of very small clusters that might be numerically unstable.) 2.
w-sweep: for all local maxima, verify the convergence in the normalized eigengap by varying w = max(w ij ) × 10 n , n = [1, . . . , 10]. In all our examples, convergence was observed at n = 7, but we cannot guarantee that this is true for all flows. If the normalized eigengap still varies between n = 9 and n = 10, increase n past the value of 10 to ensure convergence. 3.
identify coherent sets corresponding to all local eigengap maxima and compute noise-based coherence metrics for the resulting clusters.

Finite-Time Lyapunov Exponents and Poincaré Sections
In the next section, we also use Finite-Time Lyapunov Exponents (FTLE) and Poincaré sections to provide insight about the flow and to compare with the coherent clusters obtained using the optimized-parameter spectral clustering. For completeness, below we briefly describe both methods.
Poincaré section is a technique that allows mapping the chaotic and non-chaotic (regular) regions of the domain in time-periodic flows by stroboscopically sampling trajectories after each period of the perturbation. On the Poincaré section, regular trajectories appear as discretely sampled smooth curves, and chaotic trajectories appear as clouds of dots covering finite areas.
The FTLE approach [8,14,17] is probably the most widely used LCS detection method due to its intuitive nature and numerical robustness. FTLE fields can be computed by the differentiation of the flow map, F t t 0 := x j (t 0 , x 0 ; t), obtained from numerical trajectories x j (x 0 , t 0 ; t) that start at x 0 and advected by the flow over a finite integration time t ∈ [t 0 ; t 1 ]. Specifically, the gradient of the flow map is used to compute the Cauchy-Green strain tensor, C

Results
In this section, the new optimized-parameter spectral clustering method is applied to three flows with increasing complexity: the analytically prescribed Bickley Jet (with coherent sets of the same size), the analytically prescribed asymmetric Duffing oscillator (with coherent sets of different sizes), and a numerically generated realistic coastal flow from the ocean circulation model MSEAS. In the latter case, spectral clusters from a model are compared against trajectories of real drifters.

Bickley Jet
The Bickley Jet flow consists of a zonal jet on which 2 traveling Rossby waves are superimposed. Following [35], we use the streamfunction where U 0 = 62.66 ms −1 is the velocity at the jet core; L = 1770 km is the characteristic jet width scale; k n = 2n r e are wavenumbers with r e the Earth radius, c 1 = 0.205U 0 and c 2 = 0.461U 0 are wave phase speeds; and 1 = 0.15 and 2 = 0.30 are wave amplitudes.
In the reference frame moving with c 2 , the flow consists of a steady background and a time-periodic perturbation: with The Poincaré map (computed over 1000 perturbation periods) and the forward-/backward-FTLEs (computed over 30 perturbation periods) for this flow are shown in Figure 2a. The regular (i.e., non-chaotic) meandering jet separates two chaotic zones to the north and south, with 3 regular vortices embedded into each chaotic zone. Between the vortices there exist 3 hyperbolic trajectories with stable and unstable manifolds that form heteroclinic tangles. HA16 also used the Bickley Jet as a benchmark flow but with an extra (3rd) wave with 3 = 0.0075 and c 3 = c 2 + ( (13). We chose the 2-wave Bickley Jet for the benefit of comparing spectral clusters with the Poincaré section. Following the proposed optimized-parameter spectral clustering approach, we performed sweeps in the sparsification radius r (from 0.5 to 4.25 with an incremental step of 0.25, and with finer steps around the detected peaks) and sweeps in the offset coefficient w (w = max {w ij } × 10 n with integer n from 1 to 10). The resulting normalized eigengap shows 3 local maxima in r (0.90, 1.25 and 2.0), yielding K = 6 clusters in all 3 cases. For all r, the normalized eigengap converges at n = 7. (See Figure A1a,b in Appendix A.1) The coherent clusters, color-coded by their noise-based coherence metric, for all 3 peaks in r are shown in Figure 3. In all cases, the method successfully detected 6 vortices centered at each of the 6 regular islands in the Poincaré map. The sizes of the detected clusters grow with r, which is fully consistent with our interpretation of r as the parameter responsible for the cluster size. For r = 0.9 and 1.25, vortices are smaller and less filamented than for r = 2. In the latter case, each vortex contains one long and narrow filament. Note, however, that when advected to the final time t f , identified coherent vortices do not get filamented any stronger than at t 0 , whereas for an arbitrary subset, the degree of filamentation generally increases with time. The noise-based coherence metric suggests high coherence (>90%) for all the detected coherent clusters, as well as for the incoherent background. The coherence is lower for smallest clusters (corresponding to r = 0.9) due to their smaller area-to-perimeter ratios. For r = 0.9 and 1.25, the coherence metric for the incoherent background exceeds that of the 6 vortices and nearly reaches 100%, but at r = 2.0 it drops to a value that is below that for the vortices. This is likely due to a combination of the filamentation and smaller area-to-perimeter ratio for the incoherent background at r = 2.0. Advecting the clusters beyond the final time t f of the integration window shows that the cluster with the lowest coherence metrics (cluster 3, 97.36%) starts deteriorating before the other clusters, as early as 40.5 days, whereas cluster 5, with the highest coherence metrics of 98.79%, starts deteriorating around 46.5 days.

Asymmetric Duffing Oscillator
To test the method on a system with coherent sets of different sizes, we use an asymmetric Duffing oscillator. The Duffing oscillator is another commonly used benchmark flow that consists of two gyres with the same sign of rotation located on opposite sides of the hyperbolic point at the origin [36]. The classic Duffing oscillator has two identical gyres of same size. Here, we use a longitudinal asymmetry to generate a right gyre that is smaller than the left gyre. When the gyres oscillate around their mean position periodically with time, the stable and unstable manifolds of the hyperbolic trajectory form a homoclinic tangle that induces chaos in a figure-eight-shaped chaotic region around the gyres. The asymmetric Duffing oscillator velocity is two-dimensional, incompressible and is described by a streamfunction Here, = 0.1, ω = 3π 2 , φ = π 4 and a = 0.5. The corresponding Poincaré map and FTLE fields are shown in Figure 4.  To identify coherent sets we again followed the optimized-parameter spectral clustering algorithm described in Section 2. The analysis revealed one peak in the normalized eigengap at r = [1.0] (Figure A2a in Appendix A.2) with k = 2 eigenvalues before the eigengap. Convergence of the normalized eigengap with respect to the increasing values of the offset parameter is achieved at w = max {w ij } × 10 7 (Figure A2b in Appendix A.2), the same value of n − 7 as for the Bickley Jet.
The resulting two clusters plus the incoherent background, color-coded by their noisebased coherence metrics, are shown in Figure 5. As expected, the clusters correspond to the two gyres, the larger gyre on the left and smaller on the right. Both coherent clusters, but especially the right cluster, in this example are more filamented than for the Bickley Jet flow. Nevertheless, the cluster coherence is still high, about 92 and 97%, which is above that of the incoherent background (about 86%). Interestingly, at initial time t 0 ,the filaments are approximately aligned with the stable manifolds (black curve in Figure 5, left), outlining the turnstile lobes in such a manner that when advected forward to t f , filamentation of the clusters does not increase. Overall, in this example the optimized-parameter spectral clustering performed well at identifying coherent clusters of slightly different sizes and partitioning the domain into subsets with high intra-and low inter-cluster similarity. The application of the optimized-parameter spectral clustering to the flow where cluster sizes are very different is shown in Appendix B.

Geophysical Example: Flow Around No Man's Land Island
In 2017 and 2018, two field experiments took place in coastal waters south of Martha's Vineyard, MA as part of the NSF-funded Advanced Lagrangian Predictions for Hazards Assessments (ALPHA) project. The goal of the project was to develop and test Lagrangian methods for the prediction, mitigation and response to environmental hazards (for example, oil spills), as well as to aid in the search-and-rescue operations. The experiments consisted of releasing surface drifters targeting the LCS that were predicted in near-real time based on velocity forecasts from a high-resolution ocean circulation model.
Our specific area of interest was the region around a small uninhabited No Man's Land island located approximately 5 km south of the western end of Martha's Vineyard ( Figure 6). The depth of the channel between Martha's Vineyard and No Man's Land is about 10 m on average, with steep gradient on each side. East and west of No Man's Land, the depths increase to 25 m over a few kilometers. The flow south of Martha's Vineyard is strongly affected by wind and tides [37,38].
The drifters used in our study were the same as in [37], i.e., the Coastal Dynamics Experiment (CODE) type, also called Davis type, developed by [39] and manufactured by metOcean telematics. The model used was the primitive equation Multidisciplinary Simulation, Estimation, and Assimilation Systems (MSEAS) model developed at MIT [40,41]. MSEAS was configured with a two-way nesting: the domain around the continental shelf had a 600-m resolution and the domain around Martha's Vineyard had a 200-m resolution. The model was forced by the atmospheric NCEP flux forecasts and tidal forcing from the Oregon State University barotropic tide model adapted to the bathymetry around Martha's Vineyard. MSEAS was initialized with historical data from the National Marine Fisheries Service (NMFS) for conductivity, temperature and depth (CTD), and the sea surface temperature data provided by the Johns Hopkins University's Applied Physics Lab (JHU APL). About a week prior to each field experiment, hydrographic surveys were conducted in the area of interest south of Martha's Vineyard, and the collected CTD measurements were used to adjust the model boundary conditions with the observed ocean conditions. Details about surface forcing, and initial and boundary conditions for these runs can be found on the MSEAS website http://mseas.mit.edu/Sea_exercises/NSF_ALPHA.
MSEAS was re-run in a hindcast mode using the observed NCEP reanalysis wind forcing (instead of forecasted winds) after the end of the field experiments. The hindcast runs are qualitatively similar to the forecast runs in all respects. Quantitatively, MSEAS hindcasts for both 2017 and 2018 generally agree better with the observed real drifter motion than the forecasts, although the short-term strong-wind events (discussed in more detail below) that occurred during the 2018 field experiment and that were not well-represented by the NCEP reanalysis, led to the poorer agreement between model and drifters in 2018 compared to the 2017 experiment. For brevity, in this paper we only present the analyses of the MSEAS hindcasts, but the forecast-based results obtained in real time during the field experiments are qualitatively similar. For ease and speed of Lagrangian calculations, the TRACE web-based gateway [42] (http://transport.me.berkeley.edu/thredds/catalog/public/MIT-MSEAS/catalog.html) was used to compute trajectories of simulated drifters advected by MSEAS currents, as well as the FTLE fields. The trajectory integration time in this study was 6 h, which is an important timescale for a flow with a strong M2 tidal component, and also represents the upper time limit in the real person-lost-at-sea search-and-rescue operations.

2017 Experiment
On 14 August 2017, we released 14 drifters at 10 locations (i.e., triplets were released at 2 locations-station #4 from the west and the southernmost station in Figure 7a) in a circumvent pattern around No Man's Land. Deployment took 1.5 h, with the last drifter hitting water at 15:51. The deployment strategy targeted different coherent regions delineated by the separating LCSs. The 360-degree circumvent pattern around the island ensured that drifters would be deployed on opposite sides of any LCS that protrudes from any point on the island in any direction, thus mitigating uncertainties in the exact positions of model-based LCSs. Figure 7 shows the drifter deployment locations and 6-h long drifter trajectories, as well as the forward-and backward-FTLE fields and the coherent clusters identified using the optimized-parameter spectral clustering method. For a unidirectional flow that impinges on an island from any direction, a hyperbolic trajectory is expected to form on the side of the island facing the inflow and on the opposite side facing the outflow. From the former hyperbolic trajectory, a stable manifold is expected to protrude into the incoming flow, separating trajectories passing the island on opposite sides. The real oceanic flow is not unidirectional and, with a strong tidal component, changes significantly over the 6-h analysis window. Nevertheless, consistent with the expectations described above, forward FTLEs show a strong repelling ridge extending from No Man's Land across the eastern side of the channel toward Martha's Vineyard (Figure 7a). (Some indication of multiple backward-FTLE ridges is also seen on the western side of the No Man's Land (Figure 7b) but these are not as strong as the above-mentioned forward FTLE ridge.) As we go from west to east across the red forward FTLE ridge in Figure 7a, 4 deployment locations (black dots) lie to the west of it and 3 to the east, with the remaining 3 deployment locations further away from it on the southern side of the island. The drifters on opposite sides of this ridge are expected to have qualitatively different Lagrangian motion, and indeed, trajectories of the three drifters on the eastern side of this repelling ridge are significantly different from the rest in that they barely move over the subsequent 6 h and are by far the shortest. This is also consistent with the smaller FTLE values in this region (fainter red) indicating a more quiescent character of the flow. Multiple additional forward FTLE ridges are seen west and north of the No Man's Land, which lead to the complicated behavior of drifters in this quadrant. Specifically, out of the 6 drifters deployed there, the westernmost drifter did not move much, the 2 northernmost drifters strayed far north, and the easternmost triplet of drifters looped around and came back.
The coherent clusters obtained using the optimized-parameter spectral clustering method are shown in Figure 7c,d. The sweep with increasing r (Figure A3 in Appendix A.3) revealed 2 peaks in the normalized eigengap-at 0.0125 and 0.0425 deg (1.4 and 4.7 km). With respect to the offset, convergence in normalized eigengap was again achieved at w = max {w ij } × 10 7 for all r. The first peak at r = 1.4 km yielded two coherent clusters: a less coherent purple one (78.6% coherence metric) located in the dynamically active region inside the channel, and a more coherent blue one (97.8% coherence metric) in the most quiescent southeastern corner of the domain; the rest of the domain was assigned to the green incoherent background (which was in fact very coherent with >99% coherence metric). The purple cluster is roughly delineated by the 2 forward FTLE ridges-a stronger one on the east and a weaker one on the west. When advected forward in time for 6 h, the purple cluster moves northward and shrinks in area, suggesting that convergence of the surface flow could be an important physical mechanism responsible for the existence of this coherent cluster. The second peak, at r = 4.7 km, yielded one large coherent cluster (97.5% coherence) occupying most of the quiescent eastern part of the domain, with the rest of the domain assigned to the incoherent background (again with >99% coherence).
Comparing the motion of real drifters against the model-based spectral clusters, the purple cluster in the channel did not hold the triplet of drifters that were released there. After initially heading northwest with the shrinking purple cluster, all 3 drifters looped around and headed back southeast, ending up in the green incoherent background not far from their release location after 6 h. This is most likely due to a combination of the uncertainties in the numerically simulated ocean currents (see Figure A10 in Appendix C.1 for the differences between the real and simulated drifters), the most dynamic character of the flow in this region with high sensitivity to the drifters deployment locations, and the fact that the purple cluster was the least coherent out of all clusters (only 78% coherent) and thus is most sensitive to noise. All other drifters were deployed in the green region in Figure 7c, and all stayed there after 6 h. With respect to the clusters in Figure 7d, 4 drifters were released within the blue coherent cluster and all 4 stayed there. The rest of the drifters were deployed in the green background region; 7 stayed there, whereas one triplet switched from green to blue.

2018 Experiment
The 2018 No Man's Land experiment took place on 7 August. A similar circumvent deployment route around No Man's Land was used, but with 18 drifters, instead of 14 as in 2017. The deployment was completed at 16:00 so that over the subsequent 6 h the drifters would sample the same part of the tidal cycle as in 2017.
Intermittent eastward wind gusts of up to 25 m/s started to occur shortly after the drifter deployment, from 16:00 until 20:00. (Measured wind time series for the Martha's Vineyard Airport station for 7/8/2018 can be found on The Weather Underground website). These wind gusts were absent in the NCEP wind forecasts and the corresponding MSEAS model forecasts, and were under-represented in the NCEP hindcast wind reanalysis (due to temporal intermittency and spatial variation of winds around Martha's Vineyard and No Man's Land) and the corresponding MSEAS hindcast currents. The wind gusts strongly affected the drifter trajectories, caused beaching of 2 drifters, and led to the generally poor agreement between the real and simulated drifter trajectories over the first 6 h of the experiment ( Figure A11 in Appendix C.2). Agreement between the real and simulated drifters had improved after the end of the wind gust period ( Figure A12), but the 6-h time interval from 20:00 on August 7 until 02:00 on August 8 corresponds to a different part of the tidal cycle and thus cannot be directly compared to the 2017 results. Analysis of the same tidal phase of the next tidal cycle, i.e., 04:00-10:00 on August 8, is possible but by then all drifters had moved east of No Man's Land ( Figure A13), and the initial elliptical deployment pattern had been distorted into a geometrical configuration that was sub-optimal for testing the agreement with spectral clusters. Thus, the comparison of the 2018 spectral clusters with real drifters and with the results from the 2017 experiment is challenging, no matter what time window is chosen. Here, we present the analysis for 3 different time windows, and we learn something from each. At 16:00, the first peak revealed 3 clusters: a cluster in the channel and two clusters to the south-southwest of No Man's ( Figure 8). The channel cluster is qualitatively similar to the 2017 purple cluster. The rest of the domain is assigned to the green around-cluster background. When advected forward over 6 h, the clusters to the south-southwest of No Man's Land moves anticyclonically around the island and into the channel, whereas the purple channel cluster moves northward, hugging the tip of Martha's Vineyard and shrinking in size, which is again qualitatively similar to 2017. The second peak revealed a large cluster in the center-eastern part of the domain that encompasses the dark green and turquoise coherent clusters from the first peak, No Man's Land and parts of the purple cluster. Comparing the spectral clusters for the 16:00-22:00 time window against real drifters, the agreement is poor as all drifters released within a cluster ended up in the incoherent background, very likely due to the unresolved wind gusts in the model.   The 04:00-10:00 time window (Figure 8) corresponded to a similar phase of the tidal cycle and each peak in r yielded results that had many qualitative similarities to the 16:00-22:00 time window. The first peak again revealed a purple channel cluster and a second, turquoise, cluster south of No Man's Land. After 6 h, the channel cluster moves northward around Martha's Vineyard and the turquoise cluster moves anticyclonically around No Man's Land and into the channel. The second peak revealed a large, highly coherent cluster shown in dark teal that encompasses parts of both clusters from the first peak. In both cases, l7 drifters were initially located in the green inter-cluster background and 1 drifter started in the channel, in the purple cluster in (c) and the dark teal cluster in (d). All drifters stayed within their respective cluster, but since only 1 drifter was initially inside the purple cluster, the comparison might not be statistically robust.
Looking at the result over the different phase of the tide, from 20:00 to 04:00 UTC in Figure 9, we see a qualitatively different geometry both in the FTLE fields and spectral clusters. The optimized-parameter spectral clustering has 2 peaks in the normalized eigengap-at r = [0.0325, 0.0375]. Both have very large number of clusters (17 and 18, respectively), and both do not identify the inter-cluster background. This large number of clusters suggests that over this tidal period, no part of the domain is significantly different from the rest of the domain in terms of Lagrangian connectivity, unlike in all 3 other Martha's Vineyard cases that consistently show 1 to 3 coherent clusters embedded into the inter-cluster background. For the 20:00-04:00 time window, the comparison with real drifters showed mixed results. The four drifters released within the channel clusters in Figure 9a-d stayed within their assigned channel clusters, but otherwise, about half of the drifters left their assigned clusters by 04:00.

Discussion
A reliable method for identifying coherent clusters in oceanic flows is important for a variety of applications -from understanding of mixing and exchanges of biogeophysical tracers, to hazard mitigation and search-and-rescue operations. In studies of exchange processes, coherent clusters can help visualizing tracers that stay coherent over time. In hazard mitigation applications, clusters are useful for identifying areas where mitigation assets should be focused, and in the search-and-rescue operations, cluster geometries help narrow down on the optimal search-and-rescue pattern. The conventional spectral clustering algorithm has many of the desired qualities, yielding clusters that maximize inter-cluster similarity and intra-cluster differences of the underlying Lagrangian motion. Spectral clustering also has advantages over other techniques for LCS identification, such as FTLEs, in applications with sparse Lagrangian data. The major drawback of the conventional spectral clustering is the need for subjective user-defined choices for several method parameters, which requires a priori knowledge about the flow. The parameter choices vary between different flows and depend on flow characteristics such as intensity, spatial variability, and dominant spatio-temporal scales of the currents, which might not be known in real applications. Different parameter choices generally yield substantially different coherent clusters, which drastically complicates the applicability of this method.
In this paper, we describe a new optimized-parameter spectral clustering algorithm, which automatically identifies optimal parameters within pre-defined parameter ranges for a given flow(the pre-defined parameter range and the parameter increment for the sweeps still need to be specified by the user). The specific parameters of interest are the sparsification radius r, which is responsible for the sizes of the identified clusters, and the offset parameter w i,i , which is responsible for the numerical stability of the algorithm. Automated selection of optimal r and w i,i requires modifications of the conventional spectral clustering. Specifically, we introduce a normalized spectral eigengap, which allows inter-comparison between different r-choices, instead of the absolute eigengap used in the conventional method, which typically decreases with r. (As in the conventional spectral clustering, the optimal number of clusters is defined as the number of eigenvalues before the eigengap.) The optimal r value(s) is then identified as the value(s) that corresponds to the local maximum (or maxima) of the normalized eigengap. Specifying r in this manner ensures that for clusters of that particular size the inter-/intra-cluster similarity is largest/smallest compared to clusters that are either smaller and larger. Since many geophysical flows operate on multiple spatio-temporal scales, clusters of very different sizes could simultaneously exist within the same flow, and looking at all peaks in r ensures that we identify all of them. The second method parameter, w i,i , is chosen as the smallest w i,i at which convergence of the normalized eigengap is achieved. Finally, to evaluate the coherence of the clusters, we introduce a noise-based coherence metric that allows comparing clusters within the same flow or between different flows. This metric quantifies the relative robustness of the resulting clusters with respect to applied numerical noise.
The automatic detection of multi-scale features is a well-known problem for conventional graph cut methods, including the normalized cut (which lies at the core of the spectral clustering) that favors clusters of equal size [30]. When clusters are well separated from each other and their sizes are only slightly different, such as in the periodically forced pendulum example in HA16 or in our slightly asymmetric oscillator example, all clusters can be identified using a single r value. However, when the clusters are sufficiently different in size, such as in our strongly asymmetric Duffing oscillator example, different r values are needed to correctly identify all the coherent sets. Thus, in general, no single r can be guaranteed to identify all the clusters. By doing sweeps in r and identifying all the r values that correspond to the local maxima in the normalized eigengaps, we can overcome this challenge and identify all the underlying coherent sets across a wide range of spatial scales. This is particularly important in oceanic applications, in which very little is known a priori about the flow and the scales of the coherent features. Oceanic flows often have more than one dominant length scale, and thus, multiple coherent clusters with different sizes may coexist within the same flow. This was indeed the case for the coastal flow south of Martha's Vineyard. In all our oceanic examples, the optimized-parameter spectral clustering algorithm identified 2 optimal values of the sparsification radius, r, yielding 2 different partitions of the domain into coherent clusters with different sizes.
The optimized set of parameters, as well as the specific noise-based metric of coherence that we have proposed here, present one possible way to reduce the user-input parameter choices and quantify the coherence of the resulting clusters. We explained the logical arguments underlying our proposed optimized-parameter algorithm, and we illustrated that our method reliably yields parameters and clusters that are optimal according to our specific definition. The strength of our optimized-parameter spectral clustering method is that it allows applying the method to any flow with minimal prior knowledge about it, and that the method automatically identifies clusters with minimal user input, along with a coherence value for each cluster.
Although optimized-parameter spectral clustering requires less user input than the conventional spectral clustering of HA16, we cannot claim that our method is completely parameter-free. Being of spectral nature, our optimized-parameter spectral clustering is also subject to all the limitations typical of other spectral methods. In our approach, the user still needs to specify the upper and lower limits for the parameter range, as well as the increment in the parameter values for the sweep. The method will then automatically identify the optimal parameters, detect the clusters and quantify their coherence. Since less information is needed for specifying the parameter range than specifying the exact parameter value, our method is easier to use in applications to the largely unknown oceanic flows, and it comes one step closer towards a parameter-free coherent structure identification.
We have tested the optimized-parameter spectral clustering in 2 commonly used benchmark analytic flows-Bickley Jet and Duffing oscillator-and in a real-life oceanic application to a high-resolution model-based surface flow in the coastal region south of Martha's Vineyard, MA. The results are encouraging in that our method identified all known coherent clusters in both benchmark flows, and it identified reasonably looking clusters in the realistic oceanic example. Looking at the coherence of the clusters and the optimality of the resulting partition of the domain, both analytic flows had significantly larger values of normalized eigengaps (0.4-0.5) than the real oceanic flows (about 0.1), pointing to the poorer performance of the method in the latter examples. This is not surprising as both analytic flows have a stronger contrast in Lagrangian behavior between the coherent regions associated with the (more regular) gyre cores and the incoherent background that encompasses the well-mixed (chaotic) regions around the gyres, whereas in the realistic oceanic flows, this contrast between regular and chaotic motion is smeared.
In all examples, however, our method still yielded clusters with relatively high values of noise-based coherence (above 85%), suggesting that even for the flow south of Martha's Vineyard, the detected clusters were robust features of the flow that are not excessively sensitive to numerical noise and observational and model uncertainty. For the 2018 campaign, the 20:00-02:00 experiment exhibited particularly low gap ratios and the domain partitions were far from optimal, with numerous (≥17), small clusters adjacent to each other and an incoherent cluster that was indistinguishable from the coherent ones. This illustrates the limits of the method's applicability: although the method did produce a partition into clusters, the flows need a certain amount of dissimilarity for the resulting clusters to be meaningful. This is discussed in HA16 and their notion of coherence, which exists in the presence of an incoherent background.
In the Martha's Vineyard case study, since the flow south of the Vineyard has a strong tidal component, the resulting coherent clusters are qualitatively similar over the same phase of the tide on different days and even different years, whereas coherent clusters for the opposite tidal phase are qualitatively different. The strong dependence of the cluster geometry on the tidal phase has been also observed in tidally driven flow over an experiment in Scott Reef, Australia [43]. Comparing the model-based clusters south of Martha's Vineyard to the motion of real drifters, for most of the identified clusters, drifters deployed within a cluster stayed within the same cluster for 6 h, which was the time interval chosen for our analysis. Exceptions, i.e., clusters from which real drifters escaped in less than 6 h, include small clusters with lower coherence metric or clusters identified over the time interval with strong wind gusts that were not realistically represented in the numerical model of oceanic currents.
Comparing coherent clusters with FTLE fields reveals that cluster boundaries often have similarities with the forward-time FTLE ridges. This qualitative similarity is due to the repelling character of the forward FTLE ridges, which maximize separations between neighboring drifters and delineate regions of qualitatively different Lagrangian motion. In other cases, coherent clusters were identified in areas that shrink significantly over the subsequent 6 h, suggesting that these clusters might be dominated by the surface flow convergence. Finally, some of the coherent clusters were identified over the most quiescent regions with lowest FTLEs, which encompass groups of drifters that moved (and separated) less than others. Note, however, that because the spectral clustering method identifies clusters based on a specific criterion, i.e., by maximizing intra-/inter-cluster similarities/differences, one should not expect a one-to-one agreement between spectral clusters boundaries and FTLEs or other LCSs that use another underlying flow property. Thus, various complementary LCS techniques could be applied in concert to provide the most comprehensive view of the underlying Lagrangian transport.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Acknowledgments:
The authors acknowledge the contributions from the MSEAS team, who provided the ocean circulation models and the output numerical velocity fields, and Siavash Ameli for providing the TRACE web-based gateway. The authors thank the ALPHA project team for their help with field work.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:  In this case, the r-sweep, shown in Figure A8, shows three local maxima in the normalized eigengap. The resulting coherent clusters, color-coded by their noise-based coherence metric are shown in Figure A9. Each peak in r shows 2 coherent clusters centered at the left and right gyres, and an incoherent background filling the space between and around them. The sizes of the clusters are different for different r (consistent with our interpretation of r as the parameter defining the size of the resulting clusters), as is the amount of cluster filamentation. The peak at the smallest r = 1.125 correctly identifies the smaller right gyre, but pairs it with a left cluster that is smaller than the full extent of the left gyre. The peaks at r = 2.0 correctly identifies the full extent of the larger left gyre, but pair it with a right cluster that is very filamented. The peak at the largest r = 3.75 shows two clusters which are both too large and strongly filamented. The noise-based coherence metric can then be used to choose between the co-located clusters to pick one (out of three in this case) with the higher coherence, correctly yielding two clusters (the grey right cluster at r = 1.125 and the green left cluster at r = 2) corresponding to the full extent of the left and right gyres. Figure A8. Step 1 of the spectral clustering protocol for the asymmetric Duffing oscillator shown in Figure A7: sweep of r parameters with offset coefficient 10 7 for the average distance function.