Separating Mesoscale and Submesoscale Flows from Clustered Drifter Trajectories

Drifters deployed in close proximity collectively provide a unique observational data set with which to separate mesoscale and submesoscale flows. In this paper we provide a principled approach for doing so by fitting observed velocities to a local Taylor expansion of the velocity flow field. We demonstrate how to estimate mesoscale and submesoscale quantities that evolve slowly over time, as well as their associated statistical uncertainty. We show that in practice the mesoscale component of our model can explain much first and second-moment variability in drifter velocities, especially at low frequencies. This results in much lower and more meaningful measures of submesoscale diffusivity, which would otherwise be contaminated by unresolved mesoscale flow. We quantify these effects theoretically via computing Lagrangian frequency spectra, and demonstrate the usefulness of our methodology through simulations as well as with real observations from the LatMix deployment of drifters. The outcome of this method is a full Lagrangian decomposition of each drifter trajectory into three components that represent the background, mesoscale, and submesoscale flow.


Introduction
that, even with some filtering, these Lagrangian statistics are far more sensitive than similar Eulerian measures, and called into question the interpretation of previous studies that use variations of two-particle statistics.
An alternative approach is to parameterise the energetic mesoscale flow features from the Lagrangian trajectories, in order to disentangle them from the unparameterised, possibly submesoscale, flows. The notion of accounting for, or parameterising, the mesoscale strain in order to measure the submesoscale diffusivity, appears to originate with tracer release experiments [7,8], and is based on ideas introduced in Garrett [9]. The basic idea is that one axis of the tracer grows exponentially with a rate proportional to the strain rate, σ, while the other axis reaches a steady state balanced by the compressing effect of σ and the elongating effect of diffusivity, κ. In the dye experiments, the mesoscale strain rate is determined by measuring the rate of elongation of the patch, which is then used to deduce the diffusivity. The key idea to this approach is that the mesoscale strain rate is parameterised, in order to separate its effect from the submesoscale motions.
This manuscript extends the idea of parameterising mesoscale features, in order to disentangle submesoscale flow, to a more principled and robust framework appropriate for Lagrangian particles. Our work is complementary to, but distinct from, the recent works of [3,10] who developed a method for projecting clustered drifter trajectories to reconstruct local Eulerian velocity fields using Gaussian Process regression. The goal of our work is to disentangle the trajectory in a Lagrangian sense, and explicitly separate each drifter trajectory into background, mesoscale and sub mesoscale components-where each decomposed drifter can then be analysed further within the Lagrangian framework. For example, our Lagrangian separation allows for the explicit estimation of submesoscale diffusivity, which is not a topic directly covered in [3,10]. On presentation of our methods, we shall make some more remarks on how these works stand side-by-side in the discussion section.
The structure of this paper is as follows. In Section 2 we first introduce a conceptual Lagrangian flow model, and then show how this can be parameterised using a local Taylor expansion. Then in Section 3 we show how these parameters can be estimated from clustered drifter deployments. We pay particular focus to building a hierarchy of models, where each layer in the hierarchy adds extra parameters (e.g. strain/vorticity/divergence) that represent additional flow features. We provide novel methodology for selecting between hierarchies based on the evidence from the data. In Section 4 we go further and incorporate nonstationary flow features, by allowing mesoscale parameters to slowly evolve over time and space. We provide methodology for estimating this evolution using splines, and then we provide techniques for quantifying the uncertainty of all parameter estimates using the bootstrap. We detail how this quantification of uncertainty provides the ideal mechanism from which to select the key parameter of the smoothing window length. Throughout Sections 3 and 4 we perform detailed simulation analyses to provide further insight and motivation. Then in Section 5 we test and perform our novel methodologies on data collected from drifters in the LatMix deployment, which reveals new insights and discovers previously hidden mesoscale and submesoscale structures. Discussion and conclusions can be found in Section 6.
Overall, the principle contribution of this paper is a general framework for analysing Lagrangian data from clustered drifter deployments. Specifically, this methodology provides a tool to detect for the presence of various mesoscale flow features and separate those features from the submesoscale flow-while allowing such features to evolve over space and time-together with providing quantified statistical uncertainty of output.

Modelling Framework
The primary conceptual model used throughout this manuscript is that the total velocity of a Lagrangian particle u total can be decomposed into three components, where u bg is a large scale background flow, u meso is the mesoscale flow (> 10 km, > 10 days) and u sm is the submesoscale flow (100m-10km, 1 hr-10 days). The background flow is assumed to be spatially homogeneous in some local region around the drifters, and thus includes motions such as inertial oscillations and large scale currents. Mesoscale features are structures that behave non-locally across the drifters, and are thus the smoothly varying fluid structures that will be parameterised, such as the constant strain rate used in the tracer release experiments [8]. The submesoscale currents are simply the residual motion, not captured by the background or mesoscale flow. In practice, the scales captured by these three types of motion will vary depending on the deployment details and the limitations of the data, as much as the actual physical processes themselves, as we shall show. Surface drifter motion is constrained to a fixed depth near the ocean surface, where the twodimensional positions are measured in geographic coordinates longitude and latitude. For the work here it is necessary to use map coordinates {x(t), y(t)} with a projection that locally preserves area and shape. Following [11] we use the transverse Mercator projection with central meridian placed between the minimum and maximum longitude of the drifter experiment and add a false northing and easting to shift the origin to the southwest corner. The total velocity u total of a drifter is then two-dimensional and assumed to represent the velocity at the depth of the drifter drogue. The work here will also be generally applicable to clustered deployments of RAFOS floats with minor modification, but we will use the terminology of drifters throughout the manuscript.

A local Taylor expansion
One of the simplest models for separating flow components is to perform a local Taylor expansion of the velocity field. Suppose we have observations from K clustered drifters at time t, where the position of drifter k (1 ≤ k ≤ K) in x and y orthogonal directions is given by {x k (t), y k (t)}, measured in metres, and the corresponding velocity is given by d dt {x k (t), y k (t)}, measured in metres per second. We then take a Taylor series expansion of the velocity field evaluated at the position of drifter k, such that we model its velocity as d dt where • {x k (t), y k (t)} are observations from drifter k at time t, • {u bg (t), v bg (t)} is the spatially homogeneous time-varying background flow, are the model parameters for the mesoscale flow, • {x 0 , y 0 } is the expansion location and has no consequence to the model, other than redefining {u 0 , v 0 }, , v sm k (t)} are the residual 'submesoscale' velocities for each drifter, assumed to be zero-mean in time, but also zero mean in space across drifters.
The mesoscale parameters are simply re-definitions of the standard spatial gradients: the divergence is δ = u x + v y , the vorticity is ζ = v x − u y , the normal strain rate is σ n = u x − v y , and the shear strain rate is σ s = v x + u y . The normal and shear strain rates can be combined to a scalar value for the strain rate σ = σ 2 n + σ 2 s and rotation angle θ = arctan [σ s /σ n ]/2, where σ n = σ cos(2θ), σ s = σ sin(2θ).
Equation (2) therefore separates background, mesoscale, and submesoscale features in the data, following the conceptual model of equation (1). For the moment, the eight mesoscale parameters are assumed to be sufficiently slowly varying that we can treat them as constant over some time window, although we will relax this restriction later. In practice, the mesoscale component of the model will capture any coherent feature that has constant spatial gradient across the cluster of drifters, whether that is a large scale more permanent feature like a Western boundary current or a transient mesoscale eddy-or nothing at all. The spatially homogeneous time-varying 'background' flow will capture inertial and tidal oscillations, but may also erroneously include parts of a time or spatially varying mesoscale flow. Finally, the residual 'submesoscale' velocity will include any velocity contributions not captured by the other components.
The model of equation (2) was applied to drifter observations by [12] to obtain estimates of the spatial gradient parameters, but with two key differences from the approach taken here. First, the spatial gradients were allowed to vary at each observational time point, without any constraints on the rate of fluctuation. Second, the expansion point {x 0 , y 0 } was chosen to be the time-varying centre-of-mass of the cluster of drifters. The consequence of this choice is quite significant and is worth considering in more detail. Defining the centre-of-mass (or first moment) as m x (t) ≡ 1 K ∑ K k=1 x k (t) and m y (t) ≡ 1 K ∑ K k=1 y k (t), it follows from equation (2) that the centre-of-mass velocity includes contributions from both the homogeneous background as well as the spatial gradients, where no submesoscale is assumed to be present as we have defined 1 K ∑ K k=1 u sm k (t) = 0. That the mesoscale spatial gradients have a (potentially) significant impact on the velocity of the centreof-mass is evident in the top row of Figure 1, where the entire cluster of drifters is advected by the linear flow. Now if the expansion point is taken to be the centre-of-mass, {x 0 (t), y 0 (t)} = {m x (t), m y (t)}, then equation (3) reduces the background velocity to the sample mean velocity, such that u bg (t) ≈ d dt m x (t). As a result, after subtracting equation (3) from (2), the velocities of the individual particles in the centre-of-mass frame, only depend on the spatial gradients and submesoscale flow. In some sense, the difference between equation (2) and equation (4) is quite remarkable: simply by changing to centre-of-mass , v sm k (t)} follow a Wiener increment process with diffusivity equal to 0.1 m 2 /s. The top row shows drifter positions, and the bottom row shows positions with respect to centreof-mass at each time step. From left to right we include the following model components. Left: diffusivity only. Centre left: strain and diffusivity. Centre right: strain, vorticity, and diffusivity (strain dominated). Right: strain, vorticity, and diffusivity (vorticity dominated). In each plot where a parameter is present, it has been set as κ = 0.1m 2 /s, σ = 7 × 10 −6 /s, θ = 30 • , ζ = 6 × 10 −6 /s (centre right), and ζ = 8 × 10 −6 /s (right). We have set The trajectories are simulated using the Euler-Maruyama scheme.
coordinates, the potentially complicated form of the background flow, {u bg , v bg }, is eliminated, along with all the velocity variance associated with mesoscale advection of the centre-of-mass from equation (3). With this choice of reference frame, the spatial gradients in the model now only characterise the spreading of particles, i.e., the second moment, as shown in the second row of Figure 1, along with any spreading caused by the submesoscale process.

Diffusivity
A key measure with which we evaluate our techniques is to measure the diffusivity of observed and modelled velocities. We define the submesoscale diffusivity for each drifter k as in equation (21) of [13], such that where x sm k (t) is calculated from the residual velocities, u sm k (t), such that x sm k (t) = t 0 u sm k (t)dt, and similarly for y sm k (t). As in equation (10) of [13], a joint diffusivity measure across all drifters could be defined by averaging the positions/velocities before applying the derivatives/integrals in equations (5a) and (5b); however, we choose to calculate diffusivity separately for each drifter k to reflect the fact that drifters are spatially spread in a clustered deployment, and hence their diffusivity values may depend on spatial scale within a spatially inhomogeneous flow field. In general, it is also useful to consider the isotropic diffusivity as this is rotationally invariant, and as such, does not depend on our our choice of coordinate system. The isotropic submesoscale diffusivity for drifter k is defined as where z sm The diffusivity is also related to the power spectral density of complex velocity w k (t) where S(ω) is known as the Lagrangian frequency spectrum and is related the isotropic diffusivity in equation (6) with as shown in [14]. Formally, diffusivity requires the process to be stationary and is defined in the limit as T → ∞, but in practice we are always limited to finite observation times. The total variance of a complex particle velocity is conserved with the Lagrangian frequency spectrum, 1 T w k (t) 2 dt = S(ω)dω, and in this sense it will be useful to think of how the model in equation (2) describes the distribution of variance in the frequency spectrum.
Equations (5), (6) and (7) are theoretical constructs as they require submesoscale velocities to be observed continuously in time. In practice, drifter observations are only observed at discrete time points. In Section 3 we will discuss how to estimate submesoscale diffusivity from clustered drifter data using our modelling and estimation approach.
We note that diffusivities could also be calculated directly from raw velocities { d dt x k (t), d dt y k (t)}, or from centre-of-mass velocities that have only had the background removed and still contain mesoscale flow contribution (equation (4)), and such values of diffusivity will in general be much larger than the submesoscale diffusivities. This highlights the scale-dependent nature of diffusivity, as well as the challenges in comparing different measurements of diffusivity.

Model solutions
The mesoscale component of equation (2) is a linear ordinary differential equation with tractable analytical solutions [e.g. 15,16]. However, the submesoscale component of equation (2) is assumed unknown, and may represent a range of different phenomena. Thus, for our simulation analyses that follow in this paper we generate the submesoscale process stochastically using trajectory paths defined by d dt where the function dW represents an increment of a two-dimensional Wiener process (a random walk in the discrete-time limit) and we have not included any time-dependence that affects the centre-of-mass. The Lagrangian frequency spectrum of such a process is simply The frequency spectrum of internal waves (perhaps the best known submesoscale process) will have either more or less variance, depending on the frequency. We thus consider the Wiener increment process to be a reasonably agnostic choice. Notably absent from equation (9) is the spatially homogeneous background flow. In practice this contains a significant amount of power from inertial and tidal oscillations, but it does not significantly impact the estimation of the mesoscale quantities as we shall show. The particle trajectories shown in Figure 1 are sampled from equation (9), where each column contains different choices for the mesoscale parameters, but the submesoscale diffusivity κ is held constant (the first column has no mesoscale and hence the particles follow a random walk).
In the absence of the stochastic submesoscale white noise process, the Lagrangian trajectories from equation (9) are purely deterministic and thus their Lagrangian frequency spectra can be computed exactly, as we shall now show. For the following analytical solutions we set δ = 0, but make no such assumption in the estimation procedure that follows. To integrate equation (9) with κ = 0, note that simply re-positioning a particle's initial location can be used to redefine {u 0 , v 0 }. Specifically, if the initial position of the particle is given by {x(0), y(0)} with nonzero {u 0 , v 0 }, the {u 0 , v 0 } can be set to zero, so long as the initial position is set to and the Okubo-Weiss parameter is defined by s 2 ≡ σ 2 − ζ 2 . Thus, without loss of generality, we can simply take {u 0 , v 0 } and the expansion point to be zero. The complex path z(t) = x(t) + iy(t) with initial position given by {x(0), y(0)} = {r cos α, r sin α} is rs e iα s cos st 2 + σe i2(θ−α) + iζ sin st 2 if σ 2 < ζ 2 (12) and the associated velocity w(t) = u(t) + iv(t) is given by where we have defined the complementary Okubo-Weiss parameter bys 2 ≡ ζ 2 − σ 2 . The meansquare distance of a particle from the origin is given by and total velocity variance, where and T is the length of time that has passed since the particle has moved from its initial position. The Lagrangian frequency spectrum of a particle in a linear velocity field can now be computed using equations (13) and (7) which yields where the Lagrangian frequency spectra of complex-valued velocities are permitted to be asymmetric in ω (see [17]), which will occur in equation (17) when ζ = 0. Asymmetric spectra arise when the rotary spectra are unequal and there is a preferred direction of spin [18]. With no strain and after sufficiently long observation time (T >> 1/ζ), equation (17) becomes a single frequency delta function, reflecting the rotation of a particle from the vorticity. However, for the cases considered here, observation times are at most O(1/s, 1/s), and often much less. The result is a spectrum that is generally very red (S(ω) ∼ ω −2 ), with total power increasing in observation time T.
The Lagrangian frequency spectrum in equation (17) would appear to indicate that particles advected by a linear velocity field have a diffusivity, following equation (8). However, while it is true that the linear velocity field causes particles to disperse, increasing their second moment with T, this spreading is entirely deterministic with correlations between particles spatially and across time, and thus does not formally meet the requirement that diffusivity results from a stationary random velocity process. From the perspective of trying to isolate and estimate the diffusivity of submesoscale processes (which may be stationary at these scales), the linear velocity field may be viewed as contaminating the lowest frequencies in the spectrum, providing erroneously high values of diffusivity if not removed correctly. Figure 2 shows the one-sided Lagrangian frequency spectrum of a single particle simulated using equation (9). The Lagrangian frequency spectrum thus has two distinguishing parts: the white noise submesoscale process given by equation (10) and the deterministic red process given by equation (17). In Figure 2 the observed particle spectrum is very nearly the linear addition of the theoretical Lagrangian frequency spectra of the mesoscale and submesoscale models of equations (10) and (17) respectively. In terms of Figure 2, the objective of the methodology is to remove the deterministic contribution of the mesoscale flow (in blue), in order to study the submesoscale process that remains.  Figure 2: The one-sided frequency spectrum for a particle integrated with equation (9) is shown in black. The particle is initially placed at {x(0), y(0)} = {1 km, 1 km} and integrated for 5 days with simulation parameters set to κ = 0.1 m 2 /s and σ = 1 × 10 −5 /s. The theoretical spectrum of the strain-only model, equation (17), is shown in blue, and the theoretical spectrum of the white noise process, equation (10), is shown in red.
a centre-of-mass velocity, as represented in equations (3) and (4) respectively. In other words the summation of equations (3) and (4) recovers equation (2). When put into matrix-vector notation for observations these models can be jointly written as where we have defined and In this notation, The particular ordering of the observations within each vector in equations (19) and (20) does not matter, so long as it is consistent, and in fact, there is no restriction that the drifter observations occur at the same time, despite our choice of notation. We have defined 0 KN and 1 KN to be KN × 1 column vectors of zeros and ones, respectively. Under each matrix we have given its size, where p is the number of parameters, and in this case p = 8. The vector A contains model parameters which are estimated using the least squares solution By combining equations (18) and (21) the residual submesoscale and background velocities can be estimated by taking The least-squares solution is equivalent to the optimal maximum likelihood solution when the residuals are Gaussian and independent and identically distributed. In general, weighted least squares solutions should be used if residuals are correlated or have unequal variance, and although this will likely be the case here, weighted-least squares requires prior knowledge of the distributional structure of the residuals which we do not wish to assume is known. Overall, we found the (non-weighted) least squares solution of equations (21)- (22) to be robust in simulation experiments and real data analysis, and to perform better than performing least squares directly on the representation of equation (2) on raw velocities for each drifter without removing centreof-mass. This is due to the fact that the K drifter velocities in centre-of-mass coordinates, with the addition of the centre-of-mass velocity, can be thought of as a collection of K + 1 drifters that are more independent of each other than the K drifters in fixed-reference frame coordinates. This leads to errors that are more uncorrelated over drifters yielding better least squares parameter fits.

Flow decomposition
Once the parameters have been estimated using equation (21), the constituent parts of the conceptual model of equation (1) can be computed. The mesoscale contribution to each drifter is computed using The background is assumed to be spatially homogeneous, and thus can be recovered from the residuals by averaging across drifters at each time, Finally, the submesoscale contribution to each drifter is all that remains, This accomplishes the conceptual decomposition of velocities proposed in equation (1). We emphasise that the fits of equations (18)- (22) could be performed without the centre-of-mass velocity by removing the bottom 2 rows of U, ǫ and X in equations (19) and (20). This is in effect only fitting observations to the second-moment model of equation (4), as also proposed in [12]. While this fit still obtains estimates of mesoscale quantities {σ, θ, ζ, δ}, and disentangles the submesoscale {u sm (t), v sm (t)}, the first-moment mesoscale parameters {u 0 , u 1 , v 0 , v 1 } and the background {u bg , v bg } can no longer be estimated directly (unless fitted a posteriori). This means a full decomposition of the flow as performed in equations (23)-(25) is not directly accomplished using the K drifters in centre-of-mass frame only. We shall refer to this reduced technique as the second-moment fitting method. In contrast, we refer to the full estimation technique from equations (18)-(22) as the first and second-moment fitting method.
Regardless of fitting method, we estimate the isotropic submesoscale diffusivity κ sm k,z (t), defined in equation (6), by measuring the implied square displacement of the submesoscale velocities within the window. This yields where ∆ is the sampling interval of drifter observations measured in seconds. Equation (26) is equivalent to taking 1/4 of the periodogram of the velocities-or the absolute square of the Fourier Transform-at frequency zero. This is consistent with the fact that the theoretical diffusivity of a stationary complex-valued process is determined by 1/4 of the zero-frequency of the Lagrangian frequency spectrum as per equation (8).
The above equations produce estimates of the background, mesoscale and submesoscale parts of the flow over some choice of temporal window length W = t N − t 1 . A small value of W results in a reduced number of data points in the regression causing potentially noisy estimates of the parameters. Conversely, a large value of W incorporates more distant observations in time and will smooth over this noise, but may also lead to poor estimates if the underlying mesoscale parameters are evolving over time. This is the classic bias-variance trade-off in statistical estimation. In Section 4 we will address the issue of choosing an appropriate window length, and we will use this to introduce a principled estimation method using splines that allow parameters to evolve slowly over time, resulting in smoother and less variable estimates.

Hierarchical modelling
The Taylor series model of equation (2) specifies 8 mesoscale parameters {u 0 , v 0 , u 1 , v 1 , σ, θ, ζ, δ}, and these can be estimated from clustered drifter data using the machinery of Section 3.1. However, not every clustered set of drifters will necessarily experience all of these effects (as we illustrated in Figure 1), or the data might not give statistically significant estimates of some of the parameters even if they are truly present. Alternatively, we might already know the true values of some of the parameters and so we do not wish to estimate these. Motivated by this, we now introduce a simple method of removing certain parameters from the model, by either setting them to be zero or a pre-specified fixed value, and then estimating only the remaining unspecified parameters. If we were to instead set parameters to zero (or fixed values) after estimation, we would sub-optimally lose part of the data contained in the removed estimate.  To remove a parameter from the model, one simply removes the parameter from the vector A in equation (20) and the corresponding column from the matrix X. In a similar vein, multiple parameters can be removed by repeating this procedure. Ultimately, depending on the number of parameters removed, the matrix X will be sized 2(K + 1)N × p, and the column vector A, will be sized p × 1, where p is the number of free parameters that remain in the model. If p = 8, as presented in equation (19), then this represents the full mesoscale solution. If any parameter values are known a priori then they should be inserted as fixed values into A and then multiplied by the corresponding respective columns from X and then subtracted from the vector U, before proceeding with the least squares minimisation of equation (21) to estimate remaining parameters.
We now consider the special case of only estimating the mesoscale quantities {σ, θ, ζ, δ} using the second-moment fitting method discussed in Section 3.2. If we estimate all quantities in {σ, θ, ζ, δ} then p = 4. In contrast, if we remove all mesoscale parameters such that {σ, θ, ζ, δ} = {0, 0, 0, 0}, then p = 0, and only submesoscale velocities remain in the centre-of-mass frame of equation (4). If 0 < p < 4, this represents scenarios where some mesoscale components from {σ, θ, ζ, δ} are present, and some are not, and we display this schematically in Figure 3. We consider strain rate and strain angle (or equivalently shear and normal strain rates) to be either jointly present or both missing. Overall, there are therefore eight possible models we might consider, shown explicitly in Figure 3. Regardless of the choice of model choice, the remaining non-zero parameters are estimated using equation (21) as before. Figure 3 also shows that the eight models exist in a hierarchy. The simplest model, the null hypothesis shown at the top of Figure 3, corresponds to velocities in a centre-of-mass frame that are submesoscale only. There are three direct descendants of this model in the hierarchy, the addition of vorticity or divergence, each of which requires one more parameter, or strain, which requires two additional parameters. The central philosophy is that a descendent in the hierarchy should only be used if it shows meaningful improvement in some relevant error metric, essentially disproving the null hypothesis. Because adding parameters will always produce at most the same residual (which may itself be the error metric), this approach avoids using too many degrees-offreedom and producing meaningless or noisy parameter estimates.
It is worth noting that estimating all four mesoscale parameters {σ, θ, ζ, δ} at each time point (as is often done in the literature) would benefit from this conceptual approach. With K drifters there are 2K position observations at a given time point, from which 4 parameters must be estimated. For modestly sized drifter deployments, this computation runs the risk of producing estimates with no statistical significance.
In general, when picking between model hierarchies for all 8 mesoscale parameters, {σ, θ, ζ, δ} and {u 0 , v 0 , u 1 , v 1 }, then we are faced with an increased complexity of selecting between reduced permutations of the full specification. Motivated by this, in Section 4.3 we will introduce methodology for estimating time-varying parameters using splines, which allows for a natural mechanism from which to build a full hierarchy of first and second-moment candidate models, as we shall show.

Selecting between hierarchies
We have provided a mixed background-mesoscale-submesoscale modelling framework in equation (2) and an estimation framework in Section 3.1. Then in Section 3.3 we discussed how to estimate parameters using different hierarchies of mesoscale components in the overall model. The appropriateness of a chosen model in the hierarchy, for a given set of observational drifter data, can be evaluated by estimating the error resulting from the fitted model at a given point in time. We argue there is more than one meaningful way in which error can be computed-and in this section we shall define two such ways that prove to be very useful in terms of model evaluation.

Fraction of Variance Unexplained (FVU)
The first method is perhaps the most intuitive. Here we calculate how much variance remains in the 'unexplained' residual submesoscale velocities found in equation (25). This value in itself, however, is not a meaningful quantity unless it is presented in reference to some other quantity. Therefore, to provide a normalised and meaningful metric we introduce the notion of the Fraction of Variance Unexplained (FVU), which is defined as and hence quantifies the proportion of the variability remaining in the submesoscale model, as compared to velocities that have only had the centre-of-mass removed (and will hence still contain second-moment mesoscale effects present in equation (4)). The FVU will therefore in general be some value between zero and one. An FVU value close to one occurs when there is little to no mesoscale component estimated from the data. In contrast, an FVU value equal to zero means the mesoscale model successfully explains all variability in the data after the background is removed, and there is no residual submesoscale process left behind. For mixed mesoscale and submesoscale flow the FVU will be somewhere between zero and one, and this will vary dependent on the magnitude and number of mesoscale components present in the model fit.
In Figure 4, in the left column we display FVU values obtained from our simulation setup shown in Figure 1. Specifically, we generate 100 replicated simulations of each of the four model scenarios shown in Figure 1-diffusivity only, strain+diffusivity, strain+vorticity+diffusivity (strain dominated), strain+vorticity+diffusivity (vorticity dominated)-where the stochasticity between replicates occurs from simulating submesoscale velocities from a Gaussian white noise process as in (9). Again, as in LatMix Site 1, we simulate nine drifters within each simulation with matching initial positions, but this time we just simulate half-hourly records for one day. We use the procedures described in Section 3.3 to fit four hierarchies of models to each simulation within each scenario. Note that we perform a global fit by setting the window length W to be the full length of the observations (one day). The FVU values are calculated from equation (27) and the resulting spread of values across simulations are shown by box and whisker plots in Figure 4. We also provide the spread of observed FVU values in an oracle case where the true mesoscale parameters are known.
In the figure we have also indicated the estimated theoretical FVU value obtained by combining the mesoscale variance obtained from equation (15) for each drifter k (let us denote this σ 2 w meso (k)) with the submesoscale variance of a white noise process given from the spectral form of equation (10) yielding σ 2 w sm = 4κ(1 − 1/K), which is the same for each drifter, where the (1 − 1/K) rescaling is required to account for moving to a centre-of-mass reference frame. We can then obtain an estimated theoretical FVU value, which we denote FVU, by taking This an estimated theoretical FVU, rather than an exact solution, because we have ignored the co-dependence between the mesoscale and submesoscale processes and assumed these variances aggregate separately. The results however indicate remarkable agreement between theoretical and observed quantities for FVU over all scenarios (except when insufficient mesoscale parameters are proposed in the candidate model), suggesting equation (28) is an accurate approximation for the spatial and temporal scale of the simulation performed. Overall, the key finding of Figure 4 (left column) is that the FVU helps identify the correct model in all true model scenarios considered, and correctly estimates how much of the variance is explained by the mesoscale and submesoscale components in agreement with the theory. The addition of a mesoscale parameter which is truly present significantly reduces the FVU, but adding further unnecessary mesoscale parameters (such as the divergence which is not present in any of the scenarios) does not significantly reduce FVU. This diagnostic tool therefore shows utility as a method for detecting the presence of mesoscale effects on drifter velocities, and for selecting between mesoscale model hierarchies. We shall scrutinise this further when we apply our procedures to LatMix data in Section 5.

Fraction of Diffusivity Unexplained (FDU)
The FVU is a measure of how much of the variability of the data remains in the submesoscale residuals. However, we argue this is not the only metric with which to ultimately select from a model hierarchy. First of all, as the residual velocities are being directly minimised (along with the background) in the least squares fits of equations (18)- (22), the more complex models will generally have a lower FVU than nested simpler models with fewer or no mesoscale components (as seen in Figure 4). This may lead to over-fitting models unless parameter penalisation methods are introduced. Secondly, mesoscale processes are primarily low frequency processes with decaying Lagrangian velocity frequency spectra, as we showed in Figure 2. Submesoscale processes, on the other hand, will likely have Lagrangian velocity frequency spectra that are spread across frequencies and concentrated away from frequency zero. For example, white noise submesoscale residuals will have a flat spectrum, and an internal wave process, represented by the Garrett-Munk spectrum for instance, will have significant energy at the inertial frequency f , but very small energy at frequency zero. For these reasons, we now motivate a second metric with which to evaluate different model hierarchies. Specifically, we measure the diffusivity of the residual process for each drifter, and compare this with the implied total diffusivity of each drifter when no mesoscale is removed. In other words, we compare the variability of the aggregated and submesoscale-only components in terms of their respective diffusivities, with a view that submesoscale diffusivity should be much lower than total diffusivity when even a mild mesoscale component is present (as mesoscale energy is dominant at low frequencies in the velocity spectra).
To quantify this effect we introduce the notion of the Fraction of Diffusivity Unexplained (FDU), which we define by whereκ sm k,z (t n ) has already been defined in equation (26).κ c.o.m.
k,z (t n ) is the diffusivity for drifter k with only centre-of-mass removed, which is defined by replacing u sm (26). The FDU measures how much diffusivity is present in the submesoscale residual after removing the mesoscale, as compared to the diffusivity that is observed relative to the centre-of-mass when no mesoscale has been removed. An FDU value of zero means that the submesoscale process has no diffusivity, and an FDU of one will occur when either no mesoscale is present, or the mesoscale does not create any diffusive-type behaviour on the particles.
We display observed FDU values across our simulations in the right column of Figure 4, mirroring the simulation setup used for FVU described in Section 3.4.1. The estimated theoretical FDU values are overlaid by a red horizontal value from computing where the expected submesoscale diffusivity for all drifters is κ sm z = κ(1 − 1/K) where again the (1 − 1/K) rescaling is required to account for moving to a centre-of-mass reference frame. We obtain κ meso k,z by taking 1/4 of the zero-frequency value from equation (17) (as per the definition of equation (8)). Similarly to equation (28), equation (30) is an estimated theoretical FDU because we are assuming independent dispersion caused by the mesoscale and submesoscale. Nevertheless, Figure 4 indicates consistent agreement between observed and theoretical FDU values (when the correct model is fitted), highlighting the accuracy of this approximation.
The main finding of the FDU analysis in Figure 4 is that the mesoscale explains significantly more of the total diffusivity than the total variance. This is as expected because of the lowfrequency nature of mesoscale processes (see Figure 2) and highlights the usefulness of computing FDU values to test for mesoscale presence. In all cases we can see that FDU analysis reveals the cor-rect generating mesoscale model even better than FVU does. We shall further use this diagnostic method of assessing model fits with LatMix data in Section 5.

Uncertainty Quantification
In this section we first provide a method for estimating the uncertainty of parameter estimates when applied to observational datasets. In a simulation setting, uncertainty estimates be obtained by repeating experiments several times stochastically or with different initial conditions, but this cannot be done in the real world where clustered drifter deployments are scarcely repeated in the same region of the ocean, and will likely be measuring different mesoscale and submesoscale features each time.
Instead, we resort to the bootstrap, which resamples the observed data in such a way as to provide a population of different datasets with which to measure uncertainty. Specifically, the bootstrap is implemented by taking a random sample of K trajectories from the K drifters with replacement, such that the same trajectories may be selected multiple times as if they were different drifters. Then the mesoscale parameters are estimated for this random sample of trajectories. Let us denote any one of these parameter estimates asp b . The process is then repeated B times, every time randomly resampling a set of K trajectories with replacement, such that we obtain B parameter estimates {p 1 , . . . ,p B }. These replicated bootstraps can be used to form quantiles which then provide confidence intervals for the parameter of interest, often set to values such as 90% or 95%.
Alternatively, we can also estimate the standard error ofp, the parameter estimate for p, by measuring the sample standard deviation ofp b given by wherep (·) = 1 B ∑ B i=1p (i). In Figure 5 we show a histogram of bootstrap parameter estimates for {σ, θ, ζ}, with a red vertical line at the true value, and a blue vertical line showing the average bootstrap estimate. The purpose of this simulation is simply to show that bootstrap parameter estimates are centred at their true values and symmetrically distributed, despite the fact that drifter trajectories are sampled with replacement. We found this to be a consistent feature across different true parameter values and simulation settings.
Next we establish that the bootstrap estimate for the standard error of parameter estimates, given in equation (31), agrees with standard errors of parameter estimates observed from repeated simulations. In Table 1 we compare simulated and bootstrap standard errors for two experiments: the strain-only and the strain-dominated simulations of Figure 1. The standard errors from simulations are across 100 repeated simulations, but the bootstrap standard error approximation is just from 1 simulation of drifters each time (as we would have with real data). Despite this, the average bootstrap standard error estimate is very close to the standard error from repeated simulations (with the standard deviation of the bootstrap standard error accounting for any difference). Notice also that the bootstrap standard error estimates are conservative, which is better than the converse, and correctly increase when more parameters need to be estimated. This demonstrates Figure 5: Histogram of bootstrap parameter estimates for strain rate, strain angle, and vorticity, over 100 repeated simulations where B = 100 for each simulation, thus obtaining 10,000 total bootstrapped parameter values. The trajectories are generated as in Figure 1 in the strain-dominated model, and the parameters are estimated using the second-moment fitting method. Any bootstrap estimates outside the range of the x-axis are placed in the limiting visible bar in the histogram on each side. The red vertical line is the true parameter value, and the blue vertical line is the average bootstrap estimate.  = 100), over 100 repeated simulations, for both the strain-only and strain-dominated simulations of Figure 1. We also provide the standard deviation of bootstrap standard error estimates across the 100 simulations, as indicated after the ± symbol.
the accuracy of equation (31) in estimating the standard error of parameter estimates obtained from equation (21). We will make repeated use of the bootstrap in the analysis of LatMix data in Section 5.

Time-evolving parameters using rolling windows
To estimate the temporal evolution of mesoscale features across a drifter deployment we allow the mesoscale parameters to evolve over time. We first introduce a simple method for doing so where we take a rolling time window of width W and estimate {δ(t n ), ζ(t n ), σ n (t n ), σ s (t n )}, and possibly {u 0 (t), v 0 (t), u 1 (t), v 1 (t)}, in equation (20) over time using velocity observations con- using the exact approach outlined in Section 3.1 repeated at every observation time-step t n in the experiment.
In general, the window width parameter W should be chosen to be large enough to ensure we have reduced variance and statistically significant estimates of each mesoscale parameter, but  not so large that resolution is lost from over-smoothing and information is lost. To examine this effect we display simulated trajectories in Figure 6 which exactly follows the strain-only simulation from Figure 1 except that the strain rate parameter now decreases linearly by a factor of 10 across the length of the 6.25 day simulation, and we have increased κ to 0.5m 2 /s. We then fit the second-moment fitting method with the strain-only model over rolling windows with 3 choices of W (6-hours, 1-day, or 3-days). In Figure 7 we display the time-varying strain rate estimate over time from the data in Figure 6, alongside the standard error of this estimate over time (obtained over 100 repeated simulations). With this increased diffusivity, the inherent trade-off with the rolling-window method becomes apparent. Long window lengths provide low uncertainty, but the parameter estimates are only provided in the temporal centre of the experiment (and would be biased if extended outwards). Short windows, on the other hand, provide variable estimates with large standard errors that exceed half the parameter value, as we see on the right panel-meaning such estimates cannot be statistically distinguished from zero in a "two sigma" sense. A daily window length is perhaps the most appropriate balance here. Motivated by these challenges, we shall shortly provide a more principled approach to generating smoothly-evolving parameter estimates using splines in Section 4.3. Before doing so, we present results of a large simulation analysis which we will use to guide our window length se- Figure 8: Estimated standard errors for the strain rate (in the units of the true strain rate) across a dense grid of fixed strain rate values σ and window lengths W in a strain-only simulation mirroring the setup in Figure 1. In the left panel we have set κ = 0.1 m 2 /s and in the right κ = 1 m 2 /s. The strain rate estimates are obtained using the second-moment fitting method of a strain-only model, and the standard errors are obtained using the jackknife (a variant of the bootstrap which is computationally faster to implement and was found to perform similarly). The standard errors in the heatmap are upper-bounded by 0.9 for representation purposes. We draw a red line (and best fit regression line) where the standard error is half the true parameter value for each value of the strain rate. lection choices in the LatMix experiment (where this analysis will also be useful for the spline methodology that shortly follows). Specifically, in Figure 8 we plot a heatmap of standard errors in strain rate estimation, over a grid of values of true constant strain rate σ and estimation window length W. We repeat the analysis for a low diffusivity κ = 0.1 m 2 /s and high diffusivity setting κ = 1 m 2 /s. Otherwise the settings are the LatMix-type settings used in Figure 1, using 9 drifter trajectories with matching starting locations. The standard errors are in units of the true strain rate, and we have marked with a red line the point at which the standard error is equal to "two sigma" (along with a best line regression fit for ease of reading). The way in which this plot should be interpreted is that for a given strain rate (and diffusivity), the window length should be at least as long as the best-fit red line marking the point at which estimates become statistically significant. For example, higher diffusivities, or lower strain rates, will require longer windows with which to estimate the parameters significantly. We focus on strain in these simulations, as this was found to be the most pronounced mesoscale effect in the LatMix analysis that follows, but this analysis could be repeated with other mesoscale parameters to inform window length selection for other drifter deployments.

Slowly-evolving parameters using splines
To generalise the idea of time windowing to estimate the mesoscale parameters, we represent the parameters as coefficients as a finite sum of B-splines, where M is the total number splines over the experiment window andσ m are the M coefficients. A B-spline (or basis spline) of degree S is a local piecewise polynomial that maintains nonzero continuity across S knot points placed at times τ i . These knot points define the extent of the Bsplines, and therefore let us choose an effective window length for parameter fluctuations. The lowest degree (S = 0) splines are boxcar functions between the knot points, and are thus identical to non-overlapping windows in Section 4.2. At degree S = 1 B-splines are triangle functions that span two knot points, thus providing continuity in time as well as a piecewise first derivative. This generalises to higher degrees, where a B-spline of degree S has S non-zero derivatives as reviewed in [11]. The key benefit to this approach is that we can allow for time variation in the parameters while simultaneously choosing an effective window length-all while adding only a few coefficients to the model. To extend the estimation method presented in Section 3.1, we now require M coefficients for each of the p parameters, resulting in pM total coefficients to estimate. Rewriting vector A from equation (20) we have that where each coefficient, e.g., u m 0 , is a column vector of the M B-spline coefficients (we will shortly discuss why u 1 and v 1 can be dropped here). The data matrix X correspondingly expands from p to pM columns, where each column is repeated for each of the M B-spline. Note that, because the B-splines are local functions, the resulting matrix may be relatively sparse.
Parameter estimation is as before, but equation (23) for the mesoscale flow is replaced by, The background flow and submesoscale flow are still recovered using equations (24) and (25), respectively. One of the advantages to using B-splines is that the fitting method model hierarchy is simplified. Figure 9 shows the complete model hierarchy that includes the first and second-moment fitting method, unlike Figure 3 which only showed the hierarchy for the second-moment fitting method. The key simplification is that with B-splines we can drop (u 1 , v 1 ) from X when going from equation (20) to (34), since time dependence is encoded in the B-spline estimates for (u 0 , v 0 ). p First and second-moment fitting method model hierarchy Choosing the appropriate model from Figure 9 proceeds exactly as in Section 3.3, but with the additional caveat that one must choose the spline degree S and the number of splines M. With the restriction that the spline degree S < M, a reasonable upper bound is S = 3, the cubic spline. The number of splines M can be chosen by assuming a minimum window length (as discussed in Section 4.2), treating the centre of each window as a data point, and then applying the formula for the canonical interpolating spline in [11]. To compute this explicitly, assume a time series of length T, with minimum window length W, then this results in a total of M = max(⌊T/W⌋, 1) evenly sized windows of minimum length. Now apply equations (7) and (8) in [11] using pseudo points at {t 1 , t 1 + T/M(j − 1/2), t N } where j = 2, . . . , M − 1. When the drifters are evenly sampled in time, this will result in M splines that each have support from the same number of data points, and each data point will intersect S + 1 splines. As a result, there is really only one parameter to adjust: the effective window length or, alternatively, the number of splines M. Because setting M = 1 exactly reproduces the approach in Section 3.1 using fixed parameters, the freedom for parameters to vary over time can be systematically increased by increasing M.
Quantifying uncertainty with spline solutions requires a modification to the approach in Section 4.1. This is because the resulting bootstrapped parameter estimates are no longer pointwise estimates of each parameter, but rather time-varying global solutions. This means that computing the mean of each mesoscale parameter at each instant in time will not, in general, result in a valid solution since each solution is a global fit to the data. As a result, rather than considering a mean value from bootstrap solutions, as in Figure (5), we must establish the most likely bootstrap solution. Applying the bootstrap B times results in B continuous time varying model solutions of the parameters. Thus, we compute the most likely solution (of the B solutions) from an estimated joint probability distribution function (PDF). Specifically, for each estimated parameter in the model, we use a kernel density estimator to estimate a PDF from the bootstrap replicates for each parameter at each point in time using the methodology in [19]. For example, at time t n we estimate a onedimensional PDFP ζ (t n ,ζ) using the B bootstrap parameter estimates for ζ and a two-dimensional PDFP σ n ,σ s (t n ,σ n b (t n ),σ s b (t n )) for σ n , σ s . The likelihood of each path is then found with where, in practice, we include probabilities from all estimated parameters. The most likely solution is that with maximum L and confidence intervals are similarly calculated by including the Y percent of the B most likely solutions.

Application to the LatMix Experiment
The lateral mixing (LatMix) field campaign of 2011 [1,20] deployed drifters and dye with the aim of understanding what causes mixing at the submesoscale, and how this varies both spatially and temporally. The experiment consisted of two drifter deployments in the Sargasso Sea, where the drifters were deployed in a cluster. The first deployment, which we refer to as 'Site 1', consisted of 9 drifters tracked for 6.1 days in an area of low strain, and the second deployment, 'Site 2', consisted of 8 drifters tracked for 6.3 days in an area of moderate strain. There has been a large amount of interest and research from the experiment, e.g. [21].
In Figure 10 we plot the drifter trajectories for each site both in terms of their {x, y} positions, but also with respect to the time-varying centre-of-mass across drifters. The effect of the mesoscale, especially strain, can already be seen visually by inspecting this plot, both in the absolute and centre-of-mass reference frames. There are also possible signs of divergence in Site 1 (the drifters spreading in a non-random way), and vorticity in Site 2. We will inspect this in more rigorous statistical detial using the methodology of this paper.

Fixed mesoscale parameter estimates
We first fit fixed (non-time-varying) mesoscale parameters to equation (4) using the secondmoment fitting method described in Section 3. We present the results for each site in the top half of Table 2 using several model hierarchies. For each model hierarchy we present the estimated mesoscale quantities, and the resulting submesoscale diffusivity. We also present FVU and FDU values (equations (27) and (29) respectively) to assess model fit. To select the best model we use the conceptual approach illustrated earlier in Figure 3.
For Site 1 we see reasonable evidence for adding the parameters {σ, θ} ahead of vorticity ζ or divergence δ, as this creates the lowest FDU values thereby creating low submesoscale diffusivities of κ ≈ 0.2m 2 /s, as reported in [1]. Next, we follow the hierarchy and consider adding vorticity or divergence to the strain. Here we see little evidence for vorticity, but some for divergence, with a marginal reduction in the FDU value for the latter. Finally, just for completion, we show the full hierarchy. While this full hierarchy will always yield the lowest FVU compared to all simpler models (as this is the objective function being minimised)-the FVU value does not appear to drop significantly, and the FDU value has in fact increased, suggesting this to be an overfitted choice if we are only selecting among fixed mesoscale parameters.
For Site 2 we see mixed evidence for either initially adding divergence or strain, but the vorticity only fit performs poorly and in fact adds diffusivity as compared to raw centre-of-mass velocities As divergence is only one parameter (vs two for strain), we would normally proceed  Fixed estimates (Site 1) model   Table 2: LatMix submesoscale diffusivity estimates and associated FVU and FDU, estimated over candidate models in the hierarchy at each site using either fixed, rolling window, or spline parameter estimates. For fixed estimates we also show the mesoscale parameter estimates. The fixed and rolling-window methods use the second-moment fitting methods, whereas the spline method uses the first and second-moment fitting method.
this way down the hierarchy using Figure 3. However, as we shall see when we account for timevariation in the mesoscale parameters, there will be more evidence for a strain-only model than a divergence-only model, therefore for comparison we follow this route down the hierarchy. When considering adding vorticity or divergence, then now there is interestingly more evidence for vorticity, with reduced FVU and FDU values. Overall however, we note that diffusivity values are much larger at Site 2 using fixed parameters, with κ ≈ 2m 2 /s. This is likely due to the presence of time-varying mesoscale features not being account for, as we shall now explore.

Time-evolving parameters using rolling windows
We now apply the rolling-window estimates using the second-moment fitting method, as discussed in Section 4.2. To pick a suitable window length W, we see from Table 2 that diffusivity scales as order 0.1 − 1 m 2 /s, and the strain rate when converted to days is approximately 1/3 days. Therefore, using Figure 8 as a guide we choose a window length of W = 1 day (corresponding to 49 observations over 30-minute sampling intervals for each drifter). This choice also coincides approximately with the inertial and diurnal periods meaning inertial oscillations and tides will be relatively close to zero mean within the window, thus being closer to satisfying the zero-mean assumption of the average submesoscale residuals across drifters made in equations (2)-(4).
Within Table 2 we provide the estimated submesoscale diffusivity, and FVU and FDU error metrics, using rolling one-day windowed mesoscale parameter estimates for each hierarchy. As expected, the FVU decreases everywhere (as more parameters are being fitted) in comparison to the fixed-parameter fits. The FDU values, on the other hand, decrease in some but not all cases, providing mixed evidence for time-variation at this stage. We notice the reductions in FVU and FDU are most pronounced for Site 2, indicating this is the site most likely to have a time-evolving mesoscale. Overall, there is now evidence for a time-varying strain-vorticity model. Including divergence is now a less favourable choice than with the earlier analysis with fixed estimates.
In Figure 11 we display some examples of the time-varying parameter estimates using this approach. In the top panels we show strain rate over time at each respective site using a strain-only model, where the evidence for more temporal evolution at Site 2 is clear. We overlay bootstrap trajectories of these time series (as well as the fixed parameter estimates from Table 2) which indicates this variation appears significant at Site 2, but largely not at Site 1. Also, the low values for strain rate of ≈ 0.01 f 0 in the fixed-parameter estimate appears to be a misfit due to model misspecification from not allowing time-variation. The values are now larger than Site 1 when allowing time evolution, as expected. In the bottom panels we show the time-varying strain rate and vorticity estimates using a strain-vorticity model. Again there is evidence for time-variation which we will explore further with spline fitting.
Although the parameter estimates obtained using rolling windows are overfitted and not slowly varying, these fits however provide an extremely useful lower bound, in terms of interpreting estimated submesoscale diffusivities and FVU/FDU values. This will help guide the implementation for modelling time-variation more smoothly using significantly fewer parameters in the spline methodology that follows. In contrast, the fixed parameter estimates provide a useful upper bound on diffusivities and FVU/FDU values, as this approach is the most parsimonious.

Slowly-evolving parameters using splines
We continue our analysis of the LatMix data by fitting time-evolving mesoscale parameters using the splines approach defined in Section (4.3). We will use the full first and second-moment fitting method allowing us to make a complete decomposition of the flow at both sites into background, mesoscale, and submesoscale components.
First, in Figure 12 we compare estimates of strain rate between the second-moment and the first and second-moment fitting methods during the first two days of the LatMix Site 1 experiment. This particular window has relatively low strain rates that may not be distinguishable from zero, as seen in the top-left panel of Figure 11. Using the bootstrap estimates and a kernel density estimator, the left panel of Figure 12 shows the distribution of strain rates using the second-moment fitting method. While the peak of the distribution is consistent with the strain rate estimated over the entire six day experiment, the 90% contour of the distribution includes an enormous range of strain rates, including zero. In contrast, by including the first-moment as part of the fitting method, the right-panel of Figure 12 shows a narrower range of strain rates that do not include zero. Thus, at least in this example, the combined first and second-moment fitting method provides more robust estimation than the second-moment fitting method by including extra information in the fit.
In Figure 13 we display the time-evolving parameter estimates at Sites 1 and 2 using a strainonly and strain-vorticity model respectively. We overlay confidence intervals obtained using the bootstrap procedure outlined in Section 4.3. The time evolution of the strain-vorticity parameters is clear at Site 2, where all 3 mesoscale parameters {σ, θ, ζ} are seen to change in a smooth fashion across the 6 days. In contrast, at Site 1, evidence of time variability for the strain rate is less clear, as the estimate of constant strain rate (dashed-line) fits entirely within the confidence intervals. Figure 13 also shows estimates of {u 0 , v 0 }, but their particular values are not directly interpretable, as they depend on the location of the expansion point, {x 0 , y 0 }. Instead, from equation (3), it can be seen that they contribute to the mesoscale description of the flow at the location of the centreof-mass.
We include the submesoscale diffusivity estimates, as well as FVU and FDU values, in the bottom portion of Table 2, along with comparison values from a hierarchy of models at each site. Figure 13: Parameters of the spline based strain model fits to Site 1 (left panel) and strain-vorticity model fits to Site 2 (right panel) using the first and second-moment fitting method. The most likely solution is highlighted, with 90% and 68% most likely solutions shown in grey and dark grey, respectively. The models are fit using four degrees of freedom per parameter with the splines shown in the bottom row.
What we observe is quite remarkable: we can achieve FVU and FDU values that are very close to the rolling window estimates, despite using significantly fewer parameters to describe the evolution of the mesoscale velocity field. The evidence from Table 2 continues to support the choice of a strain model at Site 1 (with minor evidence for the additional presence of divergence), and a strain-vorticity model at Site 2. The estimated submesoscale diffusivities after performing the fits are around κ = 0.2 m 2 /s at Site 1 and κ = 1.0 m 2 /s at Site 2, nearly an order-of-magnitude difference.
Finally, we complete our analysis of the LatMix data by using the spline fits of Figure 13 to decompose the flow into the three components of our conceptual model of equation (1)-background, mesoscale, and submesoscale-and then integrate over time to construct an implied set of drifter trajectories for each component. This is displayed in Figures 14 and 15 for Site 1 and Site 2 respectively. We have also included the mesoscale component in centre-of-mass coordinates. We observe that the mesoscale components meander in the fixed reference frame and follow the observed particle paths explaining most of their displacement and explain some of the spreading in the centre-of-mass frame. This can be seen by directly comparing Figures 14 and 15 with Figure 10. The submesoscale components are random-walk like and broadly resemble a diffusive process. The background components contain inertial oscillations and tides which create looping  Figure 10). The centre panel shows the same solution in the centre-of-mass frame (compare to the lower-left panel of Figure 10). The top-right and bottom-right panels show the path-integrated background and submesoscale flow, respectively.
trajectories with roughly daily periodicity.

Discussion and Conclusion
The separation in equation (1) is a compelling conceptual model, based on the ideas of nonlocal spreading in turbulence theory-but is the separation actually doing something useful in practice? This idea can be tested by considering the cross-terms in the total energy of the model, as was done in [22]. Specifically, the cross terms in the kinetic energy equation, u 2 total = u 2 bg + u 2 meso + u 2 sm + 2 u bg u meso + u meso u sm + u bg u sm , should remain small if this is truly an orthogonal linear decomposition. To assess this quantity we compute the coherence between the complex submesoscale signal and the complex mesoscale signal in the centre-of-mass frame, as shown in Figure 16. The results show remarkably low coherence (O(0.1)) at Site 1, across all frequencies, suggesting no relation between the two signals. In contrast, Site 2 does show more coherence between the two signals, likely reflecting the challenges of the separation in time-varying conditions. Despite this, the average coherence across frequency bands is ≈ 0.2, suggesting the decomposition is successfully separating two mostly distinct signals.
One of the key strengths of this methodology is how few parameters are needed to estimate the mesoscale parameters and perform the decomposition. For example, at Site 1 there are N = 294 observations of position from K = 9 drifters, resulting in 2NK degrees-of-freedom. The secondmoment fitting method uses 2N degrees-of-freedom to remove the centre-of-mass. Using a single window across the entire time series to estimate the 2 parameters in the strain model, such as Site   Figure 16: Coherence between the mesoscale signal in the centre-of-mass frame and the submesoscale signal at Site 1 and Site 2, using the disentangled velocities corresponding to the trajectories shown in Figures 14 and 15 respectively. 1 which is well described by a single set of strain rate parameters across the entire window, leaves 2N(K − 1) − 2 degrees-of-freedom to describe the submesoscale flow. In contrast, daily rolling windows with N W = 49 points (corresponding to one day) that estimate strain rate parameters at each of the N − N W time points, leaves only 2N(K − 2) + 2N W degrees-of-freedom to describe the submesoscale flow. As is evident in Figure 11, these extra degrees of freedom capture time-variability in the parameters that may not be appropriate. Finally, the spline fits require estimating M coefficients per mesoscale parameter, and thus the spline based time-varying fits leave 2N(K − 1) − 2M degrees-of-freedom to describe the submesoscale flow using the second-moment fits. With M = 4 sufficient to capture any time variability at Sites 1 and 2, this approach uses remarkably few parameters to perform this estimation. The benefit of which is that the decomposed submesoscale trajectory will contain rich statistical information with which to do further Lagrangian analysis. As discussed in the introduction, we view this works as complementary to that of [3,10] who recently developed a method for projecting clustered drifter trajectories to local Eulerian velocity fields using Gaussian Process regression. The ultimate goal of [10] was to compute horizontal velocity gradients with which to better understand vertical transport. The method was applied to the CALYPSO an LASER drifter deployments. Applying our method to these datasets is a natural avenue for further investigation. More broadly speaking, what our method provides to complement [3,10], is not the Eulerian velocity field, but rather the Lagrangian decomposition of the trajectories into various components. This allows us to extract the specific submesoscale component from the trajectory for further analysis within the Lagrangian setting. This allows for the estimation of submesoscale diffusivity, which is not a topic covered in [3,10]. However there is certainly scope to merge and compare our methodologies, particular because the constructed Eulerian velocity field can be directly compared with the mesoscale parameters we estimate locally over time (and hence space) using our slowly-evolving spline fitting. Again, this is certainly a topic that warrants further investigation. We also see potential for our work to naturally follow-on from the recent methodology developed in [23] who identify clusters of drifter trajectories that share coherent structures. For example, such clustering could be used to divide larger deployments into smaller clusters, after which our method can then be applied to each cluster to separate flow components within coherent structures.

Funding
The work of S. Oscroft was funded by the Engineering and Physical Sciences Research Council (Grant EP/L015692/1). The work of A. M. Sykulski was funded by the Engineering and Physical Sciences Research Council (Grant EP/R01860X/1). J. J. Early was funded by the National Science Foundation award OCE-1658564.