Reduced Models of Point Vortex Systems

Nonequilibrium statistical models of point vortex systems are constructed using an optimal closure method, and these models are employed to approximate the relaxation toward equilibrium of systems governed by the two-dimensional Euler equations, as well as the quasi-geostrophic equations for either single-layer or two-layer flows. Optimal closure refers to a general method of reduction for Hamiltonian systems, in which macroscopic states are required to belong to a parametric family of distributions on phase space. In the case of point vortex ensembles, the macroscopic variables describe the spatially coarse-grained vorticity. Dynamical closure in terms of those macrostates is obtained by optimizing over paths in the parameter space of the reduced model subject to the constraints imposed by conserved quantities. This optimization minimizes a cost functional that quantifies the rate of information loss due to model reduction, meaning that an optimal path represents a macroscopic evolution that is most compatible with the microscopic dynamics in an information-theoretic sense. A near-equilibrium linearization of this method is used to derive dissipative equations for the low-order spatial moments of ensembles of point vortices in the plane. These severely reduced models describe the late-stage evolution of isolated coherent structures in two-dimensional and geostrophic turbulence. For single-layer dynamics, they approximate the relaxation of initially distorted structures toward axisymmetric equilibrium states. For two-layer dynamics, they predict the rate of energy transfer in baroclinically perturbed structures returning to stable barotropic states. Comparisons against direct numerical simulations of the fully-resolved many-vortex dynamics validate the predictive capacity of these reduced models.


Introduction
The equilibrium statistical mechanics of point vortex systems now exists as a mature theory. Onsager provided the first insight that equilibrium theory is able to explain the organization of many-vortex systems into coherent structures [1]. He also indicated that these systems have novel thermodynamic properties, most notably that they allow equilibria with negative absolute temperature and that the usual equivalence of ensembles between microcanonical and canonical distributions may break down. In the decades since his prescient work, not only have these properties been established theoretically and verified numerically, but it has been understood that, due to the long-range nature of the interactions between point vortices, the appropriately-scaled continuum limit is an exact mean-field theory [2][3][4][5]. Consequently, the thermodynamic behavior of equilibrium states is completely determined by a mean-field equation, whose solutions are steady two-dimensional flows having a special functional relation between mean vorticity and streamfunction, which is derived from the populations of the vortex strengths (circulations) in the statistical ensemble [6][7][8].
model replaces the intricate configuration of point vortices by an effective elliptical distribution of vorticity.
In the first half of the paper we develop the optimal closure theory of point vortex systems governed by inviscid, incompressible Euler equations in the plane. In the second half we generalize that theory to quasi-geostrophic dynamics, first for barotropic single-layer flows and then for baroclinic two-layer flows. For these simplified geophysical fluids, we conduct numerical experiments to test the accuracy of the reduced models. We especially draw attention to the predicted rates of relaxation of vortical structures and the dependence of those rates on the key parameters in the quasi-geostrophic equations. Indeed, the useful outcome of such severe model reduction and closure is to approximate gross features and functional dependences with much greater computational efficiency than is required by direct numerical studies of the full dynamics.

Statistical Mechanics of Point Vortices
The streamfunction-vorticity formulation of the incompressible, inviscid, two-dimensional Euler equation in the plane is denotes the Poisson bracket on R 2 . (Throughout we consider an appropriately nondimensionalized flow problem, in which the physical variables x and y are scaled by L, ζ by T −1 and the fluid density by ML −2 , using characteristic length, time and mass scales, L, T, M; thus, the fluid density is unity in our formulation.) The scalar conservation law (1a), which states that the vorticity ζ(x, t) is constant along all flow trajectories,ẋ = ∂ψ/∂y,ẏ = −∂ψ/∂x, is the distinguishing property of two-dimensional flow [27,28]. In particular, it implies the conservation of Γ(ζ) = R 2 ζ dx, which by Stokes' formula coincides with the total circulation, as well as the conservation of all higher moments of the vorticity (general enstrophy integrals). The continuum dynamics (1) has a (generalized) Hamiltonian structure, and its Hamiltonian is a quadratic functional of ζ [8,28]; namely, The translational and rotational symmetries of the domain imply two additional invariants; namely, the first and (radial) second spatial moments of vorticity, We shall refer to the invariants H, B and I as the energy, linear impulse and angular impulse [27]. Technically these three invariants are the finite parts of the total kinetic energy, linear momentum and angular momentum, respectively, which are divergent for a flow in the plane induced by a compactly supported vorticity ζ, since its velocity field decays like Γ|x| −1 as |x| → ∞.
A point vortex dynamics occurs when the vorticity distribution is sharply concentrated at N points: where γ i is the vortex strength (or circulation) at the point x i , the location of the i th vortex; δ(x) is the unit delta function on R 2 . The Euler equations then collapse to advection of the point vortices by the induced flow, whose streamfunction is This classical reduction of the continuum equations to a finite-dimensional dynamical system for the point vortex locations ignores the self-induced velocity field of each vortex [29]. While intuitively evident, this desingularization can be justified analytically in the point vortex limit of the continuum dynamics under some restrictions [28]. The point vortex dynamics may be expressed as a canonical Hamiltonian system in the variables, as a function H N (q 1 , p 1 , . . . , q n , p n ) [28,29]. Our interest centers exclusively on systems of identical vortices. Accordingly, we set γ i = 1/N, so that Γ(ζ N ) = 1. To express the point vortex dynamics in a Hamiltonian form with respect to the spatial coordinates x i = (x i , y i ) themselves, we employ the rescaled Poisson bracket, F and G denote generic functions on the phase space R 2N of the system, and the planar bracket acts on their x i variables separately. The point vortex dynamics is then equivalent to the family of identities: Besides H N and Γ N = 1, this Hamiltonian dynamics conserves the linear and angular impulses, respectively, after centering the vortex system at the origin. Statistical equilibrium theory invokes the ergodic hypothesis and considers invariant probability distributions on the phase space constrained by the known invariants [30,31]. In particular, the canonical Gibbs density, is parameterized by constants β and α conjugate to the ensemble mean values of energy and angular impulse, respectively, H N = E and I N = L 2 ; φ(β, α, N) normalizes the probability density ρ N .
Here and throughout our subsequent discussion, the angle brackets · denote average, or expectation, with respect to a specified density; in this case ρ N . The invariant B N is able to be ignored, as the centering constraint B N = 0 is automatically satisfied. In (8) the canonical parameters are scaled by N in anticipation of the appropriate continuum limit, in which E and L 2 are fixed and finite as N → ∞. In that scaled limit the theory of large deviations characterizes the behavior of the empirical measure ζ N in (4), that is, the spatial density of point vortices [4]. Specifically, ζ N tends in the weak sense of measures on R 2 to the maximizer ζ of the entropic functional over probability densities ζ(x) on R 2 . The same large deviation principle quantifies the likelihood of fluctuations away from the most probable density ζ(x) for large N: the likelihood of another stateζ is exponentially small in N with a rate functional equal to S β,α [ζ] − S β,α [ζ] . The relevant large deviation theory is explained in [32] and its application to point vortex ensembles is summarized in [8].
The optimality conditions for the equilibrium state, ζ = −∆ψ, imply that its streamfunction ψ satisfies the mean-field equation for a constant µ adjusted so that ζ is a probability density on R 2 . This mean-field equation is the equilibrium condition for the constrained minimizer ζ of S β,α subject to ζ dx = 1. Namely, the Lagrange multiplier rule yields using the multiplier µ − 1; since the variation δζ is arbitrary, (9) follows. Thus the most probable state in the continuum limit with finite energy and angular impulse is a special steady solution of the Euler equations, having an exponential dependence of vorticity on streamfunction. That special dependence in the macrostate is a direct consequence of the modeling assumption that the microstate consists of many identical point vortices [28,29]. It is often advantageous to associate these equilibrium flows with the energy and angular impulse values that they attain, and to parametrize branches of solutions to the mean-field Equation (9) by E and L 2 rather than β and α. Of course, this change of perspective is realized by replacing the canonical ensemble by the microcanonical ensemble. Microcanonical equilibrium states are constrained maximizers of entropy: β, α, µ − 1 are then the Lagrange multipliers associated with the equality constraints, and the maximizer satisfies the mean-field Equation (9). A large deviation analysis similar to that for canonical equilibria is applicable to microcanonical equilibria [8]. An iterative algorithm exists to solve the constrained optimization problem (10) and thus to compute branches of equilibrium states [33]. Using this algorithm it is possible to establish that the maximum entropy is a concave function of the constraint values E, L 2 , and hence that the canonical and microcanonical formulations are equivalent in this particular problem; for vortex systems under other conditions the equivalence of ensembles may break down when β < 0 [34,35]. These equilibrium states are axisymmetric, stable, macrovortices with radially decreasing vorticity. For β = 0 they are Gaussian densities, while for β > 0 they are less concentrated, and for β < 0 they are more concentrated at the origin than a Gaussian. Figure 1 displays three representative radial profiles of the vorticity ζ(r).

Optimal Closure Method
When we endeavor to construct a nonequilibrium theory of macroscopic states, and closed evolution equations for those states, we necessarily include observables in the macroscopic description that are not exact invariants. Accordingly, each choice of macroscopic observables leads to a corresponding nonequilibrium reduced model. This arbitrariness can be circumvented only when the system under study has a strong separation of time scales, so that its slow variables can form the macroscopic description, and its fast variables can be relegated to a statistical treatment. Even in such a case a unique reduction is only achieved in the limit as the separation of time scales becomes infinite. The current paper therefore adopts a different perspective. We accept that some choice of resolved, macroscopic observables must be made, and that this choice is not unique. Given a chosen set of observables-presumably a reasonable choice from the standpoint of a desired reduced model-the mathematical problem then becomes how to devise a suitable closure in terms of those observables.
We approach this problem using a combination of statistical modeling and information theory. Namely, we construct a parametric family of "trial" probability densities on the phase space of the microscopic dynamics (assumed to be deterministic and Hamiltonian), having parameters that are in a one-to-one correspondence with the mean values of the chosen observables. Within the reduced statistical model defined by that parametric family we attempt to follow the propagation of probability induced by the underlying microscopic dynamics. To do so we quantify the deviation between the exactly propagated density and any feasible evolution of the model density, using a metric derived from the relative entropy, or Kullback-Leibler divergence, between the exact and model densities. Dynamical closure within the reduced model is then determined by minimizing this metric over feasible paths in the parameter space of the model. In this way, we obtain a statistically consistent closure, in which macrostates evolve so as to minimize the rate of information loss due to model reduction.
This optimal closure method of model reduction was introduced by one of the authors in [22]. In that paper a weighted metric is considered, the weights being adjustable parameters in the closure. Subsequent studies have shown, however, that these adjustable weights are unnecessary, and that an intrinsic optimal closure is achieved by setting all weights equal to unity [21,25,26].
In the remainder of this section we summarize the optimal closure method in the general context of reducing a Hamiltonian dynamics onto a finite set of linearly independent observables, A 1 , . . . , A m . Let z denote a generic point in the phase space R 2N ; for instance, in the point vortex system, z = (x 1 , . . . , x N ). For "trial" probability densities on the phase space we take quasi-canonical densities built from the chosen observables, namely, where ξ = (ξ 1 , . . . , ξ m ) is the parameter vector conjugate to the resolved vector A = (A 1 , . . . , A m ); φ(ξ, β) normalizes the probability. The probability densities (11) constitute an exponential family with natural parameters ξ ∈ R m and −β ∈ R [36,37]. We are interested in modeling the relaxation of macrostates toward statistical equilibrium, and so we slave β to ξ by imposing the conservation of energy, namely, that H = E, where E is the equilibrium energy. Here the expectation is taken with respect to the trial densityρ. We defer until the next section any specification of the vector observable A, except to say that A(z) represents some coarse-graining of the microstate z.
The exact propagation of probability on the phase space R 2N is expressible formally as Indeed, this statement is equivalent to the Liouville equation, which is the foundation of statistical mechanics [19,31,38]. Our closure procedure is based on submitting trial densitiesρ(·; ξ(t)) to the Liouville equation and minimizing the mean-square of the resulting residual over parameter paths ξ(t). Equivalently, our closure principle can be viewed in information-theoretic terms by analyzing the Kullback-Leibler divergence between an exactly propagated density and a feasible model density [39,40]. Namely, letρ(t) = ρ(t) at time t, and for a short interval of time ∆t consider the incremental relative entropy, expanding in a Taylor series in ∆t; the expectation in the second equality is taken with respect tõ ρ(·; ξ(t)). The leading order term in this expansion of the relative entropy is an intrinsic metric on the discrepancy between the exact and model densities. In view of this calculation, we define the Liouville residual to be on any feasible path ξ(t), and we quantify the local lack-of-fit of the reduced model by Our optimal closure minimizes the time integral of L(ξ,ξ) over feasible parameter paths. These extremal paths realize the slowest rate of information loss, or entropy production, within the reduced model, and thus represent the best fit of the trial densities to the Liouville equation.
The optimality conditions for the closure are conveniently derived by applying Hamilton-Jacobi theory [41][42][43]. That is, we view L as a Lagrangian function and introduce its value function (analogous to an action integral), v(ξ,t) = min To the lack-of-fit Lagrangian we associate a lack-of-fit Hamiltonian (not to be confused with that for the microscopic dynamics) and conjugate variables, namely, The value function (14) satisfies the Hamilton-Jacobi equation Along extremals, π k = −∂v/∂ξ k , and hence we obtain the reduced equations a closed system of m ordinary differential equations of first order in ξ(t) ∈ R m . Our optimal closure approximates the relaxation of the system from an initial statistical stateρ(·, ; ξ 0 ) to statistical equilibrium (for which ξ = 0) by the path of trial densitiesρ(·; ξ(t)) determined by the solution ξ(t) of (17) satisfying the initial condition ξ(0) = ξ 0 . The Liouville residual for a trial density (11), in which β(ξ) is determined by the mean energy constraint H = E, is given by where Q H denotes the complementary orthogonal projection onto the energy shell, that is, From this formula it is evident that R = 0, R(H − E) = 0, and hence that the dynamical diagnostic R is analogous to the score variable for a parametric statistical model [37]. In turn, its mean-square R 2 , which defines the lack-of-fit (13), is a dynamical analogue to the Fisher information [37]. The meaning of the conjugate variables π k is revealed by the calculation, which uses the fact that L(ξ,ξ) depends, quadratically, onξ. Taking the moment with respect to A k of the Liouville equation yields Thus, π k is precisely the residual in the moment equation for A k , due to replacing the exact density by the trial density. Accordingly, π k may be interpreted thermodynamically as the irreversible flux of the mean A k . The Hamilton-Jacobi analysis reveals that the flux vector, π, is the minus ξ-gradient of the value function v(ξ, t). Therefore v has the thermodynamic interpretation of a dissipation potential in the reduced equations [44]. A fuller explanation of this optimal closure is given in [22], where it is termed the "nonstationary closure", since the value function is time-dependent. In this closure the dissipation vanishes at t = 0 when the initial nonequilibrium statistical state coincides with a trial density; for large t, it approaches the associated "stationary closure", for which the value function is time-independent.
Practical implementation of this model reduction procedure is inhibited by the difficulty inherent in solving the Hamilton-Jacobi Equation (16), which is not analytically tractable apart from very special (completely separable) cases. This impediment may be overcome by resorting to numerical optimization methods, such as are available in the optimal control literature; for instance, this approach is implemented in [25]. In the current paper, however, we instead restrict our attention to near-equilibrium relaxations, which allows us to linearize the optimization theory around equilibrium and thereby obtain a linear irreversible closure. Under that approximation the Hamilton-Jacobi equation becomes a matrix Riccati equation, whose solution may be identified with the matrix of transport coefficients that govern the dissipative reduced dynamics. Rather than develop this approximation in the abstract, we return in the next section to point vortex systems and elaborate the optimal closure in that setting.

Nonequilibrium Mean-Field Theory of Vortex Systems
We coarse-grain the N-vortex dynamics, for large N, in terms of finitely-many spatial moments of the vorticity defined by some chosen functions A k (x), k = 1 . . . , m. The macrostate is then the ensemble mean of the vector observable A N ∈ R m having components (Abusing the notation of the preceding section, henceforth A N k denotes the observable on phase space, while A k denotes the spatial moment on physical space.) For instance, when we model the axisymmetrization of a distorted vortex ensemble centered at the origin (so that the first-order moments vanish and are able to be ignored), we take A 1 = x 2 − y 2 , A 2 = 2xy, along with A 3 = x 2 + y 2 , which determines the angular impulse invariant. Heuristically, the macrostate, a = A ∈ R 3 , describes the effective "size", "eccentricity" and "inclination" of the evolving vortex ensemble conceived as an elliptical structure.
In this section we include the angular impulse in the resolved vector, for the sake of clarity while elaborating the mean-field theory. In the later sections, we treat the angular impulse as an invariant, like the energy, and include only non-conserved resolved variables in A.

Mean-Field Ansatz
Motivated by the established fact that the equilibrium theory of vortex systems yields an asymptotically exact mean-field theory in the appropriate continuum limit, we base our nonequilibrium theory on the following trial densities on R 2N : whereζ(x; β, ξ) is the probability density on R 2 defined by the mean-field equatioñ That is, our statistical model of the N-vortex system assigns independent random locations, x i , to each of the point vortices, distributed according toζ(x). This spatial density is parameterized by β, conjugate to the energy invariant, E, and ξ 1 , . . . , ξ m conjugate to the moments, A 1 (x), . . . , A m (x) that define the coarse-graining; µ = µ(β, ξ) normalizesζ. In the limit as N → ∞, the law of large numbers implies that the empirical density ζ N (x), defined in (4), tends toζ(x) in the weak sense of measures; in turn, the smoothing properties of the Poisson equation [41] guarantee that the corresponding streamfunction ψ N (x), defined in (5), tends toψ(x), as do its first derivatives and hence the corresponding velocity field.
The trial densities (20) are constructed from solutionsψ(x) of a nonlinear elliptic partial differential Equation (21), which depends parametrically on β and ξ. Alternatively, the marginal density,ζ(x), is the solution of the maximum entropy problem (10) in which the single constraint on angular impulse is replaced by the family of constraints: The constraint values, a k and E, then parameterize the maximizersζ(x) rather than the associated multipliers −ξ k and β. Holding the energy constant, β is slaved to ξ, and hence ξ is determined by the macrostate vector a. The correspondence between ξ and a is one-to-one in a neighborhood of equilibrium, as our subsequent analysis will show. We submit this mean-field trial density to the Liouville equation governed by the Hamiltonian H N displayed in (6) (with γ i = 1/N), and the Poisson bracket (7). The scaled residual is This calculation makes use of the desingularized streamfunction, In (22), and in the analysis to follow, · denotes expectation with respect to the densityζ. We notice that the mean-field ansatz produces two terms that do not appear in the residual of quasi-canonical trial densities outlined in the preceding section. The term, β∂ t (ψ − ψ ), accounts for the implicit dependence of the streamfunctionψ on the parameters ξ(t), β(t). The term, β[ψ N i ,ψ], vanishes in the continuum limit. Both these terms reflect the relative simplicity of the mean-field ansatz, which for finite N imposes the product form (20) that the quasi-canonical density achieves only in the limit as N → ∞.
It is evident from (22) that, in the continuum limit as N → ∞, the appropriate continuum residual on which to base a mean-field optimal closure is noticing that ψ = 2E, which is constant in time. This mean-field version of the Liouville residual is derived entirely fromζ. In fact, it may be viewed as simply the residual in the Euler equations of the ansatz (21), that is, In this light, the derivation ofR from the mean-field ansatz may be viewed as a justification on information-theoretic grounds for using L = R 2 /2 as the cost function for an optimal closure formulated in physical space in terms of quasi-equilibrium vorticity fields (21). The reduced model is therefore an optimal closure with respect to the Euler equations themselves, in which the underlying point vortex dynamics and its associated entropy functional are used to deduce the trial densities and the lack-of-fit metric appropriate to the continuum limit.

Near-Equilibrium Linearization
Next we linearize the optimal closure theory formulated in the preceding subsection around a given equilibrium state, ζ eq , with corresponding streamfunction, ψ eq . Relaxation of an initial disturbed macrostate toward this equilibrium macrostate conserves the energy, E, and angular impulse, L 2 , and thus the near-equilibrium linearization of the optimal closure is required to respect those invariants up to linear order in the perturbation. The mean-field macrostate is and by assumption the parameter vector, ξ ∈ R m , is near the origin. (The angular impulse term is no longer included in the resolved vector, since it is an invariant.) Denoting perturbations from equilibrium by primes, so that α = α eq − α , β = β eq − β andζ = ζ eq + ζ ,ψ = ψ eq + ψ , we have Consequently the perturbation streamfunction satisfies This elliptic equation along with the boundary condition, lim |x|→∞ ψ (x) = 0 , determines the solution ψ , which depends linearly on ξ, α , β . In turn, α , β are determined as linear functions of the vector ξ by solving the linearized energy and angular impulse constraints, In summary, the linearized mean-field ansatz together with the linearized invariants yields the linear expressions in which the sensitivities, denoted by α k , β k , ψ k , to the perturbation ξ k are calculable from the given equilibrium state ζ eq . We refrain from displaying explicitly the coefficient matrices involved; full details are given in [45]. The Liouville residual is, to leading order, where we define the vectors U and V by Squaring and taking the expectation we obtain the quadratic lack-of-fit Lagrangian, whose matrices are computed from the given equilibrium state ζ eq . The optimal closure thus becomes a quadratic programming problem and its Hamilton-Jacobi Equation (16) becomes a Riccati matrix equation for the m × m matrix, M(t), that defines the quadratic value function, v(ξ, t) = ξ T M(t)ξ/2 . Specifically, the relations (15) become where D = K − JC −1 J T . Substituting π = −Mξ into the Hamilton-Jacobi equation then yields the following initial value problem for the matrix-valued function M(t): Known properties of Riccati equations ensure that M(t) is symmetric semi-definite, because C is symmetric positive definite and D is symmetric semi-definite [43]. The optimal parameter path ξ(t) satisfies the linear system which is the linearization of (17). Since the macrostate vector a and the parameter vector ξ are related by the reduced Equation (33) is equivalently a relaxation equation for the macrostate. This completes the optimal closure. We emphasize that this linearized, near-equilibrium relaxation equation is constructed entirely from the given equilibrium mean-field state, ζ eq , and is free of any adjustable constants, such as relaxation rates. Moreover, the reduced dynamics is memoryless in the macrostate a(t), although it is non-autonomous due to the time dependence of M(t). The remainder of the paper applies this optimal closure to two simplified models arising in geophysical fluid dynamics. These models are straightforward generalizations of the two-dimensional Euler equations, and the analysis given above extends to them without any fundamental changes. We therefore concentrate on displaying and interpreting the predictions of the closure, and testing those predictions against benchmarks computed by direct numerical simulations of the many-vortex systems.

Axisymmetrization in Barotropic Quasi-Geostrophic Dynamics
The single-layer quasi-geostrophic dynamics is governed by the transport equation for the potential vorticity, denoted by q(x, t): We refer the reader to the literature on geophysical fluid dynamics for the meaning of this system, and its derivation as an asymptotic model for a rotating shallow fluid layer in the limit of small Rossby number [46][47][48]. This model involves a length scale, R d , the Rossby deformation radius, which characterizes the scale at which kinetic and potential energies balance. An ensemble composed of N concentrations of potential vorticity, namely, interacts with a screened potential; specifically, the Green function for (34b) is the Bessel function (1/2π)K 0 (r/R d ). The Euler equation is recovered in the limit as R d → ∞.
The presence of a length scale in the interaction potential for the vortex system allows us to investigate how the relaxation of an ensemble of vortices depends on the ratio between R d and the average "radius" of the ensemble, which thanks to the angular impulse constraint is L for an ensemble centered at the origin. To do so we study the evolution of ensembles of vortices that are initially distorted away from a given axisymmetric equilibrium state. We measure this distortion by the second moments of the vorticity field. Besides simplicity, the justification for this choice is that a centered Gaussian distribution is completely characterized by it second moments; and equilibrium macrostates for β = 0 are Gaussians. The moment of |x| 2 = x 2 + y 2 being a conserved quantity, we declare the observables to be the resolved variables, noticing that A 1 (x), A 2 (x) and |x| 2 form an orthogonal basis for the (centered) quadratics in x with respect to any axisymmetric density, and hence any equilibrium macrostate. The ensemble mean values, a 1 = A 1 , a 2 = A 2 , relax to zero as the ensemble equilibrates. We normalize the spatial scale by setting L 2 = 2, so that a symmetric Gaussian density has unit variance in both x and y. The ensemble evolution is initialized by a centered Gaussian density with given a 1 , a 2 .
In qualitative terms, an elliptically distorted initial state develops spiral arms, which are entrained and sheared by the large-scale rotation and eventually relax toward the axisymmetric equilibrium state. Our severe coarse-graining does not resolve the spiral arm structure, but instead fits trial densities to it that track its "shape", quantified by the mean observables, A 1 , A 2 along with the angular impulse |x| 2 , and its "concentration", controlled by the energy ψ /2. Figure 2 plots the temporal profiles of the two mean resolved variables for Euler dynamics (R d = ∞). The initial distortion is relatively large in this case, with a 1 = 0, a 2 = 0.8. Nonetheless, a good agreement with Ensemble Direct Numerical Simulation (EDNS) of 1000 point vortices is exhibited. In the later stages of the evolution some departures are apparent, but these can be mainly attributed to finite-N fluctuations. Figure 3 shows the analogous plots for R d = 4, now with a 1 = 0, a 2 = 0.4. Comparison to the full simulation of the vortex ensemble shows that the optimal closure is less accurate as R d is decreased. Nonetheless, it does capture the time scale of relaxation, which is longer for larger R d . This dependence of the rate of relaxation on R d is attributed to the effect of screening the vortex-to-vortex interactions.  The dependences of the rate of equilibration on the inverse temperature β eq and on the Rossby radius R d are displayed in Figures 4 and 5, respectively. The rate plotted in both of these figures is the double eigenvalue of the 2 × 2 matrix M(+∞), the stationary limit of M(t) for large time t. That M(t) is a scalar matrix (diagonal with equal entries) is a consequence of a rotational symmetry in this reduced model: expressed in polar coordinates the two resolved variables satisfy, A 2 (r, θ) = A 1 (r, θ + π/4). This symmetry implies that the coefficient matrices in the Riccati Equation (32) are special: C and D are scalar, and J is antisymmetric. (We omit the elementary calculations needed to check these properties.) It follows that the solution M(t) of (32) is necessarily a scalar matrix.
The scalar M(t) has a definite interpretation as the dissipation rate for the reduced equations. To justify this claim we consider the relative entropy D KL (ζ eq ζ ), which is the information distance between the nonequilibrium and equilibrium states. Under the near-equilibrium linearization, D KL (ζ eq ζ ) ≈ ξ T Cξ/2 . The closed reduced dynamics (33) then yields d dt Predicted symmetrization rate versus inverse temperature β for three values of the deformation radius. As β → +∞, the equilibration rate tends to zero, indicating that symmetrization is suppressed in the deterministic limit. Thus the information distance decays at a rate that equals twice the value function, and hence we naturally identify M(t) as the rate of equilibration. This rate is itself time-dependent, since M(t) evolves from M(0) = 0 to its stationary value M(+∞), which solves the associated algebraic Riccati equation. We therefore choose the limiting value of M(t) as t → +∞ to define the dissipation rate; this stationary value is plotted in Figures 4 and 5. Figure 4 shows that the equilibriation rate declines to zero as β eq → +∞. This result has an interesting interpretation. In this zero-temperature limit, the equilibrium macrostates of the mean-field theory become deterministic vortex patches (flows having constant potential vorticity inside a closed boundary curve, and zero outside); this limiting property is easily demonstrated from the mean-field equation. For the Euler equations (R d = ∞) an initially elliptical vortex patch does not axisymmetrize, being instead a rotating steady state known as a Kirchhoff ellipse (see §159 of [27]). Thus, our nonequilibrium mean-field theory is compatible with this classical deterministic result in the zero-temperature limit. The fact that the coherent structures observed in two-dimensional turbulence show a strong tendency to axisymmetrize therefore leads us to conclude that typical end states of freely decaying turbulence resemble mean-field equilibria having β < 0, not large positive β. Figure 5, in which β eq = 0, displays the increasing rate of equilibration with increasing R d . This dependence is quite strong when R d is comparable to the size of the vortex ensemble, that is, when R d ∼ 1.
These results illustrate the usefulness of a severely coarse-grained model designed to omit the details of the reorganizing vorticity field, but to capture key features of its nonequilibrium behavior. In particular, the simplicity of this optimal closure allows us to define a unique rate for the axisymmetrization phenomenon, and to predict its dependence on both the temperature of the equilibrium state and the deformation radius.
Taking a broader view, the main output of the optimal closure is the matrix M(t), which controls the dissipative structure of the reduced equations. The stationary matrix M(+∞) plays the role of the transport matrix familiar from classical linear irreversible thermodynamics, in the sense that after the transcient stage it relates the thermodynamic "forces" ξ to the thermodynamic "fluxes" π [44]. From this perspective, the reduced model tested in this section may be regarded as one instantiation of a nonequilibrium thermodynamics of isolated coherent structures.

Barotropization in Two-Layer Quasi-Geostrophic Dynamics
Finally we apply our optimal closure to a simple model in geophysical fluid dynamics that includes a vertical stratification. Specifically, we consider vortex ensembles governed by the two-layer quasi-geostrophic equations [47,48]. For simplicity, we assume that the two layers have equal depth, and are bounded by a flat bottom and a flat top; the fluid in the lower layer is denser than the upper layer. These shallow layers are separated by a free surface, and their interaction is controlled by a single parameter, R d , the internal Rossby deformation radius. In the appropriate quasi-geostrophic limit, the continuum equations for this system reduce to two coupled transport equations for the potential vorticity q(x, t) in each layer; the upper layer is indexed by 1, the lower layer by 2: Direct numerical simulations of rotating stratified fluids [11] reveal a tendency for coherent vorticity structures to approach purely barotropic end states, for which q 1 ≈ q 2 . This "barotropization" reflects a preference for minimal potential energy states. As such it is similar to baroclinic instability, in that it is a mechanism to convert available potential energy into kinetic energy [47,48].
In light of the physical significance of this tendency toward barotropic states, we now employ our optimal closure theory to predict the rate at which initially baroclinic perturbations return to stable barotropic states. Our analysis may be viewed as complementary to baroclinic instability theory, in that we predict the rate of conversion of available potential energy into kinetic energy during the end stages of a baroclinic evolution whereas linear stability analysis treats its initial stages.
We examine only the simplest formulation of this general question. Namely, we consider two like-signed vortical structures in the two layers, and demand that they have equal circulation, Γ 1 = Γ 2 = 1. A barotropic state is one in which the two vortical structures are vertically aligned. We therefore create a baroclinic initial state by horizontally displacing the centers of potential vorticity in the two layers. The mutual advection of these tilted structures as they realign into a barotropic state may be viewed as a three-dimensional analogue to the axisymmetrization of a single-layer vortical structure from an elliptically distorted state, examined in the preceding section.
We take the resolved vector for this problem to be the separation of the centers of the vortex ensembles in the two layers. In the notation of the previous sections, the ensemble-averaged macrostate a has the components Thus, the macrostate consists of the first-order spatial moments of the baroclinic part of q, while the barotropic first-order moments, x(q 1 + q 2 ) dx = 0, are conserved. The conserved total energy and angular impulse are, respectively, Unlike our single-layer reduced models, the second-order moments apart from the L 2 are not included in the resolved macrostate. Accordingly, our model is the severest reduction of the two-layer dynamics: a mean-field theory that includes only first-order spatial moments of potential vorticity along with exact invariants.
The lack-of-fit Lagrangian is derived from a mean-field ansatz that treats the ensembles of vortices in each layer as independent. We omit the straightforward but lengthy calculations needed to formulate this optimal closure for the two-layer dynamics, preferring instead to present some predictions and tests of the ensuing reduced model; details are available in [45]. Figure 6 compares the optimal closure for initial conditions a 1 = 0.6, a 2 = 0.0 with R d = 1.0 against EDNS of a potential vortex system with 1000 vortices in each layer. The initial state consists of Gaussian densities in each layer with L 4 = 4, chosen so that a Gaussian barotropic state has x and y variances in each layer normalized to unity. As the horizontal separation between the center of potential vorticity in each layer is increased in the initial state, the x and y variances are correspondingly decreased to maintain the total angular impulse. Since the size of the vortex ensembles is near to R d = 1, there is strong interaction between the layers and the evolving structure reorganizes into a barotropic equilibrium state relatively quickly. Even a moderate increase to R d = 2, however, results in a slower equilibration and poorer agreement between the EDNS and the optimal closure. The source of this disagreement may be the crudeness of the closure based only on spatial first moments, or the linearization of the closure around equilibrium. It may also be related to memory effects, which are absent from the mean-field closure used throughout this paper. Figure 6. EDNS compared to optimal closure for two-layer quasi-geostrophic dynamics for R d = 1.0. (left): a 1 (t) = x 1 − x 2 ; (right): a 2 (t) = y 1 − y 2 . The initial conditions are a 1 = 0.6 and a 2 = 0.0. Figure 7 displays the dependence of the rate of energy transfer from potential to kinetic energy on the inverse temperature β eq of the barotropic equilibrium state, and on the internal deformation radius R d . The available potential energy is expressible as a quadratic form in ξ ∈ R 2 , namely, where P is the matrix with elements, we recall the notation introduced in (27). Under the closed reduced dynamics, therefore, the rate of conversion of energy is given by As in the discussion of the single-layer equilibration rate, the 2 × 2 matrices involved in this equation are special, due to the spatial symmetries of our reduced model; namely, C, P and M are scalar matrices, while J is antisymmetric. The factor −PC −1 M therefore controls the rate of conversion of APE, and accordingly this scalar factor is displayed in Figure 7.
The APE conversion factor is fastest for β < 0; these barotropic end states are concentrated coherent structures, and for increasing negative β their potential vorticity increasingly concentrates at the origin. For β > 0, the APE conversion factor is noticeably smaller, becoming vanishingly small for large positive β. For each fixed β, the fastest conversion occurs when R d nearly equals the horizontal extent of the vortex ensemble, which is normalized to near unity in our tests. This behavior of the reduced model is reminiscent of linear stability analysis, which identifies the most active modes as those with length scales close to the deformation scale [47,48]. In this regime our optimal closure successfully follows the relative displacement of potential vorticity in the two layers and captures the associated energy transfer. Outside of this regime the barotropization process is slower and the reduced model is less accurate. Presumably this model is too severely coarse-grained to represent the saturation of a baroclinic perturbation accurately over a wide range of deformation scales. Nonetheless, it is the simplicity of the model that allows us define a characteristic rate of energy conversion and thus to identify the conditions under which barotropization proceeds most efficiently.

Conclusions
The optimal closure method used in this paper to coarse-grain point vortex dynamics is a systematic approach to model reduction. From a deterministic microscopic dynamics and a chosen macroscopic description, it invokes an information-theoretic metric to characterize that macroscopic evolution which best fits the underlying microdynamics. The metric quantifies the rate of information loss due to statistical reduction onto the macroscopic observables. The closed reduced equations derived from this intrinsic criterion rely on no intermediate stochastic modeling or adjustable closure parameters. Practical implementation of the optimal closure, however, may be computationally burdensome, because it requires dynamical optimization over paths of macrostates.
In the present application we have restricted our attention to relaxation from near-equilibrium perturbations, which allows us to linearize the optimal closure around equilibrium states, thereby making it computationally efficient. Motivated by the self-organization of coherent vortex structures observed in two-dimensional turbulence, we have derived closed reduced equations for the late stage of this organization. Our closure retains only a few spatial moments of the vorticity, making it a severe coarse-graining of the vortical structure. Comparisons against fully-resolved computations of systems containing 1000 point vortices have shown that the optimal closure approximates the temporal behavior of the few observables retained in the coarse-grained description, and that it furnishes a good estimate of the rate of relaxation toward equilibrium.
The dependence of the relaxation rate on physical parameters is more striking when the fluid dynamics is extended from the two-dimensional Euler equation to the quasi-geostrophic equations for either the single-layer or two-layer shallow water models. In these geophysical models, the Rossby radius of deformation, R d (external or internal depending on the model), is the key length scale. We have therefore tested how the reduced models of these quasi-geostrophic vortex systems compare against full simulations over a range of deformation scales. The axisymmetrization of single-layer structures has been shown to proceed at rates that decrease with decreasing R d . The reduced model based on second-order spatial moments approximates the relaxation toward axisymmetry most accurately for large R d , showing less accuracy as R d decreases and the vortex-vortex interactions weaken. In the two-layer situation, the reduced model, which resolves only the displacement of the centers of vorticity between the two layers, predicts the decay of the baroclinic part of the mean flow during relaxation toward a stable barotropic mean flow. We have found that this reduced model performs best when the spatial scale of the vortical structure is near to R d , which is the regime in which the transfer of potential energy into kinetic energy is the fastest. Even though this result pertains to the late stage of relaxation toward a stable structure, it harmonizes with the intuition drawn from classical linear analysis of baroclinic modes in the early stage of an instability, in which the fastest growing modes are those whose scale is comparable to R d .
The reduced models implemented in this paper have been designed to represent the gross spatial features of isolated vortical structures, and have therefore been based on a severe coarse-graining that uses only low-order spatial moments of the vorticity field. Nonetheless, we have presented the nonequilibrium mean-field theory of vortex systems for an arbitrary spatial coarse-graining. This general derivation naturally suggests that the theory could be applied to more refined reduced models having more resolved variables, A 1 (x), . . . , A m (x). For instance, one could consider vortex systems whose evolving macroscopic states consist of more than one "patch" of vorticity, and design a set of basis functions, A k (x), that coarsely represent the location of those patches. In particular, a nonequilibrium model of the merger (or coalescence) of two such patches, which is observed as a typical phenomenon in two-dimensional turbulence [12,49], could be attempted. Implementation of the optimal closure under these conditions, however, would be beyond the reach of linearization around equilibrium and would therefore require numerical optimization procedures, such as those used in optimal control. Moreover, a memoryless closure in terms of mean-field trial densities might be too crude to capture such far-from-equilibrium behavior. The development of optimal closures having wider scope than those developed in this paper, therefore, presents challenges for future research.