Entropy minimizing curves with application to flight path design and clustering *

Air trafﬁc management (ATM) aims at providing companies with a safe and ideally optimal aircraft 1 trajectory planning. Air trafﬁc controllers act on ﬂight paths in such a way that no pair of aircraft 2 come closer than the regulatory separation norms. With the increase of trafﬁc, it is expected that 3 the system will reach its limits in a near future: a paradigm change in ATM is planned with the 4 introduction of trajectory based operations. In this context, sets of well separated ﬂight paths are 5 computed in advance, tremendously reducing the number of unsafe situations that must be dealt 6 with by controllers. Unfortunately, automated tools used to generate such plannings generally 7 issue trajectories not complying with operational practices or even ﬂight dynamics. In this paper, 8 a mean of producing realistic air routes from the output of an automated trajectory design tool is 9 investigated. For that purpose, the entropy of a system of curves is ﬁrst deﬁned and a mean of 10 iteratively minimizing it is presented. The resulting curves form a route network that is suitable for 11 use in a semi-automated ATM system with human in the loop. The tool introduced in this work is


Introduction
Based on recent studies [3], traffic in Europe is expected to grow on an average yearly rate of 2.6%, yielding a net increase of 2 million flights per year at the 2020 horizon.Long term forecast gives a two fold increase in 2050 over the current traffic, pointing out the need for a paradigm change in the way flights are managed.Two major framework programs, SESAR (Single European Sky Air traffic management Research) in Europe and Nextgen in the US have been launched in order to first investigate potential solutions and to deploy them in a second phase.One of the main changes that the air traffic management (ATM) system will undergo is a switch from airspace based to trajectory based operations with a delegation of the separation task to the crews.Within this framework, trajectories become the basic object of ATM, changing the way air traffic controllers will be working.In order to alleviate the workload of controllers, trajectories will be planned several weeks in advance in such a way that close encounters are minimized and ideally removed.For that purpose, several tools are currently being developed, most of them coming from the field of robotics [10].Unfortunately, flight path issued by these algorithms are not tractable for a human controller and need to be simplified.The purpose of the present work is to introduce an automated procedure that takes as input a set of trajectories and outputs a simplified one that can be used in an operational context.Using an entropy associated with a curves system, a gradient descent is performed in order to reduce it so as to straighten trajectories while avoiding areas with low aircraft density, thus enforcing route-like behavior.This effect is related to the fact that entropy minimizing distributions favor concentration.

Motivation
As previously mentioned, air traffic management of the future will make an intensive use of 4D trajectories as a basic object.Full automation is a far reaching concept that will probably not implemented before 2040-2050 and even in such a situation, it will be needed to keep humans in the loop so as to gain a wide societal acceptance of the concept.Starting from SESAR or Nextgen initial deployment and aiming towards this ultimate objective, a transition phase with human-system cooperation will take place.Since ATC controllers are used to a well structured network of routes, it is advisable to post-process the 4D trajectories issued by automated systems in order to make them as close as possible to line segments connecting beacons.To perform this task, in an automatic way, flight paths will be deformed so as to minimize an entropy criterion, that enforces avoidance of low density area and at the same time penalizes length.Compared to already available bundling algorithms [5] that tend to move curves to high density areas, this new procedure generates geometrically correct curves, without excess curvature.
Let a set γ 1 , . . .γ N of smooth curves be given, that will be aircraft flight paths for the air traffic application.It will be assumed in the sequel that all curves are smooth mappings from [0, 1] to a domain Ω of R q with everywhere non vanishing derivatives in ]0, 1[.This last condition allows to view them as smooth immersions with boundaries and is sound from the application point of view as aircraft velocities are bounding below by efficiency consideration and ultimately by the stall, and therefore cannot be vanish expect at the endpoints.In air traffic applications, the dimension of the state space is generally 2 and sometimes 3 when the evolution of aircraft in the vertical plane is of interest.
The approach taken in this work is first to get a sound definition of spatial density associated with a curve system, then to derive from it an entropy that will be minimized.

Spatial density of a system of curves
Due to the fact that aircraft positions are acquired through radar measurements, a trajectory is known only at discrete sampling times.In the operational context, the sampling period ranges from 4 to 10 seconds which corresponds roughly to 100-250m traveling distance.Derived from that, a classical performance indicator used in ATM is the aircraft density [4], obtained from the sampled positions γ i (t j ), j = 1, . . ., n i on each flight path γ i , i = 1, . . ., N. It is constructed from a partition U k , k = 1, . . ., P of Ω by counting the number of samples occurring in a given U k then dividing out by the total number of samples n = ∑ N i=1 n i .More formally, the density d k in the subset U k of Ω is expressed as: with 1 U k the characteristic function of the set U k .It seems natural to extend the density obtained from samples to another one based on the trajectories themselves using an integral form: where the normalizing constant λ is chosen so as that d k is a discrete probability distribution: and since U k , k = 1 . . .P is a partition: so that λ = N.

60
Density can be viewed as an empirical probability distribution with the U k considered as bins in an histogram.It is thus natural to extend the above computation so as to give rise to a continuous distribution on Ω.For that purpose, local weighting techniques such as kernel density estimation methods are well-known in nonparametric statistics because they are a useful data driven way to yield continuous density estimation.Many references may be found in the literature as in [11,12].Given observations, the resulting estimation will be the sum of weights taking into account the distance between the observations and the location x at which the density has to be estimated; the more an observation is close to x, the greater is the weighting.The weights are defined by selecting a summable function centered on the observations, called a kernel, usually denoted by K : R → R + in the univariate case, and a smoothed version of the Parzen-Rosenblatt density estimator [8,9] is used.Standard choices for the K function are the ones used for nonparametric kernel estimation like the Epanechnikov function [2]: There exists a large variety of kernel functions and any density function satisfying the normalization condition can be considered so that the estimation is a probability density.Moreover, the kernel function is a symmetric positive function, with a first moment equals to zero and a finite second order moment.In the multivariate case, a multivariate kernel function K : R q → R + is selected, that can be expressed by means of a real kernel K associated with a norm, denoted by ., in R q as follows K(x) = K( x ), x ∈ R q .
The normalization condition becomes A kernel version of the density is then defined as a mapping d from Ω to [0, 1]: Normalizing the kernel is not mandatory as the normalization occurs with the definition of d.It is nevertheless easier to consider these kind of kernels, as it is done in nonparametric density estimation.
Note that when K is compactly supported, which is the case of the Epanechnikov function and all its relatives, it comes: provided that Ω contains the set: where the interval [−A, A] contains the support of K.The case of kernels with unbounded support, like Gaussian functions, may be dealt with provided Ω = R q .In the application considered, only compactly supported kernels are used, mainly to allow fast machine implementation of the density computation.
Using the polar coordinates (ρ, θ) and the rotation invariance of the integrand, the relation becomes: which yields a normalizing constant of 2/π for the Epanechnikov function in dimension 2, instead of the usual 3/4 in the real case.When the normalization condition is fulfilled, the expression of the density simplifies to: The normalizing constant is the same as in 2.
As an example, one day of traffic over France is considered and pictured on Figure 1 with the corresponding density map, computed on an evenly spaced grid with a normalized Epanechnikov kernel: Unfortunately, density computed this way suffers a severe flaw for the ATM application: it is not related to the shape of trajectories but also to the time behavior.Formally, it is defined on the set Imm ([0, 1], R q ) of smooth immersions from [0, 1] to R q while the space of primary interest will be the quotient by smooth diffeomorphisms of the interval [0, 1], Imm ([0, 1], R q ) /Diff([0, 1]).Invariance of the density under the action of Diff([0, 1]) is obtained as in [6] by adding a term related to velocity in the integrals.The new definition of d becomes: Assuming again a normalized kernel and letting l i be the length of the curve γ i , the expression of the density simplifies to: The new Diff-invariant density is pictured on Figure 2 along with the standard density.While the overall aspect of the plot is similar, one can observe that routes are more apparent the right picture and that the density peak located above Paris area is of less importance and less symmetric is due to the fact that near airports, aircraft are slowing down and this effect exaggerates the density with the non-invariant definition.The extension of the two dimensional defined that way to the general case of curves in an arbitrary space R q is straightforward.

Further properties of the density
In this section, curves considered are smooth mapping from the closed interval [0, 1] to R q , with a non-vanishing derivative in ]0, 1[.All multivariate kernels K will be assumed smooth, positive, with a unit integral, and of the form x → K ( x ).However, it is not required that they are compactly supported unless explicitly stated.All results are presented for the whole space R q , but apply almost verbatim to an open subset.
Definition 1.Let f be a smooth summable mapping from R to R. The scaling f ν of f is defined, for each ν > 0 to be the mapping: It is clear that the L 1 -norm of the original mapping is preserved by the scaling.Given a summable kernel function K from R to R + , it defines a multivariate kernel K on R q that maps x to K( x ).One may derive from it a parametrized family of kernels in R by mapping each ν in ]0, 1] to the scaled kernel K ν .If the original K is of unit integral, so are all the K ν .
Proposition 2. Let γ : [0, 1] → R q be a smooth path with non-vanishing derivative in ]0, 1[.Let K ν , ν > 0 be a parametrized family of unit integral kernels.The family of Borel measures µ ν defined for any Borel set A by: is tight and converges narrowly to the Lebesgue measure on γ ([0, 1]).
Proof.Let > 0 be given.By the summability of K, it exists a positive real number r such that: ) the open ball of radius r centered at the origin.Since B(0, r) ⊂ B(0, rν −1 ) for ν > 0, the same holds for all the family K ν .Let B(0, M) be an open ball containing γ ([0, 1]).Then: where l(γ) denotes the length of γ.This proves the tightness of the family K ν .
Let f : R q → R be a bounded continuous mapping.It comes: and since f is bounded, the dominated convergence theorem shows that: proving the second part of the claim.
The density in 7 is for a single curve of the form d(x) = l(γ) −1 1 0 K ( x − γ(t) ) γ (t) dt with l(γ) the length of the curve γ.It is invariant under change of parameter, and can be written in a more concise way as: where η is the arclength times l(γ) −1 .
This form allows a simple probabilistic interpretation of the density d: if a point u is drawn on the curve γ according to a uniform distribution and independently a vector v in R q with a density K (the multivariate kernel corresponding to K), then the density of x = u + v is given by the equation 13.
Proposition 3. If the multivariate kernel K has a finite second moment, that is the univariate kernel K is such that: then the Wasserstein distance between the densities d 1 , d 2 associated to smooth curves γ 1 , γ 2 is bounded by: with : where each curve is parametrized by the scaled arclength as in 13.
Proof.Let us consider the plan [1] given by the density: where each curve is parametrized by the scaled arclength.The associated transport cost is given by: letting u = y − x and using Fubini gives: The inner term can be written as: (15) The integral: R q uK( u )du is 0 and, using spherical coordinates: Putting back this value in the expression of the cost gives: Finally: + Vol(S q−1 )M.
As before the middle term vanishes and the first one integrates to: so that: This result indicates that the densities associated to curves γ 1 , γ 2 using the smoothing process described above cannot be too far (with respect to the Wasserstein distance) one from each other if the geometric L 2 distance D(γ 1 , γ 2 ) is small.In fact, the upper bound in Proposition 3 can be interpreted as the cost of moving the smoothed density around γ 1 to the uniform distribution on the curve, then move γ 1 to γ 2 keeping points with equal scaled arclength in correspondence, and finally move the uniform distribution on γ 2 to the smoothed density.
Having a density at hand, the entropy of the system of curves γ 1 , . . ., γ N is defined the usual way as: The entropy is dependent on the particular choice of the kernel K.As mentioned before, it is a common practice in the field of non-parametric statistics to introduce a tuning parameter ν > 0 in the kernel, called bandwidth, so that is it expressed as a scaled version K = f ν of a given function f : R + → R + .The value of ν is the most influential parameter in the estimation of the density and must be selected carefully.For curves clustering applications, it is defined by the desired interaction length: if ν tends to 0, the curves will behave as independent objects while on the other end of the scale very high bandwidth will tend to remove the influence of the curves themselves.For the moment, no automated mean of finding an optimal ν was used, although it will be part of a future work.

Minimizing the entropy
In order to fulfill the initial requirement of finding bundles of curve segments as straight as possible, one seeks after the system of curves minimizing the entropy E(γ 1 , . . ., γ N ), or equivalently maximizing: The reason why this criterion gives the expected behavior will become more apparent after derivation of its gradient at the end of this part.Nevertheless, when considering a single trajectory its is intuitive that the most concentrated density distribution is obtained with a straight segment connecting the endpoints: this point will be made rigorous later.
Letting be a perturbation of the curve γ j such that (0) = (1) = 0, the first order expansion of −E(γ 1 , . . ., γ N ) will be computed in order to get a maximizing displacement field, analogous to a gradient ascent1 in the finite dimensional setting.The notation: ∂F ∂γ j will be used in the sequel to denote the derivative of a function F of the curve γ j in the sense that for a perturbation : First of all, please note that since d has integral 1 over the domain Ω: Starting from the expression of d given in equation 7, the first order expansion of d with respect to the perturbation of γ j is obtained as a sum of a term coming from the numerator: and a second one coming from the length of γ j in the denominator.This last term is obtained from the usual first order variation formula of a curve length: .
Using an integration by part, the first order term can be written as: with: γ j (t) the normal component of: Please note that when dealing with planar curves (i.e. with values in R 2 ), it is κ j (t)N j (t) with κ j (resp 115 N j ) the curvature (resp.the unit normal vector) of γ j . 116 The integral in 23 can be expanded in a similar fashion.Using as above the notation () N for normal components, the first order term is obtained as: From the expressions in 26 and 24, the first order variation of the entropy is: As expected, only moves normal to the trajectory will change at first order the value of the criterion: the displacement of the curve γ j will thus be performed at t in the normal bundle to γ j and is given, up to the (∑ N i=1 l i ) −1 term by: The first term in the expression will favor moves towards areas of high density, while the second and third one are moving along normal vector and will straighten trajectory.This last point can be enlightened by considering the case of a single planar curve with fixed endpoints.Starting with the expression 31, it is clear that the second and third terms occurring in the formula will vanish as the second derivative of γ is 0. Let u be the unit normal vector to γ.Any point x in R 2 can be written as The density d for the γ curve is expressed in ξ, θ coordinates as: and is an even function in ξ.The same is true for K ( γ(t) − x ).Finally, the mapping: is odd for a fixed θ so that the whole integrand is odd as a function of ξ.By the Fubini theorem, integrating first in ξ will therefore yield a vanishing integral, proving the assertion.
The result still holds in R q , the only different aspect being that x is now expanded as x = a + θv + ∑ q−1 i=1 ξ i u i with u i , i = 1, . . ., q − 1 an orthonormal basis of the orthogonal complement of Rv in i , the same parity argument can be applied on any of the components ξ i , i = 1, . . ., q − 1, showing that the integral is vanishing.
The effect of curve straightening is present when minimizing the entropy of a whole curve system, but is counterbalanced by the gathering effect.Depending on the choice of the kernel bandwidth, one or the other effect is dominant: straightening is preeminent for low values, being the only remaining effect in the limit, while gathering dominates at high bandwidths.For the air traffic application, a rule of the thumb is to take 2-3 times the separation norm as an effective support for the kernel.Using an adaptive bandwidth may be of some interest also: starting with medium to high values favors curves gathering, then gradually reducing it will straighten the trajectories.
Using the scaled arclength in the entropy gives an equivalent but somewhat easier to interpret result.Starting with the expression 7, that takes in this case the form: Let i ∈ {1, . . ., N} be fixed.An admissible variation of the curve γ i is a smooth mapping from ] − a, a[×[0, 1] to R q , with a > 0 satisfying the following properties: .
Taking the derivative with respect to t at 0 of equation (b) yields: Letting T(η) be the unit tangent vector to γ i at η, and noting that ∂ η φ(0, η) = l i T(η) , it comes: This relation puts a constraint on the variation of the tangential component of the curve derivative and shows that it has to be constant in η.
Proposition 5. Let D be the mapping from ] − a, a[×R q to R + defined by: where η refers collectively to the scaled arclength parameter for each curve.The partial derivative ∂ t D(0, x) is given by : The proof is straightforward and is omitted.From Proposition 5 the variation of the entropy is derived: This relation is equivalent to 28: it can be seen by splitting the terms into a normal and a tangential component.The first one yields: For the tangential part, the starting point is the relation: where the subscript T stands for tangential component.It comes: With an integration by parts, the first integral in the right hand side becomes: Gathering terms, the expression 28 is recovered.As expected, only the normal components enter the relation, but it has to be noted that the tangential component of ∂ t φ(0, η) is not arbitrary and can be deduced from 37. The gradient with respect to the i-th curve is obtained from the expression of the entropy variation and can be written in its simplest form as: where d is the estimated spatial density.One must keep in mind the constraint on ∂ t φ(0, η) that is hidden within the apparent simplicity of the expression.

Numerical implementation
The two formulations 31, 46 of the gradient may be used.The first one is more complicated but does not require any additional constraint to be taken into account.The second one cannot be applied readily as the tangential component must comply with relation 37.In both cases, F: it is needed to evaluate a spatial integral, which may yield to prohibitive computational time, especially in high dimensions.In the air traffic application, only planar of 3D curves are considered, greatly simplifying the problem.Nevertheless, the performance of the algorithms is still a concern and the choice made was to replace the spatial integral by a discrete sum over a evenly spaced grid.From now, it is assumed that all curves are planar, so that the ambient space for the spatial density d is R 2 .Going back to the expression of d given by 7, a first step is to replace the integral over t by a discrete sum.In practice, curves are described by a sequence of sampled points γ i (t ij ) where the sampling times t ij will be assumed to be identical for all curves.This assumption is not satisfied in the air traffic application so that a pre-processing step must be taken before the actual entropy minimization stage.It will not be described here as any standard interpolation procedure can be applied with negligible differences on the final result.To obtain the results presented here, a cubic spline smoother was used.Since the sampling times are assumed to be the same for all trajectories, the double subscript will be dropped, so that the samples on each trajectory will be denoted as γ ij = γ i (t j ).It is further assumed that the derivative γ ij = γ i (t i ) is available, most of the time through a numerical approximation.Given a quadrature formula on [0, 1] with points t j , j = 1, . . ., m and weights w j , j = 1, . . ., m, the density may be approximated at x ∈ R 2 by: where the lengths l i , i = 1, . . ., N are also obtained with the same quadrature rule: When γ ij is computed in a numerical way, it may be expressed as: where the weights wjk are often obtained through the application of the Lagrange interpolation formula to ensure exactness on polynomials up to a given degree.In a more compact form, it can be written in matrix form as: where the matrix W has entries the weights wjk .The cost of evaluating d at a single point is in o(Nm), with the kernel evaluation being dominant.When dealing with points in R 2 or R 3 and compactly supported kernels a simple trick greatly reduces the time needed to compute d.First of all, the domain of interest is discretized on a evenly spaced grid, so that points of evaluation of the density d are its vertices x ij , i = 1, . . ., n x , j = 1, . . ., n y .The grid step δ x (resp.δ y ) in the first (resp.second) coordinate is the difference between any two adjacent vertices δ x = x i+1,j − x i,j (resp.δ y = x i,j+1 − x i,j (most of the time, δ x = δ y ).Since the expression 47 is linear, the computation can be performed by accumulating values K( x kl − γ ij ) γ ij for a fixed couple (i, j), where only the points x kl close enough to γ ij are considered.In fact, the evaluation can be written as a 2D discrete convolution: When the support of K is small compared to the overall spatial domain, a lot of computation is saved using this procedure.Furthermore, it can be thought as 2D filtering, so that highly efficient algorithms coming from the field of image processing can be applied: in particular, computing the density on a graphics processing unit (GPU) is straightforward and allows to decrease the computational time by at least a factor ten.When dealing with the scaled arclength, the derivative term is not present, an a factor of l i appears in from of the integral.The discrete version becomes: where γ ij = γ i (η j ), η j being in correspondence with t j .Please note that the quadrature weights must be adapted to the abscissa η j , j = 1 . . .m and not to the t j , j = 1 . . .m .Therefore, it is advisable to resample the curves so that the points η j , j = 1 . . .m are for example evenly spaced or of the Gauss-Lobatto form.The former was chosen for the experiments due to its ease of implementation, although the second form is probably more efficient from a numerical point of view and will be investigated in a second stage.
Having the density at hand, the gradient of the entropy with respect to the points γ ij , i = 1 . . .N, j = 1 . . .m can be easily computed using a straightforward application of the formula 31.When dealing with planar curves, a simplification occurs for the second derivative term since for a smooth curve γ j : where κ is the curvature and N the unit normal vector.These quantities may be computed using numerical differentiation, but a coarse approximation based on the rotation rate of the vectors γ i,j+1 − γ i,j , γ i,j+2 − γ i,j+1 works well in many cases.
The case of scaled arclength parametrization needs some extra attention, due to the condition on the tangential component.The simplest approach is to move the points γ ij according to an unconstrained gradient, then to re-sample the obtained curve so as get adjusted γ ij that correspond to the abscissa η j , j = 1 . . .m.
In a numerical implementation, the scaling factor in front of the whole expression may be dropped due to the fact that all gradient-based algorithms will use an automatically tuned step length.As usual with gradient algorithms, one must carefully select the step taking in the maximizing direction in order to avoid divergence.A simple fixed step strategy was first applied and gives satisfactory results on small datasets.A safer approach is to adapt the step size so as to ensure a sufficient decrease of the entropy.Due to the potentially huge dimension of the search space, this procedure has to be simple enough.An approximate quadratic search [13] was used in final implementation.
The procedure applied to one day of traffic over France yields the picture of Figure 3.As expected, a route-like network emerges.In such a case, since the traffic comes from an already organized situation, the recovered network is indeed a subset of the route network in the french airspace.Please note that there is a trade-off between density concentration and minimal curvature of the recovered trajectories, as already mentioned.The kernel bandwidth was chosen empirically in the example presented, with the aid of visual interaction.
In the second example of Figure 4, the problem of automatic conflict solving is addressed.In the initial situation, aircraft are converging to a single point, which is unsafe.Air traffic controllers will proceed in such a case by diverting aircraft from their initial flight path so as to avoid each other, but only using very simple maneuvers.An automated tool will make a full use of the available airspace and the resulting set of trajectories may fail to be manageable by a human: in the event of a system failure, no backup can be provided by controllers.The entropy minimization procedure was added to an automated conflict solver in order to end up with flight paths still tractable by humans.The final result is shown on the right part of Figure 4, where encounters no longer exists but aircraft are bound to simple trajectories, with a merging and a splitting point.Note that since the automated planner acts on velocity, all aircraft are separated in time on the inner part.

Conclusion and future work
Algorithms coming from the field of shape spaces emerge as a valuable tool for applications in ATM.In this work, the foundations of a post processing procedure that may be applied after an automated flight path planner are presented.Entropy minimization makes straight segments bundle emerge, which fulfills the operational requirements.Computational efficiency has to be improved in order to release an usable building block for future ATM systems.One way to address this issue is to compute kernel density estimators using GPUs which excel in this kind of task, very similar texture manipulations.Furthermore, statistical properties such as the optimal choice of the bandwidth parameter in the kernel estimation should be explored in more details in the next step of this work.
Another important point that must be addressed in future works deals with the flight paths that are very similar shape but are oriented in opposite direction.As the spatial density is not sensitive to the directional information, the entropy based procedure presented in this paper will tend to aggregate flight paths that should be sufficiently separated in order to prevent hazardous encounters.In [7], a notion of density based on position and velocity is developed.This work relies on a Lie group modeling as an unifying state representation that take into account the direction and the position of the curves.The curve system entropy has been extended to this setting.

Figure 1 .
Figure 1.Traffic over France and associated density.

Proposition 4 .
Let a, b be fixed points in R 2 and K be a kernel as in 7. The segment [a, b] is a critical point for the entropy associated with the curve system in R 2 consisting of single smooth paths with endpoints a, b.Proof.Let the segment [a, b] be parametrized as γ : t ∈ [0, 1] → a + tv with v the vector (b − a).

Figure 4 .
Figure 4. Initial flight plans and final ones.