Computation of the Hausdorff Distance between Two Compact Convex Sets

The Hausdorff distance between two closed sets has important theoretical and practical applications. Yet apart from finite point clouds, there appear to be no generic algorithms for computing this quantity. Because many infinite sets are defined by algebraic equalities and inequalities, this a huge gap. The current paper constructs Frank–Wolfe and projected gradient ascent algorithms for computing the Hausdorff distance between two compact convex sets. Although these algorithms are guaranteed to go uphill, they can become trapped by local maxima. To avoid this defect, we investigate a homotopy method that gradually deforms two balls into the two target sets. The Frank–Wolfe and projected gradient algorithms are tested on two pairs A,B of compact convex sets, where: (1) A is the box −1,1 translated by 1 and B is the intersection of the unit ball and the non-negative orthant; and (2) A is the probability simplex and B is the ℓ1 unit ball translated by 1. For problem (2), we find the Hausdorff distance analytically. Projected gradient ascent is more reliable than the Frank–Wolfe algorithm and finds the exact solution of problem (2). Homotopy improves the performance of both algorithms when the exact solution is unknown or unattained.


Introduction
The Hausdorff distance [1,2] between two compact sets A and B in a Euclidean space ℝ p is defined as serves as an alternative definition of Hausdorff distance [3].Wikipedia has a helpful entry on Hausdorff distance with a two-dimensional illustration.The theoretical value of Hausdorff distance stems from the fact that it turns the collection of compact sets into a complete separable metric space.In general, Hausdorff distance is challenging to compute.
Hausdorff distance has many applications.For instance, it is instrumental in defining continuity, compactness, and completeness for integral operators, differential operators, and Fourier transforms in functional analysis [4,5].These concepts are in turn relevant to the analysis of the existence, uniqueness, and stability of solutions to various equations in mathematics and physics [6].In computer vision, Hausdorff distance enables object recognition [7] and allows one to quantify the difference between two different representations of the same object [8].Edge detection and pixelization are usually necessary preprocessing steps.Other applications include robotics [9], the fractal modeling of biological structures [10], and the numerical computation of attractors in dynamical systems [11].
The current paper derives and tests two new algorithms for computing the distance d H A, B between two compact convex sets.The previous work on this intrinsically interesting problem is mostly limited to finite point clouds, usually in two and three dimensions [12][13][14][15][16].The formulas The naive implementation benefits from fast software such as the Julia Distances package, which exploits matrix multiplication to find all Euclidean distances between the column vectors of two matrices.The ImageDistances.jlpackage (github.com/JuliaImages/Images.jl) appears to rely on the naive method [12] for computing Hausdorff distances.Once the distances d ij = x i − y j are computed, the computational complexity of the naive method is O mn , where m and n equal the number of points x i ∈ A and y i ∈ B, respectively.This complexity can be reduced by various devices, as suggested in the cited references.
The algorithms for computing the Hausdorff distance between two polygons [17], two curves [18] in the plane, and a curve and a surface [19] represent exceptions to the discrete point-cloud method.More complicated sets defined by algebraic formulas can be attacked by pixelating the sets or sampling them at a dense set of random points.The methods of continuous optimization offer an attractive alternative to the various current methods.
Although the calculation of d H A, B takes us outside the comfortable realm of convex optimization, the tools of convex calculus are highly pertinent.To their credit, these tools perform well in higher dimensions.It remains to be seen whether the Hausdorff distance will have practical value in shape recognition in this regime.It would be prudent to keep the possibility in mind.
To calculate d H A, B , it clearly suffices to calculate d A, B and d B, A separately and take the maximum.For many sets B, dist x, B is explicitly known or can be computed by an efficient algorithm [20,21].The Euclidean distance dist x, B can be expressed as where P B x is the projection of x onto B. When B is convex, the projection operator P B x is single-valued.For closed nonconvex sets, the projection is multiple-valued for some x, but these points are very rare; indeed, they are of Lebesgue measure 0 [22].The scaled squared distance at all points x where P B x is single-valued [23].
The support function σ A v = max x ∈ A v ⟙ x and the corresponding support set supp A v = argmax x ∈ A v ⟙ x also play key roles in our algorithm development.The maximum of d A, B exists and necessarily occurs on the boundary of A. In fact, Bauer's maximum principle [24,25] implies that the maximum is achieved at an extreme point x of A. The point of B corresponding to x occurs on the boundary of B, but not necessarily at an extreme point of B. The supporting set supp A v is a singleton if and only if σ A v is differentiable at v.
Our first algorithm for computing d A, B , is a Frank-Wolfe algorithm [26,27].Our second algorithm,

Materials and Methods
As a prelude to the derivation of the two algorithms, it would be helpful to clarify our notation and make a few remarks about MM algorithms, support functions, and supporting sets.For projected gradient ascent and its homotopy modification, we provide algorithm flowcharts.

Notation
Here are the notational conventions used throughout this article.All vectors appear in boldface.All entries of the vectors 0 and 1 equal 0 or 1, respectively.The ⟙ superscript indicates a vector transpose.The Euclidean norm of a vector x is denoted by x .For a smooth real-valued function f x , we write its gradient (column vector of partial derivatives) as ∇f x and its first differential (row vector of partial derivatives) as df x = ∇f x ⟙ .Finally, we denote the directional derivative of f x in the direction v by

MM Algorithms
The algorithms explored here are minorization-maximization (MM) algorithms [23,30].They depend on surrogate functions g x x n that minorize the original objective f x around the current iterate x n in the sense of satisfying the tangency condition g x n x n = f x n and the domination condition g x x n ≤ f x for all x.The surrogate balances the two goals In minimization, the surrogate majorizes the objective and is instead minimized.The tangency condition remains the same, but the domination condition g x x n ≥ f x is now reversed.The celebrated EM (expectation-maximization) principle for maximum likelihood estimation with missing data [31] is a special case of minorization-maximization. In the EM setting, Jensen's inequality supplies the surrogate as the expectation of the complete data log-likelihood conditional on the observed data.

Support Functions and Supporting Sets
The set of supporting points supp S v = argmax x ∈ S v ⟙ x determines the support function σ S v .
For instance, the ℓ 1 unit ball has supp S v equal to the convex hull of the vertices ±e i where v i is largest.For the unit simplex, supp S v equals the convex hull of the vertices e i where v i is largest.For a Minkowski sum A + B, supp A + B v = supp A v + supp B v .If S is either a convex cone or a compact convex set that is symmetric about the origin with a non-empty interior, then its support function σ S y has a special form.In the former case, σ S y is the indicator of the dual cone, and in the latter case, σ S y is a norm.The support function of a Cartesian product is the Cartesian product of the separate support functions.For instance, the support function of rectangle a, b reduces to the one-dimensional case, where supp a, b v is a when v i < 0, b when v i > 0, and all of a, b when v i = 0.There are many other known support functions.For instance, one-sided penalties such as c max y, 0 and asymmetric penalties such as σ −a, b y are covered by the current theory.Indeed, the former is the support function generated by the interval 0, c .The latter is the tilted absolute value equal to by for y ≥ 0 and to −ay for y < 0. The support function of a singleton a is the linear function a ⟙ y.
More generally, the support function of the convex hull of the set The support function of the line segment from −a to a is the absolute value a ⟙ y .Adding a constant vector a to a set S produces the support function σ S y + a ⟙ y.It is trivial to project onto S + a if one can project onto S. For any non-negative scalar c, the set cS has support function cσ S y .Again, it is trivial to project onto cS if one can project onto S.

Derivation of the Algorithms
When B is convex, the supporting hyperplane inequality for v n = x n − P B x n generates our first algorithm.Maximizing this minorization over x ∈ A is equivalent to calculating the support function of points in A where σ A v is attained, then the Frank-Wolfe algorithm just described can be phrased as The MM principle guarantees that the next iterate x n + 1 will tend to increase the objective 2 unless v n = 0.This exception occurs when x n ∈ B. Fortunately, when the iterates begin in A\B, they remain in A\B.Indeed, if x n ∈ A\B but x n + 1 ∈ B, then the obtuse angle condition [23] requires contradicting the optimality of x n + 1 .To achieve the requirement x 0 ∈ A\B of the Frank-Wolfe method, we put x 0 = P A r for a random vector r and then check that If B is closed and convex, then the gradient of the function 2 is Lipschitz with constant 1 [23].This fact plus the outcome of completing the square entails the minorization Hence, the MM principle implies that defining x n + 1 = P A x n + v n = P A 2x n − P B x n also increases 2 .This second of our two algorithms is a special case of projected gradient ascent.Its flowchart (Algorithm 1) summarizes this straightforward strategy started from many random points x 0 .

Algorithm 1 Computation of d A, B by Projected Gradient Ascent
Require: Projection operators P A x and P B y , initial point x 0 , maximum iterations n, and convergence criterion ϵ > 0. 1 : x = x 0 2: for iter = 1: n do 3: x new = P A 2x − P B x 4: if x new − x < ε then 5: break 6: else 7: x = x new 8: end if 9: end for 10: Return x new

A Homotopy Method
Although both algorithms are guaranteed to increase the objective, they both suffer from the danger of being trapped by local maxima.One obvious remedy is to launch the algorithms from different random points.A more systematic alternative is to exploit homotopy.The idea is to gradually deform both sets A and B from the unit ball U at the origin, where d H U, U = 0 is known, into the target sets A and B. In practice, we follow the solution path along the family of set pairs tA + 1 − t U, tB + 1 − t U from t = 0 to t = 1.This strategy is viable for projected gradient ascent because we can project points onto the Minkowski convex combination tC + 1 − t D by three devices.First, it is well known that when A and B are balls with radii r A and r B and centers c A and c B , respectively, the distance dist x, B is maximized by taking , unless A ⊂ B, in which case the maximum is 0 [32].For the convenience of the reader, Proposition A1 of Appendix A proves this assertion.
Second, one can exploit the projection identity P tS z = tP S t −1 z for any t > 0. Third, there is an effective algorithm for projecting onto a Minkowski sum C + D [33].The idea is to alternate the minimization of z − c − d with respect to c ∈ C and d ∈ D. The iteration scheme c n + 1 = P C z − d n and d n + 1 = P D z − c n + 1 is guaranteed to converge at a linear rate when either set is strongly convex.Recall that a convex K is strongly convex if there exists an r > 0 such that for all x and y in K, α ∈ 0, 1 , and unit vectors z [34].In particular, 1 − t U is strongly convex when t ∈ 0, 1 .Furthermore, the Cartesian product of two strongly convex sets is strongly convex [35].The homotopy method succeeds because the early sets are more rounded and the objective generates fewer local maxima.The price for better performance is iterations within iterations and an overall slower algorithm.Algorithms 2 and 3 summarize our homotopy strategy for projected gradient ascent.For the Frank-Wolfe algorithm, similar homotopy tactics apply.For a Minkowski sum . This fact plus the identity supp tC v = supp C tv for t ≥ 0 makes it possible to carry out the homotopy method.

Convergence
Because this topic has been covered in previous studies [21,[36][37][38][39], we give an abbreviated treatment here.Each algorithm is summarized by a closed algorithm map x n + 1 ∈ M x n that increases the objective f x .The limit points of the map occur among the stationary points of f x .By definition, a stationary point x satisfies d v f x = df x v ≤ 0 for all tangent vectors v at x.The set of tangent vectors v is the closure of the set of points c y − x with y ∈ C and c > 0. This is a place where the convexity of C comes into play.Hence, x is a stationary point if and only if df x x ≥ df x y for all y ∈ C. With this distinction in mind, we state our basic theoretical findings for the Frank-Wolfe method.Homotopy is omitted in these considerations.
Proposition 1.The limit points of the Frank-Wolfe iterates (2) are stationary points of the objective f x = min y ∈ B x − y on A. Furthermore, the bound holds.Thus, the stationary condition max y ∈ A df x y − x ≤ 0 is reasonable to expect at a limit point x of the Frank-Wolfe algorithm.
Here is the corresponding finding for projected gradient ascent.
Proposition 2. The limit points of the projected gradient ascent iterates (3) are also stationary points of the objective f x = min y ∈ B x − y on A. Furthermore, the bound Although the convergence rate O 1 n of projected gradient ascent is slower than the corresponding slow convergence rate O 1 n of the Frank-Wolfe method, in practice, both algorithms usually converge in fewer than 100 iterations.In the case of the Frank-Wolfe algorithm, each iterate is an extreme point.Many convex sets possess only a finite number of extreme points, and convergence to one of them is guaranteed.Unfortunately, the converged point often provides just a local maximum.

Results
We tested the Frank-Wolfe and projected gradient ascent algorithms on two pairs The random point-cloud method was not remotely competitive with projected gradient ascent in either accuracy or speed on these sample problems.It did produce approximate distances that confirmed the best results of the iterative methods.As measured by the quality of its solution, projected gradient ascent also outperformed the Frank-Wolfe algorithm.The Frank-Wolfe method was probably too aggressive, perhaps because it moved directly to an extreme point of A in computing d A, B .On the second problem, projected gradient ascent attained the global maximum across all trials.When the standard deviation of the converged values is positive, it follows that some trials were trapped by inferior local maxima.Both iterative algorithms benefit from the homotopy heuristic, which is fully deterministic.Accordingly, the standard deviations under homotopy equaled 0. Homotopy increased the computation times by less than an order of magnitude for 11 evenly spaced homotopy points.Finally, as p increased, both problems appeared easier to solve by the iterative methods.This behavior was particularly evident when p = 1000.In contrast, the random point-cloud solutions deteriorated as p increased.

Discussion
The Hausdorff distance problem is intrinsically interesting, with theoretical applications throughout mathematics and practical applications in image processing.Given the nonconvexity of the problem, it has not received nearly the attention in the mathematical literature as the closest point problem.Exact values of d H A, B are available in a few special cases such as the two highlighted in our appendix.Research on fast algorithms tends to be limited to random point clouds.Infinite sets defined by mathematical formulas have been largely ignored.The current paper partially rectifies this omission and demonstrates the value of continuous optimization techniques.The Frank-Wolfe and projected gradient ascent algorithms are relatively easy to code and extremely fast, even in high dimensions.
Our preliminary experiments tilt toward projected gradient ascent as the more reliable of the two options.A naive implementation of the point-cloud method is not remotely competitive with projected gradient ascent.More exhaustive studies are warranted beyond the proof of principle examples presented here.
The standard convergence arguments covered in Section 2.6 guarantee that all limit points of the two algorithm classes are stationary points.In practice, convergence appears much faster than the slow rates mentioned in Propositions 1 and 2. We suspect, but have not proved, that full convergence to a stationary point always occurs.This exercise would require a foray into the difficult terrain of real algebraic geometry [40].In any event, convergence to a global maximum is not guaranteed.Fortunately, safeguards can be put in place to improve the chances of successful convergence.The homotopy method capitalizes on the exact distance between two balls.Minkowski set rounding smooths the boundary of the target sets and steers iterates in a productive direction.
The computation of the Hausdorff distance d H A, B is apt to be much more challenging when either A or B is non-convex.Many sets can be represented as finite unions of compact convex sets.If A = ∪ i A i and B = ∪ j B j , then the computation of d H A, B reduces to the computation of d A i , B for each index i and d B j , A for each index j.The identities d A, B = max i d A i , B and d B, A = max i d B j , A make this claim obvious.The further identity dist x, B = min j dist x, B j implies that In general, saddlepoint problems of this sort are hard to solve.One possible line of attack is to minorize min j dist x, B j and then maximize the minorization.
The validation and implementation of this strategy will require substantial effort beyond the introduction to the problem pursued in the current paper.Let us merely add that the fast implementation of a more general Hausdorff distance algorithm will depend on the nature of the candidate sets A i and B j .In the plane, triangles are appealing [41,42].It is straightforward to project onto a triangle, and a triangle by definition possesses exactly three extreme points.Furthermore, a great deal of research under the heading of finite elements has identified good algorithms for triangulating complicated regions of the plane.The software Triangle for generating two-dimensional meshes and Delaunay triangulations is surely pertinent [43].The triangularization of surfaces forms part of the MESH software for Hausdorff distance estimation [44].
We hope this paper will provoke a greater focus on the Hausdorff distance problem.As a prototype non-convex problem, it is worthy of far more attention.Continuous optimization tools can be brought to bear on the problem and may ultimately generate more efficient algorithms than the discrete algorithms designed for finite point clouds.The fact that our algorithms scale well in higher dimensions is a plus.We would also like to highlight the illumination that the MM principle brings to the construction of new high-dimensional optimization algorithms, including those considered here.Although neglected in the past, MM may well be the single most unifying principle of algorithm construction in continuous optimization.
It follows that and that achieves its minimum value when all y i are equal for i > 1.The common value z should satisfy z ≤ 0, while y 1 can have either sign.for y 1 ∈ 0, 1 , we accordingly minimize

Geometrically, max
A = min y ∈ A x − y , and dist x, B = min y ∈ B x − y .Here, ⋅ denotes the standard Euclidean norm in ℝ p .The Blaschke formula d H A, B = max x dist x, A − dist x, B implemented for finite sets A and B.

Algorithm 2 12 : 1 : 2 : 7 : 8 :
Minkowski Set Projection Require: Projection operators P A x and P B y , external point y, convexity constant c, maximum iterations n, and convergence criterion ϵ > 0Return a new and b new Algorithm 3 Homotopy Modification of Projected Gradient Ascent Require: Projection operators P A x and P B x , centers c A and c B , and homotopy points ℎ .Set d = c A − c B and x = c A − c A − c B d ▷ distance between two ball so fradius 1 Let P U be projection onto the unit ball 3= Minkowski sum projection for P A and P U and constant c 6: Put P MB = Minkowski sum projection for P B and P U and constant c Perform projected gradient ascent with P MA , P MB , and initial point x Let x be the outcome 9: end for 10: Perform projected gradientascent with P A and P B and initial point x 11: Return converged value of x ▷ output final phase of homotopy x ∈ A dist x, B = c A − c B + r A − r B should hold, so x = c A − r A c B − c A c B − c A gives the correct sign.◻ Proposition A2.If A is the probability simplex in ℝ p and B is the ℓ 1 ball U translated by 1, then d H A, B = p.Proof.Consider first d A, B .The maximum of d x, B occurs at an extreme point of A, say x = (1, 0, …, 0) ⟙ = e 1 by symmetry.On U the convex function

Table 1 .
Computation of d H A, B by various methods.