The Averaged Hausdorff Distances in 
Multi-Objective Optimization: A Review

A brief but comprehensive review of the averaged Hausdorff distances that have recentlybeen introduced as quality indicators in multi-objective optimization problems (MOPs) is presented.First, we introduce all the necessary preliminaries, definitions, and known properties of thesedistances in order to provide a stat-of-the-art overview of their behavior from a theoretical pointof view. The presentation treats separately the definitions of the (p, q)-distances GDp,q, IGDp,q, and Δp,q for finite sets and their generalization for arbitrary measurable sets that covers as an importantexample the case of continuous sets. Among the presented results, we highlight the rigorousconsideration of metric properties of these definitions, including a proof of the triangle inequalityfor distances between disjoint subsets when p, q ≥ 1, and the study of the behavior of associatedindicators with respect to the notion of compliance to Pareto optimality. Illustration of these resultsin particular situations are also provided. Finally, we discuss a collection of examples and numericalresults obtained for the discrete and continuous incarnations of these distances that allow for anevaluation of their usefulness in concrete situations and for some interesting conclusions at the end,justifying their use and further study.


Introduction
In many real-world applications, the problem of concurrent or simultaneous optimization of several objectives is an essential task known as a multi-objective optimization problem (MOP). One important problem in multi-objective optimization is to compute a suitable finite size approximation of the solution set of a given MOP, the so-called Pareto set and its image, the Pareto front.
The Hausdorff distance d H (e.g., Reference [1]) measures how far two subsets of a metric space are from each other. Due to its properties, it is frequently used in many research areas such as computer vision [2][3][4], fractal geometry [5], the numerical computation of attractors in dynamical systems [6][7][8], or convergence of multi-objective algorithms to the Pareto set/front of a given multi-objective optimization problem [9][10][11][12][13][14][15]. One possible drawback of the classical Hausdorff distance, however, is that it punishes single outliers which leads to inequitable performance evaluations in some cases. As one example, we mention here multi-objective evolutionary algorithms. On the one hand, such algorithms are known to be very effective in the (global) approximation of the Pareto set/front. On the other hand, it is also known that the final approximations (populations) may contain some outliers (e.g., Reference [16]). For such cases, the Hausdorff distance may indicate a "bad" match of population and Pareto set/front, while the approximation quality may be indeed "good". To avoid exactly this problem, Schütze et al. introduced the averaged Hausdorff distance ∆ p in Reference [16], but the initial definition only works for finite approximations of the solution set and does not behave as a proper metric in the formal mathematical sense. In Reference [17], the indicator ∆ p,q has been proposed by the first two authors of this paper. ∆ p,q is an averaged Hausdorff distance that fixes the metric behavior of ∆ p . Later, in Reference [18], a broader definition was given on metric measure spaces, suitable for the consideration of continuous approximations of the solution set. Moreover, this generalized indicator ∆ p,q preserves the nice metric properties of the initial finite case and reduces to it when using the standard discrete measure.
While the averaged Hausdorff distance has so far mostly been used for performance assessment of multi-objective evolutionary algorithms (using benchmark functions), it has also been used on MOPs coming from real-world problems including the multi-objective software next release problem [19], arc routing problems [20], power flow problems [21], engineering design problems [22], foreground detection [23], and contract design [24]. Several other indicators have also been proposed in the literature, like the hypervolume indicator or R indicators, each one with its own advantages and drawbacks, but their consideration is beyond the scope of this work. Information concerning other indicators can be found, e.g., in References [25][26][27].
The material reviewed in this work is based on recently published works [17,18,28]. The remainder of the document is organized as follows: in Section 2, we will briefly state the required background for MOPs and power means. In Section 3, we review the p-averaged Hausdorff distance ∆ p . In Section 4, we will discuss its generalization, the (p, q)-averaged Hausdorff distance ∆ p,q , explaining individually the finite and continuous cases. In Section 5, we will consider some aspects of the metric properties of ∆ p and ∆ p,q . In Section 6, we study the Pareto compliance of the performance indicators related to ∆ p and ∆ p,q . In Section 7, we will present some examples and numerical experiments. Finally, in Section 8, we will draw our conclusions and will discuss possible paths for future research in this direction.

Preliminaries
In this review, we introduce tools from a metric perspective that deal with two related contexts: distances between finite subsets of a metric space and distances between general measurable subsets of a metric measure space. The second context actually contains the first, but we deal separately with both of them, starting with the simpler setting of finite subsets before passing to the more general situation of arbitrary measurable sets that also contains the important special case of continuous sets. To emphasize each context, we use the convention that general sets will be denoted by X, Y, and Z, but when they are finite, the labels A, B, and C will be used.

Multi-Objective Optimization
First, we briefly present some basic aspects of multi-objective optimization problems (MOPs) required for the understanding of this paper. For a more thorough discussion, we refer the interested reader, e.g., to References [12,29,30].
A continuous MOP is rigorously formalized as the minimization of an appropriate function: where F denotes a vector-valued function with components f i : Q → R, for i = 1, . . . k, called objective functions. Explicitly, x −→ F(x) := ( f 1 (x), . . . , f k (x)).
The optimality of a candidate solution to a MOP depends on a dominance relation [31] given in terms of the partial order introduced below. Definition 1. For x, y ∈ Q, the partial Pareto ordering associated with the MOP determined by F is defined as x y if and only if f i (x) f i (y), for all i = 1, . . . , k.
For x, y, z ∈ Q and X, Y ⊂ Q, the following notions of dominance (≺) and non-dominance (⊀) are standard in this context: x is dominated by y, written y ≺ x if y x and F(x) = F(y). z is dominated by X, written X z if x z for some x ∈ X, otherwise X z.
X is dominated by Y, written Y X if ∀x ∈ X ∃y ∈ Y such that y x, otherwise Y X.
In addition, x ∈ Q is called a Pareto-optimal point if it is nondominated, i.e., y ∈ Q with y ≺ x. Finally, the Pareto set P ⊂ Q consists of all Pareto-optimal points and the Pareto front is defined as its image F(P) ⊂ R k .
MOPs commonly possess the important characteristic that, when mild smoothness conditions are fulfilled, the solution (or Pareto) set P and its image the Pareto front F(P) ⊂ R k consist of d dimensional subsets for d = k − 1 (or even less) when the problem involves k objective functions ( [32]).
As an example, let us describe a simple unconstrained MOP [33,34] given by where a i = (a i 1 , . . . , a i n ) ∈ R n and i = 1, . . . , k. The a i 's correspond to the minimizers of each quadratic objective f i , and the Pareto set of this problem consists of a (k − 1) simplex containing all the a i 's as vertices, i.e., simp k−1 := simp(a 1 , . . . , In the particular case when n = 1, k = 2, a 1 = 0, and a 2 = 2, the problem becomes This is the so-called Schaffers problem [35]. Figure 1 illustrates the objectives f 1 and f 2 , and the Pareto front F(P) for this MOP. In this case, the Pareto set corresponds to P = [0, 2] and the Pareto front is a continuous convex curve in R 2 joining (0, 4) with (4, 0). In many real-world applications, MOPs arise naturally. As one example, in almost all scheduling problems (e.g., References [36][37][38][39][40][41]), the total execution time (make-span) is of primary interest. However, the consideration of this objective is in many cases not enough since other quantities such as the tardiness or the energy consumption also play an important role and can consequently, according to the given problem, also add objectives to the resulting multi-objective problem.
For the numerical treatment of MOPs, there exist already many established approaches. For instance, there are mathematical programming techniques [29,42], point-wise iterative methods that are capable of detecting single local solutions of a given MOP. Via use of a clever sequence of these resulting scalar objective optimization problems, a suitable finite size approximation of the entire Pareto front can be computed in certain cases [43][44][45][46]. Multi-objective continuation methods take advantage of the fact that the Pareto set at least locally forms a manifold [47][48][49][50][51][52]. Starting with an initial (local) solution, further candidates are computed along the Pareto set of the given MOP. All of these methods typically yield high convergence rates but are, in turn, of local nature. A possible alternative is given by set oriented methods such as subdivision and cell mapping techniques [53,54] and evolutionary algorithms [55][56][57][58][59] that are of global nature and are capable of computing a finite size approximation of the Pareto front in one single run. Figure 1. Left: the objectives f 1 (x) = x 2 and f 2 (x) = (x − 2) 2 from a multi-objective optimization problem (MOP; Equation (1)). Right: the corresponding Pareto set over the interval [0, 2].

Finite Power Means
A comprehensive reference on the theory and properties of means is given in Reference [60], where proofs of the statements presented here for finite power means and for integral power means in the following subsection can be found (see also Reference [18] for integral means).
For a finite set A ⊂ [0, ∞) and a nonzero real p, the p-average or the p power mean of A is given by to denote the arithmetic mean of the elements of a finite set A ⊂ R 0 . It is well known that limit cases of power means recover familiar quantities, for example, Proposition 1. Let A and B be finite subsets of [0, ∞) and p, q ∈ R be arbitrary constants. Then, the following properties hold for finite power means: 3. For a matrix of nonnegative elements D = (d a,b ) with a ∈ A and b ∈ B: 5. For the harmonic mean: harm (A) |A| min (A).

Integral Power Means in Measure Spaces
In order to present this part with sufficient generality, let us denote by (S, µ) a measure space. Let M(S ) be the σ algebra of measurable subsets of S and M <∞ (S) be the collection of those subsets with finite measure. Now, we recall some fundamental properties of integral power means in this setting needed for the forthcoming sections. For p ∈ RK{0} and a measurable function f : X ⊂ S → [0, ∞) defined on a subset X ∈ M <∞ (S), the p power mean or p-average of f over X is given by For convenience, RHS of Equation (2) will be denoted simply as where |X| := µ(X) refers in this context to the measure of X and not to its cardinality as in the finite case. For brevity, when the measure µ employed is clear, dµ will be abbreviated by dx to highlight the variable being integrated. The shorthand M p ( f (X)) := M p x∈X ( f (x)) will also be employed.
For p 1, the integral p mean corresponds to M p ( f (X)) = |X| − 1 p f p , where · p is the standard p norm of the Lebesgue space L p (X, µ). The cases p = ±∞ can also be included by taking the limits p → ±∞. In fact, since the essential supremum of the function f on X is f ∞ = ess sup x∈X f (x), and when f is not identically zero its essential infimum is precisely 1/ f −1 ∞ = ess inf x∈X f (x); by calculating the limits, we obtain that and similarly, Note that · ∞ corresponds to the norm of the space L ∞ (X, µ). For p = 0, it is possible to define M p as the integral generalization of the notion of geometric mean, and it is given explicitly by

The p p p -Averaged Hausdorff Distance
When trying to measure the distance between subsets of Euclidean space or even an arbitrary metric space, a natural choice is the well-known Hausdorff distance d H that is extensively employed in many different contexts. However, its use is of limited practical value to measure the distance to the Pareto set/front in typical MOPs, such as stochastic search methods implemented by an evolutionary algorithm. This is due to the fact that these algorithms may produce a set of outliers that can be heavily punished by d H . As a partial remedy, the use of an averaged Hausdorff distance ∆ p was first proposed in Reference [16] to replace d H .
Let d : S × S → [0, ∞) denote a distance function on a metric space S for which the standard properties of the identity of indiscernibles, nonnegativity, symmetry, and subadditivity (more commonly known as the triangle inequality) are satisfied. For simplicity, throughout the text, the metric d can be assumed to be the standard Euclidean distance d(x, y) := x − y induced on some S ⊂ R k by the Euclidean 2 norm of R k , but the theory carries over to any general metric space (S, d). The indicators GD p and IGD p in Definition 3 correspond to simple adjustments to the definitions of the generational distance [61] and the inverted generational distance [62].
The standard Hausdorff distance is recoverable from ∆ p by taking the limit lim p→∞ ∆ p = d H , but for any finite value of p, the distance ∆ p is obtained from standard p power means of all the distances employed to calculate the supremum in part 2 of Definition 2, which is needed to define d H .
The advantage of using ∆ p as an indicator is that it does not immediately disqualify a few outliers in a candidate set, contrary to what d H does and that, among the possible configurations of (finite) candidate solutions to a MOP, it assigns lesser distances to the Pareto front to those solutions appearing evenly spread along its whole domain (see, e.g., Reference [63]). The behavior of ∆ p as a quality indicator is studied, e.g., in References [16,28], and it corresponds to the particular case q → −∞ of the results for general (p, q)-indicators presented in Section 6.
Concerning its metric properties, ∆ p has the drawback of not being a proper metric in the usual sense because for any non-unit set A ⊂ S the distance ∆ p (A, A) > 0. This problem will be fixed in the following section with a simple modification. Nevertheless, independently from that, for a positive number p, the distance ∆ p does not satisfy the triangle inequality but only a weaker version of it. Indeed, as a consequence of Corollary 3, we have that where N = max{|A|, |B|, |C|} 1 and α = 1/p.
For further details concerning ∆ p , its properties, and its relation to other indicators, the reader can consult, e.g., References [16,63].

The (p, q)
(p, q) (p, q)-Averaged Hausdorff Distance To better evaluate the optimality of a certain candidate set to approximate the Pareto set/front of a MOP, several generalizations of the averaged Hausdorff distance ∆ p have been recently introduced.

(p, q)-Distances between Finite Sets
Definition 4. For p, q ∈ RK{0}, the generational (p, q)-distance GD p,q (A, B) between two finite subsets A, B ⊂ S is given by The distance GD p,q (A, B) can be extended for values of p = 0 or q = 0, by taking the limits p → 0 or q → 0, respectively. In such cases, properties of finite power means suggest the following definitions: We can also calculate GD p,q when p → ±∞ or q → ±∞ by changing the corresponding sum with a minimum or a maximum according to the case. In particular, we have the nice relation Note that the definition of GD p,q has two drawbacks, namely GD p,q (A, B) does not necessarily vanish if A = B and in general GD p,q (A, B) = GD p,q (B, A), hence it does not define a proper metric. In order to get one, a slight modification is needed. (3) and Definition 5, we easily obtain lim In this way, for finite and disjoint sets, the indicator ∆ p,q is a generalization of ∆ p . Similarly to the relation for a ∈ A and b ∈ B, we also have the following relation between the (p, q)-generational distance GD p,q (A, B) and the matrix p,q norm D AB p,q , where the definition of the latter is precisely that of GD p,q but replacing all the normalized sums ∑ by standard ones ∑ (see, e.g., Reference [64]): A useful property of the distance ∆ p,q is that the parameters can be adjusted independently to achieve some desired spread of the archives by choosing an appropriate q and that they can be located with custom closeness to the Pareto front of a MOP by an adequate choice of p.

(p, q)-Distances between Measurable Sets
With the aid of Proposition 2, the results of the previous section can be generalized to subsets of a metric space (S, d) endowed with an appropriate measure µ. For concreteness, S can be taken to be a subset of R k carrying the metric induced from the Euclidean metric of R k and endowed with an appropriate non-null measure µ. Notice that, in our intended applications, µ will not be the restriction of the standard Lebesgue measure of R k to S for the simple reason that it can easily vanish as it happens on any hypersurface or lower dimensional subsets of R k . In this case, a lower dimensional measure is needed and alternatives like the Hausdorff measure on S can be used, since it gives rise to the standard notion of d dimensional volume for d submanifolds of R k . When these submanifolds are parametrized by functions from subsets of R d , the same volume will be obtained by a change of variable formulae from the standard Lebesgue measure on those subsets of R d .
A very important observation in this context is that any set-theoretic relation obtained from measure-related calculations needs to be understood to hold almost everywhere (a.e.). Therefore, for X, Y ∈ M <∞ (S), the statements X = Y or X ⊂ Y mean that the relations hold a.e., i.e., µ{X = Y} = 0 or µ{X Y} = 0, respectively. In other words, in this setting, we will always identify X ∈ M <∞ (S) with its equivalence class [X] := {Y | X = Y, a.e.}. This means that those classes will be regarded as the elements of M <∞ (S), removing the need to carry the abbreviation a.e. all the time. Henceforth, to simplify complicated formulae, d(x, y) will be shortened to d x,y . Definition 6. Let p, q ∈ R K {0}. For finite-measure subsets X, Y ∈ M <∞ (S), their generational (p, q)-distance is given by The cases p < 0 or q < 0 are well defined only if X and Y are disjoint subsets.
Similarly to the finite case, GD p,q can be extended to values of p, q ∈ R, but there are two drawbacks: GD p,q (X, X) = 0 only if X is a unit-set or singleton, and GD p,q (X, Y) can differ from GD p,q (Y, X). To fix this undesirable behavior, we repeat the strategy used in the finite case as follows.

Remark 1.
In general, the (p, q)-distances are maps: On the collection of finite subsets of S, the standard counting measure can be taken as the underlying one needed for these measure-theoretic notions of GD p,q and ∆ p,q , and in this case, these distances become precisely the finite-case distances given in Definitions 4 and 5.

Remark 2.
For disjoint subsets X and Y, Definition 5 in the finite case and Definition 7 above in the measurable case reduce to the simpler form which is the one we will actually use in most situations. The more general definition for non-disjoint subsets is given with the purpose that the distance so-defined changes continuously as one set approaches the other until their distance vanishes. In other words, the general definition allows the distance to become a continuous function with respect to the metric topology that it determines. Nevertheless, for practical purposes dealing with applications and for most of the results presented below, the simpler definition between disjoint subsets suffices.

Metric Properties
To explain some of the terminology used in this section, we recall to the reader that the standard triangle inequality for a distance function d : S × S → [0, ∞) is usually weakened in two different but related ways by postulating the existence of a constant C > 0 such that, for any points x, y, z ∈ S, one of the following conditions hold: Since the second condition implies the first one by using the very same constant C > 0 and, reciprocally, the C relaxed triangle inequality implies the 2C inframetric one, both conditions are equivalent for an appropriate choice of constants. A semimetric satisfying any one of these conditions will be simply called an inframetric.
For arbitrary measurable sets in S, the following results summarize the metric properties of GD p,q and ∆ p,q . Using the counting measure, these properties also apply to finite sets. For more details, see Reference [17] in the finite case and Reference [18] in the generalized measure-theoretic context. Theorem 1. For p, q ∈ [1, ∞], the generational (p, q)-distance GD p,q is subadditive in M <∞ (S), i.e., for any X, Y, Z ∈ M <∞ (S), the triangle inequality holds true: Proof. The proof follows easily by simple steps using the properties in Proposition 2. We start from the standard triangle inequality for d(·, ·): taking at both sides the q-average over Z and using 1-3 of Proposition 2 to arrive at Now, there are two independent cases for the parameters p, q ∈ [1, ∞). We explain here only the case p q, but the case q < p follows by similar arguments; see Thm. 2 in Reference [18]. Calculating the p-average over X at both sides of Equation (4) and using 1, 3, and 5 of Proposition 2, we get Since the LHS of Equation (5) is GD p,q (X, Z), after a further p-average over Y at both sides of Equation (5) and parts 1, 3, and 5 of Proposition 2, we obtain But from 2, 4, and 5 of Proposition 2, the first term at the RHS above satisfies

Corollary 1.
If p, q ∈ RK{0}, the (p, q)-averaged Hausdorff distance ∆ p,q is a semimetric on the space M <∞ (S) of finite-measure subsets of S. Furthermore, if p, q ∈ [1, ∞) the distance ∆ p,q behaves as a proper metric when it is restricted to disjoint subsets of M <∞ (S).
Proof. From Definition 7, we obtain the relations ∆ p,q (·, ·) 0 and ∆ p,q (X, Y) = ∆ p,q (Y, X) for any X, Y ∈ M <∞ (S) and all p, q ∈ RK{0}. Moreover, from Definition 6, it follows that GD p,q (X, YKX) = 0 if and only if X = ∅ or Y ⊆ X (hence, YKX = ∅). Therefore, for X, Y = ∅, i.e., ∆ p,q is a semimetric on the collection of finite-measure subsets M <∞ (S). Finally, for disjoint X and Y, it is clear that GD p,q (X, YKX) = GD p,q (X, Y); thus, by Theorem 1, the triangle inequality holds for both arguments inside the maximum that defines ∆ p,q when p, q ∈ [1, ∞). This implies that the triangle inequality is also valid for ∆ p,q .
Theorem 2. Let X, Y, Z ∈ M <∞ (S) be subsets admitting positive constants r < R such that r d u,v R for any u ∈ X ∪ Y and v ∈ Y ∪ Z. Then, for all p, q ∈ RK{0}, |p|, |q| 1 and at least one of them negative a relaxed triangle inequality holds for GD p,q , namely

Proof.
Step 1: Let p ∈ RK{0}, and suppose that q < 0. We will prove that For all x ∈ X and y, z ∈ Y, we have Step 2: Now, for q ∈ RK{0} and p < 0, we will prove that By assumption, we have r R d x,y d u,y R r for any y ∈ Y and all x, u ∈ X. Similarly as before and using 1 and 4 of Proposition 2, we conclude from the RHS Step 3: The previous steps can be summarized in the expression Using again 4 of Proposition 2 and Definition 6, we get GD p,q (X, Z) GD |p|,|q| (X, Z). From this, the subadditivity for GD |p|,|q| (Theorem 1), and Equation (8), we conclude

Remark 3.
For parameters (p, q) ∈ R 2 that lie in the orange or blue sectors in Figure 2, the distance GD p,q fulfills a C relaxed triangle inequality for a constant C = R 2 /r 2 only if the condition r d u,v R holds for all u ∈ X ∪ Y and v ∈ Y ∪ Z. On bounded and topologically separated sets (i.e., not having common limit points), this condition always holds, and on them, ∆ p,q becomes an inframetric as explained below. Figure 2. According to Corollary 1, when acting on disjoint subsets, ∆ p,q behaves as a proper metric if (p, q) lies in the blue sector, and according to Corollary 2, it behaves like an inframetric if (p, q) lies in the orange sectors.

Corollary 2.
Under the same hypotheses of Theorem 2, the (p, q)-averaged Hausdorff distance ∆ p,q satisfies Proof. It is immediate using Theorem 2 and Definition 7.
When the involved sets are finite, a generally sharper inframetric relation holds. For emphasis, we employ in this context the notation A, B, C for those subsets of S. Theorem 3. If p, q ∈ R and |p|, |q| > 1, the (p, q)-distance GD p,q satisfies the relaxed triangle inequality Proof. For arbitrary p = 0, let us assume that q < 0, so that |q| = −q. We can write If N α does not need to be sharp, α can always be chosen to take the larger value |p| −1 + |q| −1 .
Proof. The corollary follows immediately from Theorem 3 and Definition 5.
To conclude this section, we return to the general setting of arbitrary measurable sets to explain the behavior of ∆ p,q when changing the value of its parameters p and q. Theorem 4. Let X, Y ∈ M <∞ (S) and suppose that p, p , q, q ∈ R satisfy p p and q q . Then, Proof. It follows easily from part 5 of Proposition 2 and Definition 7.

)-Distances as Quality Indicators
Let Q be a decision space Q and F : Q → R k be a multi-objective function on it, of which the associated MOP consists in the simultaneous minimization of its k component functions f 1 , . . . , f k . A candidate solution to this problem is Pareto-optimal if all elements of its image in F(Q) ⊂ R k are nondominated in the sense of Pareto [31]; see Definition 1. For the forthcoming discussion, let us introduce the following abbreviated and useful notation. For X, Y ⊂ Q and any z ∈ Q, we define the following From these definitions, it follows that, for arbitrary z ∈ Q and X, Y ⊂ Q, there are partitions: where stands for the disjoint union of subsets. A similar notation with the subindices ≺, , and can also be used in an analogous way. Let us recall that an archive X ⊂ Q is, by definition, a subset of mutually non-dominated points; therefore, for any x, x ∈ X, the condition x x implies x = x . This basic property implies that F : Q → R k is a bijection when restricted to any archive X ⊂ Q and, therefore, the points in F(X) ⊂ F(Q) can be univocally labeled by the elements of X. Moreover, for a finite archive A ⊂ Q, both sets have the same number of elements |A| = |F(A)|. Now, we introduce a couple of strengthened notions of dominance between sets (archives) that are required for the validity of most of the results in this section.

Definition 8.
An archive X is well-dominated by an archive Y if 1. X is dominated by Y, written Y X, i.e., ∀x ∈ X, ∃y ∈ Y s.t. y x, and 2. Y consists only of dominating points of X, i.e., ∀y ∈ Y, ∃x ∈ X s.t. y x.
Moreover, X is said to be strictly well dominated by Y if 3. ∃y ∈ YKX, ∃x ∈ XKY such that y ≺ x.
For an archive X ⊂ Q, the GD p,q , IGD p,q , and ∆ p,q quality (or performance) indicators assigned to it will be defined as the distance of its image F(X) to the Pareto front F(P), i.e., I GD p,q (X) := GD p,q (F(X), F(P)), I IGD p,q (X) := IGD p,q (F(X), F(P)), and I ∆ p,q (X) := ∆ p,q (F(X), F(P)).
In this section, we study the behavior of I GD p,q , I IGD p,q , and I ∆ p,q as performance indicators. An example of a weakly Pareto-compliant performance indicator is the Degree of Approximation (DOA; see Reference [10]).

Pareto Compliance of (p, q)-Indicators in the Finite Case
In order to obtain general conclusions on the features of the averaged Hausdorff distance ∆ p,q as a quality indicator, we consider first the behavior of GD p,q . For additional details on the material presented in this section and other related results in the context of the p-averaged Hausdorff Distance ∆ p , the reader is referred to Reference [28].
For the following statements, we will abbreviate δ q (a, B) := (∑ b∈B d(a, b) q ) 1 q . Clearly, with this notation, I GD p,q (A) = (∑ a∈A δ q (F(a), F(P)) p ) 1 p , where in the averaged sum ∑, we are labeling the points in F(A) by the elements of the archive A, taking advantage of the fact that |A| = |F(A)|, as it also will be done with all the averages in this section.
Theorem 5. Let A, B ⊂ Q be finite archives with A strictly well dominated by B. For all a ∈ A and b ∈ B, (or equivalently an strict equality); then, I GD p,q (B) < I GD p,q (A).
Proof. By condition 1, for all a ∈ A and b ∈ B a , the inequality δ q (F(b), F(P)) p δ q (F(a), F(P)) p holds true. After averaging over all b ∈ B a at both sides, we have ∑ b∈B a δ q (F(b), F(P)) p δ q (F(a), F(P)) p , and averaging once again over all a ∈ A produces From property 2 and noticing that each b ∈ B appears |A b | times in the initial sum, the LHS becomes Returning to Equation (9), we conclude that I GD p,q (B) I GD p,q (A). Lastly, part 3 of Definition 8 for strictly well-dominated sets guarantees that this is an strict inequality, proving the assertion.   d(a, B) between a point and a set. For the inverted generational distance IGD p,q in the finite case, we provide here two useful results without explicit proofs. The necessary steps are similar to the arguments used to prove the analogous statements for IGD p in Prop. 3.8 of Reference [28] and Thm. 3.9 in Reference [28]. Those statements correspond here to the limit q → −∞, and the main difference in the proofs is that the Euclidean distante d(a, B) needs to be changed everywhere by the q-average δ q (a, B), as it was done above for the proof of Theorem 5 that generalizes the proof of Thm. 3.4 in Reference [28]. The reader can also find there additional remarks on similar hypotheses to the ones needed for Theorem 6 below.

Proposition 3. Let A, B ⊂ Q be finite and strictly well-dominated archives with B
A such that for all a ∈ A, b ∈ B, and x ∈ P: b ≺ a implies δ q (F(b), F(x)) < δ q (F(a), F(x)); then, I IGD p,q (B) < I IGD p,q (A).   Figure 4. Two situations where IGD p,q (B) is better (smaller) than IGD p,q (A) for sufficiently negative q: Here, the hypotheses of Proposition 3 hold true. Theorem 6. Let A, B ⊂ Q be finite and strictly well-dominated archives such that B A. If at least one of the following conditions is satisfied, 1. ∀a ∈ A, ∀b ∈ B: b ≺ a implies IGD p,q (F(b), F(P B )) < IGD p,q (F(a), F(P B )); 2. ∃a 0 ∈ A such that ∀x ∈ P B : a 0 ∈ arg min a∈A δ q (F(x), F(a)); 3. ∀x ∈ P B : δ q (F(A), F(x)) = δ A ; Finally, a general statement on the Pareto compliance of the finite case of the (p, q)-averaged Hausdorff distance ∆ p,q follows as a consequence of Theorems 5 and 6.

Theorem 7. Let A, B ⊂ Q be finite and well-dominated archives such that B A. If for all a ∈
and at least one of the following conditions is satisfied: 1. ∀a ∈ A, ∀b ∈ B, ∀x ∈ P: b ≺ a implies δ q (F(b), F(x)) < δ q (F(a), F(x)); 2. ∃a 0 ∈ A such that ∀x ∈ P B : a 0 ∈ arg min a∈A δ q (F(x), F(a)); 3. ∀x ∈ P B : δ q (F(A), F(x)) = δ A ; then, I ∆ p,q (B) < I ∆ p,q (A). Figure 5 illustrates four situations where Theorems 6 and 7 apply with very large q. In the first row, the left diagram is a modification of the second case in Figure 4 where condition 1 holds. In the right diagram, the diamond lying at the lower left corner of F(A) represents the image F(a 0 ) of a point a 0 satisfying condition 2. Finally, both diagrams in the second row exhibit cases where the points of F(P) are equidistant to corresponding points in F(A), making condition 3 valid, with δ A being this distance. Figure 5. Four examples where IGD p (B) is smaller (better) than IGD p,q (A) for sufficiently negative q: In each case, at least one of the requirements of Theorem 6 is satisfied.

Pareto Compliance of (p, q)-Indicators in the General Case
We consider now the behavior of the generalized GD p,q distance with respect to the Pareto-compliance, concentrating on the most important aspects that describe its characteristics and using similar hypotheses to the ones needed in the previous section for ∆ p,q in the finite case.
Here, we will continue to assume that the decision space Q with objective function F : Q → R k defining the MOP under consideration has a Pareto set P ⊂ Q with corresponding Pareto front F(P) ⊂ F(Q). Also, we assume that the objective space F(Q) ⊂ R k carries a metric d that, for simplicity, can be taken to be the one inherited from the Euclidean distance d(·, ·) in R k . In addition, to define the (p, q)-indicators on MOPs that require general non-finite sets, we need a measure space (S, µ), that here will be taken to be S := F(Q) endowed with a non-null measure µ according to the comments at the beginning of Section 4.2. In this context X, Y ⊂ Q will denote arbitrary subsets such that F(X), F(Y) ⊂ F(Q) are measurable with non-null and finite measures.

Remark 5.
Recall that, here, |F(X)| = µ(F(X)) denotes the measure of F(X) ⊂ S. In this context, Q will not be asked to carry a measure and the notation |X| will have no a priori meaning for X ⊂ Q. Nevertheless, it is possible to induce a measure on those subsets of Q where F is bijective by taking the pullback µ * of µ to them, making the identity |X| = µ * (X) := µ(X) = |F(X)| trivially true. This can be done for all archives but not for subsets where F is not bijective. When it is Q that carries a measure, a push-forward measure can be always defined on its image F(Q), making this identity true for all sets. This was implicitly assumed in the presentation provided in Reference [18] (Section 3.4). For clarity, we avoid here this identification and state everything from the assumption that the measure µ is defined only on S := F(Q).
Before stating the complete result, let us recall that a partition of a set X is a collection of disjoint and non-empty subsets of X whose union is the whole of X and a partition of an archive X ⊂ Q induces a partition of F(X) ⊂ F(Q) by the bijectivity of F restricted to X. For convenience, we abbreviate the measure-theoretic q-averaged distance from a point F(x) ∈ F(Q) to a set F(Z) ⊂ F(Q) by Theorem 8. For p, q ∈ R, let X, Y ⊂ Q denote archives of which the images F(X) and F(Y) are of non-null finite measures in F(Q). Moreover, assume that 1. there exist finite partitions X = m i=1 X i and Y = m i=1 Y i such that ∀i ∈ {1, . . . , m}: (a) F(X i ) ⊂ F(X) and F(Y i ) ⊂ F(Y) are subsets of non-null finite measure in F(Q); (b) ∀x ∈ X i , ∀y ∈ Y i : x y; 2. ∀x ∈ X, ∀y ∈ Y: x y implies δ q (F(x), F(P)) δ q (F(y), F(P)); then, I GD p,q (X) I GD p,q (Y).
Proof. By 1(a) of Theorem 8, the sets X and Y can be subdivided into the same number m of subsets, and by 1(b), if x ∈ X i and y ∈ Y i for any i ∈ {1, . . . , m}, then δ q (F(x), F(P)) δ q (F(y), F(P)). Therefore, we can take successive integral p-averages over F(X i ) and, afterwards, over F(Y i ) at both sides of this inequality to find that, for each i, we have For those i ∈ {1, . . . , m} violating the inequality , we subdivide X i into a sufficiently large partition of m i subsets X i,1 , X i,2 , . . . , X i,m i , with images by F of non-null finite measure, so as to guarantee that, for all j ∈ {1, . . . , m}, we get Notice that this is indeed possible because each F(X i ) is of non-null finite measure. Since ∀x ∈ X i,j , ∀y ∈ Y i , we have x y, an inequality similar to Equation (10) also holds for them, i.e., . Therefore, with the notation of Equation (11), a simple calculation shows that ∑ m i=1 ∑ m i j=1 w i,j = ∑ m i=1 w i = 1, implying that w i,j and w i are normalized weights useful for weighted averages. Since 0 a i,j b i and 0 w i,j w i 1, simple properties of discrete weighted power mean imply the inequality ∑ Remark 6. From condition 1 of Theorem 8, it follows the simpler (and somewhat weaker) dominance conditions: (a ) X Y (i.e., ∀y ∈ Y, ∃x ∈ X such that x y), and (b ) ∀x ∈ X, ∃y ∈ Y such that x y.
For simple situations where (a ) and (b ) are valid, the partitions needed for part 1 of Theorem 8 are not difficult to find; however, this is not always possible as the right side of Figure 6 indicates. Indeed, Figure 6 presents some examples where (a ) and (b ) hold true, but I GD p,q (X) I GD p,q (Y) can be both true (left side) and false (right side). Furthermore, it is possible to show that X and Y comply (left side) and do not comply (right side) with condition 1 of Theorem 8, respectively.

Remark 7.
An important advantage of using GD p,q over GD p is that condition 2 of Theorem 8 provides the possibility of choosing an appropriate q ∈ R for which the condition δ q (F(x), F(P)) δ q (F(y), F(P)) holds when x y, ensuring in this way the compliance to Pareto optimality for GD p,q . This freedom is lacking for GD p because, in the limit q → −∞, the distance δ q (F(x), F(P)) becomes the standard distance d(F(x), F(P)), which does not allow for any choice.

Examples and Numerical Experiments
In this section, we present some numerical experiments involving finite sets first, and afterwards, we study the case of continuous sets.

Working with ∆ p,q over Finite Sets
Let us take a hypothetical Pareto front P given by the line segment from (0, 1) to (1, 0) in R 2 , i.e., the set of all points (t, 1 − t) ∈ R 2 , for 0 t 1. This is the same example considered in Reference [16] p. 506 and enables us to make a comparison with values of ∆ p . In order to use the finite version of ∆ p,q , we discretize P by taking 11 uniformly distributed points; we call this set P . We assume two archives: X 1 is obtained from P by changing (0, 1) for (0, 10), including an outlier, and by adding 1/10 to the remaining ordinates. X 2 is obtained from P by adding 5 to each ordinate. See Figure 7.  Figure 7. A hypothetical Pareto front discretization P (black circles) and two different archives: X 1 (blue dots) and X 2 (orange squares).
As explained in Section 3, we know that coincides with the standard Hausdorff distance d H . In this case, we obtained and according to Theorem 4 and Reference [16] p. 512, these values must increase as p increases. Tables 1 and 2 show that we can find values of p and q such that the (p, q)-averaged distance does not punish heavily the outliers, for example, p = q = 1 or p = 1 and q = −1. We remark that the values of ∆ p,q (P , X 1 ) do not present a significative change under variations of q 1 for a fixed p. Thus, it is possible to work with q = 1, in which case ∆ p,q is a metric according to Corollary 1, and to still obtain values close to the ones given by the inframetric ∆ p , with the same p 1. For large values of p, the behavior of ∆ p,q presents the same disadvantages of ∆ p or of the standard Hausdorff distance. For example, in Tables 1 and 2, it can be observed that all distances for p 5 are useless because they imply that the distance from the discrete Pareto front P to the archive X 1 is larger than its distance to the archive X 2 . Figure 7 suggests that this is an undesirable outcome. Table 3 shows that ∆ p,q is close to a metric when q −1 and p 1. The percentage of triangle inequality violations decreases as p increases or q decreases.   . Optimal ∆ 1,−1 archive A for the connected Pareto front P 2 given by Equation (12) with 10 elements (blue circles) and at the right is the respective archive coordinates and the ∆ 1,−1 distance.
To numerically find the optimal ∆ p,q archive of size M, we discretized the Pareto front with 1000 equidistant points (which is an acceptable discretization according to Reference [63] p. 603) and randomly chose an initial M sized archive. Then, we used a random-walk (or step climber) evolutionary algorithm, moving one point at a time. Finally, we refined the optimal archive with the "evenly spaced" construction suggested by Reference [63] p. 607.
When finding optimal ∆ p,q archives, our numerical experiments suggest a clear geometrical influence of the parameters p and q. For values of p in (−∞, −1), the optimal archive sets are basically the same. When q ∈ [−1, 1] increases, the optimal archive tends to lose dispersion, converging to one point. When q 1, the optimal archive collapses to one point, and when q ∈ (−∞, −1], the corresponding optimal archives are basically the same (see Figure 10). When p −1 increases, the optimal archive moves away from the Pareto set (see Figure 11).
The following Figures 10 and 11 show certain "optimal" archives A for the Pareto front P 1 in Equation (12), where the optimality means that the distance ∆ p,q (X, P 1 ) is minimum when X = A. Because of the choice of the parameters p and q, this solution is clearly different from the one shown in the Figure 8.  Figure 11. Optimal ∆ p,−1 one-point archives A for the connected Pareto front P 1 given by Equation (12) with q = −1 and different values of p: In all cases, the archives are located in the line x = y.

Optimal Archives for Disconnected and Discretized Pareto Sets
In this section, we present the optimal ∆ p,q archives for a disconnected step Pareto front: where s is the number of steps, γ > 0 is a small constant responsible for the step's twist, and · stands for the integer part function. Figure 12 shows numerical optimal ∆ 1,−1 archives of sizes 20. The archive coordinates reveal that i.e., the optimal archive points do not lie over the Pareto front but they are so close to it that this is hardly noticeable. It is also evident that the archives are evenly distributed along the Pareto front.

General Example for Continuous Sets
In this first example, we are going to construct simple and illustrative continuous sets A and B. Let A be the straight segment in R 2 from a = (−1, 0) to b = (1, 0), that is For a small positive ε > 0 and a variable δ > 0, let B δ ⊂ R 2 be the set given by the following union of straight segments where c = (−1, ε), d δ = (−δ, ε), e δ = (−δ, 1), f δ = (δ, 1), g δ = (δ, ε), and h = (1, ε). We can regard the set B δ as a continuous approximation of A, where the central segment e δ f δ can be seen as the outlier. In the following Figure 13, we can see the sets A and B δ for ε = 0.10 and δ = 0.10, 0.20. According to Table 4, as δ decreases, the ∆ p,q distance between the approximation B δ and the set A also decreases.
We remark that the classical Hausdorff distance between A and B δ produces the value 1 for any δ > 0. Thus, by working with the (p, q)-distance instead of d H , we can detect "better" approximations.  (14), and the blue piecewise map is the respective approximation given by the set B δ from Equation (15) for two values of δ and ε = 0.10. Table 4. ∆ p,q results between the sets A and B δ in Equations (14) and (15) for ε = 0.10 and some parameter values of p, q, and δ.

Approximating Pareto Set and Front of a MOP
Finally, we address the problem to approximate the Pareto sets and fronts of given MOPs. As an example, we will consider the bi-objective Lamé super-sphere problem [32] which is defined as follows: where F(x) = ( f 1 (x), f 2 (x)) is defined as where x ∈ R n and γ ∈ R. For n = 2, the Pareto sets and fronts of this problem are shown in Figures 14 and 15 for γ = 2 and γ = 1/2, respectively.  Figure 15. The same as in Figure 14 but for γ = 1/2.
In a first step, we discuss the principle difference of discrete and continuous archives when approximating the Pareto set/front on a hypothetical example. For this, we assume that we are given the 5-element archive A = {x 1 , . . . , x 5 } ⊂ R 2 ; those elements are given by Hence, we can see A as a 5-element approximation of the Pareto set and its image F(A) as a 5-element approximation of the Pareto front. Now, instead of A, we may use a polygonal curve that is defined by A: In the following, we will call A a discrete archive while we call the polygon approximation B the continuous archive. Figures 16 and 17 show the approximations A and B as well as their images F(A) and F(B). Apparently, the approximation qualities are much better for the linear interpolates. This impression gets confirmed by the values of ∆ p,q for this problem that are shown in Table 5. We can observe the following two behaviors: (i) the distances are much better for the continuous archives and the differences are even larger in objective space, and (ii) the distances decrease with decreasing q (which is in accordance to the result of Theorem 4).  Figure 17. The same as in Figure 16 but for γ = 1/2. In a next step, we consider discrete archives that have been generated from multi-objective evolutionary algorithms together with their resulting continuous archives. For multi-objective evolutionary algorithms, we have chosen the widely used methods NSGA-II [65] and MOEA/D [66]. We stress, however, that any other MOEA could be chosen and that the conclusions we draw out by our results apply in principle to any other such algorithm. Table 6 shows the parameter setting we have used for our studies.  Table 7 show the results of NSGA-II where we have used 500 generations  and a population size of 12. Figures 20 and 21 and Table 8 show the respective results for MOEA/D where we have also used 500 generations and population size 12. We can see that, for both algorithms, the ∆ p,q values are significantly better for the continuous archives. We can also make another observation: the ∆ p,q values oscillate for the results of the dominance-based algorithm NSGA-II which is indeed typical. For the continuous archives, these oscillations are less notorious, which indicates that the use of continuous archives may have a smoothing effect on the approximations, which is highly desired.  Figure 19. The same as in Figure 18 but for γ = 1/2.  Figure 21. The same as in Figure 20 but for γ = 1/2. We want to investigate the last statement further on. To this end, we consider the following convex bi-objective problem: the objectives are given by f 1 , f 2 : Figure 22 shows the Pareto set and front of MOP (Equation (17)). The Pareto set is given by the straight segment joining (0, 0, 0) and (1, 0, 0). The values of ∆ p,q obtained by NSGA-II for the discrete archives (using population size 20) as well as for the respective continuous archives can be seen in Figure 23 and Table 9. Also for this example, the values for the continuous archives are much better and the oscillations are significantly reduced compared to the discrete archives.  Figure 23. The black curve is the ∆ p,q value for the discrete approximation, and the blue one is the respective curve for the continuous approximation of NSGA-II for MOP (Equation (17)).  show the results of both kinds of archives after 300, 400, and 500 generations which confirm this observation. The results show that NSGA-II is indeed capable of computing points near the Pareto front while the distribution of the points vary. This is a known fact since there exists no "limit archive" for this algorithm (as it is, e.g., not based on the averaged Hausdorff distance or any other performance indicator). When considering the respective results of the continuous archives, however, NSGA-II computes (at least visually) nearly perfect approximations of the Pareto front. The ∆ p,q values reflect this. Table 9. ∆ p,q results between the Pareto Front and its respective discrete and continuous approximations of NSGA-II for MOP (Equation (17)): The data shown is the averaged over the 20 independent runs above.

Conclusions and Perspectives
In this paper, we have presented a comprehensive overview of the averaged Hausdorff distances that have recently appeared in connection with the study of MOPs.
Among the averaged Hausdorff distances studied here, the generalized ∆ p,q as defined for arbitrary measurable sets was shown to provide a general and robust definition for applications that carries good metric properties, is adequate for use with continuous approximations of the Pareto set of a MOP, and even reduces to the previously introduced definition for discrete approximations.
Concerning the appearance of the additional parameter q in the definition of ∆ p,q which could give the impression of an overly complicated expression, it is important to highlight, as it was observed in Remark 7, that it can provide the possibility to choose a suitable value of q in order to make GD p,q as Pareto compliant as possible for the MOP under consideration. This is an argument in favor of the flexibility provided by the generalized version GD p,q , which is not available for the GD p distance, and this particular aspect worths further investigation.
Nevertheless, since the freedom provided by the two parameters p and q may appear as excessive and perhaps undesirable in many applications, there remains to find a practical recipe to determine and fix these parameters according to the characteristics of the problem under consideration. Certainly, the desired spreads of the optimal archives, the distance of an approximation to the Pareto front, and the convexity of these fronts need to be taken into account in order to determine an appropriate set of preferred values for these parameters depending on the situation.
To achieve these aims, more theoretical as well as numerical studies of optimal solutions associated with Pareto fronts with different convexities must be carried out and experiments evaluating how the Pareto compliance can be enhanced in each situation by the choice of parameters need to be performed.
Finally, we stress that the results we have shown in Section 7 show the advantage of a new performance indicator that is able to compute the performance of a continuous approximation of the solution set. Continuous approximations, e.g., of the Pareto set/front of multi-objective optimization problems have not been considered so far, though both Pareto set and front typically form continuous sets in case the objectives are continuous. The examples have indicated that the consideration of continuous archives (via use of interpolation on the populations generated by the evolutionary algorithms) could allow a reduction in population sizes and, hence, a significant reduction of the computational effort of the evolutionary algorithms. This is because the time complexity for all existing multi-objective evolutionary algorithms is quadratic in the population size and in each generation of the algorithm. To verify this statement, more computations are needed, which is left for future work.