A ( p , q ) -Averaged Hausdorff Distance for Arbitrary Measurable Sets

.


Introduction
The Hausdorff distance d H (see [1]) is an established and widely used tool to measure the proximity of different sets.It is, among others, used in several research fields such as image matching (e.g., [2][3][4]), the approximation of manifolds in dynamical systems ( [5][6][7]), in fractal geometry ( [8]), or in the context of convergence analysis in multi-objective optimization ( [9][10][11][12][13]).One major reason for the use of d H is that it defines a metric on the set of all nonempty bounded closed sets in a metric space.However, one characteristic of the Hausdorff distance is that it heavily punishes single outliers which is a severe drawback in many cases.For instance, it is known that stochastic search algorithms are generally quite effective in the (global) approximation of certain objects, however, it is also known that these approximations may come with a few outliers (e.g., [14]).For those cases, the approximation quality is not reflected by the value of the Hausdorff distance.
As a remedy, in the context of evolutionary multi-objective optimization, Schütze et al. [14] have made a first effort to propose the averaged Hausdorff distance ∆ p .As opposed to d H , this indicator averages the distances involved in the proximity measure of the given sets and is hence much more suitable in the context of stochastic search as single (or few) outliers in a candidate solution set are not punished hard any more.On the other hand, compared to d H , ∆ p has two shortcomings: (i) it only defines an inframetric instead of a metric; and (ii) it is only defined for finite approximations of the solution set.In the particular context of continuous multi-objective optimization, it is known that the solution set, the so-called Pareto set, and its image, the Pareto front, form manifolds of certain dimensions.Hence, it is natural that the candidate solution set (i.e., the set computed by a given solver) is not restricted to finitely many points, but may also form a continuous set.This is in fact already the case for set-based optimization techniques such as the cell-to-cell mappings ( [15][16][17]) and the subdivision techniques ( [10,18,19]).In the context of evolutionary multi-objective optimization, typically a finite set of candidate solutions (a population) is generated ( [20][21][22][23]).However, also here it is a rather natural approach to construct a continuous set out of the final population using, e.g., interpolation techniques (see [24,25]).
In [26], a modification of the ∆ p indicator called the (p, q)-averaged Hausdorff distance ∆ p,q has been introduced by the first two authors.This indicator generalizes the averaged Hausdorff distance ∆ p , is strongly related to the Hausdorff distance d H , and admits an expression in terms of the matrix p,q -norm • p,q .Moreover, when 1 p, q < ∞ it is a proper metric, while for the remaining cases where |p|, |q| 1 it is still an inframetric.In addition, when finding optimal archives the parameters p and q play crucial geometrical roles.More precisely, in the context of EMO, p handles the closeness to the Pareto front and q handles the dispersion.The indicator, however, is restricted to finite sets.
In this work, we propose a more general version of the ∆ p,q indicator that can be applied to general measurable subsets and that preserves the useful advantages of the finite case.Consideration is also given to the Pareto-compliance of an intermediate indicator GD p,q that is employed to define ∆ p,q .The indicator is hence the first one that can be used in the context of multi-objective optimization using continuous approximations of the Pareto set/front as described above.Numerical results on two well known evolutionary algorithms will show the benefit of such continuous archives compared to discrete ones that have been used so far in lack of a suitable performance indicator.
This paper is organized as follows: In Section 2, we briefly state the background required for the understanding of this work.In Section 3, we introduce the extended version of the GD p,q and ∆ p,q indicators, discussing their properties and providing some sufficient criteria for the Pareto compliance of the first one.In Section 4, we present some numerical results that show the applicability and the benefit of the novel indicator in particular in the context of multi-objective optimization.Finally, we draw our conclusions and present possible paths for future research in Section 5.

Preliminaries
In this section, we briefly present the required background on integral power means and multi-objective optimization that will be needed for our purposes.Throughout the document we employ the notation R × := R K {0} and R := [−∞, ∞] for simplicity.

Integral Power Means
The theory can be presented in the general setting of metric measure spaces, briefly outlined below, but for simplicity the reader may assume that the specific context of our interest is that of well-behaved bounded subsets of the n-dimensional Euclidean space R n endowed with its standard Lebesgue measure which gives rise to the conventional notion of volume (when it is defined).For a quick review of measure spaces see [27] (Section 1.4), and for a simple explanation of the Lebesgue measure see [28] (Chapter 2).Integral means appear already in [29] (Chapter 6).A comprehensive account on the properties of means can be found in [30].
For greater generality, we recall that (Σ, d, µ) is called a metric measure space if (Σ, d) is a metric space with a measure µ defined on its Borel σ-algebra M(Σ), i.e., the smallest σ-algebra containing all the open subsets of the metric topology of (Σ, d).A measure µ is said to be finite if µ(Σ) < ∞, and in this case Σ is called a finite-measure space.Now, given p ∈ R × and any measurable function f : X ⊂ Σ → [0, ∞) over a finite-measure set X, we can define the p-average of f over X (or the p-power mean of f over X), by Henceforth, the integral at the RHS will be abbreviated as If necessary, when the measure µ is clear from the context, the element dµ will be written as dx to emphasize the variable of integration x.In addition, the notation M p ( f (X)) ≡ M p x∈X ( f (x)) and |X| ≡ µ(X) will also be employed to simplify expressions whenever appropriate.
Let us note that for p 1 we have M p ( f (X)) = µ(X) − 1 p f p , where • p denotes the p-norm of the Lebesgue space L p (X, µ).Furthermore, it is not difficult to show, with the aid of L'Hôpital's rule, that the integral power mean M p can be extended to the cases p = ±∞.Indeed, if f ≡ 0, denoting the essential supremum and essential infimum of f on X by f ∞ := ess sup x∈X f (x) and because the last integrand is smaller than 1 and the limit is 1.Similarly, We recall that • ∞ is precisely the norm of the Lebesgue space L ∞ (X, µ).We can also define M p when p = 0 as follows: It can be considered the integral generalization of the notion of geometric mean for finitely many elements.

Multi-objective Optimization
As an application of the (p, q)-distances, we will consider in this work continuous multi-objective optimization problems (MOPs).Problems of this kind can be expressed mathematically as where the function F is defined as a vector of objective functions We will assume here that all objectives f i : X → R, for i ∈ {1, . . .k}, are continuous.The optimality of MOPs is typically defined via the concept of dominance (see [31]).Definition 1.In the context of MOPs the following are standard notions: Then the vector v is less than w (denoted v < P w), if v i < w i for all i ∈ {1, . . ., k}.The relation P is defined analogously.
(ii) A vector y ∈ Q is dominated by a vector x ∈ Q (in short: x ≺ y) with respect to (2) if F(x) P F(y) and F(x) = F(y), i.e., there exists a j ∈ {1, . . ., k} such that f j (x) < f j (y).(iii) A point x ∈ Q is called Pareto optimal or a Pareto point if there is no y ∈ Q which dominates x. (iv) The set of all Pareto optimal solutions is called the Pareto set, denoted by P Q .

(v)
The image of the Pareto set, F(P Q ), is called the Pareto front.
It is known that under certain mild smoothness assumptions the Pareto set and the Pareto front define (k − 1)-dimensional objects [32].Hence, for set oriented solvers such as cell mapping, subdivision techniques, and evolutionary algorithms, the question naturally arises as to how to measure the approximation quality of the obtained solution set with respect to the Pareto set/front.To accomplish this task, several performance indicators have been proposed in the specialized literature.There exist, for instance, the hypervolume indicator [21,33], the R2 indicator [34], the IGD + [35], and the DOA [36].Moreover, in the context of multi-criteria decision-making processes, the properties of some distance measures, as the Hamming, Euclidean, and Hausdorff metrics, is investigated in [37,38].In this work, we will focus on a new variant of the Hausdorff distance [6].For convenience of the reader, we recall in the following the most important definitions.Definition 2. Let u, v ∈ R n , arbitrary A, B ⊂ R n , and • be a vector norm.The Hausdorff distance d H (•, •) is defined as follows: The Hausdorff distance d H is widely used in many fields.It is, however, of limited practical use when measuring the distance of the outcome of a stochastic search method such as an evolutionary algorithm to the Pareto set/front.The main reason for this is that evolutionary algorithms may generate outliers that are punished too strongly by d H .As a remedy, the averaged Hausdorff distance has been proposed in [14].In this study the vector norm is the 2-norm, i.e., the Euclidean norm.The indicator ∆ p can be viewed as a composition of slight variations of the Generational Distance (GD, see [39]) and the Inverted Generational Distance (IGD, see [40]).It is ∆ ∞ = d H , but for finite values of p the indicator ∆ p averages the distances considered in d H .More precisely, the larger the value of p, the harder single outliers will be punished by ∆ p .Hence, as opposed to d H , the distance ∆ p does not punish single (or few) outliers in a candidate set.For more discussion about ∆ p and its relation to other indicators we refer to [14,41].Definition 4 ).For p, q ∈ R × , and finite sets A, B ⊂ R n the value ∆ p,q (A, B) := max {GD p,q (A, B K A), GD p,q (B, A K B)}, where GD p,q (A, B) is called the (p, q)-averaged Hausdorff distance between A and B.
For finite sets, the indicator ∆ p,q , introduced in [26] was also defined for p or q = 0, and even for p or q = ±∞.It is a generalization of ∆ p in the sense that between disjoint subsets we have The parameters p and q can be independently modified in order to produce customary spread archives (depending on q) located with customary closeness (depending on p) to the Pareto front.
Finally, let us recall that one characteristic of a performance indicator is Pareto compliance: for two subsets A and B we say that A B if for every b ∈ B there exists an element a ∈ A such that a b.If this does not hold, we write A B. We say that a performance indicator I is Pareto compliant if for any two sets A and B with A B and B A it follows I(A) I(B).We refer to [42] for details.

Properties of Integral Power Means
We start summarizing some fundamental properties of integral power means that we will need for our subsequent calculations.
Theorem 1.Let X and Y denote finite-measure spaces, f , g : X → [0, ∞) non-negative measurable functions, and d : X × Y → [0, ∞) a measurable function with respect to the product measure on X × Y.The integral power mean M satisfies the following properties: ).
(v) For p, q ∈ R with 0 < p q, we have that Proof.The proofs of (i) and (ii) are straightforward.To prove (iii) we only need the Minkowski inequality, ).
The proof of (iv) is also straightforward from the definitions and a simple proof of (v) can be given as a particular case of [43] (Theorem 3) which we recall here for completeness.For a positive real v, consider the function Since the function t u , with t a positive constant and u 0 is increasing with respect to u, we easily get ω r (v) ω s (v) for 0 r s and every v 0. Consider the following linear integral operator Assume first, that p = 0, then for any x ∈ X, Similarly, if p = 0 we have

Definition of ∆ p,q for Measurable Sets
With the aid of Theorem 1 we generalize the results of [26] (Section 3).For easy reference, we provide here slightly abbreviated but complete proofs.Given a metric measure space (Σ, d, µ), let M(Σ) denote the σ-algebra of all measurable subsets of Σ and let M <∞ (Σ) refer to those elements of M(Σ) having finite measure.As it should be expected from the context, any set relation obtained from calculations involving an underlying measure µ should be understood to hold in a measure-theoretic sense, i.e., almost everywhere (a.e.).For example, for X, Y ∈ M <∞ (Σ), a result saying X = Y, or X ⊂ Y actually holds almost everywhere, which means that µ{X = Y} = 0, or µ{X Y} = 0, respectively.Thus, it is convenient in this setting to identify a set X ∈ M <∞ (Σ) with the whole equivalence class [X] := {Y | X = Y, a.e.}, and think of these classes as the elements of M <∞ (Σ) to remove the need for the a.e.abbreviation.Also, to avoid an overload of parentheses in the forthcoming expressions, the distance d(x, y) between x, y ∈ Σ will be abbreviated by d x,y .Definition 5.For p, q ∈ R × , the generational (p, q)-distance GD p,q (X, Y) between two sets X, Y ∈ M <∞ (Σ) is given by , where the sets X and Y are implicitly assumed to be disjoint when p < 0 or q < 0.
As in the finite case, the definition of GD p,q can be easily extended for p, q ∈ R, but still has two undesirable drawbacks, first GD p,q (X, X) can be different from zero, and second, in general the values of GD p,q (X, Y) and GD p,q (Y, X) can be different, thus this indicator does not define a metric.To obtain a proper metric we introduce the following modification.Definition 6.The (p, q)-averaged Hausdorff distance is the map ∆ p,q : M <∞ (Σ) Remark 1.For finite subsets X, Y ⊂ R n endowed with the standard counting measure µ, the previous notions of GD p,q and ∆ p,q coincide with the ones in Definition 4.
Figure 1 illustrates how the shape of ∆ p,q -metric balls B ε := {x ∈ R 2 : ∆ p,q (A, x) ε} around a discrete set A of ten points (that approximates a segment of negative slope in the plane) varies as p and q take several different values.Notice that for negative values of p and q the balls' shape resemble the shape of A and enclose all of its points.
-neighborhoods of increasing radius around a discrete set of ten equidistant points along the line y = −x in R 2 , showing how their shape change for different values of p and q.

Metric Properties
The extension of ∆ p,q to measurable sets given in Definition 6 preserves the nice metric properties of the finite version considered in [26] (Section 3).In particular, Theorem 1 enables us to show a result analogous to [26] (Theorem 3.3).Theorem 2. Suppose that 1 p, q < ∞.Then the generational (p, q)-distance GD p,q satisfies the triangle inequality, namely GD p,q (X, Z) GD p,q (X, Y) + GD p,q (Y, Z) for any sets X, Y, Z ∈ M <∞ (Σ).
Proof.From the triangle inequality for the metric d(•, •) we have Taking the q-average over Z at both sides and using Theorem 1 (i)-(iii), yields Now, we consider two cases for the parameters 1 p, q < ∞, independently.Case p q: Taking at both sides of (3) the p-average over X and using Theorem 1 (i), (iii), and (iv), we get In this expression, the LHS is precisely GD p,q (X, Z) which does not depend on Y.We now take the p-average over Y at both sides of (4) and use Theorem 1 (i), (iii), and (iv), to obtain To finish this case note that from Theorem 1 (ii), (iv), and (v), we have that which proves the claim.Case q p: Here, we note that the LHS of (3) does not depend on Y, and take at both sides of (3) the q-average over Y. Hence, Theorem 1 (i), (iii)-(v) yield Lastly, we take the p-average over X and use Theorem 1 (ii)-(iv), to obtain which is the required result.
Corollary 1.For p, q ∈ R × the (p, q)-averaged Hausdorff distance ∆ p,q is a semimetric on the collection M <∞ (Σ) of all measurable subsets of Σ with finite measure.Moreover, between disjoint sets, ∆ p,q is a proper metric on M <∞ (Σ) for 1 p, q < ∞.
Proof.Definition 6 easily implies that ∆ p,q (•, •) 0 as well as ∆ p,q (X, Y) = ∆ p,q (Y, X), for every pair X, Y ∈ M <∞ (Σ) and all p, q ∈ R × .From Definition 5 we can see that GD p,q (X, Y K X) = 0 if and only if X = ∅ or Y ⊆ X (and hence Y K X = ∅).We thus find, for X, Y = ∅, that ∆ p,q (X, Y) = 0 if and only if X = Y.
We have shown that ∆ p,q is a semimetric on M <∞ (Σ), and since the maximum of two functions satisfying the triangle inequality also satisfies it, Theorem 2 shows that ∆ p,q satisfies the triangle inequality when 1 p, q < ∞.Theorem 3. Suppose that for any sets X, Y, Z ∈ M <∞ (Σ) there exist some constants 0 < r < R such that r d u,v R holds for all pairs (u, v) in X × Y, X × Z, or Y × Z.Then, for all non-simultaneously positive p, q ∈ R × with |p|, |q| 1 the generational (p, q)-distance GD p,q satisfies the following relaxed triangle inequality GD p,q (X, Z) R 2 r 2 GD p,q (X, Y) + GD p,q (Y, Z) .
Proof.We prove the theorem in three steps.
Step 1: Take p ∈ R × and assume that q < 0, we will show that For any x ∈ X and all y 1 , y 2 ∈ Y we have Step 2: Now, take q ∈ R × and assume that p < 0, we will show that By hypothesis, for any y ∈ Y and all x 1 , x 2 ∈ X we have Therefore, proceeding as before and applying again Theorem 1 (i) and (iv) we conclude that .
Using (1), the previous inequality can be written as which, by Definition 5, is precisely (6).
Step 3: From the previous two steps we easily obtain Theorem 1 (iv) and Definition 5 imply that GD p,q (X, Z) GD |p|,|q| (X, Z).Finally, the triangle inequality for GD |p|,|q| (Theorem 2) and ( 7), produces the desired relation Remark 2. When the pair (p, q) lies in the light-gray or violet regions of Figure 2, the distance GD p,q satisfies a relaxed triangle inequality, with the drawback that the constant R 2 /r 2 depends on the condition that r d u,v R, for all pairs (u, v) ∈ X × Y, X × Z, or Y × Z.For bounded and separated sets this condition always holds, and on those sets the associated (p, q)-averaged Hausdorff distance ∆ p,q becomes an inframetric as the following corollary implies.

Corollary 2. Under the same hypothesis of Theorem 3 we have
Proof.It follows immediately from Theorem 3 and Definition 6.
Proof.It follows easily from Theorem 1 (v) and Definition 6.
Representation of key regions on the (p, q)-plane.Corollary 1 shows that ∆ p,q is a proper metric in the violet region and Corollary 2 shows that it is an inframetric in the orange and light-gray ones.Numerical evidence suggests that ∆ p,q is still a proper metric in the orange regions.

Pareto-Compliance
We return now to the setting of MOPs to consider the behavior of the generalized GD p,q and ∆ p,q distances as performance indicators by studying their Pareto-compliance.A discussion of the Pareto-compliance for the indicators GD p and ∆ p appeared in [14] (Section 3).Similar observations are valid for these new (p, q)-indicators, but a detailed and complete account of the details is part of ongoing research and will appear elsewhere.Here, as a first approach to the compliance question we present a basic result that describes the behavior of the indicator GD p,q under stronger assumptions than the compliance notion mentioned at the end of Section 2.2.
Let us assume that given a decision space Q ⊂ R n , a MOP has an associated objective function F : Q → R k , with objective space F(Q) ⊂ R k endowed with the Euclidean distance d(•, •) and the inherited Lebesgue measure µ.Furthermore, let P Q denote the Pareto set and F(P Q ) ⊂ R k the corresponding Pareto front.If X ⊂ Q denotes an approximating subset (or archive), the explicit GD p,q -performance indicators assigned to X is given by I GD p,q (X) := GD p,q (F(X), F(P Q )).
For the following statement, let us recall here that a partition of a set X is a collection of disjoint and non-empty subsets of X whose union is the whole of X.Furthermore, for any q ∈ R we abbreviate the q-averaged distance of F(u) ∈ F(Q) to the Pareto front F(P Q ) by δ q (u) Theorem 5. Suppose that for fixed p, q ∈ R a pair of measurable archives X, Y ⊂ Q, satisfy that: X and Y admit finite partitions X = m i=1 X i and Y = m i=1 Y i such that for each i ∈ {1, . . ., m}: (a) X i ⊂ X and Y i ⊂ Y are subsets of non-null finite measure.
Then I GD p,q (X) I GD p,q (Y).
Proof.From condition (i) the archives X and Y admit partitions into the same number m of subsets and from (ii) it is clear that for any i ∈ {1, . . ., m} if x ∈ X i and y ∈ Y i then δ q (x) δ q (y).Hence, taking integral p-averages over X i , and then over Y i of the quantities at both sides of this inequality we obtain for each i that a Now, for each i ∈ {1, . . ., m} for which the inequality does not hold, we can further subdivide X i into a sufficiently large partition of m i non-null finite measure subsets X i,1 , X i,2 , . . ., X i,m i , so that for all j ∈ {1, . . ., m i } we can guarantee that Please note that this should be possible due to the assumption that X i has non-null finite measure.Since part (b) of condition (i) still holds for these subsets, (i.e., ∀x ∈ X i,j , ∀y ∈ Y i : x y), an analogous relation to Inequality (8) is valid for them.Explicitly, for each i ∈ {1, . . ., n} and all j ∈ {1, . . ., m i } we have a p i,j := Due to the chosen partitions of X and Y, it is clear that Therefore, with the notation of ( 9) it follows which implies that the quantities w i,j and w i can be regarded as normalized weights appropriate for taking weighted averages.Using that 0 a i,j b i and 0 w i,j w i 1, simple properties of (discrete) weighted power means ensure that ∑ proving the statement.
Remark 3. Condition (i) of Theorem 5 implies the simpler (and weaker) dominance conditions: (a') X Y (i.e., ∀y ∈ Y, ∃x ∈ X such that x y), and (b') ∀x ∈ X, ∃y ∈ Y such that x y.
In many simple examples for which (a') and (b') hold, it is not difficult to find the partitions needed for Theorem 5 (i), however this is not always possible, and the question of when such partitions exist will not be considered here.Figure 3, show examples where (a') and (b') hold and the inequality I GD p,q (X) I GD p,q (Y) is both, true (left) and false (right).In these cases it can be shown that X and Y satisfy (left), and do not satisfy (right) the requirements of Theorem 5 (i), respectively.Remark 4. Another important observation is that condition (ii) of Theorem 5 allows for some freedom in the choice of an appropriate q ∈ R such that the inequality δ q (x) δ q (y) holds for x y, ensuring the compliance to Pareto optimality.This freedom is not available for the indicator GD p because in that case δ q (x) should be replaced by the corresponding quantity when q → ∞ which is the standard distance from a set to a point d(F(x), F(P Q )).The possibility to choose a value of q according to the problem is clearly an advantage, and provides an argument in favor of the generalized version GD p,q .F(X) Example of a Pareto front F(P Q ) with two archives satisfying condition (i) of Theorem 5 for which I GD p,q (X) I GD p,q (Y).(Right) Example of a Pareto front F(P Q ) with two archives satisfying conditions (a') and (b') of Remark 3 but for which I GD p,q (X) I GD p,q (Y).In this case partitions of the archives satisfying Theorem 5 (i) do not exist.

Numerical Examples
In this section, we demonstrate the applicability and usefulness of the new distance measure on two examples.

General Example
As a first example we consider the following sets within the Euclidean plane R 2 : the first set, A, is a line segment connecting two points a = (−1, 0) and b = (1, 0), i.e., Next, for some given δ > 0 and any fixed value of ε > 0 we consider sets B δ defined as the union of line segments where c = (−1, ε), d δ = (−δ, ε), e δ = (−δ, 1), f δ = (δ, 1), g δ = (δ, ε), and h = (1, ε) are the segment end-points in R 2 .Hereby, a set B δ can be seen as a certain approximation of A, where the segment e δ f δ can be considered to be the outlier in the approximation.
Figure 4 shows the sets A and B δ for the values δ ∈ {0.05, 0.10, 0.20, 0.40} and ε = 0.10.Apparently, for smaller δ, the outlier region gets smaller, and hence, the approximation B δ of A gets "better".This is reflected by the values of the (p, q)-distance in Table 1.Table 1.∆ p,q values for A and B δ in ( 10) and (11), for different values of p, q, and δ, with fixed ε = 0.10.On the other hand, if choosing the classical Hausdorff distance, all values of d H (A, B δ ) are equal to 1, regardless of the choice of δ > 0. Hence, the (p, q)-distance is more appropriate in this example to identify "better" approximations.

Approximation of Pareto Sets/Fronts
As a second example we consider the approximation of the Pareto set and front of a given MOP.For this, we define the following bi-objective problem that is known as the Lamé super-sphere function [32]: where ) is given by For the first step, we consider a simple hypothetical example to illustrate the concept of continuous archives in the context of evolutionary multi-objective optimization.For these, assume we are given the discrete archive A = {x 1 , . . ., x 5 } ⊂ R 2 , where x 1 = (−0.0129,−0.0421), x 2 = (0.2525, 0.2912), x 3 = (0.4903, 0.4035), x 4 = (0.6258, 0.6912), x 5 = (1.0212,0.9930).
The set A is hence consisting of only five candidate solutions.Analogously, the image F(A) of A can be considered as an approximation of the Pareto front that consists as well of five candidate solutions.Now, in order to improve the quality of the approximation, instead of A one can consider the polygon that is defined by the elements of A, namely In what follows, we will call this polygon the continuous archive.The approximations A, B, F(A), and F(B) can be seen in the Figures 7 and 8.By visual inspection, the approximation qualities increase significantly when using the linear interpolation, in particular in objective space.This is reflected by the (p, q)-distances which are shown in Table 2 where we can find the following general behavior: first, the distances within decision and objective spaces, decreases from finite to continuous archives, and this phenomena is stronger in the objective space; and second, following the result of Theorem 4, the distances decreases as q decreases.In a next step, we consider discrete and continuous archives resulting from two of the most famous EMO algorithms: NSGA-II [44] and MOEA/D [45], see Table 3 for the parameter setting of these algorithms.To this end, we first consider the result of NSGA-II with a population size of 12 after 500 generations, see Figures 9 and 10 and Table 4 for the numerical results.Finally, we consider the MOEA/D generational algorithm to get 500 finite archives of 12 elements each, see Figures 11 and 12 and Table 5 for the numerical results.
For both EMO algorithms, it can be observed that the indicator values for the continuous archives are significantly better than for the respective discrete archives.Next, note that the ∆ p,q values oscillate for NSGA-II which is a typical behavior for this dominance-based algorithm.These oscillations, however, are less distinct for the continuous archives.To further investigate the last statement, we consider finally the convex bi-objective problem f 1 , f 2 : R 3 → R, where x = (x 1 , x 2 , x 3 ) and The Pareto set of MOP ( 14) is the line segment connecting the points (0, 0, 0) and (1, 0, 0), and the Pareto front is as shown in Figure 13.6 show the ∆ p,q values for both the discrete and continuous archives obtained by NSGA-II using a population size of 20.As it can be seen, again the continuous archives achieve much better indicator values, and the amplitudes of the oscillations are significantly smaller compared to the discrete archives.This is confirmed by Figures 15-17 that show the results of the discrete and continuous archives after 300, 400, and 500 generations, respectively.As it can be seen, NSGA-II is able to compute solutions along the Pareto front, however, with varying distribution along this set (In fact, it is known that there is no "limit archive" for NSGA-II since this algorithm is not indicator-based).In turn, for each of the results of NSGA-II, all of the continuous archives represent-at least by visual inspection-perfect approximations of the Pareto front, which is reflected by the good ∆ p,q values.Concluding, the results presented in this section strongly indicate the convenience of the new indicator that is able to assess the performance of continuous archives.Though in principle also other indicators can be extended to continuous sets, this has not been done so far, and this is not a straightforward task.Hence, no comparisons to other indicators can be considered here.The presented results further indicate the benefit of the use of continuous archives instead of discrete ones that are being used classically.This would, among others, allow for the usage of smaller population sizes which would in turn allow to reduce the computational burden of the evolutionary algorithms (note that the time complexity for all MOEAs in each generation is quadratic in the population size).The verification of this statement, however, is left for future work as this goes beyond the scope of this study.Figure 14.∆ p,q values for the discrete (black curve) and the continuous archives (blue curve) of NSGA-II for MOP (14).Table 6.∆ p,q values for the discrete and continuous archives of NSGA-II for MOP (14).The results are averaged over 20 independent runs.

Conclusions and Future Work
In this work, we have proposed extensions of the existing GD p,q and ∆ p,q performance indicators that allow to compute the distance between two general measurable sets.In particular, this is a natural setting in multi-objective optimization because the solution of such a problem typically forms a set of certain dimension (and is thus not given by finitely many points).We have shown that the extended indicators keep the nice metric properties from its finite-version predecessors (see [14,26]).Moreover, for GD p,q , sufficient conditions have been provided ensuring that certain compliance to Pareto optimality of this indicator can be guaranteed.Further study is needed to determine the precise relation between these conditions and other ones appearing in the literature.
We have demonstrated the applicability and usefulness of the novel indicator on examples related to evolutionary multi-objective optimization.
As part of future work, we intend to further investigate the use of ∆ p,q within evolutionary multi-objective optimization.For instance, it might be interesting to integrate this performance indicator within an evolutionary multi-objective optimization algorithm as it was done, e.g., in [46] for its predecessor ∆ p .Although it is clear that the individual roles of p and q are related with the convexity of the metric neighborhoods of point and sets, further research is needed to elucidate more precisely useful ways to take advantage of their joint behavior in concrete situations.Additionally, to understand the behavior of ∆ p,q in relation to Pareto compliance and to complete the partial results that have been established in Section 3.4 for GD p,q , consideration should be given to the inverted generational indicator IGD p,q .Finally, one interesting aspect is to see if the indicator can be used as a proximity measure in other research fields.

Definition 3 (,
Schütze et al.[14]).For p ∈ N, and finite sets A, B ⊂ R n the value∆ p (A, B) := max {GD p (A, B), IGD p (A, B)},whereGD p (A, B)is called the averaged Hausdorff distance between A and B.
that q = −|q|, we get − ,y ).Calculating the p-average M p x∈X of both sides, and from Theorem 1 (i) and (iv), we finally get M p x∈X ,y ) , which, by Definition 5, is precisely(5).

Figure 4 .
Figure 4. Example of four approximations of A (black horizontal segment) with B δ (blue piecewise function) for four different values of δ and fixed ε = 0.10.

Figure 7 .Figure 8 .
Figure 7. Left: Approximations A (blue dots) and B (blue polygon line) of the Pareto set (green thick line) of MOP (12) for n = 2. Right: corresponding approximations F(A) and F(B) of the Pareto front, for γ = 2.

Figure 9 .Figure 10 .
Figure 9. Left: approximations A (blue dots) corresponding to the 500th generation of the NSGA-II algorithm, and B (blue polygon line) of the Pareto set (green thick line) of MOP (12) for n = 2. Right: respective approximations F(A) and F(B) of the Pareto front for γ = 2.

Figure 12 .
Figure 12.Left: approximations A (blue dots) corresponding to the 500th generation of the MOEA/D algorithm), and B (blue polygon line) of the Pareto set (green thick line) of MOP (12) for n = 2. Right: respective approximations F(A) and F(B) of the Pareto front for γ = 1/2.

Figure 14
Figure14and Table6show the ∆ p,q values for both the discrete and continuous archives obtained by NSGA-II using a population size of 20.As it can be seen, again the continuous archives achieve much better indicator values, and the amplitudes of the oscillations are significantly smaller compared to the discrete archives.This is confirmed by show the results of the discrete and continuous archives after 300, 400, and 500 generations, respectively.As it can be seen, NSGA-II is able to compute solutions along the Pareto front, however, with varying distribution along this set (In fact, it is known that there is no "limit archive" for NSGA-II since this algorithm is not indicator-based).In turn, for each of the results of NSGA-II, all of the continuous archives represent-at least by visual inspection-perfect approximations of the Pareto front, which is reflected by the good ∆ p,q values.Concluding, the results presented in this section strongly indicate the convenience of the new indicator that is able to assess the performance of continuous archives.Though in principle also other indicators can be extended to continuous sets, this has not been done so far, and this is not a straightforward task.Hence, no comparisons to other indicators can be considered here.The presented results further indicate the benefit of the use of continuous archives instead of discrete ones that are being used classically.This would, among others, allow for the usage of smaller population sizes which would in turn allow to reduce the computational burden of the evolutionary algorithms (note that the time complexity for all MOEAs in each generation is quadratic in the population size).The verification of this statement, however, is left for future work as this goes beyond the scope of this study.

3 Figure 15 .
Figure 15.Left: Approximations A (blue dots) and B (blue continuous polygon line) of the Pareto set of MOP (14) in the 300th generation.Right: corresponding approximations F(A) and F(B) of the Pareto front.

3 Figure 16 .
Figure 16.Left: Approximations A (blue dots) and B (blue continuous polygon line) of the Pareto set of MOP (14) in the 400th generation.Right: corresponding approximations F(A) and F(B) of the Pareto front.

3 Figure 17 .
Figure 17.Left: Approximations A (blue dots) and B (blue continuous polygon line) of the Pareto set of MOP (14) in the 500th generation.Right: corresponding approximations F(A) and F(B) of the Pareto front.

Table 3 .
Parameter setting for NSGA-II and MOEA/D.

Table 4 .
(12)q values for the Pareto front approximations for MOP (12) using the NSGA-II archives and with p = 1, q = −10.Left: approximations A (blue dots) corresponding to the 500th generation of the MOEA/D algorithm), and B (blue polygon line) of the Pareto set (green thick line) of MOP(12)for n = 2.
Right: respective approximations F(A) and F(B) of the Pareto front for γ = 2.

Table 5 .
∆ p,q values for the Pareto front approximations for MOP (12) using the MOEA/D archives and with p = 1, q = −10.