The Duality of Similarity and Metric Spaces

: We introduce a new mathematical basis for similarity space. For the ﬁrst time, we describe the relationship between distance and similarity from set theory. Then, we derive generally valid relations for the conversion between similarity and a metric and vice versa. We present a general solution for the normalization of a given similarity space or metric space. The derived solutions lead to many already used similarity and distance functions, and combine them into a uniﬁed theory. The Jaccard coefﬁcient, Tanimoto coefﬁcient, Steinhaus distance, Ruzicka similarity, Gaussian similarity, edit distance and edit similarity satisfy this relationship, which veriﬁes our fundamental theory.


Introduction
Mathematical spaces have been studied for centuries and belong to the basic mathematical theories, which are used in various real-world applications [1]. In general, a mathematical space is a set of mathematical objects with an associated structure. This structure can be specified by a number of operations on the objects of the set. These operations must satisfy certain axioms of mathematical space. The mathematical construction of metric space and similarity space are based on topological space, and a topological space is based on set theory [2]. Nowadays many research groups all over the world deal with similarity spaces in different research fields, e.g., [3,4].
For readability and to reach a broad audience, we do not treat all the mathematical circumstances and conditions in detail. Rather, we present the main concept and a way to a solution. It would take too much time to grasp all the current theories and consequently there would be no time left for innovations. We refer readers to [3,[5][6][7][8][9] for the fundamental concepts and properties of topological spaces and metric spaces (convergence, continuity, completeness, separability, connectedness, compactness, etc.).
Similarity and dissimilarity functions are widely used in many research areas: in information retrieval, data mining, machine learning, cluster analysis and applications in database search, protein sequence comparison and many more. When a dissimilarity function is used, a distance metric is normally required. On the other hand, although similarity functions are used, there is no formally accepted definition of this concept [4]. Therefore, in this article we introduce a formal generalised mathematical theory, with all proofs. The organization of the paper is as follows-in Section 1 we briefly introduce the topic. In Sections 2 and 3 the background of metric space and partial metric space is presented. Sections 4 and 5, revised similarity space and duality theory focus on the authors' contribution in the reformulation and the replenishment of the metrics. Section 6, Application of similarity space, presents the connection between revised metrics and The concept of a partial metric space is a generalization of metric space. We think that it should be implemented as a superset of the axioms for a metric space, as well as the definition of a similarity space. We can see both as special cases of a partial metric space.
A partial metric space is an ordered pair (X, p).
Our introduced partial metric space comes from the original definition [10,11]. Note that it is possible to have p(x, x) = p(y, y). The definition of a partial metric allows for the possibility that the self-distance is non-zero. Thus, a metric space can be defined as a partial metric space in which each self-distance is zero, so p(x, x) = p(y, y) = 0, and hence the term p(y, y) will disappear in P2. The reason for allowing non-zero self-distances first came with the definition of a similarity metric.

Similarity Space Revisited
Especially in the last two decades, the research and development of a formal definition of similarity metric (or similarity space) has begun. The new applications and the purpose of this article call for a general consensus and search for well-defined axiomatic systems and theoretical foundations instead of using a non-intuitive duality with the distance. Based on [1,4] we introduce a modified axiomatic system for a similarity metric, which is in agreement with the current notions but simplified so as to be a minimal axiomatic system. Definition 3 (Similarity Space). Given a non-empty set X, a function s : X × X → R is a similarity metric if for all subsets x, y, z ∈ X, it satisfies the following conditions: (S1) s(x, y) = s(y, x) (symmetry), (S2) s(x, z) + s(y, y) ≥ s(x, y) + s(y, z) (triangle inequality), (S3) s(x, x) = s(x, y) = s(y, y) ⇐⇒ x = y (identity of indiscernibles), (S4) s(x, y) ≥ 0 (non-negativity).
A similarity space is an ordered pair (X, s).
Compared to the original system, we have removed the axiom of bounded selfsimilarity, which can be derived from the remaining axioms.
A few issues require attention. The name 'similarity metric' is an already proposed convention. Calling it a 'metric' should be understood in the sense of monotonously decreasing convex transformation of a partial metric or a distance metric that will be shown further in the next section. By this way we avoid misunderstanding.
Unlike D3, d(x, x) = 0, the similarity metric has an upper bound in s(x,x)+s(y,y) 2 and allows s(x, x) = s(y, y). At first sight, this may seem counter-intuitive: x is more (or less) similar to itself than y. In spatial considerations of dissimilarity and distance, this does not arise since d(x, x) = 0 for all objects. Similarity depends on the set of common features, and the result is the possibility of non-identical self-similarities. If we interpret such common features as 'description lengths' or 'complexities', unequal self-similarities become quite natural, and if x has more features than y, we have s(x, x) > s(y, y) [1]. For instance, having a German word x = 'Einkommensteuererklärung' (income tax return) and y = 'Steuer' (tax), then s(x, x) ≥ s(y, y) when counting common characters or q-grams.
We suggest in addition having non-negativity in S4 because the similarity metric doesn't have a direction-in contrast to a vector, it is a scalar value, and so it doesn't make sense to assign to it a negative sign, similar to non-negativity in a metric space. The same principle should be valid for a similarity metric as a requirement for 'symmetric measurement'. The distance between objects remains the same when we measure a distance from another direction. The second reason follows from measure theory (see Appendix A.6), where there is a non-negativity condition µ(X) ≥ 0 for a measure µ on the set X.
At first glance, it is not clear how the axiom of the triangle inequality was formed, just let us refer to related Theorem 6.

Proof. Appendix A.2.
This theorem allows us to apply any linear standardization or re-scaling without any violations of the axioms. In statistics, there is very often used a standard score where µ s is the mean and σ s is the standard deviation. Another example could be taken from min-max feature scaling where X min denotes the minimum value, and X max the maximum value. All values are re-scaled (normalization) to lie within the range [a, b]. When the parameters a = 0, b = 1 are chosen, then this is a unity-based normalization.
For instance, it should be clear that two errors in a comparison of short strings are more critical than in a comparison of long strings. Therefore, it is necessary in some circumstances to normalize the similarity metric. Until the beginning of this century, no such normalization preserving the metric axioms was known for the edit distance metric. Initially, [12] developed a normalized edit distance metric, with the range [0, 1]. It is obvious that for any normalized distance metric d n (x, y), there is also a normalized similarity metric s n (x, y) = 1 − d n (x, y) satisfying Definition 3. Because this axiomatic system is too general and valid for any unnormalized similarity metric functions, we introduce for our case a new specific axiomatic system for a normalized similarity metric in the range [0, 1].
A normalized similarity space is an ordered pair (X, s n ).
We do not relax any axiom compared to Definition 3, but we have created a stricter meaningful special case of that definition by substituting s n (x, x) = 1, which is also the least upper bound s n (x, y). Due to this normalization, the self-similarity is always bounded by the same number s(x, x) = s(y, y) = 1. The total dissimilarity also defines the greatest lower bound s n (x, y) = 0. The requirements for both limit conditions N3 and N4 thus stretch the similarity metric to its boundaries.

Proof. Appendix A.3.
With these properties, we are also connected with probability theory, where we want to ensure that the probability of the similarity is 0 ≤ P(x, y) ≤ 1 as well as 0 ≤ s n (x, y) ≤ 1.

Theorem 4 (Convex Combinations).
A convex combination T C : R → R of normalized similarity metrics is again a normalized similarity metric: This property of convex combinations allows us to assemble different normalized similarity metrics together and obtain again a normalized similarity metric.

Duality of Similarity and Metric Space
The relationship between distance and similarity is not obvious, as distance derives from spatial considerations and similarity relations derive from considering common and non-common features [1]. In many cases, distance is used to measure similarity, although this is far from intuitive and it is often a non-trivial task to find such a dual notion. Let us present a transformation of distance into similarity.
Theorem 5 (Duality of Distance and Similarity [13]). Generally, if a function f Proof. We refer further to the related proof of Corollary 4.
The range [a, b] does not have to be a closed set. We might just denote a = inf{s(x, y)} and b = sup{s(x, y)}. The condition a ≥ 0 is introduced in order to preserve the symmetry with the non-negativity of the values of the distance metric d(x, y). Once we allow a convex distortion of a metric space into a similarity space, this is not necessarily an isomorphic (isometric) transformation, and so the distances between points do not have to be preserved. Most importantly, the relative 'distances' (in the meaning of the inverse of partial order) between the points are preserved in accordance with geometric terminology, for instance d(x, z) ≤ d(x, y) =⇒ s(x, z) ≥ s(x, y) for any subsets x, y, z.
Theorem 6 (Triangle Inequality of Similarity). Any decreasing monotonic convex transformation f of the triangular inequality of the metric d forms a triangular inequality of similarity s: Proof. Appendix A.5.
Hence the condition of monotonously decreasing function preserves the triangle inequality. Because we measure similarities between objects, and not the distance, it is quite arguable that such a distortion would be more suitable. Moreover, many similarity metrics are not related to distance at all, but, conversely, distance is derived in many cases from similarity, for example, the passage from Jaccard similarity to the Jaccard distance [14].
The measure µ (see Appendix A.6) of the symmetric difference of two sets can be considered as a distance between sets, well known as the distance of Fréchet-Nikodym-Aronszajn. This distance is a particular case of the distance in the space of Lebesgue integrable functions. In fact, the distance between sets may be treated as the distance between the characteristic functions χ x and χ y . These characteristic functions are defined on a set X and indicate membership of an element in the subset x, respectively y. In classical set theory, its value is 1 for all elements of x and 0 for all elements of X not in x. Employing fuzzy set theory, we can give an uncertainty to the membership in the range of real values χ ∈ [0, 1].
Theorem 7 (Distance between Two Objects). Let x, y be subsets of set X. The symmetric difference between two objects is a distance metric.
For better illustration, let us suppose two Lebesgue measurable sets A, B. Let us imagine that these sets are described by non-negative real-valued functions f , g in Cartesian system R 1 or R 2 (A corresponds to f and B corresponds to g) [15].
The shaded gray area at the top of Figure 1 essentially shows the distance between objects. Then, we can calculate the area between functions f and g that corresponds area between sets A and B. We can compute the areas in the regions d( Conversely, the shaded gray area at the bottom Figure 1 within overlapping regions A and B and under both graphs f and g represent the similarity between those objects. Analogically, we may deduce a calculation s(A, This fundamental observation allows us to create a bridge between set theory and topology, such as the theories of metric spaces and similarity spaces. From the definition of the similarity s(x, y) it can be deduced that the number of features shared between two objects x and y is given by their intersection µ(x ∩ y). The idea behind definition is very simple, direct and intuitive too, assuming that a similarity metric is a measure s(x, y) = µ(x ∩ y).
Theorem 8 (Similarity of Two Objects). The intersection of two objects represented by subsets x and y is a similarity metric Proof. Appendix A.8.
Now we can generalize our knowledge using the similarity axioms.
Corollary 1 (Similarity of Two Objects using Duality). The similarity metric of two objects given by subsets x, y ∈ X is expressed Proof. Appendix A.9.
As a result from the proof, self-similarity is equivalent to a measure on set µ(x), e.g., cardinality of a countable set, s(x, x) = |x|, respectively s(y, y) = |y|. We can go back to the distance metric from the similarity metric, too.
Corollary 2 (Distance between Two Objects using Duality). The distance metric applied to two objects defined by subsets x, y ∈ X is given by d(x, y) = s(x, x) + s(y, y) − 2s(x, y), Proof. Expressing d(x, y) from Corollary 1.
Corollary 3 (Total Dissimilarity using Duality). The total dissimilarity between two objects is given Proof. Appendix A.10.
Total dissimilarity should mean that there are no features shared between the two objects. In set theory, this is equivalent to being a pair of disjoint sets.
Corollary 4 (Duality of Axiomatic Systems). Consider a similarity space (X, s) and a metric space (X, d). We can define a similarity s on X, dual to metric d, vice versa a distance metric d on X, dual to similarity s, as follows: Proof. Appendix A.11.
Comparing similarity axiom system with the partial metrics from Definition 2, we can see the relation p(x, y) = f −1 (d(x, y)) = s(x, x) + s(y, y) − 2s(x, y) with dependency on Corollary 2 and Corollary 4 which differs from the source [16].
As has been proved, this similarity forms the bridge between Jaccard similarity (see Theorem 13) and similarity metrics derived from distances. From this equation one can deduce that µ(x ∪ y) = s(x,x)+s(y,y)+d(x,y)

.
We can again return from a normalized similarity metric to a normalized similarity distance.
Theorem 11 (Generalized Rozinek Normalized Distance). Generalized Rozinek normalized distance is the following normalized distance metric Proof. Appendix A.14.

Applications of Similarity Spaces
We will show how this is connected to some already well-known coefficients from previous developments of similarity space theory.
The Jaccard similarity is a fundamental similarity measure on sets. Whenever it is used, it is called mainly an index or a coefficient, but it is never called a proper similarity metric. Note that the nonexistence of a mathematical foundation on similarity metrics imposes the necessity of transforming the Jaccard index into the Jaccard distance J D (x, y) = 1 − J S (x, y) and then verifying the triangle inequality J D (x, z) ≤ J D (x, y) + J D (y, z) for that distance [14].
The distance derived from the Ruzicka similarity d n (x, y) = 1 − J G (x, y) is known by the name 'Soergel distance' [3].
Gaussian similarity is relevant to the natural human and animal perception of similarity based on psychological research [19], where it is shown that a stimulus decays exponentially with the distance. Numerous experiments have provided empirical observations of learned responses to some measure of different stimuli. As the independent variable of a physical measure of the difference between two stimuli, there have been chosen, for example, the difference in wavelengths of light, frequencies of tones or angular orientations of shapes.

Theorem 19 (Rozinek Natural Distance). The Rozinek natural distance is a distance metric
Proof. Expressing d(x, y) from Theorem 18.
This distance is derived from Gaussian similarity and describes an inverse problem of how human and animal perception treats a distance depending on the similarity. In addition, there is a limit lim s(x,y)→0 + R ND (x, y) = +∞.
In cases where the similarity measurement is only dependent on the distance and Jaccard-like similarities cannot be used directly, for example, for an edit distance-also called the k difference problem [20]-our similarity metric is very appropriate. We can see an analogy between the k difference problem and the symmetric difference set x y in set theory.
Theorem 20 (Normalized Edit Similarity). The normalized edit similarity is a Rozinek similarity over the alphabet Σ where d(x, y) is an edit distance.
The normalized edit similarity is suitable for conversion from Levenshtein distance or the normalization of the longest common subsequence (LCS). The procedure is shown in proofs.

Definition 5 (Disjoint Strings).
Let Σ be a finite alphabet, and let Σ * denote the set of all finite strings over Σ. Given x and x d any two strings in Σ * , we say that they are disjoint strings if they have no character or symbol in common From the meaning of similarity, we should measure the amount of features shared by two objects. If they have no common features, they are disjoint from each other in their features. Indeed, between two strings, there are no common features if they have no common symbols or letters, hence this is the maximum possible dissimilarity (or, dually, the minimum possible similarity).

Theorem 21 (Total Dissimilarity of Strings). The total dissimilarity of strings x and y over the
Proof. Trivial.
The derived property of total dissimilarity results from the property of the set of the alphabet Σ, which contains distinguishable different objects, namely its letters or symbols. We should always have a textual similarity of zero if two strings x and y have no character or symbol in common. Take, for example, x = "abc" and y = "de f "-it doesn't make much sense for them to have a positive string similarity s(x, y) > 0.

Results
We have proposed a formal generalized mathematical theory of space similarity and similarity functions. General relations for converting a metric to a similarity were derived and general solutions for the normalization of a given similarity space or metric space were introduced. All proofs are attached as appendices. The highlights of the presented concepts are as follows: -Development of a new revised theory of similarity space. - The main contribution is a direct explanation and unified theory of the duality between a similarity space (similarity coefficients) and a metric space. Similarity spaces are as important as metric spaces, and should be used wherever similarity measurements are used, avoiding the confused notion of a dual to the distance. -New Rozinek similarities and distances, using the duality between similarity spaces and metric spaces, have been derived on the basis of set theory. In principle, they are equivalent to a measure of set intersection or Jaccard similarity. This point of view has a general application in transforming any distance metric into a similarity metric, and back to distance metric.

Conclusions
Similarity functions are used in different areas of research, from data mining to protein sequence comparison. This paper introduced a generalized mathematical theory of similarity space, which leads to many already used similarity and distance functions.
The main novelty of the approach is an explanation of unified theory between similarity and metric spaces. From this unified theory, it is possible to derive all the widely used functions, such as the Jaccard coefficient, Tanimoto coefficient, Steinhaus distance, Ruzicka similarity, Soergel distance, Gaussian similarity, edit distance and edit similarity.
Moreover, we introduced new Rozinek similarity metrics and distance metrics based on set intersection, Jaccard-like coefficients, the Gaussian function and edit similarity. The novelty and benefit of Rozinek metrics is an easy way to transform any distance metric into a similarity metric, and vice versa.

Future Work
In our future work, we will mainly focus on: -Development of a novel definition of a space based on elementary mathematical particles from set theory. It is possible to imagine them as "basic" particles that are indivisible in the space and the space is built on them. Therefore, we should be able to describe some relationships with a wider area of applications. Let s(x, y) be a positive linear transformation of s(x, y) such that s(x, y) = αs(x, y) + β for α > 0 and β ≥ 0. S1. By symmetry by multiplication α and adding β s(x, y) = s(y, x) =⇒ αs(x, y) + β = αs(y, x) + β =⇒ s(x, y) = s(y, x) S2. By the triangle inequality, we obtain s(x, z) + s(y, y) ≥ s(x, y) + s(y, z) α(s(x, z) + s(y, y)) + 2β ≥ α(s(x, y) + s(y, z)) + 2β s(x, z) + s(y, y) ≥ s(x, y) + s(y, z) By multiplication by α and adding 2β, the proof is complete. We proceed similarly in cases S3 and S4. and Theorem 1 implies

. Proof of Theorem 4
Just we continue to prove for each axiom, assuming s( Similarly, N2, N3 and N4 are trivial.

Appendix A.5. Proof of Theorem 6
A real-valued function f (d) is said to be convex over the interval [a, b] ∈ R if for any d 1 , d 2 ∈ [a, b] and any λ ∈ [0, 1] we have that The validity of the triangle inequality s(x, z) + s(y, y) ≥ s(x, y) + s(y, z) can be proven from the dual notion of distance s(x, y) = f (d(x, y)) by applying d(x, z) ≤ d(x, y) + d(y, z) and considering possible cases as follows [13].
The reasoning is analogous to the above, just flipping x and z.
As a metric is assumed, d(x, z) ≤ d(x, y) + d(y, z). Hence Let us pick any λ ∈ 0, d(x,y) d(x,z) such that 1 − d(y,z) d(x,z) ≤ λ ≤ d(x,y) d(x,z) . Obviously 0 ≤ λ ≤ 1. We see immediately that λd(x, z) ≤ d(x, y) and (1 − λ)d(x, z) ≤ d(y, z). From the definition of convexity, we have that with the last inequality being due to the fact that f is monotonic decreasing. Similarly By summing all inequalities (Appendix A9-A12) by transitivity ≥ we get so obviously the triangle inequality holds here too.

Appendix A.6. Measure Space Definition
A measurable space is a set X and σ-ring S of subsets of X with the property that S = X. A measure is an extended real valued, non-negative, and countably additive set function µ, defined on a σ-ring S, and such that µ(0) = 0. An ordered triple (X, S, µ) is called a measure space.
The meaning of this definition lies in the abstraction of measurement on countable set given by cardinality or on Lebesgue measurable set. For more details we refer the readers to the source [21].
Appendix A.7. Proof of Theorem 7 We must show that a distance equals the symmetric difference of two sets d(x, y) = µ(x y) [21][22][23]. If µ is a σ-finite measure on a σ-ring S, this function is pseudometric on S (D1 and D2 must be satisfied), assuming x, y, z ∈ S The relation (D3) x ∼ y ⇐⇒ d(x, y) = 0 is an equivalence relation on S, so d becomes a metric on the set S. We need also to prove sequential continuity with Cauchy sequence, i.e., {x n } n∈N 0 , lim This implies that We call d the symmetric difference metric. The symmetric difference between two sets can be considered a measure of how 'far apart' they are.
Appendix A.8. Proof of Theorem 8 At first, we will prove the relation for the intersection of the two objects.
The conditions S1, S3 and S4 are trivial. We show only S2. Since y ⊇ (x ∩ y) ∪ (z ∩ y), we have and, consequently, This yields the desired triangle inequality.
Appendix A.9. Proof of Corollary 1 The self-similarity could be derived from Theorem 8 Similarly we obtain s(y, y) = µ(y). We substitute these terms into Appendix A.13. Proof of Theorem 10 At first we proceed in proving R(x, y) being a normalized similarity metric in Appendix A.12 then we substitute into equation of Theorem 9 Appendix A.14. Proof of Theorem 11 We express a direct relationship between the Jaccard distance (Theorem 14) and the generalized Rozinek normalized distance 2d(x, y) s(x, x) + s(y, y) + d(x, y) = R GD n (x, y) Obviously, conditions D1 and D3 are satisfied. We refer to [14] for D2.
In the last step, we discretized the continous functions where δ may be chosen as a sampling length. The relation of J S (x, y) to R GS (x, y) is derived in Appendix A.13.

Case 2: Longest Common Subsequence (LCS)
We obtain the same results when we normalize the LCS. Let l be the length of the LCS [25] l(x, y) = 1 2 (|x| + |y| − d LCS (x, y)) (A39) where l(x,y) satisfies the similarity axioms from Definition 3 and d LCS denotes the edit distance based on unit insertion and deletation cost [1]. Now we turn our attention to normalizing similarity through evaluating a generalized Tanimoto's coefficient [1,26] S(x, y) = s(x, y) s(x, x) + s(y, y) − s(x, y) The s(x, y) is interpreted as a count of common features, while S(x, y) express this count as a fraction of the total number of features of x and y. We set s(x, y) = l(x, y) and hence obtain S(x, y) = l(x, y) |x| + |y| − l(x, y) Since l(x, x) = |x| and l(x, y) = |y|, we elaborate the above expressions to S(x, y) = |x| + |y| − d OM (x, y) |x| + |y| + d OM (x, y) where d OM is an edit distance (for details see [1]). Hence we have proved that also S(x, y) = R(x, y).