Distribution of Distances between Elements in a Compact Set

In this article, we propose a review of studies evaluating the distribution of distances between elements of a random set independently and uniformly distributed over a region of space in a normed R-vector space (for example, point events generated by a homogeneous Poisson process in a compact set). The distribution of distances between individuals is present in many situations when interaction depends on distance and concerns many disciplines, such as statistical physics, biology, ecology, geography, networking, etc. After reviewing the solutions proposed in the literature, we present a modern, general and unified resolution method using convolution of random vectors. We apply this method to typical compact sets: segments, rectangles, disks, spheres and hyperspheres. We show, for example, that in a hypersphere the distribution of distances has a typical shape and is polynomial for odd dimensions. We also present various applications of these results and we show, for example, that variance of distances in a hypersphere tends to zero when space dimension increases.


Introduction
The distribution of distances between elements in a set of points is present in many problems, particularly in spatial analysis, and in various fields of application: ecology, epidemiology, forestry, biology, astronomy, economics, particle physics, network applications, etc. [1]. For example, given two points randomly selected in a set of points independently and uniformly distributed in space, we aim to know the probability of the distance between these two points inside the set of distances between all the pairs of points ( Figure 1). This question is important when trying to evaluate or model spatial interactions between elements, such as clustering of objects, spatial autocorrelation of a variable across a set of locations, or neighbor relationships and connectivity [2]. Indeed, in nature many problems involve distance-based interactions between events or elements. For example, most methods used to measure spatial autocorrelation or to model spatial interactions are based on a weighted average of a variable between pairs of elements in a disk [2][3][4][5]. If such an index measures a phenomenon related to the distance between the elements, the index may favor the pairs of elements of the most likely distances. In order to avoid such bias, it is necessary to know the distribution of distances and consider the relationship between the distance and the phenomenon independently of the distribution of distances between all pairs of elements [2].
The distribution of distances between two randomly selected points in a compact has been studied for a long time. However, the results are fragmented because they are presented in different articles, with different methods of resolution, depending on the dimension of the space and the type of compact studied. In this article, we first present a literature review of these results. We then propose a unified Figure 1. Distance between two randomly chosen points in a set of points independently and uniformly distributed in a disk (as generated by a homogeneous Poisson process). Among all possible distances between points, how likely would this distance be? Would it be below or above average?
The distribution of distances between two randomly selected points in a compact has been studied for a long time. However, the results are fragmented because they are presented in different articles, with different methods of resolution, depending on the dimension of the space and the type of compact studied. In this article, we first present a literature review of these results. We then propose a unified method of resolution which uses only standard mathematical objects, and which is generalizable to any type of compact set in any dimension. We will describe this general approach and use it to calculate distributions of Euclidian distances between two randomly chosen points, for compact sets as lines, rectangles, disks, cubes, and specific results for hyperspheres of any dimension.

Literature Review
The task of distance distribution estimation between points is related to stochastic geometry [6].

Rayleigh Distribution
Famous Nobel prize physicist Lord Rayleigh (1842-1919) solved a slightly simpler problem than the one studied in this article: he modelled the distribution (known as Rayleigh distribution) of Euclidean distances between a central point and a set of points normally distributed around this central point in a real vector space of dimension two [7].
By positioning the central point to the origin, the problem addressed by Rayleigh was to evaluate the distribution of ‖ ⃗‖, i.e., + in cartesian coordinates for Euclidian distance, the vector ⃗ corresponding to the realization of two real random variables and independent but generated by the same density function: Figure 1. Distance between two randomly chosen points in a set of points independently and uniformly distributed in a disk (as generated by a homogeneous Poisson process). Among all possible distances between points, how likely would this distance be? Would it be below or above average?

Literature Review
The task of distance distribution estimation between points is related to stochastic geometry [6].

Rayleigh Distribution
Famous Nobel prize physicist Lord Rayleigh (1842-1919) solved a slightly simpler problem than the one studied in this article: he modelled the distribution (known as Rayleigh distribution) of Euclidean distances between a central point and a set of points normally distributed around this central point in a real vector space of dimension two [7].
By positioning the central point to the origin, the problem addressed by Rayleigh was to evaluate the distribution of → v , i.e., x 1 2 + x 2 2 in cartesian coordinates for Euclidian distance, the vector → v corresponding to the realization of two real random variables x 1 and x 2 independent but generated by the same density function: where σ, the standard deviation of this normal distribution, allows one to set the concentration of points around the center.
The distribution of distances to the center can then be easily obtained by using the independence of the two random variables and by using polar coordinates: Many phenomena in various fields such as image processing, signal processing, particle physics, etc., follow a Rayleigh distribution.

Distance Distribution between Two Random Points Iud in a Region of R n
More recently, in the 20th century, spatial points' processes in one or two dimensions and related spatial properties, as void or contact distribution or Euclidian distance distribution between k-neighbors, started to receive special attention [8]. Nevertheless, "the research on distributions of distances in point processes of dimensions higher than one have never been an issue of systematic research and have been performed in rather ad hoc way in the past" ([1], p. 2). The problem was not addressed holistically, but depending on the field of application and the geometric form considered, in two or three dimensions, essentially circles and spheres or rectangles and cubes.
For example, in R 2 , for two random points → U and → V in a circle of radius R, geometric resolution of the distribution of → U − → V described in [1] use the Croften's fixed-point theorem and the mean value theorem [23], and the result has been known since the end of the 19th century: Again in R 2 , distributions of Euclidian distances between two random points in a rectangle has long been addressed [24], and an analytical resolution is presented in [25]. More recently, other studies have focused on polygons [26].
Distribution of Euclidian distances for a cube has also been addressed [27][28][29][30], but without general formulation for any dimensions. The distribution function for this random variable seems not to be known before 1978 [27]. Robbins's constant [31] was defined as the mean Euclidian distance between two random points in a unit cube.
As one of the many random quantities studied in Geometric Probability, results were extended to the 4th and 5th dimensions [33] but for higher dimensions the increase of algebraic complexity associated with derivation procedures was a strong limiting factor. These results can have practical applications in multidimensional analysis and data mining.

A Unified Method for the Evaluation of Euclidian Distance Distributions between Two Randomly Chosen Points
We present now an original approach using a unified method generalizable to any type of compact set in any dimensions. Mathematical formalization and resolution will use only well-known objects and methods such as random variables and density functions, convolution, marginal distribution, and some standard functions (Gamma, Beta). We will use this approach to calculate distributions of Euclidian distances between two randomly chosen points for hypersphere and hypercube of any dimensions, and therefore confirm results already known in the literature as mentioned before.

Mathematical Formalization
In this article, the Euclidean norm will be considered: is the closed ball of center → u and radius r, which corresponds to all the vectors → v de E whose distance to → u is less than or equal to r: . D represents the distance between two vectors obtained randomly in E. Our problem is to determine the probability density of D from the process H, i.e., to determine the function f D such that

Using Convolution of Density Functions
Considering that p( When E is uni-dimensional, → U and → V are simply independent R-random variables U and V. Convolution can be used to find the density f U+V from the density functions f U and f V [34], , which stays true even in higher dimensions.
We have Therefore, .
As a way of consequence, f→ As such we can obtain the distribution of

Distribution with Random Set of Points Iud in a Compact Set
Compact sets are convenient to model any type of spatial region of any shape and with finite size (which is always the case in reality). We assume in the following that the set F is a set of elements iud with uniform density ρ in a compact K, corresponding to a homogeneous Poisson process of density ρ on K. So, we have: where 1 K is the indicator function of K, and λ the Lebesgue measure in E.
Using previously presented tools, Because λ is defined as a measure with translational invariance, one can be sure that p(D ≤ r) is not affected by the position of K inside E but only by the "shape" of K and K ∩ (K + → x ). This translational invariance is very intuitive; distances inside a spatial area are never affected by the global position of the area:

Using Equation (3) for Typical Compact Sets
We will apply this general resolution formula for typical compact sets, especially to hypercubes and hyperspheres of any dimension.

K is a Segment in a 1-Dimension Space
In dimension 1, the compact K is a segment We can easily determine the well-known density function [9] (Figure 2). We have (a) (b) Then, As the density is linear, it is easy to calculate the mean, variance and median (called m):

K Is a Rectangle in a 2-Dimensional Space
The two-dimensional rectangle displays a highly convenient property, namely that the x-axis and y-axis are statistically independent.
Let us introduce parameters for our rectangle: where (a, b) ∈ R +2 and (α, β) ∈ R 2 . Thanks to the remark previously made about translational invariance, (α, β) can be replaced by (0, 0) without changing the final result. Here, Equation (1) gives noticing that the longest distance inside K is √ a 2 + b 2 , the distribution function becomes Clearly, we need to separate different cases: • When t ∈ [0; min(a, b)], the calculation's results in a polynomial density function for • It is possible to calculate explicitly p( . Nevertheless, the expression is not polynomial anymore and ends up being much less simple than the integral form. • If a < b, the proportion of the sample that follows a polynomial distribution is given by In the particular case where a = b (i.e., K is a square), the polynomial function describes the first π − 2 − 1 6 (∼ 97%) of all distances. As an example, Figure 3 shows the graph for density on a square (a) and inside some rectangles (b).

K Is a Disk in a 2-Dimensional Space
K is a disk with radius R, where → c is the center of the circle and R its radius. It is obviously possible to limit our study to the case To calculate f D we need to integrate the area S = λ(B( R , 1)) on all angles θ ∈ ] −π; π]. We can demonstrate the isotropy of S (the area remains the same for any value of θ). Let M(θ) be the rotation operator, To calculate the surface S, let l(u) be the length of the chord for the u coordinate on the x-axis [35]: This finally leads to the already-mentioned result above, 2R being the longest distance possible inside a circle. Figure 4 shows the graph of distribution of distances inside a circle. We can calculate the mean and variance of this distribution: If m stands for the median, solving (t)dt = 1 2 for m turns out to be unsolvable analytically. Nevertheless, we can show that where µ is the mean of the distribution. The median is only very slightly lower than mean.

K Is a Sphere in a Three-Dimensional Space
When using Formula (2), it is clear that when a space's dimension is equal to three or more, the issue is simply to calculate a volume (or hypervolume). Calculations are quite similar to those used for the circle. Firstly, let us give random vectors → U and → V's density, c with a radius R. Just like previously, replacing → c with → 0 will not change any results. We have: Using spherical coordinates (r, θ, φ) gives us ∀t ∈ R + , P( . A rotation matrix determinant is still equal to 1. Therefore angles (θ, φ) will not change the value of the volume: The volume of the intersection can be calculated similarly to the previous two-dimensional area. The difference is that for every l(y) in our circle, there is a whole disc for every y. Therefore: Finally, The density is polynomial in a three-dimensional space. Clearly, third dimension favors the presence of longer distances: which is greater than the expected value calculated in a two-dimensional space. Calculating the median implies solving for m a polynomial equation of degree 6: Unfortunately it is impossible to give a general solution. Nevertheless, m app = 1.033·R gives a very good approximation of m. This estimation shows that this time, the median is greater than the mean. It is remarkable that in contrary to 2D, 3D distances are a slightly more likely to be longer than average.
Finally, let us calculate the variance which is as expected significantly lower than two-dimensional variance. Vol n−1 ( 1 − y 2 )dy where γ n is a constant and Vol n the volume of hypersphere in n-dimensional space: Therefore, we have where γ n can be evaluated as the normalizing constant. The generalized binomial theorem [36] gives us the way to evaluate the integral, with k ∈ N and ν ∈ R.
As we can see with Equation (7), lim n→∞ Vol n (R) = 0, and it is well-known that hyperspheres become "hollow" when the dimension is high enough [38], and most points in a hypersphere tend to agglomerate towards its hypersurface. This has a consequence on distances that is quite intuitive and explains why the variance of distances tends to zero when the dimension increases, i.e., diversity of geometric configurations is increasingly limited as the dimension in space increases. Our result shows how fast this phenomenon impacts the distances between points inside the hypersurface when dimension increases.

Conclusions
We have developed a general and unified method to obtain the distribution of distances between two points randomly selected in a iud cloud of points in a geometric figure. These distributions are useful, especially in spatial statistics, to know the statistical representativeness (the weight) of a distance between two points. In the case of iud set of random points in a hypersphere, the expression of density is given for any dimension, and the variance of these distributions converge to zero when the dimension increases. This result also opens new perspectives in multidimensional analysis and data mining.