Next Article in Journal
General Fitting Methods Based on Lq Norms and their Optimization
Previous Article in Journal
Evaluating the Performance of Multiple Imputation Methods for Handling Missing Values in Time Series Data: A Study Focused on East Africa, Soil-Carbonate-Stable Isotope Data
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Distribution of Distances between Elements in a Compact Set

1
UFR Physique, Université Paris-Diderot, 75013 Paris, France
2
UMR Unité des Virus Emergents (UVE Aix-Marseille Univ-IRD 190-Inserm 1207-IHU Méditerranée Infection), 13005 Marseille, France
*
Author to whom correspondence should be addressed.
Stats 2020, 3(1), 1-15; https://doi.org/10.3390/stats3010001
Submission received: 11 September 2019 / Revised: 18 December 2019 / Accepted: 21 December 2019 / Published: 26 December 2019

Abstract

:
In this article, we propose a review of studies evaluating the distribution of distances between elements of a random set independently and uniformly distributed over a region of space in a normed R -vector space (for example, point events generated by a homogeneous Poisson process in a compact set). The distribution of distances between individuals is present in many situations when interaction depends on distance and concerns many disciplines, such as statistical physics, biology, ecology, geography, networking, etc. After reviewing the solutions proposed in the literature, we present a modern, general and unified resolution method using convolution of random vectors. We apply this method to typical compact sets: segments, rectangles, disks, spheres and hyperspheres. We show, for example, that in a hypersphere the distribution of distances has a typical shape and is polynomial for odd dimensions. We also present various applications of these results and we show, for example, that variance of distances in a hypersphere tends to zero when space dimension increases.

1. Introduction

The distribution of distances between elements in a set of points is present in many problems, particularly in spatial analysis, and in various fields of application: ecology, epidemiology, forestry, biology, astronomy, economics, particle physics, network applications, etc. [1]. For example, given two points randomly selected in a set of points independently and uniformly distributed in space, we aim to know the probability of the distance between these two points inside the set of distances between all the pairs of points (Figure 1).
This question is important when trying to evaluate or model spatial interactions between elements, such as clustering of objects, spatial autocorrelation of a variable across a set of locations, or neighbor relationships and connectivity [2]. Indeed, in nature many problems involve distance-based interactions between events or elements. For example, most methods used to measure spatial autocorrelation or to model spatial interactions are based on a weighted average of a variable between pairs of elements in a disk [2,3,4,5]. If such an index measures a phenomenon related to the distance between the elements, the index may favor the pairs of elements of the most likely distances. In order to avoid such bias, it is necessary to know the distribution of distances and consider the relationship between the distance and the phenomenon independently of the distribution of distances between all pairs of elements [2].
The distribution of distances between two randomly selected points in a compact has been studied for a long time. However, the results are fragmented because they are presented in different articles, with different methods of resolution, depending on the dimension of the space and the type of compact studied. In this article, we first present a literature review of these results. We then propose a unified method of resolution which uses only standard mathematical objects, and which is generalizable to any type of compact set in any dimension. We will describe this general approach and use it to calculate distributions of Euclidian distances between two randomly chosen points, for compact sets as lines, rectangles, disks, cubes, and specific results for hyperspheres of any dimension.

2. Literature Review

The task of distance distribution estimation between points is related to stochastic geometry [6].

2.1. Rayleigh Distribution

Famous Nobel prize physicist Lord Rayleigh (1842–1919) solved a slightly simpler problem than the one studied in this article: he modelled the distribution (known as Rayleigh distribution) of Euclidean distances between a central point and a set of points normally distributed around this central point in a real vector space of dimension two [7].
By positioning the central point to the origin, the problem addressed by Rayleigh was to evaluate the distribution of v , i.e., x 1 2 + x 2 2 in cartesian coordinates for Euclidian distance, the vector v corresponding to the realization of two real random variables x 1 and x 2 independent but generated by the same density function:
t   R ,   f X i ( t ) = 1 2 π σ 2 exp ( t 2 2 σ 2 ) i = 1 , 2
where σ , the standard deviation of this normal distribution, allows one to set the concentration of points around the center.
The distribution of distances to the center can then be easily obtained by using the independence of the two random variables and by using polar coordinates:
r   R + , p ( V r ) = p ( V B ( 0 , r ) ) w h e r e   B ( 0 , r )   i s   t h e   c l o s e d   b a l l   o f   r a d u i s   r = B ( 0 , r ) f X ( x , y ) d ( x , y ) = B ( 0 , r ) f X 1 ( x ) f X 2 ( y ) d x d y f r o m   c o o r d i n a t e s   i n d e p e n d e n c e = 1 2 π σ 2   B ( 0 , r ) exp ( x 2 + y 2 2 σ 2 ) d x d y f r o m   n o r m a l   d i s t r i b u t i o n = 1 2 π σ 2 0 2 π 0 r t exp ( t 2 2 σ 2 ) d t d θ w i t h   p o l a r   c o o r d i n a t e s = 0 r t σ 2 exp ( t 2 2 σ 2 ) d t
Many phenomena in various fields such as image processing, signal processing, particle physics, etc., follow a Rayleigh distribution.

2.2. Distance Distribution between Two Random Points Iud in a Region of ℝn

More recently, in the 20th century, spatial points’ processes in one or two dimensions and related spatial properties, as void or contact distribution or Euclidian distance distribution between k-neighbors, started to receive special attention [8]. Nevertheless, “the research on distributions of distances in point processes of dimensions higher than one have never been an issue of systematic research and have been performed in rather ad hoc way in the past” ([1], p. 2). The problem was not addressed holistically, but depending on the field of application and the geometric form considered, in two or three dimensions, essentially circles and spheres or rectangles and cubes.
Distribution of Euclidian distances between two random points for iud set of points in a circle, sphere or hypersphere has been addressed many times in the literature in different fields and with different techniques (geometry, differential equations): in mathematics and statistics [9,10,11,12,13], in chromosome analysis [14], in geography [15], in demography [16,17,18,19,20], in network analysis [1], and in physics [21,22].
For example, in R 2 , for two random points U and V in a circle of radius R , geometric resolution of the distribution of U V described in [1] use the Croften’s fixed-point theorem and the mean value theorem [23], and the result has been known since the end of the 19th century:
t ] 0 ; 2 R ] , f U V ( t ) = 4 t π R 2 ( arccos ( t 2 R ) t 2 R 1 ( t 2 R ) 2 )
Again in R 2 , distributions of Euclidian distances between two random points in a rectangle has long been addressed [24], and an analytical resolution is presented in [25]. More recently, other studies have focused on polygons [26].
Distribution of Euclidian distances for a cube has also been addressed [27,28,29,30], but without general formulation for any dimensions. The distribution function for this random variable seems not to be known before 1978 [27]. Robbins’s constant [31] was defined as the mean Euclidian distance between two random points in a unit cube.
Results for cubes have been compiled by EW Weisstein and presented at http://mathworld.wolfram.com/CubeLinePicking.html [32].
As one of the many random quantities studied in Geometric Probability, results were extended to the 4th and 5th dimensions [33] but for higher dimensions the increase of algebraic complexity associated with derivation procedures was a strong limiting factor. These results can have practical applications in multidimensional analysis and data mining.

3. A Unified Method for the Evaluation of Euclidian Distance Distributions between Two Randomly Chosen Points

We present now an original approach using a unified method generalizable to any type of compact set in any dimensions. Mathematical formalization and resolution will use only well-known objects and methods such as random variables and density functions, convolution, marginal distribution, and some standard functions (Gamma, Beta). We will use this approach to calculate distributions of Euclidian distances between two randomly chosen points for hypersphere and hypercube of any dimensions, and therefore confirm results already known in the literature as mentioned before.

3.1. Mathematical Formalization

Let u be a vector in a normed R -vector space E of dimension n . Its coordinates are noted ( x 1 , x 2 , , x n ) in a orthonormal coordinate system, u the norm of u , d the distance associated to the norm ( d ( u , v ) = u v . In this article, the Euclidean norm will be considered: u = x 1 2 + x 2 2 + + x n 2 ). B ( u , r ) is the closed ball of center u and radius r , which corresponds to all the vectors v   d e   E whose distance to u is less than or equal to r :
r R + ,     u E , v B ( u , r ) u v r
Let U and V denote two random vectors in E independent and identically distributed with the same probability distribution corresponding to a homogeneous Poisson point process H (a completely spatial random process). Random vectors U and V allow us to simulate the pairs ( u , v ) of elements of a subset F of E , and to evaluate the distribution of their distances d = u v . Let D be the random variable in R + such that: D = U V . D represents the distance between two vectors obtained randomly in E . Our problem is to determine the probability density of D from the process H , i.e., to determine the function f D such that
r R + , p ( D r ) = 0 r f D ( t ) d t

3.2. Using Convolution of Density Functions

Considering that p ( U V t ) = p ( U V B ( 0 , t ) ) , finding the density function f U V using f U and f V would lead to the expected distribution of D .
When E is uni-dimensional, U and V are simply independent R -random variables U and V . Convolution can be used to find the density f U + V from the density functions f U and f V [34],
x R , f U + V ( x ) = ( f U f V ) ( x ) = R f U ( y ) f V ( x y ) d y
Therefore, f U V can be seen as the convolution of f U and f V ,
x R , f U V ( x ) = R f U ( y ) f V ( y x ) d y
given that x R ,   f V ( x ) = f V ( x ) , which stays true even in higher dimensions.
When U and V are two independent n -dimensional random vectors in a vector space E , ( U , V ) is a 2 n -dimensional vector. Let A be the ( 2 n × 2 n ) matrix such that A × ( ˙ U , V ) = ( U + V , V ) ,
A = ( I n I n 0 I n )
where I n is the identity matrix in E .
We have
A 1 = ( I n I n 0 I n )   and   det ( A ) = det ( A 1 ) = 1 .
Therefore,
( x , y ) E 2 , f ( U + V , V )   ( x , y ) = 1 | det ( A ) | f ( U , V ) ( A 1 ( ˙ x , y ) ) = f ( U , V ) ( x y , y ) = f U ( x y ) f V ( y ) ( f r o m   i n d e p e n d e n c e   o f   U   a n d   V ) .
As a way of consequence, f U + V is the marginal distribution of f ( U + V , V ) :
x E , f U + V ( x ) = E f U ( y ) f V ( x y ) d y
Given that x E ,   f V ( x ) = f V ( x ) thus leads to
x E , f U V ( x ) = E f U ( y ) f V ( y x ) d y
As such we can obtain the distribution of U V from U and V distributions only.

3.3. Distribution with Random Set of Points Iud in a Compact Set

Compact sets are convenient to model any type of spatial region of any shape and with finite size (which is always the case in reality). We assume in the following that the set F is a set of elements iud with uniform density ρ in a compact K, corresponding to a homogeneous Poisson process of density ρ on K. So, we have:
f U = f V = 1 λ ( K ) 1 K
where 1 K is the indicator function of K , and λ the Lebesgue measure in E.
Using previously presented tools,
x E , f U V ( x ) = E f U ( y ) f V ( y x ) d y = E 1 λ ( K ) 2 1 K ( y ) · 1 K ( y x ) d y = 1 λ ( K ) 2 E 1 K ( y ) · 1 K + x ( y ) d y = 1 λ ( K ) 2 E 1 K ( K + x ) ( y ) d y = λ ( K ( K + x ) ) λ ( K ) 2
where K + x = { k + x | k k } .
The distribution of D = U V follows:
r R + , p ( D r ) = p ( U V B ( 0 , r ) ) = 1 λ ( K ) 2 B ( 0 , r ) λ ( K ( K + x ) ) d x
Because λ is defined as a measure with translational invariance, one can be sure that p ( D r ) is not affected by the position of K inside E but only by the “shape” of K and K ( K + x ) . This translational invariance is very intuitive; distances inside a spatial area are never affected by the global position of the area:
( x , u ) E 2 , λ ( [ K + u ] [ K + x + u ] ) = λ ( [ K K + x ] + u ) = λ ( K ( K + x ) )

4. Using Equation (3) for Typical Compact Sets

We will apply this general resolution formula for typical compact sets, especially to hypercubes and hyperspheres of any dimension.

4.1. K is a Segment in a 1-Dimension Space

In dimension 1, the compact K is a segment [ a , b ]   ( b > a ) in R . The set F stands for a set of random values uniformly distributed in [ a , b ] . U and V are simply independent R -random variables U and V . We can easily determine the well-known density function [9] (Figure 2). We have
f U = f V = 1 b a 1 [ a , b ]
Due to translational invariance, [ a , b ] can be replaced with [ 0 , ]   w i t h   = b a .
From (3) we have:
x R , f U V ( x ) = 1 2 λ ( [ 0 , ] [ x , x + ] )
Therefore, if   | x | ,   f U V = 0 . Otherwise, f U V ( x ) = 1 ( 1 | x | ) . Hence,
x R , f U V ( x ) = 1 ( 1 | x | ) 1 [ , ] ( x )
Then,
p ( | U V | t ) = 1 t t ( 1 | x | ) 1 [ , ] ( x ) d x = 2 0 t ( 1 x ) 1 [ 0 , ] ( x ) d x
Hence,
t R + , f | U V | ( t ) = 2 ( 1 t ) 1 [ 0 , ] ( t )
As the density is linear, it is easy to calculate the mean, variance and median (called m ):
E ( | U V | ) = 3
Var ( | U V | ) = 2 18
m = ( 1 2 2 ) ( ~   0.293   )

4.2. K Is a Rectangle in a 2-Dimensional Space

The two-dimensional rectangle displays a highly convenient property, namely that the x-axis and y-axis are statistically independent.
Let us introduce parameters for our rectangle: K = [ α ; α + a ] × [ β ; β + b ] where ( a , b ) R + 2 and ( α , β ) R 2 . Thanks to the remark previously made about translational invariance, ( α , β ) can be replaced by ( 0 , 0 ) without changing the final result. Here, Equation ( 1 ) gives
( x , y ) R 2 , f U ( x , y ) = f V ( x , y ) = 1 a b 1 [ 0 ; a ] × [ 0 ; b ] ( x , y ) = 1 a b 1 [ 0 ; a ] ( x ) · 1 [ 0 ; b ] ( y )
Then, we have
( x 1 , x 2 ) R 2 , f U V ( x 1 , x 2 ) = 1 ( a b ) 2 R 2 1 [ 0 ; a ] ( y 1 ) · 1 [ 0 ; b ] ( y 2 ) · 1 [ x 1 ; a + x 1 ] ( y 1 ) · 1 [ x 2 ; b + x 2 ] ( y 2 ) d ( y 1 , y 2 ) = 1 a b R 2 1 [ 0 ; 1 ] [ x 1 a ; 1 + x 1 a ] ( y 1 ) · 1 [ 0 ; 1 ] [ x 2 b ; 1 + x 2 b ] ( y 2 ) d ( y 1 , y 2 ) = 1 a b 1 [ a ; a ] ( x 1 ) · 1 [ b ; b ] ( x 2 ) max ( 0 ; x 1 a ) min ( 1 ; 1 + x 1 a ) d y 1 · max ( 0 ; x 2 b ) min ( 1 ; 1 + x 2 b ) d y 2 = 1 a b 1 [ a ; a ] ( x 1 ) · 1 [ b ; b ] ( x 2 ) ( min ( 1 ; 1 + x 1 a ) max ( 0 ; x 1 a ) ) · ( min ( 1 ; 1 + x 2 b ) max ( 0 ; x 2 b ) ) = 1 a b · ( 1 | x 1 | a ) · ( 1 | x 2 | b )
f U V must be integrated on B ( 0 , t ) in order to get the distribution p ( U V t ) for t R + . By noticing that the longest distance inside K is a 2 + b 2 , the distribution function becomes
t [ 0 ; a 2 + b 2 ] , p ( U V t ) = 1 a b B ( 0 , r ) ( 1 | x 1 | a ) · ( 1 | x 2 | b ) d ( x 1 , x 2 ) = 1 a b t t 1 [ b ; b ] ( x 2 ) · ( 1 | x 2 | b ) t 2 x 2 2 t 2 x 2 2 1 [ a ; a ] ( x 1 ) · ( 1 | x 1 | a ) d ( x 1 , x 2 ) = 4 a b 0 t 1 [ 0 ; b ] ( x 2 ) · ( 1 | x 2 | b ) 0 t 2 x 2 2 1 [ 0 ; a ] ( x 1 ) · ( 1 | x 1 | a ) d λ ( x 1 , x 2 ) = 4 a b 0 min ( b ; t ) ( 1 | x 2 | b ) 0 min ( a ; t 2 x 2 2 ) ( 1 | x 1 | a ) d ( x 1 , x 2 ) = 4 a b 0 min ( b ; t ) ( 1 x b ) · ( 1 1 2 a min ( a ; t 2 x 2 ) ) min ( a ; t 2 x 2 ) d x
Clearly, we need to separate different cases:
  • When t [ 0 ; min ( a , b ) ] , the calculation’s results in a polynomial density function for U V :
    x [ 0 ; min ( a ; b ) ] , f U V ( x ) = 2 x ( a b ) 2 ( x 2 2 ( a + b ) x + π a b )
  • It is possible to calculate explicitly p ( U V t ) for values of t in [ min ( a ; b ) ; a 2 + b 2 ] . Nevertheless, the expression is not polynomial anymore and ends up being much less simple than the integral form.
  • If a < b , the proportion of the sample that follows a polynomial distribution is given by
    a b · ( π 4 3 5 a 6 b )
In the particular case where a = b (i.e., K is a square), the polynomial function describes the first π 2 1 6 ( ~   97 % ) of all distances.
As an example, Figure 3 shows the graph for density on a square (a) and inside some rectangles (b).

4.3. K Is a Disk in a 2-Dimensional Space

K is a disk with radius R ,
x R 2 , f U ( x ) = f V ( x ) = 1 π R 2 1 B ( c , R ) ( x )
where c is the center of the circle and R its radius. It is obviously possible to limit our study to the case where c = 0 .
From (3) we have
x R 2 , f U V ( x ) = ( 1 π R 2 ) 2 R 2 1 B ( 0 , R ) ( y ) 1 B ( 0 , R ) ( y x ) d ( y ) = ( 1 π R 2 ) 2 R 2 1 B ( 0 , R ) ( y ) 1 B ( x , R ) ( y ) d ( y ) = R 2 π 2 R 4 R 2 1 B ( 0 , 1 ) ( y ) 1 B ( x R , 1 ) ( y ) d ( y ) = 1 ( π R ) 2 λ ( B ( 0 , 1 ) B ( x R , 1 ) )
To calculate f D we need to integrate the area S = λ ( B ( 0 , 1 ) B ( x ( r , θ ) R , 1 ) ) on all angles θ ] π ; π ] .
We can demonstrate the isotropy of S (the area remains the same for any value of θ ).
Let M ( θ ) be the rotation operator,
θ ] π ; π ] , λ ( B ( 0 , 1 ) B ( 1 R x ( r , θ ) , 1 ) ) = λ [ M ( θ ) · ( B ( 0 , 1 ) B ( 1 R x ( r , 0 ) , 1 ) ) ] = | d e t ( M ( θ ) ) | · λ ( B ( 0 , 1 ) B ( 1 R x ( r , 0 ) , 1 ) ) = λ ( B ( 0 , 1 ) B ( 1 R x ( r , 0 ) , 1 ) )
Thus,
t R + , f U V ( t ) = t 0 2 π λ ( K [ K + x ( t , θ ) ] ) d θ = 2 π t ( π R ) 2 λ ( B ( 0 , 1 ) B ( 1 R x ( t , 0 ) , 1 ) ) = 2 π t ( π R ) 2 S
To calculate the surface S, let l ( u ) be the length of the chord for the u coordinate on the x-axis [35]:
r ] 0 ; 2 R ] , S = 2 r 2 R 1 l ( u ) d u = 2 r 2 R 1 1 u 2 d u = 2 0 arccos r 2 R sin 2 θ d θ = 2 [ arccos ( r 2 R ) r 2 R 1 ( r 2 R ) 2 ]
This finally leads to the already-mentioned result above,
t ] 0 ; 2 R ] , f U V ( t ) = 4 t π R 2 ( arccos ( t 2 R ) t 2 R 1 ( t 2 R ) 2 )
2 R being the longest distance possible inside a circle.
Figure 4 shows the graph of distribution of distances inside a circle.
We can calculate the mean and variance of this distribution:
R R + * , E ( U V ) = 0 2 R t f U V ( t ) d t = 2 5 π R 0 π 2 sin ( θ ) cos 2 ( θ ) ( θ sin 2 ( θ ) ) d θ = 2 7 45 π R ( 0.905 · R )
If m stands for the median, solving 0 m f U V ( t ) d t = 1 2 for m turns out to be unsolvable analytically. Nevertheless, we can show that
R R + * , 0 μ f U V ( t ) d t 0.511
where μ is the mean of the distribution. The median is only very slightly lower than mean.
To calculate the variance, we calculate first E ( U V 2 ) ,
R R + * , E ( U V 2 ) = 0 2 R t 2 f U V ( t ) d t = 2 6 π R 2 0 π 2 sin ( θ ) cos 3 ( θ ) ( θ cos ( θ ) sin ( θ ) ) d θ = R 2
which is quite remarkable. Therefore,
Var ( U V ) = ( 1 ( 2 7 45 π ) 2 ) · R 2 0.180 · R 2
Let us check, as an exercise, that the density is indeed normalized:
R R + * , 4 π R 2 0 2 R t ( arccos ( t 2 R ) t 2 R 1 ( t 2 R ) 2 ) d t = 16 π 0 π 2 sin ( θ ) cos ( θ ) ( θ cos ( θ ) sin ( θ ) ) d θ = 16 π ( 0 π 2 θ · θ ( 1 2 sin 2 ( θ ) ) d θ 0 π 2 1 4 sin 2 ( 2 θ ) d θ ) = 16 π ( π 4 1 4 0 π 2 ( 1 cos ( 2 θ ) ) d θ 1 8 0 π 2 ( 1 cos ( 4 θ ) ) d θ ) = 16 π ( π 4 π 8 π 16 ) = 1

4.4. K Is a Sphere in a Three-Dimensional Space

When using Formula (2), it is clear that when a space’s dimension is equal to three or more, the issue is simply to calculate a volume (or hypervolume). Calculations are quite similar to those used for the circle. Firstly, let us give random vectors U and V ’s density,
x R 3 , f U ( x ) = f V ( x ) = 1 4 3 π R 3 1 B ( c , R ) ( x )
where B ( c , R ) is the 3D sphere centered at c with a radius R . Just like previously, replacing c with 0 will not change any results. We have:
x R 3 , f U V ( x ) = ( 1 4 3 π R 3 ) 2 R 3 1 B ( 0 , R ) ( y ) 1 B ( 0 , R ) ( y x ) d ( y ) = ( 1 4 3 π R 3 ) 2 R 3 1 B ( 0 , R ) ( y ) 1 B ( x , R ) ( y ) d ( y ) = ( 3 4 π R 3 ) 2 R 3 λ ( B ( 0 , 1 ) B ( x R , 1 ) )
Using spherical coordinates ( r , θ , ϕ ) gives us
t R + , P ( U V t ) = 9 16 π 2 R 3 B ( 0 , t ) λ ( B ( 0 , 1 ) B ( x R , 1 ) ) d t = 9 16 π 2 R 3 0 t t 2 ( 0 2 π 0 π sin ( ϕ ) λ ( B ( 0 , 1 ) B ( x R , 1 ) ) d θ d ϕ ) d t
giving us directly the density f U V . A rotation matrix determinant is still equal to 1. Therefore angles ( θ , ϕ ) will not change the value of the volume:
t [ 0 ; 2 R ] , f U V ( t ) = 9 16 π 2 R 3 · ( 4 π t 2 ) · λ ( B ( 0 , 1 ) B ( x ( r , 0 , 0 ) R , 1 ) )
The volume of the intersection can be calculated similarly to the previous two-dimensional area. The difference is that for every l ( y ) in our circle, there is a whole disc for every y . Therefore:
λ ( B ( 0 , 1 ) B ( x ( r , 0 , 0 ) R , 1 ) ) = 2 r 2 R 1 π l ( u ) 2 d u = 2 r 2 R 1 π ( 1 u 2 ) d u = 2 π ( 1 3 ( r 2 R ) 3 r 2 R + 2 3 )
Finally,
t [ 0 ; 2 R ] , f U V ( t ) = 3 2 R 3 t 2 ( ( t 2 R ) 3 3 ( t 2 R ) + 2 )
The density is polynomial in a three-dimensional space. Clearly, third dimension favors the presence of longer distances:
R R + * , E ( U V ) = ( 1 + 1 35 ) R ( 1.029   R )
which is greater than the expected value calculated in a two-dimensional space. Calculating the median implies solving for m a polynomial equation of degree 6:
2 ( m 2 R ) 6 9 ( m 2 R ) 4 + 8 ( m 2 R ) 3 = 1 2
Unfortunately it is impossible to give a general solution. Nevertheless, m a p p = 1.033 · R gives a very good approximation of m . This estimation shows that this time, the median is greater than the mean. It is remarkable that in contrary to 2D, 3D distances are a slightly more likely to be longer than average.
Finally, let us calculate the variance
R R + * , Var ( U V ) = ( 1 1 175 ) R 2 7 ( 0.142 · R 2 )
which is as expected significantly lower than two-dimensional variance.

4.5. K Is a Hypersphere in Higher Dimensions (n > 3)

Random vectors U and V are now considered independently and uniformly distributed in B ( c , R ) R n where n N , n 4 . A generalization of previous methods allows us to expect f U V to take the following form:
t [ 0 ; 2 R ] , f U V ( t ) = γ n t n 1 t 2 R 1 Vol n 1 ( 1 y 2 ) d y
where γ n is a constant and Vol n the volume of hypersphere in n-dimensional space:
Vol n ( R ) = 2 π n 2 n Γ ( n 2 ) R n
Therefore, we have
t [ 0 ; 2 R ] , f U V ( t ) = γ n t n 1 t 2 R 1 ( 1 y 2 ) n 1 2 d y
where γ n can be evaluated as the normalizing constant. The generalized binomial theorem [36] gives us the way to evaluate the integral,
t 2 R 1 ( 1 y 2 ) n 1 2 d y = k = 0 + ( n 1 2 k ) ( 1 ) k 2 k + 1 ( 1 t 2 R 2 k + 1 )
where ( n 1 2 k ) is the generalized binomial factor ( ν k ) = ν ( ν 1 ) ( ν k + 1 ) k ! with k N   and   ν R .
Using the fact that 1 2 R f U V ( t ) d t = 1 , we have
1 γ n = ( 2 R ) n n k = 0 + ( n 1 2 k ) ( 1 ) k 2 k + 1 0 1 y n 1 ( 1 y 2 k + 1 ) d y = ( 2 R ) n n 0 1 y n k = 0 + ( n 1 2 k ) ( 1 ) k y 2 k d y = ( 2 R ) n n 0 1 x n 1 2 ( 1 x ) n 1 2 d x = ( 2 R ) n n β ( n + 1 2 , n + 1 2 )
where β is Euler’s Beta function [37].
This gives the explicit form of density function f U V of a random variable D n = U V in dimension n (Figure 5a)
Therefore, the mean of D n can be calculated using
E ( D n ) = γ n 2 ( 2 R ) n + 1 n + 1 β ( n + 2 2 , n + 1 2 ) = 2 R n n + 1   Γ ( n + 2 2 ) Γ ( n + 1 2 ) Γ ( n + 1 ) Γ ( n + 3 2 )
where Γ is Euler Gamma function.
Using Stirling approximation [35], we have lim q   E ( D n ) = R 2 .
On the other hand,
E ( D n 2 ) = γ n 2 ( 2 R ) n + 2 n + 2 β ( n + 3 2 , n + 1 2 ) = 2 R 2 ( 1 2 n + 2 )
We have then lim n ( E ( D n 2 ) ) = lim n ( E ( ( D n ) 2 ) = 2 R 2 , which implies that lim n ( Var ( D n ) ) = 0 (Figure 5b).
As we can see with Equation (7), lim n Vol n ( R ) = 0 , and it is well-known that hyperspheres become "hollow" when the dimension is high enough [38], and most points in a hypersphere tend to agglomerate towards its hypersurface. This has a consequence on distances that is quite intuitive and explains why the variance of distances tends to zero when the dimension increases, i.e., diversity of geometric configurations is increasingly limited as the dimension in space increases. Our result shows how fast this phenomenon impacts the distances between points inside the hypersurface when dimension increases.

5. Conclusions

We have developed a general and unified method to obtain the distribution of distances between two points randomly selected in a iud cloud of points in a geometric figure. These distributions are useful, especially in spatial statistics, to know the statistical representativeness (the weight) of a distance between two points. In the case of iud set of random points in a hypersphere, the expression of density is given for any dimension, and the variance of these distributions converge to zero when the dimension increases. This result also opens new perspectives in multidimensional analysis and data mining.

Author Contributions

Authors contribute equally to this work (conceptualization: M.S.; Formal analysis: S.L. and M.S.; Methodology: S.L.; Supervision: M.S.). All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Moltchanov, D. Distance distribution in random networks. Ad Hoc Netw. 2012, 10, 1146–1166. [Google Scholar] [CrossRef]
  2. Souris, M.; Demoraes, F. Improvement of Spatial Autocorrelation, Kernel Estimation, and Modeling Methods by Spatial Standardization on Distance. ISPRS Int. J. Geo-Inf. 2019, 8, 199. [Google Scholar] [CrossRef] [Green Version]
  3. Fotheringham, S.; Rogerson, P.A. (Eds.) The Sage Handbook of Spatial Analysis; Sage: London, UK, 2009. [Google Scholar]
  4. Shabenberger, O.; Gotway, C. Statistical Methods for Spatial Data Analysis; Chapman & Hall, CRC Press: Boca Raton, FL, USA, 2005. [Google Scholar]
  5. Souris, M. Epidemiology and Geography. Principles, Methods and Tools of Spatial Analysis; Wiley-ISTE: Fort Wayne, IN, USA, 2019. [Google Scholar]
  6. Stoyan, D.; Kendall, W.; Mecke, J. Stochastic Geometry and Its Applications, 2nd ed.; Wiley: Hoboken, NJ, USA, 1995. [Google Scholar]
  7. Pearson, K. The problem of the Random Walk. Nature 1905, 72, 318. [Google Scholar] [CrossRef]
  8. Thompson, H. Distribution of distance to nth neighbour in a population of randomly distributed individuals. Ecology 1956, 37, 391–394. [Google Scholar] [CrossRef]
  9. Borel, E. Principes et Formules Classiques du Calcul des Probabilités; GauthierVillars: Paris, France, 1924. [Google Scholar]
  10. Garwood, F. The variance of the overlap of geometrical figures with reference to a bombing problem. Biometrika 1947, 34, 1–17. [Google Scholar] [CrossRef] [PubMed]
  11. Hammersley, J.M. The distribution of Distance in a hypersphere. Ann. Math. Stat. 1950, 21, 447–452. [Google Scholar] [CrossRef]
  12. Lord, R.D. The Distribution of Distance in a Hypersphere. Ann. Math. Stat. 1954, 25, 794–798. [Google Scholar] [CrossRef]
  13. Alagar, V.S. The distribution of the distance between random points. J. Appl. Probab. 1976, 13, 558–566. [Google Scholar] [CrossRef]
  14. Barton, D.; David, E.; Fix, F. Random points in a circle and the analysis of chromosome patterns. Biometrika 1963, 50, 23–29. [Google Scholar] [CrossRef]
  15. Kuiper, J.H.; Paelinck, J.H. Frequency distribution of distances and related concepts. Geogr. Anal. 1982, 14, 253–259. [Google Scholar] [CrossRef]
  16. Thanh, L.M. Distribution Théorique des Distances Entre deux Points Répartis Uniformément sur le Territoire. In Les Déplacements Humains, Entretiens de Monaco en Sciences Humaines; Sutter, J., Ed.; Hachette: Paris, France, 1962; pp. 173–184. [Google Scholar]
  17. Courgeau, D. Migrations et découpages du territoire. Population 1973, 28, 511–537. [Google Scholar] [CrossRef]
  18. Courgeau, D.; Baccaïni, B. Migrations et distances. Population 1989, 42, 57–82. [Google Scholar] [CrossRef]
  19. Rogerson, P. Buffon’s needle and the estimation of migration distances. Math. Popul. Stud. 1990, 2, 229–238. [Google Scholar] [CrossRef] [PubMed]
  20. Cohen, J.; Courgeau, D. Modeling distances between humans using Taylors’s law and geometric probability. Math. Popul. Stud. 2017, 24, 197–218. [Google Scholar] [CrossRef]
  21. Tu, S.-J.; Fischbach, E. Random distance distribution for spherical objects: general theory and applications to physics. J. Phys. A Math. Gener. 2002, 35, 6557–6570. [Google Scholar] [CrossRef]
  22. Parry, M.; Fischbach, E. Probability distribution of distance in a uniform ellipsoid: theory and applications to physics. J. Math. Phys. 2000, 41, 2417–2433. [Google Scholar] [CrossRef] [Green Version]
  23. Crofton, M. Probability, in Encyclopaedia Britannica, 9th ed.; Britannica Inc.: Chicago, IL, USA, 1885. [Google Scholar]
  24. Miller, L. Distribution of link distances in a wireless network. J. Res. Natl. Inst. Stand Technol. 2001, 106, 401–412. [Google Scholar] [CrossRef]
  25. Kostin, A. Probability distribution of distance between pairs of nearest stations in wireless network. Electron. Lett. 2010, 46, 1299–1300. [Google Scholar] [CrossRef]
  26. Hsu, A. The Expected Distance between Two Random Points in a Polygon. Master’s Thesis, Massachussets Institute of Technology (MIT), Cambridge, MA, USA, 1990. [Google Scholar]
  27. Robbins, D.P.; Bolis, T.S. Solution to problem E2629: Average Distance between Two Points in a Box. Am. Math. Month. 1978, 85, 277–278. [Google Scholar] [CrossRef]
  28. Mathai, A.M.; Moschopoulos, P.; Pederzoli, G. Random points associated with rectangles. Rendiconti del Circolo Matematico Di Palermo 1999, 48, 162–190. [Google Scholar] [CrossRef]
  29. Mathai, A.; Moschopoulos, P.; Pederzoli, G. Distance between random points in a cube. Statistica 1999, 59, 61–81. [Google Scholar]
  30. Philip, J. The Probability Distribution of the Distance between Two Random Points in a Box; Department of Mathematics, Royal Institute of Technology: Stockholm, Sweden, 2007. [Google Scholar]
  31. Le Lionnais, F. Les Nombres Remarquables; Hermann: Paris, France, 1983. [Google Scholar]
  32. Weisstein, E.W. “Cube Line Picking” From MathWorld—A Wolfram Web Resource. Available online: http://mathworld.wolfram.com/CubeLinePicking.html (accessed on 1 April 2019).
  33. Philip, J. The Distance between Two Random Points in a 4- and 5-Cube; Department of Mathematics, Royal Institute of Technology: Stockholm, Sweden, 2008. [Google Scholar]
  34. Garet, O.; Kurtzmann, A. De L’intégration Aux Probabilités; Ellipses: Paris, France, 2011. [Google Scholar]
  35. Harris, J.W.; Stocker, H. Spherical Zone (Spherical Layer). In Handbook of Mathematics and Computational Science; Springer-Verlag: New York, NY, USA, 1998. [Google Scholar]
  36. Graham, R.L.; Knuth, D.E.; Patashnik, O. Concrete Mathematics: A Foundation for Computer Science, 2nd ed.; Addison-Wesley: Reading, MA, USA, 1994. [Google Scholar]
  37. Andrews, G.E.; Askey, R.; Roy, R. Special Functions; Cambridge University Press: London, UK, 1999. [Google Scholar]
  38. Berger, M.; Gostiaux, B. Géométrie Différentielle; Armand Colin: Paris, France, 1972. [Google Scholar]
Figure 1. Distance between two randomly chosen points in a set of points independently and uniformly distributed in a disk (as generated by a homogeneous Poisson process). Among all possible distances between points, how likely would this distance be? Would it be below or above average?
Figure 1. Distance between two randomly chosen points in a set of points independently and uniformly distributed in a disk (as generated by a homogeneous Poisson process). Among all possible distances between points, how likely would this distance be? Would it be below or above average?
Stats 03 00001 g001
Figure 2. Distribution of distances from values generated by a homogeneous Poisson process in [0,1] with density ρ = 1500 . In blue is the distribution of observed distances and in red is the theorical distribution f D ( t ) = 2 ( 1 t ) in [0,1].
Figure 2. Distribution of distances from values generated by a homogeneous Poisson process in [0,1] with density ρ = 1500 . In blue is the distribution of observed distances and in red is the theorical distribution f D ( t ) = 2 ( 1 t ) in [0,1].
Stats 03 00001 g002
Figure 3. (a) The distribution of distance on a square. The graph for density is in red when polynomial (5). The light blue bars show a simulation from values generated by a homogeneous Poisson process in [ 0 , 1 ] × [ 0 , 1 ] with density ρ = 1500 . (b) Densities in rectangles verifying a 2 + b 2 = 5 (this condition allows the curves to share the same domain of definition). It appears that the more symmetrical the rectangle is, the rarer the longest and shortest distances are.
Figure 3. (a) The distribution of distance on a square. The graph for density is in red when polynomial (5). The light blue bars show a simulation from values generated by a homogeneous Poisson process in [ 0 , 1 ] × [ 0 , 1 ] with density ρ = 1500 . (b) Densities in rectangles verifying a 2 + b 2 = 5 (this condition allows the curves to share the same domain of definition). It appears that the more symmetrical the rectangle is, the rarer the longest and shortest distances are.
Stats 03 00001 g003
Figure 4. Distribution of distances inside a circle (R = 1). In red the graph for the density (6), in light blue a simulated distribution from values generated by a homogeneous Poisson process with density ρ = 1500 inside the circle.
Figure 4. Distribution of distances inside a circle (R = 1). In red the graph for the density (6), in light blue a simulated distribution from values generated by a homogeneous Poisson process with density ρ = 1500 inside the circle.
Stats 03 00001 g004
Figure 5. (a) Distribution of distances inside a hypersphere ( R = 1 ) for different values of space dimension n . (b) Mean value of distance tends to R 2 while variance tends to 0 when n + (n = 50 in the graphs).
Figure 5. (a) Distribution of distances inside a hypersphere ( R = 1 ) for different values of space dimension n . (b) Mean value of distance tends to R 2 while variance tends to 0 when n + (n = 50 in the graphs).
Stats 03 00001 g005

Share and Cite

MDPI and ACS Style

Lellouche, S.; Souris, M. Distribution of Distances between Elements in a Compact Set. Stats 2020, 3, 1-15. https://doi.org/10.3390/stats3010001

AMA Style

Lellouche S, Souris M. Distribution of Distances between Elements in a Compact Set. Stats. 2020; 3(1):1-15. https://doi.org/10.3390/stats3010001

Chicago/Turabian Style

Lellouche, Solal, and Marc Souris. 2020. "Distribution of Distances between Elements in a Compact Set" Stats 3, no. 1: 1-15. https://doi.org/10.3390/stats3010001

Article Metrics

Back to TopTop