Next Article in Journal
Efficient Markets and Contingent Claims Valuation: An Information Theoretic Approach
Next Article in Special Issue
Integrated Information in the Spiking–Bursting Stochastic Model
Previous Article in Journal
Diauxic Growth at the Mesoscopic Scale
Previous Article in Special Issue
Minimum Spanning vs. Principal Trees for Structured Approximations of Multi-Dimensional Datasets
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Linear and Fisher Separability of Random Points in the d-Dimensional Spherical Layer and Inside the d-Dimensional Cube

Institute of Information Technologies, Mathematics and Mechanics, Lobachevsky State University, 603950 Nizhni Novgorod, Russia
*
Author to whom correspondence should be addressed.
Entropy 2020, 22(11), 1281; https://doi.org/10.3390/e22111281
Submission received: 21 October 2020 / Revised: 8 November 2020 / Accepted: 10 November 2020 / Published: 12 November 2020

Abstract

:
Stochastic separation theorems play important roles in high-dimensional data analysis and machine learning. It turns out that in high dimensional space, any point of a random set of points can be separated from other points by a hyperplane with high probability, even if the number of points is exponential in terms of dimensions. This and similar facts can be used for constructing correctors for artificial intelligent systems, for determining the intrinsic dimensionality of data and for explaining various natural intelligence phenomena. In this paper, we refine the estimations for the number of points and for the probability in stochastic separation theorems, thereby strengthening some results obtained earlier. We propose the boundaries for linear and Fisher separability, when the points are drawn randomly, independently and uniformly from a d-dimensional spherical layer and from the cube. These results allow us to better outline the applicability limits of the stochastic separation theorems in applications.

1. Introduction

It is generally accepted that the modern information world is the world of big data. However, some of the implications of the advent of the big data era remain poorly understood. In his “millennium lecture”, D. L. Donoho [1] described the post-classical world in which the number of features d is much greater than the sample size n: d n . It turns out that many phenomena of the post-classical world are already observed if d log n , or, more precisely, when ID log n , where ID is the intrinsic dimensionality of the data [2]. Classical methods of data analysis and machine learning become of little use in such a situation, because usually they require huge amounts of data. Such an unlimited appetite of classical approaches for data is usually considered as a phenomenon of the “curse of dimensionality”. However, the properties ID n or ID log n themselves are neither a curse nor a blessing, and can be beneficial.
One of the “post-classical” phenomena is stochastic separability [3,4,5]. If the dimensionality of data is high, then under broad assumptions any sample of the data set can be separated from the rest by a hyperplane (or even Fisher discriminant—as a special case) with a probability close to 1 even the number of samples is exponential in terms of dimensions. Thus, high-dimensional datasets exhibit fairly simple geometric properties.
Recently, stochastic separation theorems have been widely used in machine learning for constructing correctors and ensembles of correctors of artificial intelligence systems [6,7], for determining the intrinsic dimensionality of data sets [8,9], for explaining various natural intelligence phenomena, such as grandmother’s neuron [10,11].
In its usual form a stochastic separation theorem is formulated as follows. A random n-element set in R d is linearly separable with probability p > 1 ϑ , if n < a e b d . The exact form of the exponential function depends on the probability distribution that determines how the random set is drawn, and on the constant ϑ ( 0 < ϑ < 1 ). In particular, uniform distributions with different support are considered in [5,12,13,14]. Wider classes of distributions (including non-i.i.d.) are considered in [7]. Roughly speaking, these classes consist of distributions without sharp peaks in sets with exponentially small volume. Estimates for product distributions in the cube and the standard normal distribution are obtained in [15]. General stochastic separation theorems with optimal bounds for important classes of distributions (log-concave distribution, their convex combinations and product distributions) are proposed in [2].
We note that there are many algorithms for constructing a functional separating a point from all other points in a data set (Fisher linear discriminant, linear programming algorithm, support vector machine, Rosenblatt perceptron, etc.). Among all these methods the computationally cheapest is Fisher discriminant analysis [6]. Other advantages of the Fisher discriminant analysis are its simplicity and the robustness.
The papers [5,6,7,12] deal with only Fisher separability, whereas [13,14] considered a (more general) linear separability. A comparison of the estimations for linear and Fisher separability allows us to clarify the applicability boundary of these methods, namely, to answer the question of what d and n are sufficient in order to use only Fisher separability and so that there is no need to search a more sophisticated linear discriminant.
In [13,14], there were obtained estimates for the cardinality of the set of points that guarantee its linear separability when the points are drawn randomly, independently and uniformly from a d-dimensional spherical layer and from the unit cube. These results give more accurate estimates than the bounds obtained in [5,12] for Fisher separability.
Our interest in the study of the linear separability in spherical layers is explained, among other reasons, by the possibility of applying our results to determining the intrinsic dimension of data. After applying PCA to the data points for the selection of the major components and subsequent whitening we can map them to a spherical layer of a given thickness. If the intrinsic dimensionality of the initial set of n points is ID, then we expect that the separability properties of the resulting set of points are similar to the properties of uniformly distributed n points in dimension d. In particular, we can use the theoretical estimates for the separation probability to estimate ID (cf. [8,9]).
Here we give even more precise estimations for the number of points in the spherical layer to guarantee their linear separability. We also consider the case of linear separability of random points inside a cube in more detail than it was done in [13]. In particular, we give estimates for the probability of separability of one point. We also report results of computational experiments comparing the theoretical estimations for the probability of the linear and Fisher separabilities with the corresponding experimental frequencies and discuss them.

2. Definitions

A point X R d is linearly separable from a set M R d if there exists a hyperplane separated X from M; i.e., there exists A X R d such that ( A X , X ) > ( A X , Y ) for all Y M .
A point X R d is Fisher separable from the set M R d if ( X , Y ) < ( X , X ) for all Y M [6,7].
A set of points { X 1 , , X n } R d is called linearly separable [5] or 1-convex [3] if any point X i is linearly separable from all other points in the set, or in other words, the set of vertices of their convex hull, conv ( X 1 , , X n ) , coincides with { X 1 , , X n } . The set { X 1 , , X n } is called Fisher separable if ( X i , X j ) < ( X i , X i ) for all i, j, such that i j [6,7].
Fisher separability implies linear separability but not vice versa (even if the set is centered and normalized to unit variance). Thus, if M R d is a random set of points from a certain probability distribution, then the probability that M is linearly separable is not less than the probability that M is Fisher separable.
Denote by B d = { X R d : X 1 } the d-dimensional unit ball centered at the origin ( X means Euclidean norm), r B d is the d-dimensional ball of radius r < 1 centered at the origin and Q d = [ 0 , 1 ] d is the d-dimensional unit cube.
Let M n = { X 1 , , X n } be the set of points chosen randomly, independently, according to the uniform distribution on the ( 1 r )-thick spherical layer B d \ r B d , i.e., on the unit ball with spherical cavity of radius r. Denote by P ( d , r , n ) the probability that M n is linearly separable, and by P F ( d , r , n ) the probability that M n is Fisher separable. Denote by P 1 ( d , r , n ) the probability that a random point chosen according to the uniform distribution on B d \ r B d is separable from M n , and by P 1 F ( d , r , n ) the probability that a random point is Fisher separable from M n .
Now let M n = { X 1 , , X n } be the set of points chosen randomly, independently, according to the uniform distribution on the cube Q d . Let P ( d , n ) and P F ( d , n ) denote the probabilities that M n is linearly separable and Fisher separable, respectively. Let P 1 ( d , n ) and P 1 F ( d , n ) denote the probabilities that a random point chosen according to the uniform distribution on Q d is separable and Fisher separable from M n , respectively.

3. Previous Results

3.1. Random Points in a Spherical Layer

In [5] it was shown (among other results) that for all r, ϑ , n and d, where 0 < r < 1 , 0 < ϑ < 1 , d N , if
n < r 1 r 2 d 1 + 2 ϑ ( 1 r 2 ) d / 2 r 2 d 1 ,
then n points chosen randomly, independently, according to the uniform distribution on B d \ r B d are Fisher separable with a probability greater than 1 ϑ , i.e., P F ( d , r , n ) > 1 ϑ .
The following statements concerning the Fisher separability of random points in the spherical layer are proved in [12].
  • For all r, where 0 < r < 1 , and for any d N
    P 1 F ( d , r , n ) > ( 1 r d ) 1 ( 1 r 2 ) d / 2 2 n .
  • For all r, ϑ , where 0 < r < 1 , 0 < ϑ < 1 , and for sufficiently large d, if
    n < ϑ ( 1 r 2 ) d / 2 ,
    then P 1 F ( d , r , n ) > 1 ϑ .
  • For all r, where 0 < r < 1 , and for any d N
    P F ( d , r , n ) > ( 1 r d ) 1 ( n 1 ) ( 1 r 2 ) d / 2 2 n .
  • For all r, ϑ , where 0 < r < 1 , 0 < ϑ < 1 and for sufficiently large d, if
    n < ϑ ( 1 r 2 ) d / 4 ,
    then P F ( d , r , n ) > 1 ϑ .
The authors of [5,12] formulate their results for linearly separable sets of points, but in fact in the proofs they used that the sets are only Fisher separable.
Note that all estimates (1)–(5) require 0 < r < 1 with strong inequality. This means that they are inapplicable for (maybe the most interesting) case r = 0 , i.e., for the unit ball with no cavities.
A reviewer of the original version of the article drew our attention that for r = 0 better results are obtained in [6,15]. Specifically,
P 1 F ( d , 0 , n ) 1 n 2 d + 1 ,
P F ( d , 0 , n ) 1 n ( n 1 ) 2 d + 1 > 1 n 2 2 d + 1 ,
and P 1 F ( d , 0 , n ) > 1 ϑ provided that n < ϑ · 2 d + 1 . See details in Section 4.4.
The both estimates (1) and (5) are exponentially dependent on d for fixed r, ϑ and the estimate (1) is weaker than (5).
The following results concerning the linear separability of random points in the spherical layer were obtained in [14]:
  • For all r, where 0 r < 1 , and for any d N
    P 1 ( d , r , n ) > 1 n 2 d .
  • For all r, ϑ , where 0 r < 1 , 0 < ϑ < 1 , and for any d N , if
    n < ϑ 2 d ,
    then P 1 ( d , r , n ) > 1 ϑ .
  • For all r, where 0 r < 1 , and for any d N
    P ( d , r , n ) > 1 n ( n 1 ) 2 d .
  • For all r, ϑ , where 0 r < 1 , 0 < ϑ < 1 , and for any d, if
    n < ϑ 2 d ,
    then P ( d , r , n ) > 1 ϑ .
We note that the bounds (8)–(11) do not depend on r. We remove this drawback in this paper, giving more accurate estimates (see Theorems 1 and 3 and Corollaries 1 and 2).

3.2. Random Points Inside a Cube

In [5], a product distribution in the Q d is considered. Let the coordinates of a random point X = ( x 1 , , x d ) Q d be independent random variables with variances σ i 2 > σ 0 2 > 0 ( i = 1 , , d ) . In [5], it is shown that for all ϑ and n, where 0 < ϑ < 1 , if
n < ϑ e 0.5 d σ 0 4 3 ,
then M n is Fisher separable with a probability greater than 1 ϑ . As above, the authors of [5] formulate their result for the linearly separable case, but in fact they used only the Fisher separability.
If all random variables x 1 , , x d have the uniform distribution on the segment [ 0 , 1 ] then σ 0 2 = 1 12 . Thus, the inequality (12) takes the form
n < ϑ e d / 288 3 .
We obtain that if n satisfies (13), then P F ( d , n ) > 1 ϑ .
In [13], it was shown that if we want to guarantee only the linear separability, then the bound (13) can be increased. Namely, if
n < ϑ c d d + 1 , c = 1.18858 ,
then P ( d , n ) > 1 ϑ . Here we give related estimates including ones for the linear separability of one point (see Theorems 5 and 6 and Corollary 3).
We note that better (and in fact asymptotically optimal) estimates for the Fisher separability in the unit cube are derived in [15]. The papers [13,15] were submitted to the same conference, so these results were derived in parallel and independently. Corollary 7 in [15] states that n points are Fisher separable with probability greater than 1 ϑ provided only that n < ϑ e γ d for γ = 0.23319 See details in Section 5.

4. Random Points in a Spherical Layer

4.1. The Separability of One Point

The theorem below gives the probability of the linear separability of a random point from a random n-element set M n = { X 1 , , X n } in B d \ r B d . The proof develops an approach borrowed from [3,16].
The regularized incomplete beta function is defined as I x ( a , b ) = B ( x ; a , b ) B ( a , b ) , where
B ( a , b ) = 0 1 t a 1 ( 1 t ) b 1 d t , B ( x ; a , b ) = 0 x t a 1 ( 1 t ) b 1 d t
are beta function and incomplete beta function, respectively (see [17]).
Theorem 1.
Let 0 r < 1 , α = 4 r 2 ( 1 r 2 ) , β = 1 r 2 , d N . Then
(1) 
for 0 r 1 2
P 1 ( d , r , n ) > 1 n · 1 0.5 I α d + 1 2 , 1 2 + ( 2 r ) d · I β d + 1 2 , 1 2 2 d ( 1 r d ) ;
(2) 
for 1 2 r < 1
P 1 ( d , r , n ) > 1 n · 0.5 I α d + 1 2 , 1 2 ( 2 r ) d · I β d + 1 2 , 1 2 2 d ( 1 r d ) .
Proof. 
A random point Y is linearly separable from M n = { X 1 , , X n } if and only if Y conv ( M n ) . Denote this event by C . Thus, P 1 ( d , r , n ) = P ( C ) . Let us find the upper bound for the probability of the event C ¯ . This event means that the point Y belongs to the convex hull of M n . Since the points in M n have the uniform distribution, then the probability of C ¯ is
P ( C ¯ ) = Vol ( conv ( M n ) \ ( conv ( M n ) r B d ) ) Vol ( B d ) Vol ( r B d ) .
First, estimate the numerator of this fraction. We denote by S i the ball with center at the origin, with the diameter 1, and the point X i lies on this diameter (see Figure 1). Then
conv ( M n ) \ ( conv ( M n ) r B d ) i = 1 n S i \ ( S i r B d ) = W
and
Vol ( conv ( M n ) \ ( conv ( M n ) r B d ) ) Vol ( W ) i = 1 n Vol S i \ ( S i r B d )
= i = 1 n ( Vol ( S i ) Vol ( S i r B d ) ) = n ( Vol ( S 1 ) Vol ( S 1 r B d ) )
= n γ d 1 2 d Vol ( S 1 r B d ) ,
where γ d is the volume of a ball of radius 1. Hence
P ( C ¯ ) n γ d 1 2 d Vol ( S 1 r B d ) γ d ( 1 r d ) .
Now find Vol ( S 1 r B d ) . It is obvious that Vol ( S 1 r B d ) is equal to the sum of the volumes of two spherical caps. We denote by Cap ( R , H ) the volume of a spherical cap of height H of a ball of radius R . It is known [18] that
Cap ( R , H ) = 1 2 γ d R d I ( 2 R H H 2 ) / R 2 d + 1 2 , 1 2
if 0 H R .
Consider two cases: 0 r 1 2 and 1 2 r < 1 (see Figure 2)
Case 1 If 0 r 1 2 , then the centers of the balls S 1 , S 2 , , S n are inside of the spherical caps of height h of the ball r B d (see the left picture on Figure 2). Therefore, the following equalities are true:
r 2 ( r h ) 2 = 1 2 2 r h 1 2 2 ,
r 2 ( r h ) 2 = ( r h ) 2 + ( r h ) ,
h = r r 2 ,
V 1 = Cap 1 2 , r h = Cap 1 2 , r 2 , V 2 = Cap ( r , h ) = Cap ( r , r r 2 ) .
If R = 1 2 , H = r 2 , then ( 2 R H H 2 ) / R 2 = 4 r 2 ( 1 r 2 ) = α , hence
V 1 = 1 2 γ d 1 2 d I α d + 1 2 , 1 2 .
If R = r , H = r r 2 , then ( 2 R H H 2 ) / R 2 = 2 H / R ( H / R ) 2 = 2 ( 1 r ) ( 1 r ) 2 = 1 r 2 = β , hence
V 2 = 1 2 γ d r d I β d + 1 2 , 1 2 .
Thus,
Vol ( S 1 r B d ) = V 1 + V 2 = γ d 1 2 1 2 d I α d + 1 2 , 1 2 + 1 2 r d I β d + 1 2 , 1 2 .
Hence
P ( C ) = 1 P ( C ¯ ) 1 n γ d 1 2 d Vol ( S 1 r B d ) γ d ( 1 r d )
= 1 n · 1 0.5 I α ( d + 1 2 , 1 2 ) + ( 2 r ) d · I β ( d + 1 2 , 1 2 ) 2 d ( 1 r d ) .
Case 2 If 1 2 r < 1 , then the centers of the balls S 1 , S 2 , , S n are outside of the spherical caps of height h of the ball r B d (see the right picture on Figure 2). Therefore, the following equalities are true:
r 2 ( r h ) 2 = 1 2 2 r h 1 2 2 ,
r 2 ( r h ) 2 = ( r h ) 2 + ( r h ) ,
h = r r 2 ,
V 1 = Vol 1 2 B d Cap 1 2 , 1 ( r h ) = Vol 1 2 B d Cap 1 2 , 1 r 2 .
If R = 1 2 , H = 1 r 2 , then ( 2 R H H 2 ) / R 2 = 4 r 2 ( 1 r 2 ) ; hence,
V 1 = γ d 1 2 d 1 2 γ d 1 2 d I α d + 1 2 , 1 2 ,
where α = 4 r 2 ( 1 r 2 ) ,
V 2 = Cap ( r , h ) = Cap ( r , r r 2 ) = 1 2 γ d r d I β d + 1 2 , 1 2 ,
where β = 1 r 2 . Thus,
Vol ( S 1 r B d ) = V 1 + V 2 = γ d 1 2 d 1 2 1 2 d I α d + 1 2 , 1 2 + 1 2 r d I β d + 1 2 , 1 2 .
Hence
P ( C ) = 1 P ( C ¯ ) 1 n γ d 1 2 d Vol ( S 1 r B d ) γ d ( 1 r d ) = 1 n · 0.5 I α ( d + 1 2 , 1 2 ) ( 2 r ) d · I β ( d + 1 2 , 1 2 ) 2 d ( 1 r d ) .
The estimates (14) and (15) for P 1 ( d , r , n ) are monotonically increasing in both d and r and decreasing in n, which corresponds to the behavior of the probability P 1 ( d , r , n ) itself (see Figure 3 and Figure 4). On the contrary, the estimate (3) for the probability P 1 F ( d , r , n ) is nonmonotonic in r (see Figure 5).
Note that the estimates (14), (15) obtained in Theorem 1 are quite accurate (in the sense that they are close to empirical values), as is illustrated with Figure 4. The experiment also shows that the probabilities P 1 ( d , r , n ) and P 1 F ( d , r , n ) (more precisely, the corresponding frequencies) are quite close to each other, but there is a certain gap between them.
The following corollary gives an estimate for the number of points n guaranteeing the linear separability of a random point from a random n-element set M n in B d \ r B d with probability close to 1.
Corollary 1.
Let 0 < ϑ < 1 , α = 4 r 2 ( 1 r 2 ) , β = 1 r 2 , d N . If
(1) 
n < N 1 ( d , r , ϑ ) = ϑ 2 d ( 1 r d ) 1 0.5 I α ( d + 1 2 , 1 2 ) + ( 2 r ) d · I β ( d + 1 2 , 1 2 ) , 0 r 1 2
or
(2) 
n < N 2 ( d , r , ϑ ) = ϑ 2 d ( 1 r d ) 0.5 I α ( d + 1 2 , 1 2 ) ( 2 r ) d · I β ( d + 1 2 , 1 2 ) , 1 2 r < 1 ,
then P 1 ( d , r , n ) > 1 ϑ .
The theorem below establishes asymptotic estimates.
Theorem 2.
(1) 
If 0 r < 1 2 then
N 1 ( d , r , ϑ ) ϑ 2 d .
(2) 
If r = 1 2 then
N 1 ( d , r , ϑ ) = N 2 ( d , r , ϑ ) ϑ 2 d + 1 .
(3) 
If 1 2 < r < 1 then
N 2 ( d , r , ϑ ) ϑ 2 π · r ( 2 r 2 1 ) 1 r 2 · d + 1 · 1 r 1 r 2 d .
Proof. 
The paper [19] gives the following asymptotic expansion for the incomplete beta function
B ( x ; a , b ) x a a k = 0 f k ( b , x ) a k for 0 x < 1 , a
and
f k ( b , x ) = d k d w k ( 1 x e w ) b 1 w = 0 .
Since f 0 ( b , x ) = ( 1 x ) b 1 then
B ( x ; a , b ) x a a ( 1 x ) b 1 + x a a k = 1 f k ( b , x ) a k x a a ( 1 x ) b 1 for b , x fixed , a .
Since B ( a , b ) Γ ( b ) a b for b fixed and a , then
I x ( a , b ) = B ( x ; a , b ) B ( a , b ) x a ( 1 x ) b 1 a 1 b Γ ( b )
for b , x fixed and a .
We have x = α = 4 r 2 ( 1 r 2 ) or x = β = 1 r 2 and a = d + 1 2 , b = 1 2 ; hence,
I α d + 1 2 , 1 2 2 α d + 1 2 π d + 1 1 4 r 2 + 4 r 4 = 2 π · 1 | 1 2 r 2 | · α d + 1 2 d + 1 ,
( 2 r ) d I β d + 1 2 , 1 2 ( 2 r ) d 2 ( 1 r 2 ) d + 1 r π d + 1 = 2 π · 1 2 r 2 · α d + 1 2 d + 1 .
If r = 0 , then α = 0 , β = 1 ; hence, N 1 ( d , r , ϑ ) ϑ 2 d .
If 0 < r < 1 2 , then 0 < α < 1 ; hence,
I α d + 1 2 , 1 2 + ( 2 r ) d I β d + 1 2 , 1 2 0
and
N 1 ( d , r , ϑ ) ϑ 2 d .
If r = 1 2 , then α = 1 , β = 1 2 ; hence,
N 1 ( d , r , ϑ ) = N 2 ( d , r , ϑ ) ϑ 2 d ( 1 r d ) 0.5 1 2 π · 1 d + 1 ϑ 2 d + 1 .
If 1 2 < r < 1 , then 0 < α < 1 ; hence,
I α d + 1 2 , 1 2 ( 2 r ) d I β d + 1 2 , 1 2 2 π · 1 2 r 2 ( 2 r 2 1 ) · α d + 1 2 d + 1 = 2 π · 1 r 2 r ( 2 r 2 1 ) · 2 d ( r 1 r 2 ) d d + 1
and
N 2 ( d , r , ϑ ) = ϑ 2 d ( 1 r d ) 0.5 I α ( d + 1 2 , 1 2 ) ( 2 r ) d · I β ( d + 1 2 , 1 2 ) ϑ 2 d 0.5 2 π · 1 r 2 r ( 2 r 2 1 ) · 2 d ( r 1 r 2 ) d d + 1
= ϑ 2 π · r ( 2 r 2 1 ) 1 r 2 · d + 1 · 1 r 1 r 2 d .

4.2. Separability of a Set of Points

The theorem below gives the probability of the linear separability of a random n-element set M n in B d \ r B d .
Theorem 3.
Let 0 r < 1 , α = 4 r 2 ( 1 r 2 ) , β = 1 r 2 and d , n N . Then
(1) 
for 0 r 1 2
P ( d , r , n ) > 1 n ( n 1 ) · 1 0.5 I α ( d + 1 2 , 1 2 ) + ( 2 r ) d · I β ( d + 1 2 , 1 2 ) 2 d ( 1 r d ) ;
(2) 
for 1 2 r < 1
P ( d , r , n ) > 1 n ( n 1 ) · 0.5 I α ( d + 1 2 , 1 2 ) ( 2 r ) d · I β ( d + 1 2 , 1 2 ) 2 d ( 1 r d ) .
Proof. 
Denote by A n the event that M n is linearly separable and denote by C i the event that X i conv ( M n \ { X i } ) ( i = 1 , , n ). Thus, P ( d , r , n ) = P ( A n ) . Clearly, A n = C 1 C n and P ( A n ) = P ( C 1 C n ) = 1 P ( C ¯ 1 C ¯ n ) 1 i = 1 n P ( C ¯ i ) . Let us find an upper bound for the probability of the event C ¯ i . This event means that the point X i belongs to the convex hull of the remaining points, i.e., X i conv ( M n \ { X i } ) . In the proof of the previous theorem, it was shown that if 0 r 1 2 , then
P ( C ¯ i ) ( n 1 ) · 1 0.5 I α ( d + 1 2 , 1 2 ) + ( 2 r ) d · I β ( d + 1 2 , 1 2 ) 2 d ( 1 r d ) ( i = 1 , , n ) ;
and if 1 2 r < 1 , then
P ( C ¯ i ) ( n 1 ) · 0.5 I α ( d + 1 2 , 1 2 ) ( 2 r ) d · I β ( d + 1 2 , 1 2 ) 2 d ( 1 r d ) ( i = 1 , , n ) .
Therefore, using the inequality
P ( A n ) 1 i = 1 n P ( C ¯ i )
we obtain what is required. □
The graphs of the estimates (16), (17) and corresponding frequencies in 60 trials for n = 1000 and n = 10,000 points are shown in Figure 6 and Figure 7, respectively. The experiment shows that our estimates are quite accurate and close to the corresponding frequencies.
Another important conclusion from the experiment is as follows. Despite the fact that the estimates for both probabilities P F ( d , r , n ) and P ( d , r , n ) and corresponding frequencies are close to 1 for sufficiently big d, the "threshold values" for such a big d differ greatly. In other words, the blessing of dimensionality when using linear discriminants comes noticeably earlier than if we only use Fisher discriminants. This is achieved at the cost of constructing the usual linear discriminant in comparison with the Fisher one.
The following corollary gives an estimate for the number of points n guaranteeing the linear separability of a random n-element set M n in B d \ r B d with probability close to 1.
Corollary 2.
Let 0 < ϑ < 1 , α = 4 r 2 ( 1 r 2 ) , β = 1 r 2 . If
(1) 
0 r 1 2 a n d n < N 1 ( d , r , ϑ ) = ϑ 2 d ( 1 r d ) 1 0.5 I α ( d + 1 2 , 1 2 ) + ( 2 r ) d · I β ( d + 1 2 , 1 2 )
or
(2) 
1 2 r < 1 a n d n < N 2 ( d , r , ϑ ) = ϑ 2 d ( 1 r d ) 0.5 I α ( d + 1 2 , 1 2 ) ( 2 r ) d · I β ( d + 1 2 , 1 2 ) ,
then P ( d , r , n ) > 1 ϑ .
The theorem below establishes asymptotic estimates for the number of points guaranteeing the linear separability with probability greater than 1 ϑ .
Theorem 4.
(1) 
If 0 r < 1 2 then
N 1 ( d , r , ϑ ) ϑ 2 d / 2 .
(2) 
If r = 1 2 then
N 1 ( d , r , ϑ ) = N 2 ( d , r , ϑ ) ϑ 2 ( d + 1 ) / 2 .
(3) 
If 1 2 < r < 1 then
N 2 ( d , r , ϑ ) ϑ 2 π 4 · r ( 2 r 2 1 ) 1 r 2 4 · d + 1 4 · 1 r 1 r 2 d / 2 .

4.3. Comparison of the Results

Let us show that the new estimates (16) and (17) for linear separability tend to be 1 faster than the estimate (4) in [12] for Fisher separability.
Statement 1.
Let 0 < r < 1 , α = 4 r 2 ( 1 r 2 ) , β = 1 r 2 and d , n N ,
f 1 = n ( n 1 ) · 1 0.5 I α ( d + 1 2 , 1 2 ) + ( 2 r ) d · I β ( d + 1 2 , 1 2 ) 2 d ( 1 r d ) ,
f 2 = n ( n 1 ) · 0.5 I α ( d + 1 2 , 1 2 ) ( 2 r ) d · I β ( d + 1 2 , 1 2 ) 2 d ( 1 r d ) ,
g = 1 ( 1 r d ) 1 ( n 1 ) ( 1 r 2 ) d / 2 2 n .
For r and n fixed
(1) 
if 0 < r < 1 2 , then
g f 1 1 2 ( 4 4 r 2 ) d / 2 ;
(2) 
if r = 1 2 , then
g f 1 = g f 2 n + 1 n 1 · 2 d / 2 ;
(3) 
if 1 2 < r < 1 , then
g f 2 2 π · r ( 2 r 2 1 ) ( n 1 ) 1 r 2 · d + 1 · 1 1 r 2 d / 2 .
Proof. 
If 0 < r < 1 2 , then g n ( n 1 ) 2 ( 1 r 2 ) d / 2 and f 1 n ( n 1 ) 2 d (see the proof of Theorem 2); hence,
g f n ( n 1 ) 2 ( 1 r 2 ) d / 2 n ( n 1 ) 2 d = 1 2 ( 4 4 r 2 ) d / 2 , as 4 4 r 2 > 2 .
If r = 1 2 , then g n ( n + 1 ) 2 1 2 d / 2 and f 1 = f 2 n ( n 1 ) 2 d + 1 (see the proof of Theorem 2); hence,
g f 1 = g f 1 n ( n + 1 ) 2 1 2 d / 2 n ( n 1 ) 2 d + 1 = n + 1 n 1 · 2 d / 2 .
If 1 2 < r < 1 , then g n r d and f 2 n ( n 1 ) 2 π · r ( 2 r 2 1 ) 1 r 2 · d + 1 · 1 r 1 r 2 d (see the proof of Theorem 2), hence
g f 2 n r d 2 π · r ( 2 r 2 1 ) 1 r 2 · d + 1 · 1 r 1 r 2 d n ( n 1 ) = 2 π · r ( 2 r 2 1 ) ( n 1 ) 1 r 2 · d + 1 · 1 1 r 2 d / 2 .
 □
Now let us compare the estimates for the number of points that guarantee the linear and Fisher separabilities of random points in the spherical layer obtained in Corollary 2 and in [12], respectively. The estimate in Corollary 2 for the number of points guaranteeing the linear separability tends to ∞ faster than the estimate (5), guaranteeing the Fisher separability for all 0 < r < 1 .
Statement 2.
Let f 1 = N 1 ( d , r , ϑ ) , f 2 = N 2 ( d , r , ϑ ) , g = ϑ ( 1 r 2 ) d / 4 , 0 < r < 1 , 0 < ϑ < 1 , d N . For r and ϑ fixed
(1) 
if 0 < r < 1 2 , then
f 1 g ( 2 1 r 2 ) d / 2 ;
(2) 
if r = 1 2 , then
f 1 g = f 2 g 2 ( d + 2 ) / 4 ;
(3) 
if 1 2 < r < 1 , then f 2 g 2 π · r ( 2 r 2 1 ) 1 r 2 · ( d + 1 ) 1 / 4 · 1 r d / 2 .
Proof. 
If 0 < r < 1 2 then f 1 g ϑ 2 d ( 1 r 2 ) d / 4 ϑ = ( 2 1 r 2 ) d / 2 .
If r = 1 2 , then f 1 = f 2 ϑ 2 d + 1 and g = ϑ 2 d / 4 ; hence, f 1 g = f 2 g ϑ 2 d + 1 ϑ 2 d / 4 = 2 ( d + 2 ) / 4 .
If 1 2 < r < 1 , then f 2 ϑ 2 π · r ( 2 r 2 1 ) 1 r 2 · d + 1 · 1 r 1 r 2 d ; hence,
f 2 g ϑ 2 π · r ( 2 r 2 1 ) 1 r 2 · ( d + 1 ) 1 / 4 · 1 r 2 ( 1 r 2 ) d / 4 ( 1 r 2 ) d / 4 ϑ
= 2 π · r ( 2 r 2 1 ) 1 r 2 · ( d + 1 ) 1 / 4 · 1 r d / 2 .
 □

4.4. A Note about Random Points Inside the Ball ( r = 0 )

A reviewer of the original version of the article drew our attention to the fact that for the uniform distribution inside the ball (case r = 0 ), better results are known. Specifically, let p ¯ x y F be the probability that i.i.d. points x, y inside the ball are not Fisher separable. Let I x y be the indicator function of this event. Then
p ¯ x y F = E [ I x y ] = E [ E [ I x y y ] ] = E [ p ¯ y ] ,
where p ¯ y denotes the probability that x is not Fisher separable from a given point y. In [6] (also discussed in [15]), there is a proof that E [ p ¯ y ] = 1 / 2 d + 1 . In the notation of our paper, this implies that
P 1 F ( d , 0 , n ) 1 n 2 d + 1 , P F ( d , 0 , n ) 1 n ( n 1 ) 2 d + 1 > 1 n 2 2 d + 1 ,
and P 1 F ( d , 0 , n ) > 1 ϑ provided that n < ϑ · 2 d + 1 . This improves the estimate in Theorem 2 for the case r = 0 twice. Note that the same estimate n < ϑ · 2 d + 1 was derived for r = 1 2 (see Theorem 2). The reviewer conjectured that estimate n < ϑ · 2 d derived in this paper could be improved twice for the whole range r 0 , 1 2 . The experimental results give support for this hypothesis (see Figure 4, Figure 5, Figure 6 and Figure 7).

5. Random Points Inside a Cube

Consider a set of points M n = { X 1 , , X n } choosing randomly, independently and according to the uniform distribution on the d-dimensional unit cube Q d .
Theorem 5.
Let d , n N . Then
P 1 ( d , n ) > 1 n ( d + 1 ) c d , c = 1.18858
Proof. 
A random point Y is linearly separable from M n = { X 1 , , X n } if and only if Y conv ( M n ) . Denote this event by C . Thus, P 1 ( d , n ) = P ( C ) . Let us find the upper bound for the probability of the event C ¯ . This event means that the point Y belongs to the convex hull of M n . Since the points in M n have the uniform distribution, the probability of C ¯ is
P ( C ¯ ) = Vol conv ( M n ) Vol ( Q d ) = Vol conv ( M n ) .
In [20] it is proved that the upper bound for the maximal volume of the convex hull of k points placed in Q d is k ( d + 1 ) c d , where c = 1.18858 . Thus, Vol conv ( Y 1 , , Y k ) < k ( d + 1 ) c d so
P ( C ¯ ) = Vol conv ( M n )   < n ( d + 1 ) c d .
and
P 1 ( d , n ) = P ( C ) = 1 P ( C ¯ ) > 1 n ( d + 1 ) c d .
 □
Corollary 3.
Let 0 < ϑ < 1 ,
n < ϑ c d d + 1 , c = 1.18858
Then P 1 ( d , n ) > 1 ϑ .
Theorem 6.
Let d , n N . Then
P ( d , n ) > 1 n ( n 1 ) ( d + 1 ) c d , c = 1.18858 .
Proof. 
Denote by A n the event that M n is linearly separable and denote by C i the event that X i conv ( M n \ { X i } ) ( i = 1 , , n ). Thus, P ( d , n ) = P ( A n ) . Clearly A n = C 1 C n and P ( A n ) = P ( C 1 C n ) = 1 P ( C ¯ 1 C ¯ n ) 1 i = 1 n P ( C ¯ i ) . Let us find the upper bound for the probability of the event C ¯ i . This event means that the point X i belongs to the convex hull of the remaining points, i.e., X i conv ( M n \ { X i } ) . In the proof of the previous theorem, it was shown that
P ( C ¯ i ) ( n 1 ) ( d + 1 ) c d , c = 1.18858 ( i = 1 , , n ) .
Hence
P ( A n ) 1 i = 1 n P ( C ¯ i ) 1 n ( n 1 ) ( d + 1 ) c d .
 □
Corollary 4.
[13] Let 0 < ϑ < 1 ,
n < ϑ c d d + 1 , c = 1.18858 .
Then P ( d , n ) > 1 ϑ .
We note that the estimate (21) for the number of points guaranteeing the linear separability tends to be ∞ faster than the estimate (13), guaranteeing the Fisher separability because
ϑ c d d + 1 ϑ e d / 288 3 = 3 d + 1 c e 1 288 d , as d ,
since c / e 1 288 1.18446 .
However better (and in fact asymptotically optimal) estimates for the Fisher separability in the unit cube are derived in [15]. Corollary 7 in [15] states that n points are Fisher separable with probability greater than 1 ϑ provided only that n < ϑ e γ d for γ = 0.23319 . This can be written as n < ϑ c d for c = e 2 γ = 1.59421 . Thus,
P 1 F ( d , n ) > 1 n exp ( 2 γ d ) = 1 n c d ,
P F ( d , n ) > 1 n 2 c d .
Theorem 6 and Corollary 4 in our paper state the same results with c = 1.18858 , and for just linear separability instead of Fisher separability. However, [13,15] were submitted to the same conference, so these results were derived in parallel and independently.
The bounds (18) and (20) for the probabilities and corresponding frequencies are presented in Figure 8 and Figure 9.

6. Subsequent Work

In a recent paper [2], explicit and asymptotically optimal estimates of Fisher separation probabilities for spherically invariant distribution (e.g., the standard normal and the uniform distributions) were obtained. Theorem 14 in [2] generalizes the results presented here. Since [2] was submitted to the arxiv later, we did not compare the results of that article with our results.

7. Conclusions

In this paper we refined the estimates for the number of points and for the probability in stochastic separation theorems. We gave new bounds for linear separability, when the points are drawn randomly, independently and uniformly from a d-dimensional spherical layer or from the unit cube. These results refine some results obtained in [5,12,13,14] and allow us to better understand the applicability limits of the stochastic separation theorems for high-dimensional data mining and machine learning problems.
The strongest progress was in the estimation for the number of random points in a ( 1 r ) -thick spherical layer B d \ r B d that are linear separable with high probability. If
n ϑ 2 d / 2 , 0 r < 1 2 or n ϑ 2 ( d + 1 ) / 2 , r = 1 2
or
n ϑ 2 π 4 · r ( 2 r 2 1 ) 1 r 2 4 · d + 1 4 · 1 r 1 r 2 d / 2 , 1 2 < r < 1 ,
then n i.i.d. random points inside the spherical layer B d \ r B d are linear separable with probability at least 1 ϑ (the asymptotic inequalities are for d ).
One of the main results of the experiment comparing linear and Fisher separabilities is as follows. The blessing of dimensionality when using linear discriminants can come noticeably earlier (for smaller values of d) than if we only use Fisher discriminants. This is achieved at the cost of constructing the usual linear discriminant in comparison with the Fisher one.

Author Contributions

Conceptualization, S.S. and N.Z.; methodology, S.S. and N.Z.; software, S.S. and N.Z.; validation, S.S. and N.Z.; formal analysis, S.S. and N.Z.; investigation, S.S. and N.Z.; resources, S.S. and N.Z.; data curation, S.S. and N.Z.; writing—original draft preparation, S.S. and N.Z.; writing—review and editing, S.S. and N.Z.; visualization, S.S. and N.Z.; supervision, S.S. and N.Z.; project administration, S.S. and N.Z.; funding acquisition, S.S. and N.Z. All authors have read and agreed to the published version of the manuscript.

Funding

The work is supported by the Ministry of Science and Higher Education of the Russian Federation (agreement number 075-15-2020-808).

Acknowledgments

The authors are grateful to anonymous reviewers for valuable comments.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Donoho, D.L. High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality. Invited Lecture at Mathematical Challenges of the 21st Century. In Proceedings of the AMS National Meeting, Los Angeles, CA, USA, 6–12 August 2000; Available online: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.329.3392 (accessed on 9 November 2020).
  2. Grechuk, B.; Gorban, A.N.; Tyukin, I.Y. General stochastic separation theorems with optimal bounds. arXiv 2020, arXiv:2010.05241. [Google Scholar]
  3. Bárány, I.; Füredi, Z. On the shape of the convex hull of random points. Probab. Theory Relat. Fields 1988, 77, 231–240. [Google Scholar] [CrossRef]
  4. Donoho, D.; Tanner, J. Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing. Philos. Trans. R. Soc. A 2009, 367, 4273–4293. [Google Scholar] [CrossRef] [PubMed]
  5. Gorban, A.N.; Tyukin, I.Y. Stochastic separation theorems. Neural Netw. 2017, 94, 255–259. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  6. Gorban, A.N.; Golubkov, A.; Grechuk, B.; Mirkes, E.M.; Tyukin, I.Y. Correction of AI systems by linear discriminants: Probabilistic foundations. Inf. Sci. 2018, 466, 303–322. [Google Scholar] [CrossRef] [Green Version]
  7. Gorban, A.N.; Grechuk, B.; Tyukin, I.Y. Augmented artificial intelligence: A conceptual framework. arXiv 2018, arXiv:1802.02172v3. [Google Scholar]
  8. Albergante, L.; Bac, J.; Zinovyev, A. Estimating the effective dimension of large biological datasets using Fisher separability analysis. In Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 14–19 July 2019. [Google Scholar]
  9. Bac, J.; Zinovyev, A. Lizard brain: Tackling locally low-dimensional yet globally complex organization of multi-dimensional datasets. Front. Neurorobot. 2020, 13, 110. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  10. Gorban, A.N.; Makarov, V.A.; Tyukin, I.Y. The unreasonable effectiveness of small neural ensembles in high-dimensional brain. Phys. Life Rev. 2019, 29, 55–88. [Google Scholar] [CrossRef] [PubMed]
  11. Gorban, A.N.; Makarov, V.A.; Tyukin, I.Y. High-Dimensional Brain in a High-Dimensional World: Blessing of Dimensionality. Entropy 2020, 22, 82. [Google Scholar] [CrossRef] [Green Version]
  12. Gorban, A.N.; Burton, R.; Romanenko, I.; Tyukin, I.Y. One-trial correction of legacy AI systems and stochastic separation theorems. Inf. Sci. 2019, 484, 237–254. [Google Scholar] [CrossRef] [Green Version]
  13. Sidorov, S.V.; Zolotykh, N.Y. On the Linear Separability of Random Points in the d-dimensional Spherical Layer and in the d-dimensional Cube. In Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 14–19 July 2019; pp. 1–4. [Google Scholar] [CrossRef]
  14. Sidorov, S.V.; Zolotykh, N.Y. Linear and Fisher Separability of Random Points in the d-dimensional Spherical Layer. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; pp. 1–6. [Google Scholar] [CrossRef]
  15. Grechuk, B. Practical stochastic separation theorems for product distributions. In Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 14–19 July 2019; pp. 1–8. [Google Scholar] [CrossRef]
  16. Elekes, G. A geometric inequality and the complexity of computing volume. Discret. Comput. Geom. 1986, 1, 289–292. [Google Scholar] [CrossRef]
  17. Paris, R.B. Incomplete beta functions. In NIST Handbook of Mathematical Functions; Olver, F.W., Lozier, D.W., Boisvert, R.F., Clark, C.W., Eds.; Cambridge University Press: Cambridge, UK, 2010. [Google Scholar]
  18. Li, S. Concise Formulas for the Area and Volume of a Hyperspherical Cap. Asian J. Math. Stat. 2011, 4, 66–70. [Google Scholar] [CrossRef] [Green Version]
  19. López, J.L.; Sesma, J. Asymptotic expansion of the incomplete beta function for large values of the first parameter. Integral Transform. Spec. Funct. 1999, 8, 233–236. [Google Scholar] [CrossRef]
  20. Dyer, M.E.; Füredi, Z.; McDiarmid, C. Random points in the n-cube. DIMACS Ser. Discret. Math. Theor. Comput. Sci. 1990, 1, 33–38. [Google Scholar]
Figure 1. Illustration to the proof of Theorem 1.
Figure 1. Illustration to the proof of Theorem 1.
Entropy 22 01281 g001
Figure 2. Illustration to the proof of Theorem 1: case 1 (left); case 2 (right).
Figure 2. Illustration to the proof of Theorem 1: case 1 (left); case 2 (right).
Entropy 22 01281 g002
Figure 3. The graphs of the right-hand sides of the estimates (14), (15) for the probability P 1 ( d , r , n ) that a random point is linear and separable from a set of n = 1000 (left) and n = 10,000 (right) random points in the layer B d \ r B d .
Figure 3. The graphs of the right-hand sides of the estimates (14), (15) for the probability P 1 ( d , r , n ) that a random point is linear and separable from a set of n = 1000 (left) and n = 10,000 (right) random points in the layer B d \ r B d .
Entropy 22 01281 g003
Figure 4. The graphs of the estimates for the probabilities P 1 ( d , r , n ) ( P 1 F ( d , r , n ) ) that a random point is linearly (and respectively, Fisher) separable from a set of n = 10,000 random points in the layer B d \ r B d . The solid lines correspond to the theoretical bounds (14) and (15) for the linear separability. The dash-dotted lines represent the theoretical bounds (2) and (6) for the Fisher separability. The crosses (circles) correspond to the empirical frequencies for linear (and respectively Fisher) separability obtained in 60 trials for each dimension d.
Figure 4. The graphs of the estimates for the probabilities P 1 ( d , r , n ) ( P 1 F ( d , r , n ) ) that a random point is linearly (and respectively, Fisher) separable from a set of n = 10,000 random points in the layer B d \ r B d . The solid lines correspond to the theoretical bounds (14) and (15) for the linear separability. The dash-dotted lines represent the theoretical bounds (2) and (6) for the Fisher separability. The crosses (circles) correspond to the empirical frequencies for linear (and respectively Fisher) separability obtained in 60 trials for each dimension d.
Entropy 22 01281 g004
Figure 5. The graphs of the right-hand side of the estimate (3) for the probability P 1 F ( d , r , n ) that a random point is Fisher separable from a set of n = 1000 (left) and n = 10,000 (right) random points in the layer B d \ r B d .
Figure 5. The graphs of the right-hand side of the estimate (3) for the probability P 1 F ( d , r , n ) that a random point is Fisher separable from a set of n = 1000 (left) and n = 10,000 (right) random points in the layer B d \ r B d .
Entropy 22 01281 g005
Figure 6. The graphs of the estimates for the probabilities P ( d , r , n ) ( P F ( d , r , n ) ) that a random set of n = 1000 points in B d \ r B d is linearly (and respectively Fisher) separable. The solid lines correspond to the theoretical bounds (16) and (17) for the linear separability. The dash-dotted lines represent the theoretical bound (4) and (7) for the Fisher separability. The crosses (circles) correspond to the empirical frequencies for linear (and respectively, Fisher) separability obtained in 60 trials for each dimension d.
Figure 6. The graphs of the estimates for the probabilities P ( d , r , n ) ( P F ( d , r , n ) ) that a random set of n = 1000 points in B d \ r B d is linearly (and respectively Fisher) separable. The solid lines correspond to the theoretical bounds (16) and (17) for the linear separability. The dash-dotted lines represent the theoretical bound (4) and (7) for the Fisher separability. The crosses (circles) correspond to the empirical frequencies for linear (and respectively, Fisher) separability obtained in 60 trials for each dimension d.
Entropy 22 01281 g006
Figure 7. The graphs of the estimates for the probabilities P ( d , r , n ) ( P F ( d , r , n ) ) that a random set of n = 10,000 points in B d \ r B d is linearly (and respectively, Fisher) separable. The notation is the same as in Figure 6.
Figure 7. The graphs of the estimates for the probabilities P ( d , r , n ) ( P F ( d , r , n ) ) that a random set of n = 10,000 points in B d \ r B d is linearly (and respectively, Fisher) separable. The notation is the same as in Figure 6.
Entropy 22 01281 g007
Figure 8. The graphs of the estimate for the probabilities P 1 ( d , n ) and P 1 F ( d , n ) that a random point is linearly (Fisher) separable from a set of n = 10,000 random points inside the cube Q d . The solid red and blue lines correspond to the theoretical bounds (18) and (22) respectively. Red crosses (blue circles) correspond to the empirical frequencies for linear (and respectively, Fisher) separability obtained in 60 trials for each dimension d.
Figure 8. The graphs of the estimate for the probabilities P 1 ( d , n ) and P 1 F ( d , n ) that a random point is linearly (Fisher) separable from a set of n = 10,000 random points inside the cube Q d . The solid red and blue lines correspond to the theoretical bounds (18) and (22) respectively. Red crosses (blue circles) correspond to the empirical frequencies for linear (and respectively, Fisher) separability obtained in 60 trials for each dimension d.
Entropy 22 01281 g008
Figure 9. The graphs of the estimates (20) and (23) for the probabilities P ( d , n ) and P F ( d , n ) that a set of n = 10,000 random points inside the unit cube Q d is linear and Fisher separable, respectively. The notation is the same as in Figure 8.
Figure 9. The graphs of the estimates (20) and (23) for the probabilities P ( d , n ) and P F ( d , n ) that a set of n = 10,000 random points inside the unit cube Q d is linear and Fisher separable, respectively. The notation is the same as in Figure 8.
Entropy 22 01281 g009
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Sidorov, S.; Zolotykh, N. Linear and Fisher Separability of Random Points in the d-Dimensional Spherical Layer and Inside the d-Dimensional Cube. Entropy 2020, 22, 1281. https://doi.org/10.3390/e22111281

AMA Style

Sidorov S, Zolotykh N. Linear and Fisher Separability of Random Points in the d-Dimensional Spherical Layer and Inside the d-Dimensional Cube. Entropy. 2020; 22(11):1281. https://doi.org/10.3390/e22111281

Chicago/Turabian Style

Sidorov, Sergey, and Nikolai Zolotykh. 2020. "Linear and Fisher Separability of Random Points in the d-Dimensional Spherical Layer and Inside the d-Dimensional Cube" Entropy 22, no. 11: 1281. https://doi.org/10.3390/e22111281

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop