An Algorithm Based on Compute Unified Device Architecture for Estimating Covering Functionals of Convex Bodies

: In Chuanming Zong’s program to attack Hadwiger’s covering conjecture, which is a longstanding open problem from Convex and Discrete Geometry, it is essential to estimate covering functionals of convex bodies effectively. Recently, He et al. and Yu et al. provided two deterministic global optimization algorithms having high computational complexity for this purpose. Since satisfactory estimations of covering functionals will be sufficient in Zong’s program, we propose a stochastic global optimization algorithm based on CUDA and provide an error estimation for the algorithm. The accuracy of our algorithm is tested by comparing numerical and exact values of covering functionals of convex bodies including the Euclidean unit disc, the three-dimensional Euclidean unit ball, the regular tetrahedron, and the regular octahedron. We also present estimations of covering functionals for the regular dodecahedron and the regular icosahedron.


Introduction
A compact convex subset K of R n having interior points is called a convex body, whose boundary and interior are denoted by bd K and int K, respectively.Let K n be the set of convex bodies in R n .For each K ∈ K n , let c(K) be the least number of translates of int K necessary to cover K. Regarding the least upper bound of c(K) in K n , there is a long-standing conjecture: Conjecture 1 (Hadwiger's covering conjecture).For each K ∈ K n , we have c(K) ≤ 2 n ; the equality holds if and only if K is a parallelotope.
Let K ∈ K n .A set having the form λK + x, where λ ∈ (0, 1) and x ∈ R n , is called a smaller homothetic copy of K.According to Theorem 34.3 in [1], c(K) equals the least number of smaller homothetic copies of K required to cover K. Clearly, c(K) ≤ p for some p ∈ Z + if and only if Γ p (K) < 1, where Axioms 2024, 13, 132.https://doi.org/10.3390/axioms13020132https://www.mdpi.com/journal/axioms For each p ∈ Z + , the map is an affine invariant and is called the covering functional with respect to p. Let K ∈ K n and p ∈ Z + .A set C of p points satisfying is called a p-optimal configuration of K.
In [8], Chuanming Zong proposed the first program based on computers to tackle Conjecture 1 via estimating covering functionals.Two different algorithms have been designed for this purpose.The first one is introduced by Chan He et al. (cf.[9]) based on the geometric branch-and-bound method (cf.[10]).The algorithm is implemented in two parts.The first part uses geometric branch-and-bound methods to estimate Γ(K, C), where The second part also uses geometric branch-and-bound methods to estimate Γ p (K).
When n ≥ 3, computing Γ(K, C) and Γ p (K) in this way exhibits a significantly high computational complexity.The other is introduced by Man Yu et al. (cf.[11]) based on the relaxation algorithm.Let S ⊆ K be a discretization of K, P K be a set containing a p-optimal configuration of K, and V ⊆ P K .They transformed the problem of covering S by smaller homothetic copies of K into a vertex p-center problem, and showed that the solution of the corresponding vertex p-center problem is a good approximation of Γ p (K) by proving where and ε 1 and ε 2 are two positive numbers satisfying Clearly, finer discretizations of K are required to obtain more accurate estimates of Γ p (K), which will lead to higher computational complexity.
In this paper, we propose an algorithm utilizes Compute Unified Device Architecture (CUDA) and stochastic global optimization methods to accelerate the process of estimating Γ p (K). CUDA is a parallel computing platform, particularly well-suited for handling large-scale computational tasks by performing many computations in parallel (cf.[12]).When discretizing convex bodies, CUDA provides a natural discretization method and enables parallel computation for all discretized points, thereby accelerating the execution of algorithms.As show in Section 2, when calculating Γ(S, C) for some C ⊆ R n , we need to obtain the maximum dissimilarity between a point in S and its closest point in C. The reduction technique provided by CUDA, which typically involves performing a specific operation on all elements in an array (summation, finding the maximum, finding the minimum, etc., cf.[13]), enables the efficient computation of Γ(S, C).When facing large-scale optimization problems, stochastic algorithms have the capability to produce high-quality solutions in a short amount of time.
In Section 2, the problem of estimating Γ p (K) is transformed into a minimization optimization problem, and an error estimation is provided.Using ideas mentioned in [9], an algorithm based on CUDA for Γ p (S) is designed in Section 3. Results of computational experiments showing the effectiveness of our algorithm are presented in Section 4.

Covering Functional and Error Estimation
As in [9], we put where A n is the set of all non-singular affine transformations on R n .We can apply an appropriate affine transformation, if necessary, to ensure that Remark 1.In general, it is not easy to calculate α(K).If K is symmetric about the origin, then 1 α(K) is the Banach-Mazur distance between K and Q n .By Proposition 37.6 in [14], when K is the n-dimensional Euclidean unit ball B n 2 , 1 α(K) = √ n.For our purpose, it will be sufficient to choose an α, as large as possible, such that Definition 1 (cf.[11]).Let K ∈ K n .For k, c ∈ R n , the number is called the dissimilarity of k and c.
If K = B n 2 , the dissimilarity between any two points is precisely the Euclidean distance between these two points.In general, the dissimilarity is not symmetric.If K is an n-dimensional convex polytope, the dissimilarity between any two points can be computed by Lemma 1.
where A is an m-by-n matrix and B is an m-dimensional column vector whose elements are all 1.
For any u, c ∈ R n , we have where (p i ) m×1 = A(u − c).
Remark 2. Clearly, each convex polytope K containing the origin 0 in its interior can be represented as (2).
Let i ≥ 2 and j ∈ [n] be two integers.We denote by p j (x) the j-th coordinate of a point x ∈ R n and Proof.Suppose that x = (β 1 , . . ., β n ) ∈ Q n .Let y ∈ R n be the point satisfying Otherwise, we have Proof.By Lemma 2, we have It follows that There exists a point p ∈ S i such that Therefore, we have Let α > 0, i ≥ 2 be an integer, K be a convex body satisfying and and Proposition 1.Let K, α, i, S be as above, p ∈ Z + .Then, Proof.By Theorem 1, we have which completes the proof.

An Algorithm Based on CUDA for Γ p (S)
Let K ∈ K n , p ∈ Z + , and C be a set of p points.First, we use CUDA to obtain S defined by ( 6) and compute the minimum dissimilarity from each point in S to C.Then, we employ a CUDA-based reduction algorithm to obtain Γ(S, C).Finally, we use different stochastic global optimization algorithms to estimate Γ p (S) and select an appropriate optimization algorithm through comparison.Figure 1 shows the overall framework of the algorithm.

Start
Choose a stochastic global optimization algorithm.
Determine the search space.

Algorithm 1
Get S using CUDA.
Compute the minimum dissimilarity from each point in S to C.

Employ a CUDA-based reduction algorithm for Γ(S, C).
Is the termination condition satisfied?

An Algorithm Based on CUDA for Γ(S, C)
CUDA organizes threads into a hierarchical structure consisting of grids, blocks, and threads.The grid is the highest-level organization of threads in CUDA, and a grid represents a collection of blocks.A block, identified by a unique block index within its grid, is a group of threads that can cooperate with each other and share data using shared memory.Threads are organized within blocks, and each thread is identified by a unique thread index within its block.The number of blocks and threads per block can be specified when launching a CUDA kernel.The grid and block dimensions can be one-dimensional, two-dimensional, or three-dimensional, depending on the problem to address.For more information about CUDA, we refer to [15][16][17][18].
The organization of threads within blocks and grids provides a natural way to discretize  Let T be the set of all threads invoked by CUDA.Then, the cardinality of T is gridDim.x× gridDim.y× blockDim.x.
which corresponds to the point where α is a positive number satisfying αQ 3 ⊆ K ⊆ Q 3 .For each t ∈ S, denote by d(p t , C) the minimum dissimilarity from point p t to C, i.e., If t ̸ ∈ S, we set d(p t , C) = 0.The CUDA thread corresponding to t ∈ T computes d(p t , C).Then, a CUDA-based reduction algorithm will be invoked to obtain max t∈T d(p t , C).
The idea of the reduction algorithm based on CUDA is to divide the original data into multiple blocks, and then perform a local reduction operation on each block to obtain the local reduction result, and, finally, a global reduction operation is performed on the local reduction results to obtain the final reduction result (cf.[19]).
Algorithm 1 with parameters K, C, p, blockDim.x,gridDim.x,gridDim.y,and α calculates Γ(S, C).It is more efficient than the geometric branch-and-bound approach proposed in [9].For example, take K = B  12-core processor and the NVIDIA A4000 graphics processor.For Algorithm 1, we take gridDim.x= gridDim.y= blockDim.x= 1024, and the accuracy is given by Proposition 1.For the geometric branch-and-bound algorithm, we set the relative accuracy ε to be 0.0001 (cf.[9] for the usage of the relative accuracy).The execution time of the geometric branch-andbound approach exhibits substantial variability among different Cs, whereas the algorithm based on CUDA shows relatively consistent execution times across various cases.

Different Stochastic Global Optimization Algorithms for Γ p (S)
We choose to employ stochastic global optimization algorithms for several reasons.In the program proposed by Chuanming Zong (cf.[8]), after appropriately selecting a positive real number β and constructing a β-net N for K n endowed with the Banach-Mazur metric, we only need to verify that Γ 2 n (K) ≤ c n holds for each K ∈ N , where c n is a reasonably accurate estimate of the least upper bound of Γ 2 n (K).For this purpose, we do not need to determine exact values of covering functionals of convex bodies in N .Stochastic global optimization algorithms demonstrate a low time complexity and high algorithmic efficiency.Moreover, based on the results presented in Table 2, it is evident that stochastic global optimization algorithms provide satisfactory estimates for covering functionals.The NLopt (Non-Linear Optimization) library is a rich collection of optimization routines and algorithms, which provides a platform-independent interface for global and local optimization problems (cf.[20]).Algorithms in the NLopt library are partitioned into four categories: non-derivative-based global algorithms, derivative-based global algorithms, nonderivative-based local algorithms, and derivative-based local algorithms.We use several non-derivative-based stochastic global algorithms here.All global optimization algorithms require bound constraints to be specified as optimization parameters (cf.e.g., ref. [21]).
The following is the framework of a stochastic optimization algorithm based on NLopt.
Remark 3. By Proposition 2, when K ⊆ Q n , we only need to search 2Q n for points in a p-optimal configuration of K.
Algorithm 2 A stochastic optimization algorithm for Γ p (S) based on NLopt.
Require: K, C, p, blockDim.x,gridDim.x,gridDim.y,an estimation Γ p (K) ≤ γ, lower bound LB and upper bound UB of the search domain Ensure: min as an estimation of Γ p (S).Table 3 shows a comparison between these three stochastic algorithms.It can be seen that GN_CRS2_LM is better than the other ones.

Computational Experiments
All algorithms were coded in CUDA 12.2 and gcc 13.2.1.The computer's graphics card is an NVIDIA RTX A4000.

Covering Functional of the Euclidean Unit Disc
Let B 2 2 be the Euclidean unit disc.In this situation, d(x, y) is the Euclidean distance between x and y.The discretization of Q 2 is performed using threadIdx.xfor the x axis and blockIdx.xfor the y axis.For a positive integer i, let i = blockDim.x= gridDim.x;then, the values of threadIdx.xand blockIdx.xrange from 0 to i − 1.Let By Theorem 1, we have , there is a p-optimal configuration of B 2 2 contained in B 2 2 .Thus, we can take LB = −1 and UB = 1.Proposition 1 shows that 2 ) are summarized in Table 4.

Covering Functional of the Regular Tetrahedron
Let Thus, T is a regular tetrahedron affinely equivalent to By Lemma 1, the dissimilarity d(x, c) between x and c is given by It can be verified that 1 5 see Figure 3, where , , 1 , and By Remark 3, we take LB = −2 and UB = 2.By (8), we have Numerical estimations of Γ p (T) for p ∈ {4, 5, 6, 7, 8} are summarized in Table 5.
Then, B 3 1 is a regular octahedron.By Lemma 1, the dissimilarity between x and c with respect to B 3 1 is given by d It can be verified that 1 3 see Figure 4, where , and By Remark 3, we can set LB = −2 and UB = 2. From (8), we have Numerical estimations of Γ p (B 3 1 ) for p ∈ {6, 7, 8} are summarized in Table 6.

Covering Functional of the Regular Dodecahedron
Let P be the regular dodecahedron having the form: By Lemma 1, the dissimilarity between x and c with respect to P is given by It can be verified that see Figure 5, where By Remark 3, we can set LB = −2 and UB = 2. From (8), we have Numerical estimations of Γ p (P) for p ∈ {5, 6, 7, 8} are summarized in Table 7.
The estimate of Γ 8 (P) is much better than that given in [29].

Conclusions
This paper proposes an algorithm based on CUDA for estimating covering functionals of three-dimensional convex polytopes, which is more efficient than the geometric branchand-bound method presented in [9].The theoretical accuracy of our algorithm is given by (8).Restrictions on the dimensions of each grid makes the implementation of our algorithm in higher dimensions difficult.

Figure 3 .
Figure 3. Inscribed cube and circumscribed cube of the regular tetrahedron.

Figure 4 .
Figure 4. Circumscribed and inscribed cubes of the regular octahedron.

Figure 5 .
Figure 5. Inscribed cube and circumscribed cube of the regular dodecahedron. .

4 Figure 6 .
Figure 6.Inscribed cube and circumscribed cube of the regular icosahedron.

Table 1 .
Comparison of results between the algorithm based on CUDA and the geometric branchand-bound approach.

Table 3 .
Comparison between different stochastic algorithms.