NP-Hardness of the Problem of Optimal Box Positioning

: We consider the problem of ﬁnding a position of a d -dimensional box with given edge lengths that maximizes the number of enclosed points of the given ﬁnite set P ⊂ R d , i.e., the problem of optimal box positioning. We prove that while this problem is polynomial for ﬁxed values of d , it is NP-hard in the general case. The proof is based on a polynomial reduction technique applied to the considered problem and the 3-CNF satisﬁability problem


Introduction
We consider the problem of optimal box positioning, that is, finding a position of a d-dimensional box with given edge lengths that maximizes the number of enclosed points of a given n-element set P ⊂ R d .In this paper, we prove that this problem is NP-hard when integers n, d are not fixed and treated as parameters of the problem.
The problem of optimal box positioning has wide applications in computational geometry, data mining, and pattern recognition (e.g., see [1][2][3]).In [4], the authors presented a clustering approach based on the greedy algorithm that finds an approximate solution of the optimal box positioning problem.The algorithm was inspired by the apparatus of maximum interval pattern concepts (see, e.g., [4,5]), a technique that allows one to select patterns from fuzzy contexts.This approach was successfully applied to the dataset of tactile images registered by the Medical Tactile Endosurgical Complex [6][7][8], which allows intraoperative tactile examination of tissues.Comparison of the proposed clustering approach with the conventional k-means clustering resulted in a statistically significant advantage of the proposed method over k-means in clustering quality.Note that the result proved in the present paper justifies developing algorithms to solve an approximate version of the optimal box positioning over the exact one.
The rest of the paper is organized as follows.In Section 2, we describe some known results.In Section 3, we introduce formal definitions and formulate the problem of optimal integer box positioning and the auxiliary problem of the existence of an integer m-box.In Section 4, we prove the NP-hardness of the problem of optimal box positioning.In Section 5, we summarize the results.

Previous Results
Despite the fact that, to the best of our knowledge, a proof of the NP-hardness of the optimal box positioning problem is not available so far, some results are known for related problems.For example, in [3], Eckstein et al. considered a generalization of the problem of optimal box positioning: given two finite sets P + , P − in R d , find a box B with arbitrary edge lengths such that 1. B does not intersect with P − , and 2. the cardinality of B P + is maximal over all boxes that satisfy the first condition.
The authors proved the NP-hardness of the problem by applying a polynomial reduction of the classical NP-hard problem of finding a maximum independent set of vertices in a graph (e.g., see [9]) to the considered problem.
Barbay et al. considered a weighted generalization of the problem: given a finite set P ⊂ R d and a function w : P → R {−∞}, find a box B with arbitrary edge lengths that maximizes the sum ∑ p∈B P w(p) [10].This problem is also NP-hard since it generalizes the previous problem in which points from P + have weight +1 and points from P − have weight −∞.
De Figueiredo and da Fonseca considered the weighted problem of the optimal unit ball positioning with non-negative weight function [11].They obtained a lower bound Ω(n d ) under an additional restriction: an algorithm decides which operation to apply to a point based only on the coordinates of the point, ignoring its weight, so it processes input points in an order that does not depend on a weight function.Under this restriction, an algorithm must calculate the weight for each ball that is optimal for some weight function.Note that this restriction is not met in the unweighted version of the problem since we use the fact that the weight of each point is equal to +1.Also note that a unit box is a unit ball in ∞ metric.

Formal Definitions Definition
Furthermore, we consider only boxes with integer edge lengths and vertice coordinates, i.e., δ i , a i , b i ∈ Z (i ∈ {1, . . ., d}).We call such boxes integer boxes.Definition 2. The problem of optimal integer box positioning is defined as follows: find an integer box with given edge lengths that maximizes the number of enclosed points of a set P = {p i } n i=1 ⊂ Z d .
In Section 4, we obtain NP-hardness of the problem of optimal integer box positioning as a corollary of the theorem about NP-completeness of the problem of the existence of an integer m-box.Definition 3. The problem of the existence of an integer m-box is a problem of the existence of an integer box with given edge lengths that contains at least m points from a set P = {p i } n i=1 ⊂ Z d .
In general, case parameters of both problems are integers n, d, δ 1 , δ 2 , . . ., δ d and a set P. The number m is considered as a function of n, d or as a constant.
It is easy to see that both problems belong to the P complexity class if the parameter d is fixed.Indeed, without loss of generality, we can consider only boxes for which each a i is equal to the i-th coordinate of some point p k i from the set P. So to solve the problem, we can count the number of points in at most n d boxes.Since each count can be performed in O(nd) operations, the total number of operations for solving the problem is O(dn d+1 ), which is polynomial in n.Definition 4. The 3-CNF satisfiability problem is the problem of the existence of an assignment (s 1 , . . ., s d ) ∈ {0, 1} d to the Boolean variables x 1 , . . . ,x d , which turns formula n i=1 l i,1 ∨ l i,2 ∨ l i,3 in the conjunctive normal form to 1 (here, l i,j denotes literals over variables from the set {x 1 , . . ., x d }).For further details, see e.g., [9].
Without loss of generality, assume that variables of every conjunctive clause are distinct.Indeed, otherwise a clause is either identically equal to 1 (if it contains both a variable and its negation) or can be replaced with at most four clauses with the required property such that the conjunction of these clauses is identically equal to the initial clause.
Cook's theorem [12] states that the 3-CNF satisfiability problem is NP-complete.This fact will give ground for our proof of NP-hardness of the problem of the existence of an integer m-box.

NP-Hardness of the Problem of Optimal Box Positioning
Theorem 1.The problem of the existence of an integer m-box belongs to the NP complexity class.
Proof.Suppose we have a certificate: a box B which encloses at least m points from the set P. Then the certificate validation can be performed by counting cardinality of B P, which can be done by iterating over the set P and checking whether the current point lies in the box B. Since P contains n elements and each check can be done with O(d) comparisons, counting cardinality of B P will take O(dn) operations, which is polynomial in parameters n, d.
Theorem 2. The problem of the existence of an integer m-box is NP-hard.
Proof.We will prove this theorem by employing a polynomial reduction of the 3-CNF satisfiability problem (which is NP-hard [12]) in the problem of the existence of an integer m-box.Consider an arbitrary formula F in conjunctive normal form with d variables x 1 , . . ., x d and n disjunctive clauses D 1 , . . . ,D n , each containing exactly 3 literals: F = n i=1 D i , where D i = l i,1 ∨ l i,2 ∨ l i,3 ; l i,j denotes a literal over one of the variables x 1 , . . ., x d .
We construct the set P = {p i } ⊂ Z d by the following procedure.Consider the disjunctive clause D i with variables x i,1 , x i,2 , x i,3 and the set of its satisfying assignments S i = {S i,j } over the variable set {x i,1 , x i,2 , x i,3 }.Since each disjunctive clause contains exactly 3 literals corresponding to distinct variables, it holds that |S i | = 7.We map the pair (D i , S i,j ) to the point z i,j ∈ Z d with coordinates (z 1 , . . ., z d ) by the following rule: 3 } and the value of x l in S i,j is 0; 1, if x l ∈ {x i,1 , x i,2 , x i,3 }; 2, if x l ∈ {x i,1 , x i,2 , x i,3 } and the value of x l in S i,j is 1.
We define the set P as an image of this map over all clauses D 1 , . . ., D n and their sets of satisfying assignments S 1 , . . ., S n , so |P| ≤ 7n.For further convenience, we also introduce sets Z i = {z i,j }, i ∈ {1, . . ., n}, as subsets of P that consist of all points associated with D i .
To complete the proof of the theorem, we prove the following lemmas.
Lemma 1.In the above notation, for an arbitrary unit cube C ⊂ Z d and for all i ∈ {1, . . ., n}, the intersection C Z i contains zero points or one point.

Proof.
Consider an arbitrary i ∈ {1, . . ., n} and points j=1 associated with D i .Since for any j, k ∈ {1, . . ., 7}, j = k satisfying assignments S i,j and S i,k are different, there exists l ∈ {1, . . ., d} such that the values of variable x l ∈ {x i,1 , x i,2 , x i,3 } in S i,j and S i,k are opposite.Hence, the lth coordinates of z i,j and z i,k differ by 2 (one of these coordinates equals 0, and the other equals 2).Thus, points z i,j and z i,k cannot belong to the same unit cube.Lemma 2. In the above notation, a formula F is satisfiable if and only if there exists a unit cube C ⊂ Z d such that |C P| = n.
Proof.Let us first prove that if F is satisfiable, then a cube C with |C P| = n exists.Let S = (s 1 , . . ., s d ) be a satisfying assignment for F. We construct a subset P ⊂ P consisting of the points that correspond to the satisfying assignments S ij matching the satisfying assignment S. Since for each i ∈ {1, . . ., n} there exists exactly one satisfying assignment S ij ∈ S i that matches S, we have | P| = n.Let z = (z 1 , . . ., z d ) be an arbitrary point in P and l ∈ {1, . . ., d}.If x l is not met in the respective clause, the value of z l will be equal to 1. Otherwise, the value of z l will be equal to 2 • s l .This means that if s l = 0, the value of z l will lie in the interval [0, 1], and otherwise in the interval [1,2].Thus, the cube C = [a 1 , where covers the n-element set P ⊂ P. Note that P contains exactly one point corresponding to each clause, so according to Lemma 1, the cube C has no common points with P \ P. Thus |C P| = n.Now we prove that if a unit cube with |C P| = n exists, then F is satisfiable.Let C be the specified unit cube.By Lemma 1, we conclude that C P contains exactly one point corresponding to each clause.Since each edge length of C is equal to 1 and the cube vertex coordinates are integers, the list of l-th coordinates of the points from C P (for fixed l ∈ {1, . . ., d}) contains exactly one value from the set {0, 2}, and we denote this value by 2 • s l .From the procedure of construction of the set P, we conclude that S = (s 1 , . . ., s d ) is a satisfying assignment for F.
Lemmas 1 and 2 directly imply the following assertion.Lemma 3. In the above notation, a formula F is satisfiable if and only if there exists a unit m-cube for m = n and the set P.
To complete the proof of Theorem 2, we consider the problem of the existence of an integer m-box (with m equal to n) in d-dimensional space for a box with all edge lengths equal to 1 (i.e., for the unit cube) and the constructed set P. Lemma 3 states that F is satisfiable if and only if there exists a unit cube that encloses n points.This statement in combination with the fact that set P can be constructed in time polynomial in n, d completes the proof of the theorem.
Since the class of NP-complete problems is the intersection of the class NP and the class NP-hard, Theorems 1 and 2 immediately lead to the following theorem.Proof.This theorem is a trivial corollary of Theorem 3. Consider a set P = {p i } n i=1 ⊂ Z d .Then, finding the optimal position of an integer box B with edge lengths δ 1 , δ 2 , . . ., δ d immediately leads to an answer to the problem of the existence of an integer m-box (by simply counting the number of points in the found box in O(nd) operations and comparing it with m), which is proved to be NP-complete.Thus, we made a polynomial reduction of the problem of the existence of an integer m-box to the problem of optimal integer box positioning.Note that the above proofs actually lead to stronger results, namely to NP-completeness of the problem of the existence of an integer unit m-cube and the NP-hardness of the problem of optimal integer unit cube positioning.
Corollary 1.The problem of optimal integer box positioning with a set of prohibited points P − (i.e., box should have an empty intersection with it) is NP-hard.
Proof.This statement immediately follows from the NP-hardness of the problem of optimal integer box positioning since it is a particular case of the considered problem with P − = ∅.Proof.This is also a corollary of the NP-hardness of the problem of optimal integer box positioning since we obtain an unweighted version of the problem by setting the weight function to +1 for all points.

Conclusions
The problem of optimal box positioning finds its applications in computer science, pattern recognition, and data analysis [1][2][3][4].In this paper, we have proved that this problem is NP-hard.
On the one hand, this result means that algorithms based on optimal box positioning are in general inefficient for the analysis of high-dimensional data, thus it makes sense to develop algorithms that look for an approximately optimal box position.An example of such an algorithm used for data clustering can be found in [4].
On the other hand, NP-hardness does not necessarily imply average-case hardness.For example, the canonical NP-complete problem of CNF satisfiability (the one used in the proof of Cook's theorem about the existence of NP-complete problems, [12]) can be solved using an algorithm with polynomial average time [13].Thus, the problem of estimation of average complexity for finding an optimal box position remains an interesting open challenge.

Theorem 3 .Theorem 4 .
The problem of the existence of an integer m-box is NP-complete.Now we are ready to prove the main theorem.The problem of optimal integer box positioning is NP-hard.

Corollary 2 .
The weighted problem of optimal integer box positioning with the range of the weight function in R {−∞} is NP-hard.