Abstract
We consider the problem of finding a position of a d-dimensional box with given edge lengths that maximizes the number of enclosed points of the given finite set , i.e., the problem of optimal box positioning. We prove that while this problem is polynomial for fixed values of d, it is NP-hard in the general case. The proof is based on a polynomial reduction technique applied to the considered problem and the 3-CNF satisfiability problem.
1. Introduction
We consider the problem of optimal box positioning, that is, finding a position of a d-dimensional box with given edge lengths that maximizes the number of enclosed points of a given n-element set . In this paper, we prove that this problem is NP-hard when integers are not fixed and treated as parameters of the problem.
The problem of optimal box positioning has wide applications in computational geometry, data mining, and pattern recognition (e.g., see [1,2,3]). In [4], the authors presented a clustering approach based on the greedy algorithm that finds an approximate solution of the optimal box positioning problem. The algorithm was inspired by the apparatus of maximum interval pattern concepts (see, e.g., [4,5]), a technique that allows one to select patterns from fuzzy contexts. This approach was successfully applied to the dataset of tactile images registered by the Medical Tactile Endosurgical Complex [6,7,8], which allows intraoperative tactile examination of tissues. Comparison of the proposed clustering approach with the conventional k-means clustering resulted in a statistically significant advantage of the proposed method over k-means in clustering quality. Note that the result proved in the present paper justifies developing algorithms to solve an approximate version of the optimal box positioning over the exact one.
The rest of the paper is organized as follows. In Section 2, we describe some known results. In Section 3, we introduce formal definitions and formulate the problem of optimal integer box positioning and the auxiliary problem of the existence of an integer m-box. In Section 4, we prove the NP-hardness of the problem of optimal box positioning. In Section 5, we summarize the results.
2. Previous Results
Despite the fact that, to the best of our knowledge, a proof of the NP-hardness of the optimal box positioning problem is not available so far, some results are known for related problems. For example, in [3], Eckstein et al. considered a generalization of the problem of optimal box positioning: given two finite sets in , find a box B with arbitrary edge lengths such that
- B does not intersect with , and
- the cardinality of is maximal over all boxes that satisfy the first condition.
The authors proved the NP-hardness of the problem by applying a polynomial reduction of the classical NP-hard problem of finding a maximum independent set of vertices in a graph (e.g., see [9]) to the considered problem.
Barbay et al. considered a weighted generalization of the problem: given a finite set and a function , find a box B with arbitrary edge lengths that maximizes the sum [10]. This problem is also NP-hard since it generalizes the previous problem in which points from have weight and points from have weight .
De Figueiredo and da Fonseca considered the weighted problem of the optimal unit ball positioning with non-negative weight function [11]. They obtained a lower bound under an additional restriction: an algorithm decides which operation to apply to a point based only on the coordinates of the point, ignoring its weight, so it processes input points in an order that does not depend on a weight function. Under this restriction, an algorithm must calculate the weight for each ball that is optimal for some weight function. Note that this restriction is not met in the unweighted version of the problem since we use the fact that the weight of each point is equal to . Also note that a unit box is a unit ball in metric.
3. Formal Definitions
Definition 1.
A d-dimensional box with edge lengths is a Cartesian product of the intervals where ().
Furthermore, we consider only boxes with integer edge lengths and vertice coordinates, i.e., . We call such boxes integer boxes.
Definition 2.
The problem of optimal integer box positioning is defined as follows: find an integer box with given edge lengths that maximizes the number of enclosed points of a set .
In Section 4, we obtain NP-hardness of the problem of optimal integer box positioning as a corollary of the theorem about NP-completeness of the problem of the existence of an integer m-box.
Definition 3.
The problem of the existence of an integer m-box is a problem of the existence of an integer box with given edge lengths that contains at least m points from a set .
In general, case parameters of both problems are integers and a set P. The number m is considered as a function of or as a constant.
It is easy to see that both problems belong to the P complexity class if the parameter d is fixed. Indeed, without loss of generality, we can consider only boxes for which each is equal to the i-th coordinate of some point from the set P. So to solve the problem, we can count the number of points in at most boxes. Since each count can be performed in operations, the total number of operations for solving the problem is , which is polynomial in n.
Definition 4.
The 3-CNF satisfiability problem is the problem of the existence of an assignment to the Boolean variables , which turns formula in the conjunctive normal form to 1 (here, denotes literals over variables from the set ). For further details, see e.g., [9].
Without loss of generality, assume that variables of every conjunctive clause are distinct. Indeed, otherwise a clause is either identically equal to 1 (if it contains both a variable and its negation) or can be replaced with at most four clauses with the required property such that the conjunction of these clauses is identically equal to the initial clause.
Cook’s theorem [12] states that the 3-CNF satisfiability problem is NP-complete. This fact will give ground for our proof of NP-hardness of the problem of the existence of an integer m-box.
4. NP-Hardness of the Problem of Optimal Box Positioning
Theorem 1.
The problem of the existence of an integer m-box belongs to the NP complexity class.
Proof.
Suppose we have a certificate: a box B which encloses at least m points from the set P. Then the certificate validation can be performed by counting cardinality of , which can be done by iterating over the set P and checking whether the current point lies in the box B. Since P contains n elements and each check can be done with comparisons, counting cardinality of will take operations, which is polynomial in parameters . □
Theorem 2.
The problem of the existence of an integer m-box is NP-hard.
Proof.
We will prove this theorem by employing a polynomial reduction of the 3-CNF satisfiability problem (which is NP-hard [12]) in the problem of the existence of an integer m-box. Consider an arbitrary formula F in conjunctive normal form with d variables and n disjunctive clauses , each containing exactly 3 literals: , where ; denotes a literal over one of the variables .
We construct the set by the following procedure. Consider the disjunctive clause with variables , , and the set of its satisfying assignments over the variable set . Since each disjunctive clause contains exactly 3 literals corresponding to distinct variables, it holds that . We map the pair to the point with coordinates by the following rule:
We define the set P as an image of this map over all clauses and their sets of satisfying assignments , so . For further convenience, we also introduce sets , as subsets of P that consist of all points associated with .
To complete the proof of the theorem, we prove the following lemmas.
Lemma 1.
In the above notation, for an arbitrary unit cube and for all , the intersection contains zero points or one point.
Proof.
Consider an arbitrary and points associated with . Since for any , satisfying assignments and are different, there exists such that the values of variable in and are opposite. Hence, the lth coordinates of and differ by 2 (one of these coordinates equals 0, and the other equals 2). Thus, points and cannot belong to the same unit cube. □
Lemma 2.
In the above notation, a formula F is satisfiable if and only if there exists a unit cube such that .
Proof.
Let us first prove that if F is satisfiable, then a cube C with exists. Let be a satisfying assignment for F. We construct a subset consisting of the points that correspond to the satisfying assignments matching the satisfying assignment S. Since for each there exists exactly one satisfying assignment that matches S, we have . Let be an arbitrary point in and . If is not met in the respective clause, the value of will be equal to 1. Otherwise, the value of will be equal to . This means that if , the value of will lie in the interval , and otherwise in the interval . Thus, the cube , where
covers the n-element set . Note that contains exactly one point corresponding to each clause, so according to Lemma 1, the cube C has no common points with . Thus .
Now we prove that if a unit cube with exists, then F is satisfiable. Let C be the specified unit cube. By Lemma 1, we conclude that contains exactly one point corresponding to each clause. Since each edge length of C is equal to 1 and the cube vertex coordinates are integers, the list of l-th coordinates of the points from (for fixed ) contains exactly one value from the set , and we denote this value by . From the procedure of construction of the set P, we conclude that is a satisfying assignment for F. □
Lemmas 1 and 2 directly imply the following assertion.
Lemma 3.
In the above notation, a formula F is satisfiable if and only if there exists a unit m-cube for and the set P.
To complete the proof of Theorem 2, we consider the problem of the existence of an integer m-box (with m equal to n) in d-dimensional space for a box with all edge lengths equal to 1 (i.e., for the unit cube) and the constructed set P. Lemma 3 states that F is satisfiable if and only if there exists a unit cube that encloses n points. This statement in combination with the fact that set P can be constructed in time polynomial in completes the proof of the theorem. □
Since the class of NP-complete problems is the intersection of the class NP and the class NP-hard, Theorems 1 and 2 immediately lead to the following theorem.
Theorem 3.
The problem of the existence of an integer m-box is NP-complete.
Now we are ready to prove the main theorem.
Theorem 4.
The problem of optimal integer box positioning is NP-hard.
Proof.
This theorem is a trivial corollary of Theorem 3. Consider a set . Then, finding the optimal position of an integer box B with edge lengths immediately leads to an answer to the problem of the existence of an integer m-box (by simply counting the number of points in the found box in operations and comparing it with m), which is proved to be NP-complete. Thus, we made a polynomial reduction of the problem of the existence of an integer m-box to the problem of optimal integer box positioning. □
Note that the above proofs actually lead to stronger results, namely to NP-completeness of the problem of the existence of an integer unit m-cube and the NP-hardness of the problem of optimal integer unit cube positioning.
Corollary 1.
The problem of optimal integer box positioning with a set of prohibited points (i.e., box should have an empty intersection with it) is NP-hard.
Proof.
This statement immediately follows from the NP-hardness of the problem of optimal integer box positioning since it is a particular case of the considered problem with . □
Corollary 2.
The weighted problem of optimal integer box positioning with the range of the weight function in is NP-hard.
Proof.
This is also a corollary of the NP-hardness of the problem of optimal integer box positioning since we obtain an unweighted version of the problem by setting the weight function to for all points. □
5. Conclusions
The problem of optimal box positioning finds its applications in computer science, pattern recognition, and data analysis [1,2,3,4]. In this paper, we have proved that this problem is NP-hard.
On the one hand, this result means that algorithms based on optimal box positioning are in general inefficient for the analysis of high-dimensional data, thus it makes sense to develop algorithms that look for an approximately optimal box position. An example of such an algorithm used for data clustering can be found in [4].
On the other hand, NP-hardness does not necessarily imply average-case hardness. For example, the canonical NP-complete problem of CNF satisfiability (the one used in the proof of Cook’s theorem about the existence of NP-complete problems, [12]) can be solved using an algorithm with polynomial average time [13]. Thus, the problem of estimation of average complexity for finding an optimal box position remains an interesting open challenge.
Author Contributions
All authors contributed equally to the writing of this paper. All authors read and approved the final manuscript.
Funding
The research was supported by the Russian Science Foundation (project 16-11-00058 “The development of methods and algorithms for automated analysis of medical tactile information and classification of tactile images”).
Acknowledgments
The authors thank Vladimir V. Galatenko for valuable comments and discussions.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Agarwal, P.K.; Hagerup, T.; Ray, R.; Sharir, M.; Smid, M.H.M.; Welzl, E. Translating a planar object to maximize point containment. In Algorithms—ESA 2002; Möhring, R., Raman, R., Eds.; Springer: Berlin/Heidelberg, Germany, 2002; pp. 42–53. [Google Scholar]
- Lamdan, Y.; Schwartz, J.T.; Wolfson, H.J. Object recognition by affine invariant matching. In Proceedings of the CVPR ’88: The Computer Society Conference on Computer Vision and Pattern Recognition, Ann Arbor, MI, USA, 5–9 June 1988; IEEE: Ann Arbor, MI, USA, 1988; pp. 335–344. [Google Scholar]
- Eckstein, J.; Hammer, P.L.; Liu, Y.; Nediak, M.; Simeone, B. The maximum box problem and its application to data analysis. Comput. Optim. Appl. 2002, 23, 285–298. [Google Scholar] [CrossRef]
- Nersisyan, S.A.; Pankratieva, V.V.; Staroverov, V.M.; Podolskii, V.E. A greedy clustering algorithm based on interval pattern concepts and the problem of optimal box positioning. J. Appl. Math. 2017. [Google Scholar] [CrossRef]
- Ganter, B.; Kuznetsov, S.O. Pattern Structures and Their Projections. In Conceptual Structures: Broadening the Base. ICCS 2001; Delugach, H.S., Stumme, G., Eds.; Springer: Berlin/Heidelberg, Germany, 2001; pp. 129–142. [Google Scholar]
- Barmin, V.; Sadovnichy, V.; Sokolov, M.; Pikin, O.; Amiraliev, A. An original device for intraoperative detection of small indeterminate nodules. Eur. J. Cardiothorac. Surg. 2014, 46, 1027–1031. [Google Scholar] [CrossRef] [PubMed]
- Solodova, R.F.; Galatenko, V.V.; Nakashidze, E.R.; Andreytsev, I.L.; Galatenko, A.V.; Senchik, D.K.; Staroverov, V.M.; Podolskii, V.E.; Sokolov, M.E.; Sadovnichy, V.A. Instrumental tactile diagnostics in robot-assisted surgery. Med. Dev. 2016, 9, 377–382. [Google Scholar] [CrossRef] [PubMed]
- Solodova, R.F.; Galatenko, V.V.; Nakashidze, E.R.; Shapovalyants, S.G.; Andreytsev, I.L.; Sokolov, M.E.; Podolskii, V.E. Instrumental mechanoreceptoric palpation in gastrointestinal surgery. Minim. Invasive Surg. 2017. [Google Scholar] [CrossRef] [PubMed]
- Garey, M.K.; Johnson, D.S. Computers and Intractability, A Guide to the Theory of NP-Completeness; W.H. Freeman & Co.: New York, NY, USA, 1997. [Google Scholar]
- Barbay, J.; Chan, T.M.; Navarro, G.; Pérez-Lantero, P. Maximum-weight planar boxes in O(n2) time (and better). Inf. Process. Lett. 2014, 114, 437–445. [Google Scholar] [CrossRef]
- De Figueiredo, C.M.; da Fonseca, G.D. Enclosing weighted points with an almost-unit ball. Inf. Process. Lett. 2009, 109, 1216–1221. [Google Scholar] [CrossRef]
- Cook, S. The complexity of theorem-proving procedures. In STOC ’71 Proceedings of the Third Annual ACM Symposium on Theory of Computing; ACM: New York, NY, USA, 1971; pp. 151–158. [Google Scholar]
- Iwama, K. CNF satisfiability test by counting and polynomial average time. SIAM J. Comput. 1989, 18, 385–391. [Google Scholar] [CrossRef]
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).