Abstract
Given a set of n points in a plane, the General Position Subset Selection problem is that of finding a maximum-size subset of points in general position, i.e., with no three points collinear. The problem is known to be hard computationally, and the best approximation ratio known is . Here, we obtain better approximations in three special cases: (I) a constant-factor approximation for the case where the input set consists of lattice points and is dense, which means that the ratio between the maximum and the minimum distance in P is of the order of ; (II) an -approximation for the case where the input set is the set of vertices of a generic n-line arrangement, i.e., one with vertices; and (III) an -approximation for the case where the input set has at most points collinear and can be covered by lines. The scenario in (I) is a special case of that in (II). Our approximations rely on probabilistic methods and results from incidence geometry.
1. Introduction
A set of points in the plane is said to be in general position if no three points are collinear. The problem of selecting a large subset of points from a grid of integer points with no three points collinear (i.e., in general position) goes back more than 100 years; see, for example, [,] (Ch. 10). In particular, it is not known if one can always select points from the grid so that no three points are collinear [].
A key quantity in the process of selecting a large subset from an input set of points is the number of collinear triples in the set. Payne and Wood [] obtained the following upper bound on this number. The proof relies on the classical points and line incidence bound due to Szemerédi and Trotter []; see also [] (Chap. 10) and [] for a modern approach to this topic. The special case in the lemma can also be found in [] or in [] (p. 313).
Lemma 1
([]). Let P be a set of n points in the plane with at most ℓ collinear. Then, the number of collinear triples in P is . In particular, if , then .
By applying the above lemma together with a lower bound on the independence number of a hypergraph obtained by Spencer [], Payne and Wood [] obtained the following result.
Theorem 1
([]). Let P be a set of n points in the plane with at most ℓ collinear. Then, P contains a subset of points in general position. In particular, if , then P contains a subset of points in general position.
A cursory examination of the above lower bound shows that if the input set has a large subset of points in general position (of, say, nearly linear size), then the above guarantee is roughly a factor of smaller than the truth.
Given a set P of points in the plane, the General Position Subset Selection problem (GPSS, for short), is that of finding a maximum-size subset of points in general position. The problem is known to be NP-complete and APX-hard, and the current best approximation for the problem is a greedy algorithm due to Cao [] (Chap. 3), achieving a ratio of ; here, is the cardinality of an optimum solution to GPSS. See also [] (Chap. 9) and [] for further aspects of this problem and [] for basic notions in complexity and approximation. Throughout this paper, denotes the cardinality of the solution to GPSS computed by an algorithm under consideration.
As in [], here, we follow the convention that the approximation ratio of an algorithm for a maximization problem is less than 1. Throughout this paper, all logarithms are in base 2.
- Our results.
(I) For a set P of n points in the plane, consider the ratio (sometimes also called spread)
where is the Euclidean distance between points a and b. Since the general position property is unaffected by scaling, we may assume, without loss of generality, that . In this case, is the diameter of P. A standard disk packing argument shows that if , then , where
provided that n is large enough; see [] (Prop. 4.10). On the other hand, a section of the integer lattice shows that this bound is tight up to a constant factor. An n-element point set P satisfying the condition , for some constant , is said to be -dense; see, for instance, [] and Figure 1 for an illustration.
Figure 1.
Left: A 2-dense set of points in the grid (). Center: ; and ; points in may lie outside the grid, e.g., lies 2 units above it. Right: The approximation algorithm returns (or ); here, .
We obtain a constant-factor approximation for GPSS in dense lattice point sets; its approximation ratio depends on the spread ratio of the input.
Theorem 2.
Given an α-dense set of n lattice points, an -approximation for the General Position Subset Selection problem can be computed in polynomial time.
(II) According to a classical result of Beck [], if P is a set of n points in the plane with at most ℓ points collinear, then P determines at least distinct lines. In particular, if , where is a constant, then P determines distinct lines. By duality (see, e.g., []), for a set L of n lines in the plane, where at most ℓ lines of L are concurrent, the number of vertices of the corresponding arrangement is . We say that an arrangement of n lines is generic if it has vertices.
We obtain an -approximation for GPSS in the special case where the input set is the set of vertices of an n-line arrangement under the relatively mild genericity assumption.
Theorem 3.
Given n lines in the plane that make a generic line arrangement, an -approximation for the General Position Subset Selection problem for the set of vertices of the corresponding line arrangement can be computed using a randomized algorithm in expected polynomial time.
Essentially, the same proof (of this theorem in Section 3) yields the following.
Corollary 1.
Given n lines in the plane, let V denote the vertex set of the corresponding line arrangement. If is a subset of vertices with , an -approximation for the General Position Subset Selection problem in can be computed using a randomized algorithm in expected polynomial time.
It is worth noting that the scenario in Theorem 2 is a special case of the scenario in Theorem 3. Indeed, any finite set P of lattice points in the integer grid is a subset of the vertices of an arrangement of axis-parallel lines induced by H and V, the sets of horizontal and vertical lines incident to points in P, respectively. Moreover, if P is a dense set, then is generic, since
Thus, has vertices. Observe that not all vertices in the arrangement are necessarily in P.
(III) Given a set P of points in the plane, a line cover of P is a set of lines that cover all points in P. Computing a line cover of minimum size is APX-hard, and the best approximation ratio known is [].
We obtain an -approximation for GPSS in the special case where the input set has at most points collinear and can be covered by lines.
Theorem 4.
Given a set P of n points in the plane with at most collinear and with a line cover of size , an -approximation for the General Position Subset Selection problem can be computed using a randomized algorithm in expected polynomial time.
2. Subset Selection in Dense Lattice Point Sets
In this section, we prove Theorem 2. First, we introduce the setup.
For a positive integer m, the grid is the set of points in the plane . We have . Following [], let denote the minimum number of colors in a coloring of the points in such that no three collinear points are monochromatic. Since no three points in a single row or column can receive the same color, we have . It was shown by Wood [] that for any , for every . Among other facts, this is based on the fact that for any , there exists a prime between m and , for every . A non-asymptotic result, more convenient for small m, is that there exists a prime between m and , for every []. (And by the well-known Bertrand–Chebyshev theorem, there exists a prime between m and , for every m.)
We briefly recall Wood’s argument [] relying on a construction of Erdos []. Let be a prime; in practice we may pick the smallest prime at least m or a prime at least m that is close to m. For any integer i, let
Note that , and that may be not contained in ; refer to Figure 1. A Vandermonde determinant calculation shows that is in general position for every . Moreover, whenever . Each point is in , where . Since
the union of the sets over consists of color classes, where each class is in general position. That is, can be covered by m-element point sets in general position. As mentioned in the previous paragraph, for every , or for every , as needed.
Proof of Theorem 2.
Since P is -dense, we may assume that , where , after a suitable translation with an integer vector. The algorithm chooses a prime close to m. Recall that primality testing can be realized in polynomial time, that is, in time; see, e.g., [] or [] (Ch. 14.6). By the Prime Number Theorem, finding a prime close to m involves the primality testing of odd integers in the interval [].
As argued above, is covered by m-element point sets in general position. By monotonicity (i.e., every subset of a set in general position is likewise in general position), the sets are in general position, and the algorithm outputs the largest one, with at least elements.
Since each of the m rows of contains at most two elements of an optimal solution, we have . Consequently, the ratio is bounded from below as
The approximation ratio in the formula is roughly , for a small . Clearly, the resulting algorithm runs in polynomial time: choosing the prime p and generating the sets takes time. Selecting one with a maximal size can be achieved within the same time, and so the overall running time is . □
3. Subset Selection in Generic Line Arrangements
In this section, we prove Theorem 3. Recall that an arrangement of n lines is generic if it has vertices. See Figure 2.
Figure 2.
Left: A generic line arrangement consisting of three bundles of nearly parallel lines each. Right: A non-generic line arrangement consisting of 3 parallel lines and concurrent lines.
Lemma 2.
Let V be the set of vertices of an n line arrangement , where . Then, .
Proof.
We distinguish two cases. Let be a set of collinear vertices, say on a line h. If , then obviously . If , scan h, say, from left to right, and observe that each vertex in uses up at least two “new” lines from L; thus, . See Figure 3. □
Figure 3.
Collinear vertices in a line arrangement—incident to an induced dotted line.
Let V be the set of vertices of a generic n line arrangement . Let ; we also have by the assumption. By Lemma 2, ; thus, by Lemma 1, the number of collinear triples is bounded as
Proof of Theorem 3.
The subset selection algorithm consists of two steps: (i) random sampling from V and (ii) the application of the deletion method. See, e.g., [] (Chap. 3) and a similar application in []. In the first step, a random subset is chosen by selecting points independently with probability , for a suitable k to be determined. Note that .
To select a subset in general position, one needs to avoid one type of obstacle, collinear triples. One obstacle can be eliminated by deleting one point from each surviving collinear triple in the second step. In particular, it suffices to choose k so that
since this implies that the expected number of remaining points is at least . Let denote the constant hidden in (2). With the previous upper bound on T, and recalling that , it suffices to choose k so that
Setting for a sufficiently small satisfies the inequality, proving the lower bound on the size of the output. In addition, this setting is valid for ensuring that .
In summary, the expected number of points in general position found using the algorithm is . To analyze the approximation ratio, it suffices to notice that . Indeed, for each line , , at most two vertices in V can appear in any solution; thus, . It follows that the approximation ratio is
as claimed.
Since repeated random samplings are independent, the probability of error for the resulting randomized algorithm of not finding the required number of points can be made arbitrarily small by repetition—in standard fashion. Clearly, the resulting algorithm runs in expected polynomial time; we give a few details below.
Computing the set of vertices V and performing the random sampling takes time. The expected sample size is . Using the point–line duality, the algorithm computes the line arrangement dual to the point sample in time [] (Ch. 8). It repeatedly removes lines passing through vertices incident to at least three lines, each in time, and updates the line arrangement. This step is repeated until the arrangement is simple, and thus, the points left in the sample are in general position. The expected running time is . □
Proof of Corollary 1.
We perform random sampling from , let , and proceed as in the proof of Theorem 3. Inequality (3) is again satisfied due to the assumption . □
4. Another Application
In this section, we restrict ourselves to General Position Subset Selection in grid-like sets, that is, sets with at most points collinear and with a line cover of size . The approximation is obtained with an approach similar to that in the proof of Theorem 3.
Proof of Theorem 4.
By the first assumption and Lemma 1, the number of collinear triples in P is
Since an optimal solution can contain at most two points of P from each line in an optimal line cover of P, we have . The subset selection algorithm consists of two steps: (i) random sampling from P and (ii) the application of the deletion method. In the first step, a random subset is chosen by selecting points independently with probability , for a suitable k to be determined. Note that .
As in the proof of Theorem 3, it suffices to choose k so that
Let denote the constant hidden in (4). Using the upper bound on T in (4), it suffices to choose k so that
Setting for a sufficiently small satisfies the inequality, proving the lower bound on the size of the output. In addition, this setting is valid for ensuring that .
The expected number of points in general position found using the algorithm is . By recalling that , it follows that the approximation ratio is
as claimed. □
5. Concluding Remarks
We conclude with some problems for further investigation:
- Can the approximation factor in Theorem 3 be improved? Perhaps using ideas from [] (Section 2)?
- Is there a constant-factor approximation for GPSS among the vertices of an n-line arrangement?
- Can a constant-factor approximation for GPSS on point sets satisfying and be obtained? Note that one can select points in general position from the grid of n points; see, e.g., refs. [] (Ch. 3.3) or [] (Ch. 10).
- Can better approximation factors for the general position subset selection problem be obtained in other interesting scenarios?
Funding
This research received no external funding.
Data Availability Statement
No new data were created or analyzed in this study. Data sharing is not applicable to this article.
Conflicts of Interest
Author Adrian Dumitrescu was employed by the company Algoresearch L.L.C. The author declares no conflicts of interest.
References
- Dudeney, H. A puzzle with pawns. In Amusements in Mathematics; Nelson: Edinburgh, UK, 1917; p. 94. [Google Scholar]
- Braß, P.; Moser, W.; Pach, J. Research Problems in Discrete Geometry; Springer: New York, NY, USA, 2005. [Google Scholar]
- Payne, M.; Wood, D. On the general position subset selection problem. SIAM J. Discret. Math. 2013, 27, 1727–1733. [Google Scholar] [CrossRef]
- Szemerédi, E.; Trotter, W.T. Extremal problems in discrete geometry. Combinatorica 1983, 3, 381–392. [Google Scholar] [CrossRef]
- Pach, J.; Agarwal, P. Combinatorial Geometry; Wiley-Interscience: New York, NY, USA, 1995. [Google Scholar]
- Székely, L. Crossing numbers and hard Erdős problems in discrete geometry. Comb. Probab. Comput. 1997, 6, 353–358. [Google Scholar] [CrossRef]
- Lefmann, H. Distributions of points in the unit square and large k-gons. Eur. J. Comb. 2008, 29, 946–965. [Google Scholar] [CrossRef]
- Tao, T.; Vu, V. Additive Combinatorics; Cambridge Studies in Advanced Mathematics, 105; Cambridge University Press: Cambridge, UK, 2006. [Google Scholar]
- Spencer, J. Turán’s theorem for k-graphs. Discret. Math. 1972, 2, 183–186. [Google Scholar] [CrossRef]
- Cao, C. Study on Two Optimization Problems: Line Cover and Maximum Genus Embedding. Master’s Thesis, Texas A&M University, Station, TX, USA, 2012. [Google Scholar]
- Eppstein, D. Forbidden Configurations in Discrete Geometry; Cambridge University Press: Cambridge, UK, 2018. [Google Scholar]
- Froese, V.; Kanj, I.; Nichterlein, A.; Niedermeier, R. Finding points in general position. Int. J. Comput. Geom. Appl. 2017, 27, 277–296. [Google Scholar] [CrossRef]
- Williamson, D.; Shmoys, D. The Design of Approximation Algorithms; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
- Valtr, P. Convex independent sets and 7-holes in restricted planar point sets. Discret. Comput. Geom. 1992, 7, 135–152. [Google Scholar] [CrossRef]
- Kovács, I.; Tóth, G. Dense point sets with many halving lines. Discret. Comput. Geom. 2020, 64, 965–984. [Google Scholar] [CrossRef]
- Beck, J. On the lattice property of the plane and some problems of Dirac, Motzkin and Erdős in combinatorial geometry. Combinatorica 1983, 3, 281–297. [Google Scholar] [CrossRef]
- Matoušek, J. Lectures on Discrete Geometry; Springer: New York, NY, USA, 2002. [Google Scholar]
- Dumitrescu, A.; Jiang, M. On the approximability of covering points by lines and related problems. Comput. Geom. Theory Appl. 2015, 48, 703–717. [Google Scholar] [CrossRef]
- Wood, D. A note on colouring the plane grid. Geombinatorics 2004, 13, 193–196. [Google Scholar]
- Nagura, J. On the interval containing at least one prime number. Proc. Jpn. Acad. 1952, 28, 177–181. [Google Scholar] [CrossRef]
- Erdos, P. Appendix, in Klaus F. Roth, On a problem of Heilbronn. J. Lond. Math. Soc. 1951, 1, 198–204. [Google Scholar]
- Agrawal, M.; Kayal, N.; Saxena, N. PRIMES is in P. Ann. Math. 2004, 160, 781–793. [Google Scholar] [CrossRef]
- Motwani, R.; Raghavan, P. Randomized Algorithms; Cambridge University Press: Cambridge, UK, 1995. [Google Scholar]
- Baker, R.; Harman, G.; Pintz, J. The difference between consecutive primes, II. Proc. Lond. Math. Soc. 2001, 83, 532–562. [Google Scholar] [CrossRef]
- Alon, N.; Spencer, J. The Probabilistic Method, 4th ed.; Wiley: New York, NY, USA, 2016. [Google Scholar]
- Zhang, Z. A note on arrays of dots with distinct slopes. Combinatorica 1993, 13, 127–128. [Google Scholar] [CrossRef]
- de Berg, M.; Cheong, O.; van Kreveld, M.; Overmars, M. Computational Geometry: Algorithms and Applications, 3rd ed.; Springer: New York, NY, USA, 2008. [Google Scholar]
- Balogh, J.; Clemen, F.C.; Dumitrescu, A.; Liu, D. Subset selection problems in planar point sets. arXiv 2024, arXiv:2412.14287. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).