Next Article in Journal
Automated Generation of Hybrid Metaheuristics Using Learning-to-Rank
Next Article in Special Issue
Super-Exponential Approximation of the Riemann–Liouville Fractional Integral via Gegenbauer-Based Fractional Approximation Methods
Previous Article in Journal
Multi-Project Scheduling with Uncertainty and Resource Flexibility: A Narrative Review and Exploration of Future Landscapes
Previous Article in Special Issue
Generation of Sparse Antennas and Scatterers Based on Optimal Current Grid Approximation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

General Position Subset Selection in Line Arrangements †

by
Adrian Dumitrescu
Algoresearch L.L.C., Milwaukee, WI 53217, USA
This paper is an extended version of our paper published in Proceedings of the 17th International Conference on Algorithms and Complexity.
Algorithms 2025, 18(6), 315; https://doi.org/10.3390/a18060315
Submission received: 28 March 2025 / Revised: 27 April 2025 / Accepted: 16 May 2025 / Published: 27 May 2025

Abstract

Given a set of n points in a plane, the General Position Subset Selection problem is that of finding a maximum-size subset of points in general position, i.e., with no three points collinear. The problem is known to be hard computationally, and the best approximation ratio known is Ω ( n 1 / 2 ) . Here, we obtain better approximations in three special cases: (I) a constant-factor approximation for the case where the input set consists of lattice points and is dense, which means that the ratio between the maximum and the minimum distance in P is of the order of Θ ( n ) ; (II) an Ω ( log n ) 1 / 2 -approximation for the case where the input set is the set of vertices of a generic n-line arrangement, i.e., one with Ω ( n 2 ) vertices; and (III) an Ω ( log n ) 1 / 2 -approximation for the case where the input set has at most O ( n ) points collinear and can be covered by O ( n ) lines. The scenario in (I) is a special case of that in (II). Our approximations rely on probabilistic methods and results from incidence geometry.

1. Introduction

A set of points in the plane is said to be in general position if no three points are collinear. The problem of selecting a large subset of points from a k × k grid of integer points with no three points collinear (i.e., in general position) goes back more than 100 years; see, for example, [1,2] (Ch. 10). In particular, it is not known if one can always select 2 k points from the k × k grid so that no three points are collinear [1].
A key quantity in the process of selecting a large subset from an input set of points is the number of collinear triples in the set. Payne and Wood [3] obtained the following upper bound on this number. The proof relies on the classical points and line incidence bound due to Szemerédi and Trotter [4]; see also [5] (Chap. 10) and [6] for a modern approach to this topic. The special case = O ( n ) in the lemma can also be found in [7] or in [8] (p. 313).
Lemma 1
([3]). Let P be a set of n points in the plane with at most ℓ collinear. Then, the number of collinear triples in P is T = O ( n 2 log + 2 n ) . In particular, if = O ( n ) , then T = O ( n 2 log ) .
By applying the above lemma together with a lower bound on the independence number of a hypergraph obtained by Spencer [9], Payne and Wood [3] obtained the following result.
Theorem 1
([3]). Let P be a set of n points in the plane with at most ℓ collinear. Then, P contains a subset of Ω n / n log + 2 points in general position. In particular, if = O ( n ) , then P contains a subset of Ω ( n / log ) 1 / 2 points in general position.
A cursory examination of the above lower bound shows that if the input set has a large subset of points in general position (of, say, nearly linear size), then the above guarantee is roughly a factor of n smaller than the truth.
Given a set P of points in the plane, the General Position Subset Selection problem (GPSS, for short), is that of finding a maximum-size subset of points in general position. The problem is known to be NP-complete and APX-hard, and the current best approximation for the problem is a greedy algorithm due to Cao [10] (Chap. 3), achieving a ratio of Ω OPT 1 / 2 = Ω ( n 1 / 2 ) ; here, OPT is the cardinality of an optimum solution to GPSS. See also [11] (Chap. 9) and [12] for further aspects of this problem and [13] for basic notions in complexity and approximation. Throughout this paper, ALG denotes the cardinality of the solution to GPSS computed by an algorithm under consideration.
As in [13], here, we follow the convention that the approximation ratio of an algorithm for a maximization problem is less than 1. Throughout this paper, all logarithms are in base 2.
  • Our results.
(I) For a set P of n points in the plane, consider the ratio (sometimes also called spread)
D ( P ) = max { | a b | : a , b P , a b } min { | a b | : a , b P , a b } ,
where | a b | is the Euclidean distance between points a and b. Since the general position property is unaffected by scaling, we may assume, without loss of generality, that min { | a b | : a , b P , a b } = 1 . In this case, D ( P ) = max { | a b | : a , b P , a b } is the diameter of P. A standard disk packing argument shows that if | P | = n , then D ( P ) α 0 n 1 / 2 , where
α 0 : = 2 1 / 2 3 1 / 4 π 1 / 2 1.05 ,
provided that n is large enough; see [14] (Prop. 4.10). On the other hand, a n × n section of the integer lattice shows that this bound is tight up to a constant factor. An n-element point set P satisfying the condition D ( P ) α n 1 / 2 , for some constant α α 0 , is said to be α -dense; see, for instance, [15] and Figure 1 for an illustration.
We obtain a constant-factor approximation for GPSS in dense lattice point sets; its approximation ratio depends on the spread ratio of the input.
Theorem 2.
Given an α-dense set of n lattice points, an Ω α 2 -approximation for the General Position Subset Selection problem can be computed in polynomial time.
(II) According to a classical result of Beck [16], if P is a set of n points in the plane with at most points collinear, then P determines at least Ω ( n ( n ) ) distinct lines. In particular, if c n , where c < 1 is a constant, then P determines Ω ( n 2 ) distinct lines. By duality (see, e.g., [17]), for a set L of n lines in the plane, where at most lines of L are concurrent, the number of vertices of the corresponding arrangement is Ω ( n ( n ) ) . We say that an arrangement of n lines is generic if it has Ω ( n 2 ) vertices.
We obtain an Ω ( ( log n ) 1 / 2 ) -approximation for GPSS in the special case where the input set is the set of vertices of an n-line arrangement under the relatively mild genericity assumption.
Theorem 3.
Given n lines in the plane that make a generic line arrangement, an Ω ( log n ) 1 / 2 -approximation for the General Position Subset Selection problem for the set of vertices of the corresponding line arrangement can be computed using a randomized algorithm in expected polynomial time.
Essentially, the same proof (of this theorem in Section 3) yields the following.
Corollary 1.
Given n lines in the plane, let V denote the vertex set of the corresponding line arrangement. If V V is a subset of vertices with | V |   = Ω ( n 2 ) , an Ω ( log n ) 1 / 2 -approximation for the General Position Subset Selection problem in V can be computed using a randomized algorithm in expected polynomial time.
It is worth noting that the scenario in Theorem 2 is a special case of the scenario in Theorem 3. Indeed, any finite set P of lattice points in the integer grid is a subset of the vertices of an arrangement A + of axis-parallel lines induced by H and V, the sets of horizontal and vertical lines incident to points in P, respectively. Moreover, if P is a dense set, then A + is generic, since
| H | , | V | = Ω ( | P | / ( α n ) )   = Ω ( n ) ,
Thus, A + has | H | · | V | = Ω ( n ) vertices. Observe that not all vertices in the arrangement A + are necessarily in P.
(III) Given a set P of points in the plane, a line cover of P is a set of lines that cover all points in P. Computing a line cover of minimum size is APX-hard, and the best approximation ratio known is O ( log n ) [18].
We obtain an Ω ( ( log n ) 1 / 2 ) -approximation for GPSS in the special case where the input set has at most O ( n ) points collinear and can be covered by O ( n ) lines.
Theorem 4.
Given a set P of n points in the plane with at most O ( n ) collinear and with a line cover of size O ( n ) , an Ω ( log n ) 1 / 2 -approximation for the General Position Subset Selection problem can be computed using a randomized algorithm in expected polynomial time.

2. Subset Selection in Dense Lattice Point Sets

In this section, we prove Theorem 2. First, we introduce the setup.
For a positive integer m, the m × m grid is the set of points in the plane G m = { ( x , y ) : x , y { 0 , 1 , , m 1 } . We have | G m |   = m 2 . Following [19], let k ( m ) denote the minimum number of colors in a coloring of the points in G m such that no three collinear points are monochromatic. Since no three points in a single row or column can receive the same color, we have k ( m ) m / 2 . It was shown by Wood [19] that for any ε > 0 , k ( m ) ( 2 + ε ) m for every m M ( ε ) . Among other facts, this is based on the fact that for any ε > 0 , there exists a prime between m and ( 1 + ε ) m , for every m M ( ε ) . A non-asymptotic result, more convenient for small m, is that there exists a prime between m and 6 m / 5 , for every m 25 [20]. (And by the well-known Bertrand–Chebyshev theorem, there exists a prime between m and 2 m , for every m.)
We briefly recall Wood’s argument [19] relying on a construction of Erdos [21]. Let p m be a prime; in practice we may pick the smallest prime at least m or a prime at least m that is close to m. For any integer i, let
V i = { ( x , ( x 2 mod p ) + i ) : x = 0 , 1 , , m 1 } .
Note that | V i |   = m , and that V i may be not contained in G m ; refer to Figure 1. A Vandermonde determinant calculation shows that V i is in general position for every i Z . Moreover, V i V j = whenever i j . Each point ( x , y ) G m is in V i , where i = y ( x 2 mod p ) . Since
1 p = 0 ( p 1 ) y ( x 2 mod p ) m 1 ,
the union of the m + p 1 sets V i over i = 1 p , , m 1 consists of m + p 1 color classes, where each class is in general position. That is, G m can be covered by m + p 1 m-element point sets in general position. As mentioned in the previous paragraph, m + p 1 ( 2 + ε ) m for every m M ( ε ) , or m + p 1 11 m / 5 for every m 25 , as needed.
Proof of Theorem 2.
Since P is α -dense, we may assume that P G m , where m = α n , after a suitable translation with an integer vector. The algorithm chooses a prime p m close to m. Recall that primality testing can be realized in polynomial time, that is, in O ( log O ( 1 ) m ) time; see, e.g., [22] or [23] (Ch. 14.6). By the Prime Number Theorem, finding a prime p m close to m involves the primality testing of O ( m 0.525 ) = O ( n 0.263 ) odd integers in the interval [ m , m + 2 m 0.525 ] [24].
As argued above, G m is covered by m + p 1 ( 2 + ε ) m m-element point sets V i in general position. By monotonicity (i.e., every subset of a set in general position is likewise in general position), the sets P V i are in general position, and the algorithm outputs the largest one, with at least n ( 2 + ε ) m elements.
Since each of the m rows of G m contains at most two elements of an optimal solution, we have OPT 2 m . Consequently, the ratio ALG / OPT is bounded from below as
ALG OPT n 2 · ( 2 + ε ) m 2 = Ω 1 α 2 .
The approximation ratio in the formula is roughly 1 / ( 4 α 2 ) , for a small ε . Clearly, the resulting algorithm runs in polynomial time: choosing the prime p and generating the sets V i takes O ( n ) time. Selecting one V i with a maximal size | P V i | can be achieved within the same time, and so the overall running time is O ( n ) . □

3. Subset Selection in Generic Line Arrangements

In this section, we prove Theorem 3. Recall that an arrangement of n lines is generic if it has Ω ( n 2 ) vertices. See Figure 2.
Lemma 2.
Let V be the set of vertices of an n line arrangement = ( L ) , where L = { 1 , , n } . Then, ( V ) n 1 .
Proof. 
We distinguish two cases. Let V V be a set of collinear vertices, say on a line h. If h L , then obviously | V | n 1 . If h L , scan h, say, from left to right, and observe that each vertex in V uses up at least two “new” lines from L; thus, | V |   n / 2 . See Figure 3. □
Let V be the set of vertices of a generic n line arrangement = ( L ) . Let N = | V | n 2 ; we also have N c n 2 by the assumption. By Lemma 2, : = ( V ) n 1 ; thus, by Lemma 1, the number of collinear triples is bounded as
T = O ( N 2 log + 2 N ) = O ( N 2 log n + n 2 N ) .
Proof of Theorem 3.
The subset selection algorithm consists of two steps: (i) random sampling from V and (ii) the application of the deletion method. See, e.g., [25] (Chap. 3) and a similar application in [26]. In the first step, a random subset X V is chosen by selecting points independently with probability p = k / N , for a suitable k to be determined. Note that E [ | X | ] = N · k / N = k .
To select a subset in general position, one needs to avoid one type of obstacle, collinear triples. One obstacle can be eliminated by deleting one point from each surviving collinear triple in the second step. In particular, it suffices to choose k so that
E [ T p 3 ] k / 2 ,
since this implies that the expected number of remaining points is at least k k / 2 = k / 2 . Let c 1 denote the constant hidden in (2). With the previous upper bound on T, and recalling that N c n 2 , it suffices to choose k so that
2 c 1 k 2 ( N log n + n 2 ) N 2 .
Setting k = c n log n for a sufficiently small c > 0 satisfies the inequality, proving the lower bound on the size of the output. In addition, this setting is valid for ensuring that p = k / N < 1 .
In summary, the expected number of points in general position found using the algorithm is Ω n log n . To analyze the approximation ratio, it suffices to notice that OPT 2 n . Indeed, for each line i L , i = 1 , , n , at most two vertices in V can appear in any solution; thus, OPT 2 n . It follows that the approximation ratio is
ALG OPT = 1 2 n · Ω n log n = Ω 1 log n ,
as claimed.
Since repeated random samplings are independent, the probability of error for the resulting randomized algorithm of not finding the required number of points can be made arbitrarily small by repetition—in standard fashion. Clearly, the resulting algorithm runs in expected polynomial time; we give a few details below.
Computing the set of vertices V and performing the random sampling takes O ( n 2 ) time. The expected sample size is O ( k ) = O ( n ) . Using the point–line duality, the algorithm computes the line arrangement dual to the point sample in O ( n 2 ) time [27] (Ch. 8). It repeatedly removes lines passing through vertices incident to at least three lines, each in O ( n ) time, and updates the line arrangement. This step is repeated until the arrangement is simple, and thus, the points left in the sample are in general position. The expected running time is O ( n 2 ) . □
Proof of Corollary 1.
We perform random sampling from V , let N =   | V | , and proceed as in the proof of Theorem 3. Inequality (3) is again satisfied due to the assumption | V |   = Ω ( n 2 ) . □

4. Another Application

In this section, we restrict ourselves to General Position Subset Selection in grid-like sets, that is, sets with at most = O ( n ) points collinear and with a line cover of size κ = O ( n ) . The approximation is obtained with an approach similar to that in the proof of Theorem 3.
Proof of Theorem 4.
By the first assumption and Lemma 1, the number of collinear triples in P is
T = O ( n 2 log ) = O ( n 2 log n ) .
Since an optimal solution can contain at most two points of P from each line in an optimal line cover of P, we have OPT 2 κ = O ( n ) . The subset selection algorithm consists of two steps: (i) random sampling from P and (ii) the application of the deletion method. In the first step, a random subset X P is chosen by selecting points independently with probability p = k / n , for a suitable k to be determined. Note that E [ | X | ] = n · k / n = k .
As in the proof of Theorem 3, it suffices to choose k so that
E [ T p 3 ] k / 2 .
Let c 1 denote the constant hidden in (4). Using the upper bound on T in (4), it suffices to choose k so that
2 c 1 k 2 log n n .
Setting k = c n log n for a sufficiently small c > 0 satisfies the inequality, proving the lower bound on the size of the output. In addition, this setting is valid for ensuring that p = k / n < 1 .
The expected number of points in general position found using the algorithm is Ω n log n . By recalling that OPT 2 κ = O ( n ) , it follows that the approximation ratio is
ALG OPT 1 2 κ · Ω n log n = Ω 1 log n ,
as claimed. □

5. Concluding Remarks

We conclude with some problems for further investigation:
  • Can the approximation factor in Theorem 3 be improved? Perhaps using ideas from [28] (Section 2)?
  • Is there a constant-factor approximation for GPSS among the vertices of an n-line arrangement?
  • Can a constant-factor approximation for GPSS on point sets satisfying = O ( n ) and κ = O ( n ) be obtained? Note that one can select Θ ( n ) points in general position from the n × n grid of n points; see, e.g., refs. [25] (Ch. 3.3) or [2] (Ch. 10).
  • Can better approximation factors for the general position subset selection problem be obtained in other interesting scenarios?

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

Author Adrian Dumitrescu was employed by the company Algoresearch L.L.C. The author declares no conflicts of interest.

References

  1. Dudeney, H. A puzzle with pawns. In Amusements in Mathematics; Nelson: Edinburgh, UK, 1917; p. 94. [Google Scholar]
  2. Braß, P.; Moser, W.; Pach, J. Research Problems in Discrete Geometry; Springer: New York, NY, USA, 2005. [Google Scholar]
  3. Payne, M.; Wood, D. On the general position subset selection problem. SIAM J. Discret. Math. 2013, 27, 1727–1733. [Google Scholar] [CrossRef]
  4. Szemerédi, E.; Trotter, W.T. Extremal problems in discrete geometry. Combinatorica 1983, 3, 381–392. [Google Scholar] [CrossRef]
  5. Pach, J.; Agarwal, P. Combinatorial Geometry; Wiley-Interscience: New York, NY, USA, 1995. [Google Scholar]
  6. Székely, L. Crossing numbers and hard Erdős problems in discrete geometry. Comb. Probab. Comput. 1997, 6, 353–358. [Google Scholar] [CrossRef]
  7. Lefmann, H. Distributions of points in the unit square and large k-gons. Eur. J. Comb. 2008, 29, 946–965. [Google Scholar] [CrossRef]
  8. Tao, T.; Vu, V. Additive Combinatorics; Cambridge Studies in Advanced Mathematics, 105; Cambridge University Press: Cambridge, UK, 2006. [Google Scholar]
  9. Spencer, J. Turán’s theorem for k-graphs. Discret. Math. 1972, 2, 183–186. [Google Scholar] [CrossRef]
  10. Cao, C. Study on Two Optimization Problems: Line Cover and Maximum Genus Embedding. Master’s Thesis, Texas A&M University, Station, TX, USA, 2012. [Google Scholar]
  11. Eppstein, D. Forbidden Configurations in Discrete Geometry; Cambridge University Press: Cambridge, UK, 2018. [Google Scholar]
  12. Froese, V.; Kanj, I.; Nichterlein, A.; Niedermeier, R. Finding points in general position. Int. J. Comput. Geom. Appl. 2017, 27, 277–296. [Google Scholar] [CrossRef]
  13. Williamson, D.; Shmoys, D. The Design of Approximation Algorithms; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
  14. Valtr, P. Convex independent sets and 7-holes in restricted planar point sets. Discret. Comput. Geom. 1992, 7, 135–152. [Google Scholar] [CrossRef]
  15. Kovács, I.; Tóth, G. Dense point sets with many halving lines. Discret. Comput. Geom. 2020, 64, 965–984. [Google Scholar] [CrossRef]
  16. Beck, J. On the lattice property of the plane and some problems of Dirac, Motzkin and Erdős in combinatorial geometry. Combinatorica 1983, 3, 281–297. [Google Scholar] [CrossRef]
  17. Matoušek, J. Lectures on Discrete Geometry; Springer: New York, NY, USA, 2002. [Google Scholar]
  18. Dumitrescu, A.; Jiang, M. On the approximability of covering points by lines and related problems. Comput. Geom. Theory Appl. 2015, 48, 703–717. [Google Scholar] [CrossRef]
  19. Wood, D. A note on colouring the plane grid. Geombinatorics 2004, 13, 193–196. [Google Scholar]
  20. Nagura, J. On the interval containing at least one prime number. Proc. Jpn. Acad. 1952, 28, 177–181. [Google Scholar] [CrossRef]
  21. Erdos, P. Appendix, in Klaus F. Roth, On a problem of Heilbronn. J. Lond. Math. Soc. 1951, 1, 198–204. [Google Scholar]
  22. Agrawal, M.; Kayal, N.; Saxena, N. PRIMES is in P. Ann. Math. 2004, 160, 781–793. [Google Scholar] [CrossRef]
  23. Motwani, R.; Raghavan, P. Randomized Algorithms; Cambridge University Press: Cambridge, UK, 1995. [Google Scholar]
  24. Baker, R.; Harman, G.; Pintz, J. The difference between consecutive primes, II. Proc. Lond. Math. Soc. 2001, 83, 532–562. [Google Scholar] [CrossRef]
  25. Alon, N.; Spencer, J. The Probabilistic Method, 4th ed.; Wiley: New York, NY, USA, 2016. [Google Scholar]
  26. Zhang, Z. A note on arrays of dots with distinct slopes. Combinatorica 1993, 13, 127–128. [Google Scholar] [CrossRef]
  27. de Berg, M.; Cheong, O.; van Kreveld, M.; Overmars, M. Computational Geometry: Algorithms and Applications, 3rd ed.; Springer: New York, NY, USA, 2008. [Google Scholar]
  28. Balogh, J.; Clemen, F.C.; Dumitrescu, A.; Liu, D. Subset selection problems in planar point sets. arXiv 2024, arXiv:2412.14287. [Google Scholar]
Figure 1. Left: A 2-dense set of n = 25 points in the 8 × 8 grid ( m = 8 ). Center: V 0 ; | V 0 |   = m = 8 and p = 11 ; points in V i may lie outside the grid, e.g., ( 3 , 9 ) lies 2 units above it. Right: The approximation algorithm returns P V 0 (or P V 3 ); here, | P V 0 |   = | P V 3 |   = 4 .
Figure 1. Left: A 2-dense set of n = 25 points in the 8 × 8 grid ( m = 8 ). Center: V 0 ; | V 0 |   = m = 8 and p = 11 ; points in V i may lie outside the grid, e.g., ( 3 , 9 ) lies 2 units above it. Right: The approximation algorithm returns P V 0 (or P V 3 ); here, | P V 0 |   = | P V 3 |   = 4 .
Algorithms 18 00315 g001
Figure 2. Left: A generic line arrangement consisting of three bundles of n / 3 nearly parallel lines each. Right: A non-generic line arrangement consisting of 3 parallel lines and n 3 concurrent lines.
Figure 2. Left: A generic line arrangement consisting of three bundles of n / 3 nearly parallel lines each. Right: A non-generic line arrangement consisting of 3 parallel lines and n 3 concurrent lines.
Algorithms 18 00315 g002
Figure 3. Collinear vertices in a line arrangement—incident to an induced dotted line.
Figure 3. Collinear vertices in a line arrangement—incident to an induced dotted line.
Algorithms 18 00315 g003
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Dumitrescu, A. General Position Subset Selection in Line Arrangements. Algorithms 2025, 18, 315. https://doi.org/10.3390/a18060315

AMA Style

Dumitrescu A. General Position Subset Selection in Line Arrangements. Algorithms. 2025; 18(6):315. https://doi.org/10.3390/a18060315

Chicago/Turabian Style

Dumitrescu, Adrian. 2025. "General Position Subset Selection in Line Arrangements" Algorithms 18, no. 6: 315. https://doi.org/10.3390/a18060315

APA Style

Dumitrescu, A. (2025). General Position Subset Selection in Line Arrangements. Algorithms, 18(6), 315. https://doi.org/10.3390/a18060315

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop