1. Introduction
Geometric optimization deals with optimization problems involving large sets of geometric objects. Bichromatic separability of point sets is a well-known topic in the field of geometric optimization. Typically, we are given a set of “red” points and a set of “blue” points in two or three dimensions, and the goal is to separate them using various geometric loci, such as lines, planes, circles, spheres, rectangles, or boxes.
The Maximum Bichromatic Separating Rectangle (MBSR) problem was introduced by Armaselu et al. in [
1] (see also [
2]) and is stated as follows. Given a red point set
R and a blue point set
B in the plane, with
, compute the axis-aligned rectangle
S satisfying the following:
- (1)
S contains all points in R;
- (2)
S contains the fewest points in B among all rectangles satisfying (1);
- (3)
S has the largest area of all rectangles satisfying (1) and (2).
Such a rectangle is called maximum bichromatic separating rectangle (MBSR) or simply largest separating rectangle.
Denote the smallest axis-aligned rectangle enclosing R by .
In this paper, we consider two extensions of the MBSR problem.
The first extension, introduced in [
3], is called
MBSR with outliers (MBSR-O) or simply
outliers version. It seeks to find the largest axis-aligned rectangle containing all red points and up to
k blue points outside
, where
k is given as part of the input. That is, MBSR-O is a relaxation of condition (2) from the original MBSR problem. The running time of the algorithm in [
3] is
. However, when the
k is large (e.g.,
), this running time bound can be unreasonably high. We will show how to improve this time bound to
using a more clever sweep line-based approach.
We also introduce another extension of MBSR, called MBSR among circles (MBSR-C) or simply circles version, in which there are red points and blue unit circles, and the goal is to find the largest rectangle containing no point of any blue circle outside while containing all red points.
For both extensions, we assume no unbounded solution and also that all points are in general positions.
The outliers version can have applications in various domains. For instance, in VLSI or circuit design, one might seek to place a hardware component (e.g., cooler) on a board with minor fabrication defects (blue points), where up to k defects are tolerated for component placement. The red points may indicate “hot spots” that must be covered by the component on the board.
The circles version is motivated by problems involving “imprecise” data, such as probabilistic applications, machine learning applications, and tumor extraction with large or imprecise cells as red or blue points. For instance, the goal is to surgically remove a tumor using a rectangular tool, with tumor cells marked by red points while blue points denote healthy cells and osteoclasts that should not be removed or cut out.
Various other applications of bichromatic separation with imprecise points can be found in spatial databases and data science.
1.1. Related Work
Geometric separability of point sets, which deals with finding a geometric locus that separates two or more point sets whilst achieving a specific optimum criterion, is an important topic in computational geometry. Various approaches deal with finding a specific type of separator (e.g., hyperplane) when the points are guaranteed to be separable. However, this is not always the case, hence the need for results on weak separability, i.e., either minimizing the misclassifications or allowing up to a fixed number of them.
The problem of finding the smallest separating circle among red and blue points (i.e., containing all red points and the fewest blue points), was introduced by Bitner and Daescu et al. [
4]. They provide two algorithms that find all optimal solutions: the first one runs in
time and the second one runs in
-time. The dynamic version of the problem, in which blue points may be dynamically inserted and deleted at run time, was later addressed by Armaselu and Daescu [
5], who provided three data structures foir this version. The first one is a unified data sdtructure supporting both insertion and deletion queries in
time, as well as
time updates. The other two are deletion-specific (resp., insertion-specific) and allow
(resp.,
) query time, at the expense of
update time.
Armaselu and Daescu were the first to address the MBSR problem. Their algorithm runs in
time [
1,
2]. When the axis-alignment restriction on the MBSR is dropped, they have an
time algorithm. They also come up with an
-time algorithm to compute the maximum-volume separating box in three dimensions [
2]. Later, this wasr improved to
time [
6].
Separability of imprecise points has also been considered. In such setting, points are asscoiated with an region of imprecision. For instance, blue unit circles (i.e., MBSR-C) can be thought of as imprecision regions, where the imprecise points are their centers. When the imprecisions are axis-aligned rectangles, de Berg et al. [
7] come up with a linear-time algorithm to compute certain separators, i.e., that are 100% likely to separate the point sets. They also show how to compute possible separators (>0% likely to separate) in
time.
It is worth mentioning that all these results on separators deal with blue points. However, there are also results for blue obstacles. In a more recent paper [
3], Armaselu, Daescu, Fan, and Raichel give an algorithm to find a largest rectangle separating red points from blue axis-aligned rectangles in
time.
Computing the largest empty (axis-aligned) rectangle problem is a very popular and studied topic. Given a set of planar points
P, the goal is to compute the largest axis-aligned rectangle that has a point
on each of its sides but none inside it. For the axis-aligned version, Agarwal et al. [
8] provided the best currently known time bound,
, for computing
one optimal solution, while Hsu et al. [
9] got the best-known result for computing all optimal solutions, namely, in
time, where
M is the number of maximal empty rectangles. Mukhopadhyay et al. [
10] solved the arbitrary orientation version and gave an
time algorithm that outputs all optimal solutions. In addition, Chaudhuri et al. [
11] proved a lower bound of
optimal solutions in the worst case.
Nandy et al. considered the problem of finding the maximal empty axis-aligned rectangle among a given set of rectangles isothetic to a given bounding rectangle [
12]. They show how to solve the problem in
time, where
R is the number of rectangles. Later, they solved the version where obstacles have arbitrary orientation using an algorithm that takes
time [
13]. Finally, they consider the problems of locating the largest empty rectangle inside a simple polygon, as well as avoiding a bunch of simple polygons [
13].
2. Finding the Largest Separating Rectangle with k Outliers
The goal is to compute the largest axis-aligned rectangle enclosing R while containing no more than k blue points of B (“outliers”), for a given . We call this rectangle maximum bichromatic separating rectangle with outliers (MBSR-O).
We first discard the blue points inside , as they cannot be avoided.
In [
3], the given algorithm for MBSR-O operates as follows. We first compute the smallest rectangle
enclosing
R in linear time, and then partition the space outside
into 8 regions, using the lines bounding
. Specifically, there are 4 “corner” regions
. (also known as “quadrants”), and 4 “side” regions
. In each such region
Q, we consider the set
of blue points inside
Q.
Definition 1. [3] A point dominates another point , if and . Similarly, a point dominates another point , if and , a point dominates another point , if and , and a point dominates another point , if and . Definition 2. [3] For each and for any t such that , the t-th level staircase of is the rectilinear polygon formed by the blue points in that dominate exactly t blue points in . Note that an optimal solution contains
t points from
if and only if it is bounded by the
t-th level staircase of
, shown in
Figure 1.
For each partition of k into 8 smaller natural numbers , that is, one natural number for each region, we do the following.
Consider the -th closest to blue point from each side region Q. Note that, by extending in each direction until reaching these points, one obtains a rectangle which definitely contains the target rectangle.
For each quadrant Q, compute the -level staircase of Q in time.
Solve a “staircase” problem on , and in time. That is, compute the largest rectangle enclosing , supported by points of the staircases, and contained in .
Since there are ways of partitioning the integer k into 8 smaller natural numbers, it follows that the running time of this approach is .
2.2. A Closer Look at the Number of Candidate Partitions of k
In the previous section, we reduced the running time by a factor of . However, it seems hard to further improve this bound given the high number of partitions of k. Thus, in this subsection, we show how to reduce the number of candidate partitions of k, to further improve the running time bound.
To do that, we first compute all the t-level staircases as described in the previous section. We then consider the blue points in 4 pairs of adjacent regions, e.g., N and . That is, we suppose the total number of outliers coming from , denoted , is fixed. Similarly, we suppose are fixed. Let , for any quadrant Q. From now on, we focus on the N and regions and, for simplicity, we denote as simply and as simply .
We notice that even though any points of any t-th level staircase, , may be a corner for a candidate rectangle, most of these rectangles can be discarded as they are guaranteed to be smaller than the optimal rectangle.
Definition 3. For every pair of regions and every integer , denote by the set of pairs such that p is the top support and q is the right support for an optimal solution, among all rectangles containing t blue points from .
From now on, for simplicity, we are going to remove the superscript and simply write , e.g., instead of . For every t, we store as an array.
The goal is to compute
. Refer to
Figure 3 for an illustration. Suppose we have already computed all
. Sweep
with a horizontal line
going upwards, starting at the
-th lowest blue point in
. For every blue point
p encountered as top support, let
be the highest point in
below
p, and
be the lowest point in
above
p. For every
, let
be the leftmost point in
to the right of
p and below
p. Let
be the blue point count below
p from
. Furthermore, let
t be the number of points dominated by
p from
.
First, assume and let be the number of points dominated by p from , i.e., . When sweeping the next blue point q, we consider the following cases.
Case 1. If
q is to the right of
p, then
q is below
but dominates
p, the points dominated by
p, and the points in
(
Figure 4). If
, then we add
to
, where
.
Case 2. If
and
q is to the left of
p, then
q is below
(otherwise
), but dominates the points dominated by
p, except the ones in
(
Figure 5). For each
, if
, then we add
to
, where
is the index of
s in
in decreasing order of X coordinates. Finally, if
, then we add
to
.
Case 3. If
then, for each
such that
, we add
to
, where
is the index of
s in
in decreasing order of X coordinates. Finally, if
, then we add
to
(
Figure 6).
Now assume and let be the largest such that all points in any are below p. Let be the leftmost point of below p. When sweeping the next blue point q, we consider the following cases.
Case 4. If
q is to the right of
then, if
, we add
to
. For each
such that
, we also add
to
(
Figure 7).
Case 5. If
and
q is to the left of
, then, if
, we add
to
for every
(
Figure 8).
Case 6. If
, then, if
, we add
to
for every
(
Figure 9).
The following lemma puts an upper bound on the storage required by .
Lemma 2. .
Proof. In case 1, we only add one pair to . In case 2, even though we consider points, we only add the pair such that . Similarly, in cases 3 and 4 we only add the pair , even though we consider (resp., ) points. In case 5, we add at most pairs if . However, note that for the subsequent point swept, we would have a larger number of blue points in dominated by . Thus, we only add at most pairs once. Similarly, in case 6 we only add pairs once. □
The following lemma states the running time of the aforementioned sweeping algorithm.
Lemma 3. For any , the horizontal line sweeping described above takes time.
Proof. We store the blue points in in two balanced binary search trees , indexed by X (resp., Y) coordinates. Thus, for each blue point p swept, we require time. We require an extra time to compute , and . In case 1, note that we can compute by finding the position of q in the X-sorted order of , and thus the number of blue points , in time, since is maintained as a binary search tree. Thus, we only require an extra time to handle case 1. In cases 2 and 4, note that we only need to add to if , so we query for s in X using time. Similarly, in case 3 we only add to if , so we query X for s in time. Now in cases 5 and 6 we spend time to traverse , since we store as an array for any t, but they only occur once, so this gives us total time. In every case, since is an array, adding a pair to takes time. Since we sweep blue points, the result follows. □
Corollary 1. For any pair of quadrants, we compute in time for all .
We reduce the number of candidate partitions of k from to as follows. By writing , we can deduce for every combination of . Therefore, there are such combinations.
Initially, we compute
for every quadrant
Q and
. Then, for each combination
, we set
and solve the staircase problem in [
2] with the pairs
as pairs of supports. Each staircase problem takes
time to solve, so we require
time for all candidate partitions of
k. Putting this together with the result in Lemma 1, we get the following result.
Theorem 2. Given two point sets, and , as well as an integer , the MBSR-O for R and B with k outliers can be computed in time.
3. Finding the Largest Axis Aligned Rectangle Enclosing R and Avoiding Unit Circles
In this extension, B consists of -disjoint unit circles, and the goal is to find the largest axis-aligned rectangle that avoids all circles while enclosing R. We call such rectangle an MBSR among circles or MBSR-C.
Again, we discard blue circles intersecting from consideration, as they cannot be avoided.
One may wonder whether the reduction in [
3] for finding the largest separating rectangle among axis-aligned rectangles can be tailored to MBSR-C. However, it can be shown that it does not always work. If we let
be circles in the regions
, pick any point
p on the quadrant of
that is the closest to
, and add it to
, then any rectangle enclosing
R, avoiding
, and top or right-bounded by
p will intersect
. See
Figure 10 for a depiction of why this is the case.
We call a
candidate separating rectangle (CSR) a rectangle that encloses
R and cannot be extended in any direction without intersecting some circle. Note that a CSR may touch a circle either at a corner or at an edge. If it is bounded at an edge, then that edge is fixed in terms of
X or
Y coordinate and the arc it touches at each endpoint of the edge is uniquely determined (
Figure 11). On the other hand, if it is bounded at a corner, then the corner can be slid along the appropriate arc of the circle (
Figure 12). Each position of the corner determines the
X or
Y coordinates of its two adjacent edges, and thus the arcs pinning the two adjacent corners, if any.
We say that an edge e of a CSR is pinned by a circle C if C touches the interior of e.
A horizontal (resp., vertical) edge e is said to be fixed by two circles in terms of Y (resp., X) coordinate, if:
- (1)
the ends of e are on and , respectively, and
- (2)
changing its Y (resp., X) coordinate would result in either e intersecting or or failing to touch both and .
3.1. A Description of All Cases in Which a CSR Can Be Found
We consider all the cases in which a CSR can be found, based on the number of edges pinned by circles.
Case 1. Three edges pinned by circles (
Figure 13). In this case, we extend the the fourth edge outward from
until it touches a circle. Hence, the CSR is uniquely determined.
Case 2. Two edges are pinned by two circles . In this case, we further distinguish the following subcases.
Case 2.1. Two adjacent edges are pinned by
. Note that their common corner
q is fixed. We extend one of the edges by moving its other end
p away from
q until it touches a circle
, and then extend the third edge until it touches a circle
at a point
r (
Figure 14). The resulting CSR is unique.
Case 2.2. Two adjacent edges are pinned by
. We extend one of the edges by moving its other end
p away from
q, until the orthogonal line through
p touches a circle
at a point
r (see
Figure 15). While moving
p, the point
r can slide along one or more circles in the same quadrant, giving an infinite number of CSRs.
Case 2.3. Two opposite edges are pinned by
. We slide the other two edges outward from
until each of them touches some circle (see
Figure 16). This gives us a unique CSR.
Case 3. One edge e is pinned by a circle . We have the following subcases.
Case 3.1. When
e is extended in both directions, it touches two circles
(
Figure 17). We then slide the fourth edge outward from
until it touches a circle
and we have a unique CSR.
Case 3.2. When
e is extended in both directions, the orthogonal line through one of the ends
p touches a circle
at point
q (
Figure 18). While moving
p,
q can slide along one or more circles in the same quadrant, yielding an infinite number of CSRs. After establishing the position of
q, we slide the fourth edge away from
until it touches a circle
.
Case 4. No edge is pinned by any circle. In this case, all corners can slide along circles until one of the edges becomes pinned by some circle, giving an infinite number of CSRs. Suppose the position of a corner p along a circle is known. We consider the following subcases.
Case 4.1, while extending the CSR in the two directions away from
p, the CSR touches a circle in
or
at some point
q before touching any circle in
(
Figure 19). The other two corners are determined by sliding the edge opposite to
outwards until it touches a circle at some point
r. In this case, the CSR is uniquely defined.
Case 4.2, while extending the CSR in the two directions away from
p, the first circle that CSR touches, at a point
q, is located in
(
Figure 20). This gives us an infinite number of CSRs.
3.2. Dominating Envelopes
Definition 4. The dominating envelope of a corner region is a curve satisfying the following:
fully contains ,
, the rectangle cornered at p and the closest corner of from p is empty, and
property (2) no longer holds if one extends away from .
Note that
is a sequence of horizontal and vertical segments, circle arcs, as well as a horizontal and a vertical infinite ray (see
Figure 21). We shall reveal the use of dominating envelopes later.
The dominating envelope changes direction at breakpoints, which can be between two consecutive arcs, segments, or an infinite ray. Every two consecutive breakpoints define a range of motion for a CSR corner.
A breakpoint p is said to be a corner breakpoint, if a CSR cornered at p cannot be extended away from in all directions without crossing some circle, even if its other corners are not located on any envelope.
To compute the dominating envelope of , we do the following. First, sort the circles by X coordinate of their centers. Let p be the current breakpoint (initially, the first breakpoint is the left endpoint l of the left-most circle, with a vertical infinite ray upwards from l). For each two adjacent circles , depending on the relative positions of , we do the following.
Case A. . In this case, we add the lower intersection between and as a new corner breakpoint q, along with the arc of .
Case B. and . We add a breakpoint q at the bottom of , the arc of , a corner breakpoint r at the intersection between the horizontal through q and , and the line segment .
Case C. and . We add a breakpoint r at the left endpoint of , a corner breakpoint q at the intersection between the vertical through r and , the line segment , and the arc of .
Case D. and . We add the breakpoints q at the bottom of , r at the left end of , the corner breakpoints at the intersection between the horizontal through q and the vertical through r, the arc of , and the segments and .
We then set
p to the rightmost breakpoint added and repeat the process for the next pair of adjacent circles. Finally, for the last circle, we add an arc from
p to its bottom
b, the breakpoint
b, and then a horizontal infinite ray emanating from
b to the right. In each case, we say that a pair of circles
defines breakpoints
, if all of
are added as breakpoints in the process described above. Similarly, we say
defines arcs
, if all of
are added as arcs in the process. See
Figure 22 for an illustration of this process.
Note that deciding the case a circle belongs to can be done in constant time.
3.3. Finding an Optimal Solution in Each Case
We first slide the edges of outward until each of them touches a circle. Since we assumed no unbounded solutions, we are guaranteed that every edge will eventually hit a circle. Denote by the resulting rectangle and discard the region outside from the dominating envelopes of all quadrants. The endpoints of the resulting envelopes are also counted as breakpoints. We also sort the blue circles by the X coordinate and then (to break ties) by the Y coordinate of their centers.
Now we give a specific algorithm to compute the MBSR-C in each of the cases listed in
Section 3.1.
Case 1. We consider all corner breakpoints that are defined by pairs of adjacent circles in Case D, and add them to a set
. We also consider all arcs
of circles that are part of pairs in Case D (
p to the west of
q), the vertical line
through
p and the horizontal line
through
q, and add
to
. We then find the largest rectangle
enclosing
R and containing the fewest points in
using the algorithm in [
1] in
time. It is easy to check that
is an optimal solution for Case 1, since any circle containing points in
intersects a circle in
B. Thus, Case 1 can be done in
time.
Case 2. Assume wlog that two circles pin the north and the west edges of a CSR. The cases where the two circles define a different pair of adjacent edges of a CSR can be handled in a similar fashion.
If we are in Case 2.1, we consider all corner breakpoints q defined by pairs of adjacent circles in Case D, as well as the intersection between the south horizontal tangent to the eastmost circle in and the east vertical tangent to the northmost circle in south of . For every such point, the north and west edges are fixed, and we either find the south edge by extending the west edge southwards until it hits a blue circle, or the east edge by extending the north edge eastwards until it hits a blue circle. In both approaches, the fourth edge is uniquely determined. For each circle in , we store pointers to the northmost circle in south of C and to the eastmost circle in west of C, as well as similar pointers for the other quadrants and directions, Thus, once the first two edges are fixed, we can find the third and fourth edges in time. Since there are circles in Case 2.1 and they all can be found in time, Case 2.1 can be solved in time.
For Case 2.2, we consider all points q as in Case 2.1. Having selected such point q defined by two circles and , we consider the dominating envelope of starting from the east tangent to or the east edge of , whichever is eastmost, and ending at the south tangent to or the south edge of , whichever is southmost. This gives us a range of motion for the corner r of the CSR spanning circle arcs. For each such arc, we find the optimal CSR in time as we shall prove in the next section. Since there are choices of q, we handle Case 2.2 in time.
As for Case 2.3, note that the pairs of circles defining the and the corners, respectively, must belong to a dominating envelope. We scan the dominating envelope of for pairs of circles in Cases B and C and, for each such pair, we scan the dominating envelope of for pairs of circles in Cases B and C, starting from the east tangent to or the east edge of , whichever is eastmost, and ending at the south tangent to or the south edge of , whichever is southmost. Once these pairs are established, the CSR is determined. Since scanning each dominating envelope takes time, we handle Case 2.3 in time.
Case 3. Assume wlog that a circle C pins the east edge of the CSR. The cases where the circle define a different edge of the CSR can be handled in a similar fashion.
For Case 3.1, we scan the dominating envelope of (similarly, ) for pairs of circles in cases C and D, (C is the rightmost circle of the pair). For every such pair of circles in , we consider all circles that are intersected by the west vertical tangent to C. It is possible that some of these circles were already considered for a previous pair of circles in , so we may have to consider triplets of circles . We also traverse the circles in in increasing X order of their centers. Denote by C the current circle. We consider the sequences of circles and that are intersected by the west tangent to C. Since these sequences may include circles already considered for a previous circle in , we may need to spend to find all such triplets for which there exists a vertical line intersecting both . Once a triplet is established, the west, north, and south edges are established, and the west edge can be determined in time by extending the north or the south edge until it hits a circle. Thus, Case 3.1 requires time.
For Case 3.2, we scan the dominating envelope of (similarly, ) for pairs of circles in cases C and D (C is the rightmost circle of the pair). We also traverse the circles in increasing X order and consider the sequences of circles that are intersected by the west tangent to C. For each pair , the east and south edges of the CSR are defined. This provides a range of eligible circles from the dominating envelope of such that the and corners of the CSR are not supported by any circle, and the corner slides along some circle arc. There are pairs and each of them gives circles from . Hence, Case 3.2 takes time.
Case 4. Consider all arcs defined by pairs of adjacent circles in one of the cases A, B, C, or D. Consider all arcs defined by pairs of adjacent circles. Each such arc a establishes the range of motion for the appropriate corner p of a CSR, say in X order. Suppose a belongs to a circle in , which establishes the range of motion of the NE corner p of the CSR. This gives us a range of sliding motion for the north and the east edges of the CSR, which are supported by two rays shooting from p to the west and south, respectively. Since only the SW corner q may also slide along a circle, the west edge can be neither to the west of the first intersection between and a circle, nor to the west of the easternmost point in of a circle. Similarly, the south edge can be neither be to the south of the first intersection between and a circle, nor to the south of the northernmost point in of a circle. This gives a range of motion for q, which may span multiple circle arcs with X coordinates within the range = , , , . In fact, there are arcs in the worst case, yielding pairs of arcs for all possible pairs . By computing the pointers , and for every p, from the dominating envelope, we can find each pair of arcs in time, as we take them in X order. That is, we consider all arcs defined by pairs of adjacent circles in X order and, for each such arc, we compute the points , , , and , and then consider the arcs in with X coordinates within .
Handling each pair of arcs in time will be detailed in the next subsection. Thus, computing the optimal solution takes time in Case 4.