Recognition of unipolar and generalised split graphs

A graph is unipolar if it can be partitioned into a clique and a disjoint union of cliques, and a graph is a generalised split graph if it or its complement is unipolar. A unipolar partition of a graph can be used to find efficiently the clique number, the stability number, the chromatic number, and to solve other problems that are hard for general graphs. We present the first $O(n^2)$ time algorithm for recognition of $n$-vertex unipolar and generalised split graphs, improving on previous $O(n^3)$ time algorithms.


Definition and motivation
A graph is unipolar if for some k ≥ 0 its vertices admit a partition into k + 1 cliques {C i } k i=0 so that there are no edges between C i and C j for 1 ≤ i < j ≤ k. A graph G is a generalised split graph if either G or its complement G is unipolar. All generalised split graphs are perfect; and Prömel and Steger [PS92] show that almost all perfect graphs are generalised split graphs. Perfect graphs can be recognised in polynomial time [CLV03] and [CCL + 05], and there are many NP-hard problems which are solvable in polynomial time for perfect graphs, including the stable set problem, the clique problem, the colouring problem, the clique covering problem and their weighted versions [GLS84]. If the input graph is restricted to be a generalised split graph, then there are much more efficient algorithms for the problems above [EW14]. In this paper we address the problem of efficiently recognising generalised split graphs, and finding a witnessing partition.
Previous recognition algorithms for unipolar graphs include [TC85] which achieves O(n 3 ) running time, [CH12] with O(n 2 m) time and [EW14] with O(nm+ nm ′ ) time, where n and m are respectively the number of vertices and edges of the input graph, and m ′ is the number of edges added after a triangulation of the input graph. Note that almost all unipolar graphs and almost all generalised split graphs have (1 + o(1))n 2 /4 edges [MY16]. Further, by testing whether G or G is unipolar, each of the mentioned algorithms above recognises generalised split graphs in O(n 3 ) time. The algorithm in this paper has running time O(n 2 ).
This leads to polynomial-time algorithms for the problems mentioned above (stable set, clique, colouring and so on) which have O(n 2.5 ) expected running time for a random perfect graph R n and an exponentially small probability of exceeding this time bound. Here we assume that R n is sampled uniformly from the perfect graphs on vertex set [n] = {1, 2, . . . n}.

Notation
We use V (G), E(G), v(G) and e(G) to denote V , E, |V | and |E| for a graph G = (V, E). We let N (v) denote the neighbourhood of a vertex v, and let N + (v) denote N (v) ∪ {v}, also called the closed neighbourhood of v. If G = (V, E) and S ⊆ V , then G[S] denotes the subgraph induced by S. Let GS + be the set of all unipolar graphs and let GS be the set of all generalised split graphs. Fix is a clique and V 1 is a disjoint union of cliques, then the ordered pair (V 0 , V 1 ) will be called a unipolar representation of G or just a representation of G. For each unipolar representation R = (V 0 , V 1 ) we call V 0 the central clique of R, and we call the maximal cliques of V 1 the side cliques of R. A graph is unipolar iff it has a unipolar representation.
is a block decomposition of G with respect to R if the intersection of each part of B with V 1 is either a side clique or ∅.

Plan of the paper
Assume that G is an input graph throughout. The algorithm for recognising unipolar graphs has three stages. In the first stage a sufficiently large maximal independent set is found. The second stage constructs a partition B of V (G), such that B is a block decomposition for some unipolar representation if G ∈ GS + . The third stage generates a 2-CNF formula which is satisfiable iff B is a block decomposition for some unipolar representation. The formula is constructed in such a way that a satisfying assignment of the variables corresponds to a representation of G, and the algorithm returns either a representation of G, or reports G / ∈ GS + . We describe the third stage first (in §2) as it is short and includes a natural transformation to 2-SAT. In §3 we discuss the first stage of finding large independent sets. In §4 we present the second stage when we seek to build a block decomposition. Finally, in §5, we briefly discuss random perfect graphs and algorithms for them using the algorithm described above.

Data Structures
The most commonly used data type for this algorithm is the set. We assume that the operation A ∩ B takes O(min(|A|, |B|)) time, A ∪ B takes O(|A| + |B|) time, A \ B takes O(|A|) time and a ∈ A takes O(1) time. These properties can be achieved by using hashtables to implement sets.
Functions will always be of the form f : [m] → A for some m, where [m] = {1, 2, . . . m}. Therefore functions can be implemented with simple arrays, hence the lookup and assignment operations are assumed to require O(1) time.
2 Verification of Block Decomposition 2.1 2-SAT Let x 1 , . . . , x n be n boolean variables. A 2-clause is an expression of the form y 1 ∨ y 2 , where each y j is a variable, x i , or the negation of a variable, ¬x i . There are 4n 2 possible 2-clauses. The problem of deciding whether or not a formula of the form ψ = ∃x 1 ∃x 2 . . . ∃x n (c 1 ∧ c 2 ∧ . . . ∧ c m ), where each c j is a 2-clause, is satisfiable is called 2-SAT. The problem 2-SAT is solvable in O(n + m) time - [EIS76] and [APT79], where n is the number of variables and m is the number of clauses in the input formula.

Transformation to 2-SAT
. In this subsection we show how to test if a partition of V is a block decomposition for some unipolar representation, in which case we must have G ∈ GS + . Let B be the partition of V we want to test. From each block of B, we seek to pick out some vertices to form the central clique V 0 of a representation, with the remaining vertices in the blocks forming the side cliques. Suppose that |B| = m, and B is represented by a surjective function f : Boolean variables. We use the procedure verify to construct a formula ψ(x 1 , . . . x n ), so that each satisfying assignment of {x v } corresponds to a representation of G.
Procedure verify(G, f ): There is an exception: the first time a clause is added to ψ it should be added without the preceding sign for conjunction. The following lemma is easy to check.
Lemma 2.1. The formula ψ is satisfiable iff B is a block decomposition for some representation. Indeed an assignment Φ : Proof. Suppose Φ is a satisfying assignment and let V 0 , V 1 be as above. If u and v are both in V 0 , then uv ∈ E, since otherwise Φ contains a clause ¬x u ∨ ¬x v . If u and v are in V 1 , then either uv ∈ E and f (u) = f (v) or uv ∈ E and f (u) = f (v), because in the other two cases φ contains the clause x u ∨ x v . This means that the vertices in V 1 are grouped into cliques by their value of f . For the other direction, it is sufficient to verify that each generated clause is satisfied, which is a routine check.
At most a constant number of operations are performed per pair {u, v}, so O(n 2 ) time is spent preparing ψ. The formula ψ can have at most 2 clauses per pair {u, v}, so the length of ψ is also O(n 2 ), and since 2-SAT can be solved in linear time, the total time for this step is O(n 2 ).

Maximum Independent Set of a Unipolar Graph
Let α(G) be the maximum size of an independent set in a graph G. Let G ∈ GS + and let R be a unipolar representation of G. Observe that for any representation R of G, the number s(R, G) of side cliques satisfies s(R, G) ≤ α(G) ≤ s(R, G) + 1. We deduce that for every two representations R 1 and R 2 of G we have |s(R 1 , G) − s(R 2 , G)| ≤ 1.
If for example G is K n or its complement, then the number s(R, G) depends on R. However, this is not necessarily the case for all graphs, see Figure 1. It can be shown that the number of n-vertex unipolar graphs with a unique representation is (1 − e −Θ(n) )|GS + n |, and that the number of n-vertex unipolar graphs G with a unique representation R and such that s(G,

Independent Set Algorithm
It is well known that calculating α(G) for a general graph is NP-hard. For G ∈ GS + let s(G) = max R s(R, G), where the maximum is over all representations R of G. For G / ∈ GS + set s(G) = 0. In this section we see how to find a maximal independent set I, such that if G ∈ GS + , then |I| ≥ s(G) (≥ α(G) − 1). Figure 1: For the graph G 1 on the left, s(R, G 1 ) = α(G 1 ) for all representations R, and the graph G 2 on the right, s(R, G 2 ) = α(G 1 ) + 1 for all representations R.
The idea is to start with G and with I = ∅; and as long as the remaining graph has two non-adjacent vertices, say v 1 and v 2 , pick r = 1 or 2 of these vertices to add to I, and delete from G the closed neighbourhood of the added vertices. We do this in such a way that a given representation R of G yields a representation with r less side cliques, or (only when r = 2) with one less side clique and the central clique removed.
Procedure indep(G): Observe that the main body of the indep(G) procedure is a while loop. An alternative way of seeing the algorithm is that instead of the loop there is a recursive call to indep(G[U ]) at the end of the iteration and the procedure returns the union of the vertices found during this iteration and the recursively retrieved set. A recursive interpretation is clearer to work with for inductive proofs.

Correctness
Lemma 3.1. Procedure indep(G) always returns a maximal independent set I.
Proof. This is easy to see, since each vertex deleted from U is adjacent to a vertex put in I. Proof. If G / ∈ GS + , then the statement holds, because s(G) = 0. From now on assume that G ∈ GS + . We argue by induction on v(G). It is trivial to see that the lemma holds for v(G) = 1. Let v(G) > 1 and assume that the lemma holds for smaller graphs. If G is complete, then |I| = 1 = s(G).
Fix an arbitrary unipolar representation R of G. We show that |I| ≥ s(R, G). If G is not complete, then the procedure selects two non-adjacent vertices u and v. The vertices u and v are either in different side cliques or one of them is in the central clique and the other is in a side clique.
We start with the case when u and v are contained in side cliques. After inspecting their neighbourhoods, the algorithm removes from U either one or both of them along with their neighbourhood. Suppose that it removes r of them, where r is 1 or 2. Let G ′ and R ′ be the graph and the representation induced by the remaining vertices. By the induction hypothesis, if I ′ is the recursively retrieved set, then |I ′ | ≥ s(R ′ , G ′ ). Let I be the independent set returned at the end of the algorithm, so that |I ′ | + r = |I|. Both u and v see all the vertices in their corresponding side clique, and see no vertices from different side cliques, so after removing r of them with their neighbours, the number of side cliques in the representation decreases by precisely r, and hence Now w.l.o.g. assume that u belongs to a side clique and v belongs to the central clique. Then N + (u) contains the side clique of u and perhaps parts of the central clique. Therefore, N + (u) \ N + (v) is a subset of the side clique of u, and hence it is a clique. If N + (v) \ N + (u) is a not clique, then the algorithm continues recursively with G[V \N + (u)]; and using the same arguments as above with r = 1, we guarantee correct behaviour. Now assume that N + (v) \ N + (u) is a clique. Then N + (v) \ N + (u) can intersect at most one side clique, because the vertices in different side cliques are not adjacent. In this case N + (v) ∪ N + (u) completely covers the side clique of u, completely covers the central clique, and it may intersect one additional side clique. Hence s(R, G) = s(R ′ , G ′ ) + 1 or s(R, G) = s(R ′ , G ′ ) + 2, where G ′ and R ′ are the induced graph and representation after the removal of N + (v) ∪ N + (u). If I ′ is the recursively obtained independent set, from the induction hypothesis we deduce that |I| = |I| ′ + 2 ≥ s(R ′ , G ′ ) + 2 ≥ s(R, G).

Time Complexity
In this form the algorithm takes more than O(n 2 ) time, because checking whether an induced subgraph is complete is slow. However, we can maintain a set of vertices, C, which we have seen to induce a complete graph. We will create an efficient procedure to check if a subgraph is complete, and to return some additional information to be used for future calls if the subgraph is not complete.
Procedure antiedge(G, U, C): The following lemma summarises the behaviour of antiedge.
Proof. Easy checking. Assume that e = u 1 u 2 ; u 1 , u 2 ∈ U Let U and C be the sets stored in the respective variables at the beginning of an iteration of the main loop of the modified indep, and let U ′ and C ′ be the sets stored at the beginning of the next iteration, if the algorithm does not terminate meanwhile. The following loop invariants hold:

A loop invariant is a condition which is true at the beginning of each iteration of a loop.
Proof. Observe that the initial values of U and C, which are V and ∅ respectively, guarantee by Lemma 3.3 that the values after the call to antiedge satisfy condition (I1). Therefore (I1) holds for the first iteration. Concerning future iterations, observe that (I1) guarantees the precondition of Lemma 3.3, which it turn guarantees (I1) for the next iteration. We deduce that (I1) does indeed give a loop invariant. By proving this we have proved that the preconditions of Lemma 3.3 are always met; and so we can use Lemma 3.3 throughout.
If e = F alse, then there is no next iteration, hence condition (I2) is automatically correct. Now assume that e = u 1 u 2 . Depending on e 1 and e 2 there are two cases for how many vertices are excluded. Case 1: one vertex is excluded. W.l.o.g. assume that u 1 is excluded, so U ′ = U \ N + (u 1 ), C ∩ U 2 ⊆ C ′ and C ⊆ N + (u 1 ) ∪ N + (u 2 ). Then Case 2: two vertices are excluded. Now U ′ = U \ (N + (u 1 ) ∪ N + (u 2 )), and We have shown that condition (I2) holds at the start of the next iteration, and so it gives a loop invariant as claimed.
A vertex v is absorbed if it is processed during the loop of antiedge and then appended to the result set, C ′ .
Corollary 3.5. A vertex can be absorbed once at most.
Proof. Let U, C, U ′ and C ′ be as before. Observe that if, during the iteration, vertex v is absorbed in a call to antiedge, then v ∈ U \ C and v / ∈ U ′ \ C ′ . The Corollary now follows from the second invariant in Lemma 3.4. Assume that v is processed in antiedge for the first time, but it is not absorbed and it is tested against a set C 1 . Since v is not absorbed, we may assume that antiedge has returned the pair of vertices vu. At least one of u and v is removed (along with its neighbourhood) from U and moved to I. If v is removed from U , then no more time can be spent on it in antiedge, hence the total time spent on v in antiedge is O(n). Now assume that u is removed. We have that C 1 ⊆ N + (u) and each vertex in N + (u) is removed from U . Hence, if v is processed again in antiedge, it will be tested against a set C 2 with C 1 ∩ C 2 = ∅, and therefore |C 1 | + |C 2 | = |C 1 ∪ C 2 | = O(n). As we saw before, if v is absorbed or removed from U , then it cannot be processed again in antiedge; and thus the running time spent on v is again O(n). If v is not removed from U , then C 2 is removed from U . Hence, if v is processed again in antiedge, v will be tested against a set C 3 with |C 1 | + |C 2 | + |C 3 | = |C 1 ∪ C 2 ∪ C 3 | = O(n), and so on. Thus, we see that over all these tests, each vertex is tested at most once for adjacency to v, and so the total time spent on v is O(n).

Block Creation Algorithm
In this subsection we present a short algorithm for creating a partition of V (G) using an independent set I and then checking if this partition is a block decomposition using the procedure verify from Section 2.
Procedure test(G, I): Proof. On each step of the main loop a vertex from i ∈ I is selected. Since V 0 ∩ I = ∅, the vertex i is a part of some side clique, say C. Now C ∩ N + (j) = ∅ for each j ∈ I \ {i}, so C ⊆ U . Also C ⊆ N + (i), and hence C ⊆ N + (i) ∩ U . Vertex i does not see vertices from other side cliques, so N + (i) ∩ U is correctly marked as a separate block.
Since |I| ≥ s(G) − 1, at most one side clique is not represented in I. If there is an unrepresented side clique, say C, then none of the previously created blocks can claim any vertex from it, and hence C ⊆ U . We have shown that when the main loop ends, either U ∩ V 1 = ∅ or U ∩ V 1 is a side clique; so U is correctly marked as a separate block. The set U also contains all remaining vertices, so f is partition of V into blocks, and hence verify(G, f ) will return T rue.

Block Decomposition Algorithm
By Lemma and 3.1 and 3.2, indep(G) returns a maximal independent set I of size at least s(G). Thus, Lemma 4.1 suggests a naive algorithm for recognition for GS + -try test(G, I \ i) for each i ∈ I and return T rue if any attempt succeeds. The proposed algorithm is correct, since |I ∩ V 0 | ≤ 1. The running time is O(|I|n 2 ) = O(n 3 ), while we aim for O(n 2 ). However, with relatively little effort we can localise I ∩ V 0 to at most 2 candidates from I.

Correctness
Lemma 4.2. The procedure recognise(G) returns T rue iff G ∈ GS + .
Case 2: V 0 ∩ I = {c}. Blocks starts by calculating the set C, where C = I if there is no v ∈ V with |N + (v) ∩ I| = 2, and otherwise can intersect at most one vertex from I ∩ V 1 and at most one vertex from I ∩ V 0 = {c} and since |N + (v) ∩ I| = 2, we have c ∈ N + (v). For each v ∈ V if |N + (v) ∩ I| = 2, then c ∈ N + (v) ∩ I, so c belongs to their intersection. If no v ∈ V exists with |N + (v) ∩ I| = 2, then C = I, but c ∈ I, so again c ∈ C. We deduce that if V 0 ∩ I = {c}, then c ∈ C and |C| > 0.
If |C| = 1 or |C| = 2 then test(G, I \ {i}) is tested individually for each vertex i ∈ C, but c ∈ C and test(G, I \ {c}) = T rue by Lemma 4.1.
If |C| > 2, then there is no v ∈ V with |N + (v) ∩ I| = 2. Either |I| = s(R, G) or |I| = s(R, G)+1, so either all side cliques are represented by vertices of I, or at most one is not represented, say S. We can handle both cases simultaneously by saying that S = ∅ in the former case. We have that I is a maximal independent set, but no vertex of I \ {c} can see a vertex of S, because they belong to different side cliques, so c is connected to all vertices of S and therefore {c} ∪ S is a clique. Let T = N (c) ∩ (V 1 \ S). Then |N + (v) ∩ I| = 2 for each v ∈ T , but no such vertex exists by assumption, so T = ∅. Now N (c) ∩ V 1 = S, and V 1 is a union of disjoint cliques, so V 1 ∪ {c} is also a union of disjoint cliques.
On the contrary, if G / ∈ GS + , then there is no representation for G, hence test cannot generate a block decomposition of G, and therefore test will return F alse.

Algorithms for random perfect graphs
Grötschel, Lovász, and Schrijver [GLS84] show that the stable set problem, the clique problem, the colouring problem, the clique covering problem and their weighted versions are computable in polynomial time for perfect graphs. The algorithms rely on the Lovász sandwich theorem, which states that for every graph G we have ω(G) ≤ ϑ(G) ≤ χ(G), where ϑ(G) is the Lovász number. The Lovász number can be approximated via the ellipsoid method in polynomial time, and for perfect graphs we know that ω(G) = χ(G), hence ϑ(G) is an integer and its precise value can be found. Therefore χ(G) and ω(G) can be found in polynomial time for perfect graphs, though these are NP-hard problems for general graphs. Further, α(G) and χ(G) (the clique covering number) can be computed from the complement of G (which is perfect). The weighted versions of these parameters can be found in a similar way using the weighted version of the Lovász number, ϑ w (G).
These results tell us more about computational complexity than algorithm design in practice. On the other hand, the problems above are much more easily solvable for generalised split graphs. We know that the vast majority of the nvertex perfect graphs are generalised split graphs [PS92]. One can first test if the input perfect graph is a generalised split graph using the algorithm in this paper and if so, apply a more efficient solution.
Eschen and Wang [EW14] show that, given a generalised split graph G with n vertices together with a unipolar representation of G or G, we can efficiently solve each of the following four problems: find a maximum clique, find a maximum independent set, find a minimum colouring, and find a minimum clique cover.
It is sufficient to show that this is the case when G is unipolar, as otherwise we can solve the complementary problem in the complement of G. Finding a maximum size stable set and minimum clique cover in a unipolar graph is equivalent to determining whether there exists a vertex in the central clique such that no side clique is contained in its neighbourhood, which is trivial and can be done very efficiently. Suppose there are k side cliques. If there is such a vertex v, then a maximum size stable set (of size k + 1) consists of v and from each side clique a vertex not adjacent to v, and a minimum size clique cover is formed by the central clique and the k side cliques. If not, then a maximum size stable set (of size k) consists of a vertex from each side clique, and a minimum clique cover is formed by extending the k side cliques to maximal cliques (which then cover C 0 ).
Let us focus on finding a maximum clique and minimum colouring of a unipolar graph G with a representation R. If R contains k side cliques, C 1 , . . . C k , then where C 0 is the central clique. Therefore, in order to find a maximum clique or a minimum colouring, it is sufficient to solve the corresponding problem in each of the co-bipartite graphs induced by the central clique and a side clique. The vertices outside a clique in a co-bipartite graph form a cover in the complementary bipartite graph, and the vertices coloured with the same colour in a proper colouring of a co-bipartite graph form a matching in the complementary bipartite graph. By König's theorem it is easy to find a minimum cover using a given maximum matching, and therefore finding a maximum clique and a minimum colouring in a co-bipartite graph is equivalent to finding a maximum matching in the complementary bipartite graph. For colourings, we explicitly find a minimum colouring in each co-bipartite graph G[C 0 ∪C i ], and such colourings can be fitted together using no more colours, since C 0 is a clique cutset. ). The approach of Eschen and Wang [EW14] is very similar, and they give more details, but unfortunately there is a mistake with their analysis, and a corrected version of their analysis yields O(n 3.5 / log n) time, instead of the claimed O(n 2.5 / log n). In order to see the mistake consider the case when the input graph is a split graph with an equitable partition.
Given a random perfect graph R n , we run our recognition algorithm in time O(n 2 ). If we have a generalised split graph, with a representation, we solve each of our four optimisation problems in time O(n 2.5 ), if not, which happens with probability e −Ω(n) , we run the methods from [GLS84]. This simple idea yields a polynomial-time algorithm for each problem with low expected running time, and indeed the probability that the time bound is exceeded is exponentially small.