Next Article in Journal
Tree Compatibility, Incomplete Directed Perfect Phylogeny, and Dynamic Graph Connectivity: An Experimental Study
Next Article in Special Issue
A Selectable Sloppy Heap
Previous Article in Journal
Optimized Deep Convolutional Neural Networks for Identification of Macular Diseases from Optical Coherence Tomography Images
Previous Article in Special Issue
DenseZDD: A Compact and Fast Index for Families of Sets
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Space-Efficient Fully Dynamic DFS in Undirected Graphs

by
Kengo Nakamura
*,‡ and
Kunihiko Sadakane
Department of Mathematical Informatics, Graduate School of Information Science and Technology, The University of Tokyo, Tokyo 113-8656, Japan
*
Author to whom correspondence should be addressed.
This paper is an extended version of our paper published in WALCOM2017, Entitled “A Space-Efficient Algorithm for the Dynamic DFS Problem in Undirected Graphs”.
Current address: NTT Communication Science Laboratories, Kyoto 619-0237, Japan.
Algorithms 2019, 12(3), 52; https://doi.org/10.3390/a12030052
Submission received: 30 November 2018 / Revised: 15 February 2019 / Accepted: 21 February 2019 / Published: 27 February 2019
(This article belongs to the Special Issue Efficient Data Structures)

Abstract

:
Depth-first search (DFS) is a well-known graph traversal algorithm and can be performed in O ( n + m ) time for a graph with n vertices and m edges. We consider the dynamic DFS problem, that is, to maintain a DFS tree of an undirected graph G under the condition that edges and vertices are gradually inserted into or deleted from G. We present an algorithm for this problem, which takes worst-case O ( m n · polylog ( n ) ) time per update and requires only ( 3 m + o ( m ) ) log n bits of space. This algorithm reduces the space usage of dynamic DFS algorithm to only 1.5 times as much space as that of the adjacency list of the graph. We also show applications of our dynamic DFS algorithm to dynamic connectivity, biconnectivity, and 2-edge-connectivity problems under vertex insertions and deletions.

1. Introduction

Depth-first search (DFS) is a fundamental algorithm for searching graphs. As a result of performing DFS, a rooted tree (or forest, for disconnected graphs) which spans all vertices is constructed. This rooted tree (forest) is called DFS tree (DFS forest), which is used as a tool for many graph algorithms such as finding strongly connected components of digraphs and detecting articulation vertices or bridges of undirected graphs. Generally, for a graph with n vertices and m edges, DFS can be performed in O ( n + m ) time, and a DFS tree (forest) can be constructed in the same time.
The graph structure that appears in the real world often changes gradually with time. Therefore, we consider DFS on dynamic graphs, not on static graphs. This problem is called dynamic DFS problem, and the goal for this problem is to design a data structure which can rebuild, for any on-line sequence of updates on G, a DFS tree (forest) for G after each update. Here single update on the graph is one of the following four operations: inserting a new edge, deleting an existing edge, inserting a new vertex and its incident edges (simultaneously), and deleting an existing vertex and its incident edges.
The problem of computing a DFS tree can be classified into two settings. For an undirected graph G, a DFS tree is generally not unique even if a root vertex is fixed. However, if the order of adjacent vertices to visit is fixed for every vertex, the DFS tree will be unique. The ordered DFS tree problem is to compute the order in which the vertices are visited in this setting. Contrary to this, the general DFS tree problem is, given an undirected graph G, to compute any one of DFS trees. In this paper, we focus on the general DFS tree problem. Meanwhile, dynamic graph algorithms can be classified into three types. If an algorithm supports only insertion of edges, it is said to be incremental. If an algorithm supports only deletion of edges, it is called decremental. If an algorithm supports both insertion and deletion updates, it is called fully dynamic. We consider the incremental and fully dynamic settings. Generally, dynamic graph algorithms focus on only edge insertions and deletions. However, for the fully dynamic setting we also consider the vertex insertions and deletions.

1.1. Existing Results

All the works described in this section focus on the general DFS tree problem, not the ordered DFS tree problem. Until recently, there were few papers for the dynamic DFS problem, despite of the simplicity of DFS in static setting. For directed acyclic graphs, Franciosa et al. [1] proposed an incremental algorithm and later Baswana and Choudhary [2] proposed a randomized decremental algorithm. For undirected graphs, Baswana and Khan [3] proposed an incremental algorithm. However, these algorithms support only either of insertion or deletion, and do not support vertex updates. Moreover, none of these algorithms achieve the worst-case time complexity of o ( m ) per single update though the amortized update time is better than the static DFS algorithm. This means in the worst case the computational time becomes the same as the static algorithm.
In 2016, Baswana et al. [4] proposed a dynamic DFS algorithm for undirected graphs which overcomes these two problems. Their algorithm supports all four types of graph updates, edge/vertex insertions/deletions, and achieves worst case O ( m n log 2.5 n ) time per update. They also proposed an incremental (supporting only edge insertions) dynamic DFS algorithm with worst case O ( n log 3 n ) time per update. Later Chen et al. [5] improved the fully dynamic worst-case update time by a polylog ( n ) factor. Baswana et al. also showed in the full version [6] of their paper the conditional lower bounds for fully dynamic DFS problems: Ω ( n ) time per update, under strong exponential time hypothesis, for any fully dynamic DFS under vertex updates, and Ω ( n ) time per update, under the condition the DFS tree is explicitly stored, for any fully dynamic DFS under edge updates. Now the recently proposed incremental dynamic DFS algorithm of Chen et al. [7] has O ( n ) worst-case update time and thus meets the lower bound of the incremental setting.
Recently, Baswana et al. [8] conducted an experimental study for the incremental (not fully dynamic) DFS problem. Besides this, Khan [9] proposed a parallel algorithm for the fully dynamic DFS (including vertex updates), which can compute the DFS tree after each update in O ( log 3 n ) time using m processors.
Please note that after the preliminary version [10] of this paper was published, Baswana et al. [11] proposed an improved algorithm for the fully dynamic DFS in undirected graphs. This algorithm has worst-case O ( m n log n ) update time and requires O ( m log n ) bits of space.

1.2. Our Results

We develop algorithms for incremental (i.e., under edge insertions) and fully dynamic (i.e., under edge/vertex insertions/deletions) DFS problems in undirected graphs, based on the algorithms Baswana et al. [4] proposed (an overview of their algorithms is in Section 3). The dynamic DFS algorithms of both Baswana et al. [4] and Chen et al. [5,7] require O ( m log 2 n ) bits of space, which is O ( log n ) times larger than the space usage of the adjacency list of the graph G and thus do not seem to be optimal. Thus, we seek to compress the required space of the dynamic DFS.
Besides this, we focus on relatively dense graphs, i.e., graphs with n = o ( m ) , because in sparse graphs, i.e., m = O ( n ) , DFS can be performed in O ( n ) time, which meets the conditional lower bound Baswana et al. [6] suggests. Here please note that they showed an example of a graph in which any dynamic DFS algorithm under edge update takes Ω ( n ) time. Since this graph has only O ( n ) edges, the (conditional) lower bound holds even for the sparse graphs.
We develop two algorithms for the dynamic DFS algorithm, namely algorithms A and B. Algorithm A is a simple modification of the work of Baswana et al. [4], while algorithm B is designed to reduce the space usage more and more. The comparison of the required space and worst-case update time of these algorithms with those of Baswana et al. [4] and Chen et al. [5,7] is given in Table 1. Both of our algorithms compress the required space by a factor of O ( log n ) and improve the worst-case update time by a polylog ( n ) factor under the fully dynamic case (i.e., supporting all four types of updates). Even under the incremental case (i.e., supporting only edge insertions), the update time is improved from [4], and close to [7]. Our main ingredient is the space usage of algorithm B: it is asymptotically only 1.5 times as much space as that of the adjacency list of G. Here note that since G is undirected, the adjacency list of G should have two elements for each edge in G and thus requires 2 m log n bits of space. We also show that if amortized update time is permitted instead of worst-case update time, the required space of algorithm B can be reduced to only ( 2 m + o ( m ) ) log n bits.
Here note that the new dynamic DFS algorithm of Baswana et al. [11] does not subsume algorithm B in terms of the space usage. However, it subsumes algorithm A because the space usage is the same, but the update time is faster. Even so, we describe the details of algorithm A in this paper because, as described below, our algorithm A (as well as algorithm B) can be applied to dynamic biconnectivity and 2-edge-connectivity problems including vertex insertions and deletions.
Our work can be summarized as follows. First, we improve the way to solve a query that is frequently used in the algorithm of Baswana et al. [4] (Section 4), by using the idea of Chen et al. [5] partially. By this improvement, we propose a linear space (i.e., requiring O ( m log n ) bits of space) algorithm, algorithm A, for the incremental and fully dynamic DFS problems (Theorem 1 in Section 5). Second, we further compress the data structures used in [4,5] using wavelet tree [12] (Section 6). In this contribution, we develop an efficient method for solving a kind of query on integer sequences, named range leftmost (rightmost) value query, and give a space-efficient method for solving a variant of orthogonal range search problems, which has been studied in the computational geometry community. These queries are of independent interest. Third, we consider a space-efficient method to implement the algorithm of Baswana et al. [4] (Section 7). By combining them, we propose a more space-efficient algorithm with worst-case update time, algorithm B, for the incremental and fully dynamic DFS problems (Theorems 4 and 5). Simultaneously, results for the amortized update time algorithms are also obtained (Theorems 2 and 3).

1.3. Applications

For static undirected graphs, connectivity, biconnectivity and 2-edge-connectivity queries can be answered by using a DFS tree (details for these queries are in Section 8). The existing fully dynamic DFS algorithms [4,5] can be applied to solve these queries in fully dynamic graphs including vertex updates. Our algorithms can also be extended to answer these queries under the fully dynamic setting including vertex updates. Though we need some additional considerations for our algorithms (Section 8), the worst-case update time complexity and the required space can be kept same as the dynamic DFS algorithms in Table 1 (Theorems 6 and 7). Moreover, as well as the existing fully dynamic DFS algorithms, our algorithms can solve these queries in worst-case O ( 1 ) time.
For the dynamic connectivity problem under vertex updates, the dynamic subgraph connectivity problem [13,14] has been extensively studied. In this problem, given an undirected graph G, a binary status is associated with each vertex in G and we can switch it between “on” and “off”, and the query is to answer whether there is a path between two vertices in the subgraph of G induced by the “on” vertices. Indeed, our dynamic setting including vertex insertions and deletions is a generalization of this dynamic subgraph setting. Under the dynamic subgraph setting, we cannot change the topology of G, i.e., all edges and vertices in G are fixed, while under our setting we can. Under the generalized fully dynamic setting (i.e., our setting), we improve the deterministic worst-case update time bound of [4,5] (with keeping query time O ( polylog ( n ) ) ) by a polylog ( n ) factor. We also compress the required space of their algorithms.
For the dynamic biconnectivity and 2-edge-connectivity problems, the setting including vertex updates was not well considered. Under the fully dynamic setting including vertex updates, we improve the deterministic worst-case update time bound of [4,5] (with keeping query time O ( polylog ( n ) ) ) by a polylog ( n ) factor. We again compress the required space of their algorithms.

2. Preliminaries

Throughout this paper, n denotes the number of vertices and m denotes the number of edges. We assume that a graph is always simple, i.e., has no self-loops or parallel edges, since they make no sense in constructing DFS tree. With this assumption we can conclude m = O ( n 2 ) and log m = O ( log n ) . We also assume that n = o ( m ) . We use log ( · ) as the base-2 logarithm. From now, the term “fully dynamic setting” includes vertex updates as well as edge updates, while “incremental setting” includes only edge insertions.
Given an undirected graph G and its DFS tree (forest) T, the parent vertex of a vertex v in T is denoted by p a r ( v ) . A subtree of T rooted at v is the subgraph of T induced by v and its descendants, and is denoted by T ( v ) . Two vertices x and y are said to have ancestor-descendant relation iff x = y , x is an ancestor of y, or y is an ancestor of x. The path in T connecting two vertices x and y is denoted by p a t h ( x , y ) . A path p in T is called an ancestor-descendant path iff the endpoints of p have ancestor-descendant relation.
Given a connected undirected graph G and its rooted spanning tree T, non-tree edges, i.e., the edges in G which are not included in T, can be classified into two types. A non-tree edge is called a back edge if it connects two vertices which have ancestor-descendant relation; otherwise it is called a cross edge. Then T is a DFS tree of G iff all non-tree edges are back edges. We call this DFS property.
We can assume that the graph G is always connected by the following way. At the beginning, we add a virtual vertex r, and edges ( r , v ) for all vertices v in G, to the graph G. During our algorithm, we keep a DFS tree of this augmented graph rooted at r. The DFS tree (forest) of the original graph can be obtained by simply removing r from the DFS tree of the augmented graph.
Bit Vectors. Let B [ 1 l ] be a 0,1-sequence of length l, and consider two queries on B: for c = 0 , 1 , rank c ( i , B ) returns the number of occurrences of c in B [ 1 i ] , and select c ( i , B ) returns the position of the i-th occurrence of c in B if exists or ∅ otherwise. Then there exists a data structure such that rank and select queries for c = 0 , 1 can both be answered in O ( 1 ) time and the required space is l + O ( l log log l / log l ) = l + o ( l ) bits [15,16]. Moreover, the space can be reduced to l H 0 ( B ) + o ( l ) bits while keeping O ( 1 ) query time [17], where H 0 ( B ) 1 is the zeroth-order empirical entropy of B. When 1 occurs k times in B, l H 0 ( B ) = k log l k + ( l k ) log l l k k log e l k .
Wavelet Trees. Let S [ 1 l ] be an integer sequence of symbols [ 0 , σ 1 ] . A wavelet tree [12] for S is a complete binary tree with σ leaves and σ 1 internal nodes, each internal node of which has a bit vector [15,16]. Each node v corresponds to an interval of symbols [ l v , r v ] [ 0 , σ 1 ] ; the root corresponds to [ 0 , σ 1 ] and its left (right) child to [ 0 , σ / 2 ] ( [ σ / 2 + 1 , σ 1 ] ), and these intervals are recursively divided until leaves, each of which corresponds to one symbol. The bit vector B v [ 1 L v ] corresponding to an internal node v is defined as follows. Let S v [ 1 L v ] be the subsequence of S which consists of elements with symbols [ l v , r v ] . Then if the symbol S v [ i ] corresponds to the left child of v then B v [ i ] = 0 ; otherwise B v [ i ] = 1 . The wavelet tree requires ( l + o ( l ) ) log σ bits of space, and can be built in O ( l log σ / log l ) time [18]. Using wavelet tree for S, the following queries can be solved in O ( log σ ) time for each: access ( i , S ) returns S [ i ] , rank c ( i , S ) returns the number of occurrences of c in S [ 1 i ] , and select c ( i , B ) returns the position of the i-th occurrence of c in B if exists or ∅ otherwise (here c [ 0 , σ 1 ] ).

3. Overview of the Algorithms of Baswana et al.

In this section, we give an overview of the DFS algorithms in dynamic setting proposed by Baswana et al. [4], and describe some lemmas used in this paper.

3.1. Fault Tolerant DFS Algorithm

We first refer to the algorithm for fault tolerant DFS problem. This problem is described as follows: given an undirected graph G and its DFS tree T, we try to rebuild a DFS tree for the new graph obtained by deleting k ( n ) edges and vertices (simultaneously) from G. In this part U denotes a set of vertices and edges we want to delete from G, and G U denotes the new graph obtained by deleting vertices and edges in U from G.
Their algorithm uses a partitioning technique which divides a DFS tree into connected paths and subtrees. This partitioning is called disjoint tree partitioning (DTP).
Definition 1 ([4]).
A DFS tree T of an undirected graph G and a set U of vertices and edges are given, and the forest T U obtained by removing the vertices and edges in U from T is considered. Given a vertex subset A of T U , a disjoint tree partitioning of T U defined by A is a partition of a subgraph of T U induced by A into a set P of paths with | P | | U | and a set T of trees. Here each p P is an ancestor-descendant path in T and each τ T is a subtree of T.
Using DTP, their algorithm can be summarized as follows. First, the DTP of T U defined by V { r } (where V is the vertex set of T U and r is the virtual vertex in Section 2) is calculated. As a result, a set P of paths and a set T of subtrees are constructed. Now let T * be the partially constructed DFS tree of G U , and at first T * contains only r. Then their algorithm can be seen as if performing a static DFS traversal (start with r) on the graph whose vertex set is P T . When a path or a subtree x P T is visited during the traversal, the algorithm extracts an ancestor-descendant path p * from x and attaches it to T * , which means the vertices of p * are marked as visited. Thereafter, the DTP of T U defined by the unvisited vertices is recalculated. This can be performed by local operations around x. More specifically, if x T the remaining part x p * is divided into some subtrees and they are stored in T ; otherwise x p * is an ancestor-descendant path and is pushed back to P . Then the traversal continues. If all vertices are visited, T * is indeed the DFS tree of G U .
The key point of reducing computational complexity is that taking advantage of partitioning, the number of edges accessed by this algorithm can be decreased from m. At this time, it must be ensured that the edges not accessed by this algorithm are not needed to construct the new DFS tree T * . To achieve this, they use a reduced adjacency list L and two kinds of queries Q and Q . Here Q and Q are defined as follows.
Definition 2 ([4]).
A connected undirected graph G, its DFS tree T, and a set U of vertices and edges are given. Then for any three vertices w , x , y in G U , the following queries are considered. Among all edges in G U which directly connect a subtree T ( w ) and an ancestor-descendant path p a t h ( x , y ) , Q ( T ( w ) , x , y ) returns one of the edges whose endpoint on p a t h ( x , y ) is the nearest to x. Similarly, among all edges in G U which directly connect a vertex w and an ancestor-descendant path p a t h ( x , y ) , Q ( w , x , y ) returns one of the edges whose endpoint on p a t h ( x , y ) is the nearest to x. If there are no connecting edges, these queries should return . Here we can assume that T ( w ) (or { w } ) and p a t h ( x , y ) have no common vertices, and contain no vertices or edges in U.
During the construction of T * , the edges added to L are chosen carefully by Q and Q and, instead of the whole adjacency list of G, only L is accessed. Please note that in these queries, the virtual vertex r and its incident edges are not considered, i.e., there are no queries such that T ( w ) or p a t h ( x , y ) contains r.
In fact, this fault tolerant DFS algorithm can be easily extended to handle insertion of vertices/edges as well as deletion updates [4]. Now we consider each of the situations: fully dynamic and incremental. Under the incremental case, the number of times the query Q is called is bounded by O ( n ) , and Q is not used. In this case, the number of edges in L is at most O ( n ) . Under the fully dynamic case, the number of times Q and Q is executed is bounded by O ( n k log n ) and O ( n k ) , respectively, and the number of edges in L is at most O ( n k log n ) . Solving these queries Q and Q is the most time-consuming part of their algorithm. Therefore, the time complexity of their fault tolerant DFS algorithm can be summarized in the following lemma.
Lemma 1 ([4]).
An undirected graph G and its DFS tree T are given. Suppose that the query Q can be solved in O ( f ) time with a data structure constructed in O ( F ) time under the incremental case. Then with O ( F + n ) preprocessing time, a DFS tree for the graph obtained by applying any k ( n ) edge insertions to G can be built in O ( f n ) time. Similarly, suppose Q and Q can be solved in O ( g ) and O ( g ) time (resp.) with a data structure built in O ( G ) time under the fully dynamic case. Then with O ( G + n ) preprocessing time, a DFS tree for the graph obtained by applying any k ( n ) updates (vertex/edge insertions/deletions) to G can be built in O ( k ( g log n + g ) n ) time.

3.2. Dynamic DFS Algorithm

Next we refer to the algorithm for the dynamic DFS. Baswana et al. [4] proposed an algorithm for this problem by using the fault tolerant DFS algorithm as a subroutine. Their result can be summarized in the following lemma.
Lemma 2 ([4]).
Suppose that for any k ( n ) updates on an undirected graph G, a new DFS tree can be built in O ( k g + h ) time with a data structure constructed in O ( f ) time (i.e., with O ( f ) preprocessing time). Then for any on-line sequence of updates on the graph, a new DFS tree after each update can be built in amortized/worst-case O ( f g + h ) time per update, if f / g n holds.
First we refer to the amortized (not worst-case) update time algorithm. Their idea is to rebuild the data structure D , which is constructed at the preprocessing in the fault tolerant DFS algorithm, periodically. To explain this idea in detail, let G j be the graph obtained by applying first C j : = c 0 + + c j 1 updates on G ( c 1 , c 2 , is later decided), n j and m j be the number of vertices and edges in G j , and T j be the DFS tree for G j reported by this algorithm. For the first c 0 updates, use the data structure D 0 constructed from the original graph G and DFS tree T. That is, after each arrival of graph update, we perform the fault tolerant DFS algorithm as if all previous updates come simultaneously. After c 0 updates are processed, build the data structure D 1 from G 1 and T 1 , and use D 1 for next c 2 updates. In other words, from ( c 0 + 1 ) -st to ( c 0 + c 1 ) -th updates, we perform the fault tolerant DFS as if from ( c 0 + 1 ) -st to the latest updates come simultaneously. Similarly, after C j updates are processed, the data structure D j is built from G j and T j , and D j is used for next c j updates. We call the moment D j is used phasej of the amortized update time algorithm. In this way the construction time of the data structures is amortized over c j updates in phase j. Now suppose that for any k ( n ) updates on G, a new DFS tree can be built in O ( k g j + h j ) time with D j built in O ( f j ) time, where f j , g j and h j are all functions of n j and m j . Then the update time complexity becomes O ( f j / c j + g j c j + h j ) = O ( f j g j + h j ) by setting c j = f j / g j . Therefore, we can achieve the amortized update time bound in Lemma 2. Here in phase j, f , g and h in Lemma 2 are indeed f j , g j and h j , and are functions of n j and m j .
Next we proceed to the worst-case update time algorithm. The idea to achieve the efficient “worst-case” update time described in [4] is to actually divide the construction process of data structure over c j updates. Here we assume that the number of edges is not dramatically changed during each phase, i.e., l m j m j + 1 h m j holds for all j = 1 , 2 , with some constants l and h. With this assumption we can say c j and c j + 1 differs only by a constant factor (since the number of vertices is not dramatically changed during each phase). In our algorithm described in Section 5 and Section 7, we can say this assumption always holds on condition that n = o ( m ) , so later we do not touch it.
Let us go into detail. For the first c 0 updates, use the data structure D 0 built from the original graph G and DFS tree T. For the next c 1 updates, use again D 0 and build D 1 gradually from G 1 and T 1 . Similarly, from ( C j + 1 ) -st to C j + 1 -th updates on the graph ( C j = c 0 + + c j 1 ), use D j 1 for fault tolerant DFS and build D j gradually from G j and T j . In this way the construction time of data structures is divided, and the efficient worst-case update time in Lemma 2 is achieved.

4. Query Reduction to Orthogonal Range Search Problem

In this section, we show an efficient reduction from the queries Q and Q to orthogonal range search queries. Generally speaking, given some points on the grid points, an orthogonal range search problem (in a 2-dimensional plane) is to answer queries about the points within any rectangular region R = [ x 1 , x 2 ] × [ y 1 , y 2 ] . Queries of this kind are extensively studied in the computational geometry community, e.g., counting the number of points (orthogonal range counting) or reporting all points (orthogonal range reporting) within R. Now we consider the following query.
Definition 3.
On grid points in a 2-dimensional plane, k points are given. Then for any rectangular region R = [ x 1 , x 2 ] × [ y 1 , y 2 ] , the orthogonal range successor (predecessor) query returns one of the points whose y-coordinate is the smallest (largest) within R. If there are no points within R, the query should return . We abbreviate it as ORS (ORP) query.

4.1. Original Reduction

First we describe the original reduction of queries Q and Q [4] proposed by Baswana et al. In their paper, the ORS (ORP) query is not explicitly used, but is implicitly used. Indeed, their method to answer Q and Q is equivalent to solving O ( log n ) ORS (ORP) queries on the adjacency matrix of G; details are described below. We later use part of their ideas.
Now we describe that how a set of points is constructed from G. The high-level idea is quite simple: the vertices in G are numbered from 0 to n 1 , and an adjacency matrix according to this numbering is constructed. Let us go into detail. First, a heavy-light (HL) decomposition [19] of T is calculated. Then the order L of vertices is decided according to the pre-order traversal of T, such that for the first time a vertex v is visited, the next vertex to visit is one that is directly connected with a heavy edge derived from the HL decomposition. Next, the vertices of G (except r) are numbered from 0 to n 1 according to L ; here the vertex id of v is denoted by f ( v ) . Finally, on a 2-dimensional grid G , for each edge ( i , j ) of G { r } , we put two points on the coordinates ( f ( i ) , f ( j ) ) and ( f ( j ) , f ( i ) ) in G . This is equivalent to considering the adjacency matrix of G, and thus 2 m points are placed.
The order L has some good features. First, for any subtree T ( w ) of T, the vertex ids of the vertices of T ( w ) occupy single consecutive interval [ t b , t e ] since L is a pre-order traversal of T. Second, for any ancestor-descendant path p a t h ( x , y ) in T, those of p a t h ( x , y ) occupy at most O ( log n ) intervals [ a 1 , b 1 ] , , [ a k , b k ] . This is because p a t h ( x , y ) contains at most O ( log n ) light edges (i.e., non-heavy edges) thanks to the property of HL decomposition [19]. Therefore, all edges in G between T ( w ) and p a t h ( x , y ) are within O ( log n ) rectangular regions [ t b , t e ] × [ a i , b i ] ( i = 1 , , k ) on G . Then if U = (e.g., the incremental case), the answer for Q ( T ( w ) , x , y ) can be obtained by searching them. More specifically, if f ( x ) f ( y ) we solve ORS queries on G with R = [ t b , t e ] × [ a i , b i ] ( i = 1 , , k ) and return the point with the smallest y-coordinate among the ORS queries’ answers. Otherwise, we solve ORP queries with the same rectangles and return the point with the largest y-coordinate among the ORP queries’ answers. If all ORS (ORP) queries return ∅, the answer for Q ( T ( w ) , x , y ) is also ∅. The same argument can be applied for Q ( w , x , y ) except that the rectangular regions are [ f ( w ) , f ( w ) ] × [ a i , b i ] ( i = 1 , , k ) .
If U (e.g., the fully dynamic case), deletion of points on G should be supported, since the edges in U and the incident edges of the vertices in U must be removed from G to prevent Q and Q from reporting already deleted edges. To achieve this, Baswana et al. [4] uses a kind of range tree data structures to solve Q and Q , which supports deleting a point.

4.2. Efficient Reduction

Next we show the following: (a) the query Q ( T ( w ) , x , y ) can be converted to single (not O ( log n ) ) ORS/ORP query for any w , x , y , and (b) the deletion of points from G need not be supported. Note that the idea to partially achieve (a) is first proposed by Chen et al. [5] and we use a part of it. However, the solution of the query Q of Chen et al. [5] deals with only the case T ( w ) is hanging from p a t h ( x , y ) (that is, the case (ii) in Figure 1 explained later). Thus we here extend this to deal with an arbitrary case. The goal is to prove the following lemma.
Lemma 3.
Suppose there exists a data structure D which can solve both ORS and ORP queries on G in O ( f ) time for each. Then for any three vertices w , x , y , the query Q ( T ( w ) , x , y ) can be solved in O ( f ) time with D . Similarly, for any w , x , y the query Q ( w , x , y ) can be answered in O ( f log n ) time with D . Please note that D need not support deletion of points from G .
First we show (a) when U = (later this assumption is removed). Here we define some symbols for convenience: for two vertices a and b in G, a b means a is an ancestor of b in T, a b means a b or a = b , and a b means neither a b nor b a holds, i.e., a and b have no ancestor-descendant relation. Let p = p a r ( w ) . Please note that it is confirmed that w always has a parent because if w is the root of T, T ( w ) = T spans all vertices of G and has some common vertices with any p a t h ( x , y ) , which contradicts the assumption (see Definition 2). Now we assume x y (the case y x is considered at last). Then the configurations of T ( w ) and p a t h ( x , y ) can be classified into five patterns in terms of x , y and p as drawn in Figure 1: (i) p x y , (ii) x p y , (iii) x y p , (iv) x p and y p , and (v)  x p and y p . Now we show the following.
Claim 1.
For all cases from (i) to (v), the answer for Q ( T ( w ) , x , y ) can be obtained by solving the ORS query on G with R = [ t b , t e ] × [ f ( x ) , f ( LCA ( y , w ) ) ] , where [ t b , t e ] is the interval the vertices of T ( w ) occupy in the vertex numbering and LCA ( y , w ) is the lowest common ancestor of y and w in T.
Proof. 
First, for the cases (i) and (iv), the answer for Q ( T ( w ) , x , y ) is ∅ since if such edge exists, it becomes a cross edge and thus refutes DFS property. For these cases, since LCA ( y , w ) comes above x and thus f ( x ) > f ( LCA ( y , w ) ) , the ORS query also returns ∅, which correctly answers Q ( T ( w ) , x , y ) . For other cases ((ii), (iii) and (v)), LCA ( y , w ) lies on p a t h ( x , y ) . Here all edges between T ( w ) and p a t h ( x , y ) are indeed between T ( w ) and p a t h ( x , LCA ( y , w ) ) due to DFS property. It may be that the interval [ f ( x ) , f ( LCA ( y , w ) ) ] contains some vertex ids of the vertices of branches forking from p a t h ( x , LCA ( y , w ) ) , which happens when these branches are traversed prior to y and w in L . This does not cause trouble because there are no edges between T ( w ) and these branches again due to DFS property. Hence R contains all edges between T ( w ) and p a t h ( x , y ) and no other edges, when we see G as an adjacency matrix of G. Thus, reporting a point whose y-coordinate is the smallest within R yields an answer for Q ( T ( w ) , x , y ) . Here note that the LCA query can be solved in O ( 1 ) time with a data structure of O ( n log n ) = o ( m ) log n bits built in O ( n ) time [20]. Therefore Claim 1 holds. ☐
From Claim 1 we prove (a) under U = and x y . Lastly, if y x , all we must do is swap x and y and perform almost the same as described above, except that we solve an ORP (not ORS) query on G .
Next we show (b). First we assume that U consists of only vertices and contains no edges. In this setting we can confirm from Definition 2 that for the query Q ( T ( w ) , x , y ) , T ( w ) and p a t h ( x , y ) have no vertices in U. Thus, [ t b , t e ] contains no vertex ids of the vertices in U. We can also say that even if [ f ( x ) , f ( LCA ( y , w ) ) ] in Claim 1 contains some vertex ids of the vertices in U, these vertices are all in the branches forking from p a t h ( x , y ) and cause no trouble. Therefore, even if U contains some vertices, Claim 1 also holds. For the query Q ( w , x , y ) , we use the original reduction described in Section 4.1. We can also say from Definition 2 that { w } and p a t h ( x , y ) have no vertices in U. Thus, [ f ( w ) , f ( w ) ] and [ a i , b i ] ( i = 1 , , k ) contain no vertex ids of the vertices in U.
Finally, we consider the case U contains some edges. In this case, it seems that deletion of these edges from G is needed. However, deleting one edge e = ( u , v ) can be simulated by one vertex deletion and one vertex insertion as follows: first record u’s incident edges ( u , w i ) ( i = 1 , 2 , ) excluding e = ( u , v ) itself, second delete v, and then insert a vertex u with its incident edges ( u , w i ) ( i = 1 , 2 , ) . In the algorithm of Baswana et al. [4], vertex insertions are treated separately from the queries Q and Q . Thus, in this way we can avoid deleting e from G . This completes the proof of Lemma 3.
Lastly we briefly explain that Q cannot be converted to single ORS (ORP) query in the same way as Q. The five patterns of configuration of vertices drawn in Figure 1 can also appear in Q ( w , x , y ) . However, there is a corner case that can appear in Q ( w , x , y ) but cannot appear in Q ( T ( w ) , x , y ) : w x y , as drawn in Figure 2. This pattern cannot appear in Q ( T ( w ) , x , y ) since if w x y then T ( w ) overlaps with p a t h ( x , y ) . This corner case is relatively hard to convert to single ORS/ORP query on G , since every vertex of a branch forking from p a t h ( x , y ) has ancestor-descendant relation with w. Please note that it does not matter if Q ( w , x , y ) is solved O ( log n ) times slower than Q ( T ( w ) , x , y ) (see Lemma 1).

5. Linear Space Dynamic DFS

In this section, we propose a linear space fast dynamic DFS algorithm. In Section 4.2, we prove that there is no need to support deletion of points from G even if there are some deletion updates of edges and vertices. This means we can bring a static data structure for the queries Q and Q rather than a dynamic one. Indeed, the data structure by Belazzougui and Puglisi [21] can solve the ORS query in rank space. The rank space with k points is a [ 1 , k ] × [ 1 , k ] grid where all points differ in both x- and y-coordinates. For the rank space with k points, their data structure can solve ORS query in O ( log ε k ) time for an arbitrary 0 < ε < 1 , can be constructed in O ( k log k ) time, and occupies O ( k log k ) bits of space. Though G is not a rank space, we can convert G to the rank space by using bit vectors. Please note that this kind of conversion is regularly employed for various orthogonal range search data structures (e.g., see [22]). So, in the proof below we do not show the conversion from G to a rank space, but show that from the adjacency list of G to a rank space directly.
Lemma 4.
There exists a data structure D which can solve both ORS and ORP queries on G in O ( log ε n ) time for each for arbitrary 0 < ε < 1 . This data structure requires O ( m log n ) bits of space and can be built in O ( m log n ) time.
Proof. 
Let d i be the degree of a vertex v in G whose vertex id is i and M i = j = 0 i 1 d j ( M 0 = 0 ). Now we have M n = 2 m , since G is an undirected graph. We construct a rank space R satisfying the following condition: for two vertices v , w with f ( v ) = i and f ( w ) = j , there exists an edge between v and w in G iff there exists a point within [ M i + 1 , M i + 1 ] × [ M j + 1 , M j + 1 ] in R . This can be done by the following procedure. First, prepare two arrays A , B with A [ i ] = B [ i ] = M i for i = 0 , , n 1 . Then for each edge e = ( u , v ) in G, increment A [ f ( u ) ] , B [ f ( v ) ] , A [ f ( v ) ] and B [ f ( u ) ] by one and then place two points on the coordinates ( A [ f ( u ) ] , B [ f ( v ) ] ) and ( A [ f ( v ) ] , B [ f ( u ) ] ) in R . When all edges are processed, R  satisfies the above condition.
Besides R , we construct a bit vector with B = 0 d 0 10 d 1 1 0 d n 1 1 . Here for i = 0 , , n , rank 0 ( select 1 ( i , B ) , B ) = M i and rank 1 ( select 0 ( j , B ) , B ) = i for M i + 1 j M i + 1 ( i n ) . These mean that the bit vector for B enables us to interconvert between the vertex id f ( · ) and the coordinate in R in O ( 1 ) time. Therefore, given an ORS query with a rectangle R on G we can solve it as follows. First, convert the coordinates of R into that in R by the bit vector for B. Second, solve the converted ORS query on R by the data structure of Belazzougui and Puglisi [21]. Finally, if the answer is ∅ then the original answer is also ∅. Otherwise the answer is again converted to the vertex id by the bit vector for B. The overall cost for single ORS query is thus O ( log ε n ) . Please note that the bit vector requires only ( n + 2 m ) H 0 ( B ) + o ( m ) n log e ( n + 2 m ) n + o ( m ) = o ( m ) log n bits of space.
We above mentioned only the ORS query, but the data structure for the ORP query can be built in a similar way. Let R be a rank space constructed by flipping R vertically; i.e., R is constructed by putting a point on coordinates ( i , 2 m + 1 j ) for every point ( i , j ) on R . R can also be built directly from the adjacency list of G, and the ORP query on G can be converted to the ORS query on R in the same manner as described above. ☐
Figure 3 is an example of an undirected graph G and its corresponding rank space R . In this example, the rectangles [ 1 , 3 ] × [ 4 , 5 ] and [ 4 , 5 ] × [ 1 , 3 ] have a point inside, which corresponds to the edge connecting 0 and 1. Conversely, the rectangles [ 1 , 3 ] × [ 6 , 7 ] and [ 6 , 7 ] × [ 1 , 3 ] do not have a point inside, which indicates the absence of the edge connecting 0 and 2.
Combining Lemma 4 with Lemma 3 implies the following corollary.
Corollary 1.
There exists a data structure of O ( m log n ) bits such that for any w , x , y , Q ( T ( w ) , x , y ) can be solved in O ( log ε n ) time and Q ( w , x , y ) in O ( log 1 + ε n ) time. This data structure can be built in O ( m log n )  time.
This corollary directly gives a fault tolerant DFS algorithm when combining with Lemma 1. Here we must consider the space requirement of this algorithm, but it is simple. Information used in the algorithm other than the data structure to solve Q and Q and the reduced adjacency list L takes only O ( n log n ) = o ( m ) log n bits: there are O ( n ) words of information for the original DFS tree T (including the data representing the disjoint tree partition), O ( 1 ) words of information attached to each vertex and each edge in T, a stack which has at most O ( n ) elements, and a partially constructed DFS tree, but these sum up to only O ( n ) words. Moreover, since the reduced adjacency list L contains at most O ( m ) edges, the required space for L is bounded by O ( m log n ) bits. From Corollary 1, the data structure to solve Q and Q occupies O ( m log n ) bits. Hence the following lemma holds.
Lemma 5.
An undirected graph G and its DFS tree T are given. Then there exists an algorithm such that with O ( m log n ) preprocessing time, a DFS tree for the graph obtained by applying any k ( n ) edge insertions to G can be built in O ( n log ε n ) time. Similarly, there exists an algorithm such that with O ( m log n ) preprocessing time, a DFS tree for the graph obtained by applying any k ( n ) updates (vertex/edge insertions/deletions) to G can be built in O ( n k log 1 + ε n ) time. These algorithms require O ( m log n ) bits of space.
Linear space dynamic DFS algorithms are obtained by combining Lemma 5 with Lemma 2. For the incremental case, we set f = m log n , g = m log n / n 2 and h = n log ε n for Lemma 2 (here the condition f / g n must be satisfied). Then the update time is O ( f g + h ) = O ( m / n · log n + n log ε n ) . We can say the update time is O ( n log ε n ) if m = O ( n 2 / log n ) , or O ( n log n ) otherwise. For the fully dynamic case, we set f = m log n , g = n log 1 + ε n and h = 0 , and, therefore, the update time is O ( m n log 0.75 + ε n ) .
Theorem 1.
There exists an algorithm such that given an undirected graph G and its DFS tree T, for any on-line sequence of updates on G, a new DFS tree after each update can be built in worst-case O ( m / n · log n + n log ε n ) time per update under the incremental case, and O ( m n log 0.75 + ε n ) time per update under the fully dynamic case, where 0 < ε < 1 / 2 is an arbitrary constant. This algorithm requires O ( m log n ) bits.

6. Compression of Data Structures

In this section, we show a way to solve Q and Q more space-efficiently, which is later used in the space-efficient dynamic DFS algorithm in Section 7.

6.1. Range Next Value Query

First, we show a data structure of ( 2 m + o ( m ) ) log n bits, which can be derived immediately from the existing results.
It is already known that the ORS (ORP) query on a k × σ grid G with k points, where all points differ in x-coordinates, can be solved efficiently with wavelet tree [23]. Now we describe this method in detail. The method is to build an integer array S [ 1 k ] where S [ i ] ( 1 i k ) is the y-coordinate of the point whose x-coordinate is i. Then the ORS (ORP) query is converted to the following query on S.
Definition 4.
An integer sequence S [ 1 l ] is given. Then for any interval [ a , b ] [ 1 , l ] and integer p ( q ) , the range next (previous) value query returns one of the smallest (largest) elements among the ones in S [ a b ] which are not less than p (not more than q). If there is no such element, the query should return . We abbreviate it as RN (RP) query.
These queries can be efficiently solved with the wavelet tree W for S; if S [ i ] [ 0 , σ 1 ] for all i = 1 , , l , they can be solved with O ( log σ ) rank and select queries on W [23]. The pseudocode for solving RN query by the wavelet tree for S is given in Algorithm 1. In Algorithm 1, left ( v ) and right ( v ) stand for the left and right child of the node v, respectively. When calling RN ( root , p , [ a , b ] , [ 0 , σ 1 ] ) , we can obtain the answer for the RN query with [ a , b ] and p. Here a pair of the position and the value of the element is returned. If ( 0 , 1 ) is returned, the answer for the RN query is ∅. The RP queries on S can be solved in a similar way using the same wavelet tree as the RN queries.
Algorithm 1 Range Next Value by Wavelet Tree
 1: function RN( v , p , [ a , b ] , [ α , ω ] )
 2:    if a > b or ω < p then return ( 0 , 1 )
 3:    if α = ω then return ( a , α )
 4:     γ ( α + ω ) / 2
 5:    if p γ then
 6:         ( p o s , v a l ) RN ( left ( v ) , p , [ rank 0 ( B v , a 1 ) + 1 , rank 0 ( B v , b ) ] , [ α , γ ] )
 7:        if p o s 0 then return ( select 0 ( B v , p o s ) , v a l )
 8:    end if
 9:     ( p o s , v a l ) RN ( right ( v ) , p , [ rank 1 ( B v , a 1 ) + 1 , rank 1 ( B v , b ) ] , [ γ + 1 , ω ] )
10:    if p o s 0 then return ( select 1 ( B v , p o s ) , v a l )
11:    else return ( 0 , 1 )
12: end function
The ORS query on G with R = [ x 1 , x 2 ] × [ y 1 , y 2 ] can be answered by solving the RN query on S with [ a , b ] = [ x 1 , x 2 ] and p = y 1 . If the RN query returns ∅, the answer for the ORS query is also ∅. If the RN query returns an element, the answer is ∅ if the value of this element is larger than y 2 , or the point corresponding to this element otherwise. Similarly, the ORP query on G can be converted to the RP query on S.
Although in the grid G , on which we want to solve ORS (ORP) queries, some points share the same x-coordinate, it is addressed by preparing a bit vector B in the same way as the proof of Lemma 4. B enables us to convert x-coordinates. Here the integer sequence S [ 1 2 m ] consists of the y-coordinates of the points on G sorted by the corresponding x-coordinates. The required space of the wavelet tree is ( 2 m + o ( m ) ) log n bits since there are k = 2 m points on G and the y-coordinates vary between [ 0 , n 1 ] . Hence now we obtain a data structure of ( 2 m + o ( m ) ) log n bits to solve Q and Q (note that the bit vector requires only o ( m ) log n bits as described in the proof of Lemma 4).

6.2. Halving Required Space

Next, we propose a data structure of ( m + o ( m ) ) log n bits to solve Q and Q . The data structure shown in Section 6.1 has information of both directions for each edge of G since G is an undirected graph. This seems to be redundant, thus we want to hold information of only one direction for each edge. In fact, the placement of points on G is symmetric since the adjacency matrix of G is also symmetric. So now we consider, of the grid G , the upper triangular part G u and the lower triangular part G l ; a grid G u inherits from G the points within the region the x-coordinate is larger than the y-coordinate, and G l is defined in the same manner. Since G has no self-loops, G u and G l have m points for each. Now we show that we can use G u as a substitute for G .
Let S u [ 1 m ] be an integer sequence which contains the y-coordinates of the points on G u sorted by the corresponding x-coordinates. Let B u be a bit vector 0 d 0 , u 10 d 1 , u 1 0 d n 1 , u 1 , where d i , u is the number of occurrences of i in S u . B u enables us to convert the x-coordinate on G to the position in S u and vice versa in O ( 1 ) time. Figure 4 is an example of the grid G and its upper triangular part G u . Here we should observe that G u has half as much point as G and thus the length of S u is halved from S.
Lemma 6.
Any ORS (ORP) query which appears in the reduction of Lemma 3 can be answered by solving one RN (RP) query, or by solving onerankand oneselectqueries, on S u .
Proof. 
We give a proof for only ORS queries since that for ORP queries is almost same. It can be observed that if the query rectangle of the ORS query is inside the upper triangular part of G , this ORS query can be converted to an RN query on S u in the same way as Section 6.1, since G u contains all points inside the upper triangular part in G . The below claim ensures us that any ORS query related to Q can be converted to the RN query on S u .
Claim 2.
The rectangles appearing in Claim 1 are all inside the upper triangular part of G , i.e., these rectangles are of the form: [ x 1 , x 2 ] × [ y 1 , y 2 ] with y 1 y 2 < x 1 x 2 .
Proof. 
It can be easily observed that t b , appearing in the interval [ t b , t e ] corresponding to the vertices of a subtree T ( w ) , equals to f ( w ) . Since LCA ( y , w ) p w , f ( LCA ( y , w ) ) f ( p ) < f ( w ) and thus [ t b , t e ] × [ f ( x ) , f ( LCA ( y , w ) ) ] is inside the upper triangular part. ☐
In solving Q , there may be query rectangles inside the lower triangular part. However, these rectangles are of the form: [ f ( w ) , f ( w ) ] × [ a , b ] with f ( w ) < a b . Transposing this rectangle, it is inside the upper triangular part and the problem becomes like the following: for a rectangle R = [ a , b ] × [ c , c ] (with c < a b ) on G u , we want to know the point whose “x-coordinate” is the smallest within R. Converting the x-coordinate to the position in S u by B u , this problem is equivalent to the following problem: to find the leftmost element among ones in S u [ a b ] with a value just c.
Claim 3.
Finding the leftmost element among ones in S u [ a b ] with a value just c can be done by onerankand oneselectqueries on S u for any a , b , c .
Proof. 
Let k = rank c ( a 1 , S u ) . It is observed that the ( k + 1 ) -st occurrence of c in S u is the answer if it exists and its position is in [ a , b ] . So, if select c ( k + 1 , S u ) returns ∅ or a value more than b then the answer is “not exist”. Otherwise the answer can be obtained by this select query itself. ☐
This completes the proof of Lemma 6. ☐
Since rank, select, RN and RP queries on S u can be solved in O ( log n ) time for each with a wavelet tree for S u , we immediately obtain a data structure of ( m + o ( m ) ) log n bits for the queries Q and Q . Here the required space of B u is again o ( m ) log n bits and does not matter.
Corollary 2.
There exists a data structure of ( m + o ( m ) ) log n bits such that for any w , x , y , Q ( T ( w ) , x , y ) can be solved in O ( log n ) time and Q ( w , x , y ) in O ( log 2 n ) time. This data structure can be built in O ( m log n )  time.
Please note that the above corollary does not mention working space for the construction of the data structure. Therefore, obtaining space-efficient dynamic DFS algorithm requires consideration of the building process, since in the algorithm of Baswana et al. we need to rebuild this data structure periodically. We further consider this in Section 7.

6.3. Range Leftmost Value Query

Next, we show another way to compress the data structure to ( m + o ( m ) ) log n bits. This is achieved by solving range leftmost (rightmost) value query with a wavelet tree. This query appears in the preliminary version [10] of this paper. In the preliminary version, this query and the RN (RP) query (Definition 4) is used simultaneously to obtain a data structure of ( m + o ( m ) ) log n bits. Although we no longer need to use this query to halve the required space as described in Section 6.2, this query still gives us another way to compress the data structure. Moreover, this query is of independent interest since its application is beyond the dynamic DFS algorithm, as described later.
The range leftmost (rightmost) value query is defined as follows.
Definition 5.
An integer sequence S [ 1 l ] is given. Then for any interval [ a , b ] [ 1 , l ] and two integers p q , the range leftmost (rightmost) value query returns the leftmost (rightmost) element among the ones in S [ a b ] with a value not less than p and not more than q. If there is no such element, the query should return . We abbreviate it as RL (RR) query.
This RL (RR) query is a generalization of the query of Claim 3 in the sense that the value of elements we focus varies between [ p , q ] instead of being fixed to just c. Moreover, the RL (RR) query is a generalization of a prevLess query [24], which is the RR query with a = 1 and p = 0 . It is already known that the prevLess query can be efficiently solved with the wavelet tree for S [24].
First we describe the idea to use RL (RR) query for solving Q and Q . Here we consider the symmetric variant of ORS (ORP) query, which is almost the same as ORS (ORP) query except that it returns one of the points whose “x-coordinate” is the smallest (largest) within R (we call it symmetric ORS (ORP) query). The motivation to consider symmetric ORS (ORP) queries comes from transposing the query rectangle of normal ORS (ORP) queries. Since G is symmetric, the (normal) ORS query on G with rectangle R = [ x 1 , x 2 ] × [ y 1 , y 2 ] is equivalent to the symmetric ORS query on G with rectangle R = [ y 1 , y 2 ] × [ x 1 , x 2 ] . When transposing the rectangle, we can focus on the lower triangular part G l instead of the upper triangular part G u , as described below.
Let S l and B l be an integer sequence and a bit vector constructed from G l in the same manner as S u and B u . B l enables us to convert between the x-coordinate on G l and the position in S u . Then it can be observed that the symmetric ORS query on G l with R = [ x 1 , x 2 ] × [ y 1 , y 2 ] can be answered by solving the RL query on S l with [ a , b ] = [ x 1 , x 2 ] and [ p , q ] = [ y 1 , y 2 ] , where [ x 1 , x 2 ] is the interval of the position in S l corresponding to [ x 1 , x 2 ] . If the RL query returns ∅, the answer for the symmetric ORS query is also ∅. Otherwise the answer can be obtained by converting the returned position in S l to the x-coordinate on G l by B l . In this reduction “leftmost” means “the smallest x-coordinate”. Similarly, a symmetric ORP query on G l can be converted to an RR query on S l . With this conversion from the symmetric ORS (ORP) query to the RL (RR) query, we can prove the following lemma which is a lower triangular version of Lemma 6.
Lemma 7.
Any ORS (ORP) query which appears in the reduction of Lemma 3 can be answered by solving one RL (RR) query, or by solving onerankand oneselectqueries, on S l .
Proof. 
From Claim 2, the rectangles of the ORS queries corresponding to Q is all inside the upper triangular part. Therefore, when we transpose them, they are inside the lower triangular part. Since the symmetric ORS query can be converted into the RL query as described above, it is enough for solving Q. In solving Q , there may be query rectangles inside the upper triangular part. However, they are solved by one rank and one select queries on S l , in the same way as Claim 3. ☐
Next we describe how to solve RL (RR) queries with a wavelet tree. We prove the following lemma, by referring to the method to solve prevLess [24] by a wavelet tree.
Lemma 8.
An integer sequence S [ 1 l ] is given. Suppose that S [ i ] [ 0 , σ 1 ] for all i = 1 , , l . Then using the wavelet tree for S, both RL and RR queries can be answered in O ( log σ ) time for each.
Proof. 
We show that with the pseudocode given in Algorithm 2. First we show the correctness. In the arguments of RL , v is the node of the wavelet tree we are present, and [ α , ω ] is the interval of symbols v corresponds to (i.e., [ α , ω ] = [ l v , r v ] ). We claim the following, which ensures us that the answer for the RL query can be obtained by calling RL ( root , [ p , q ] , [ a , b ] , [ 0 , σ 1 ] ) . ☐
Claim 4.
For any node v of wavelet tree and four integers a , b , p , q , calling RL ( v , [ p , q ] , [ a , b ] , [ l v , r v ] ) yields the position of the answer for the RL query on S v . If the answer for the RL query is , this function returns 0.
Proof. 
We use an induction on the depth of the node v. The base case is [ l v , r v ] [ p , q ] = or [ l v , r v ] [ p , q ] (note that either of them must occur when l v = r v , so if v is a leaf node the basis case must happen). For the former case, the answer is obviously ∅. For the latter case, the answer is a since all elements in S v [ a b ] have value between [ l v , r v ] [ p , q ] . In these cases, the answer is correctly returned with lines 2–3 of Algorithm 2.
Now we proceed to the general case. Let l ( v ) = left ( v ) , r ( v ) = right ( v ) and γ = ( l v + r v ) / 2 . Then [ l l ( v ) , r l ( v ) ] = [ l v , γ ] and [ l r ( v ) , r r ( v ) ] = [ γ + 1 , r v ] . Therefore, the elements in S v [ a b ] with a value between [ l v , γ ] appear in the left child S l ( v ) [ a l b l ] , and those with a value between [ γ + 1 , r v ] appear in the right child S r ( v ) [ a r b r ] . Here a l , b l , a r and b r can be calculated by rank queries on B v : [ a l , b l ] = [ rank 0 ( B v , a 1 ) + 1 , rank 0 ( B v , b ) ] and [ a r , b r ] = [ rank 1 ( B v , a 1 ) + 1 , rank 1 ( B v , b ) ] . It can be observed that the element S v [ i ] of S v corresponding to the leftmost element S l ( v ) [ i ] in S l ( v ) [ a l , b l ] with a value within [ p , q ] is also the leftmost element in S v [ a b ] with a value within [ p , q ] [ l v , γ ] . Here i can be obtained by RL ( l ( v ) , [ p , q ] , [ a l , b l ] , [ l v , γ ] ) (induction hypothesis) and i = select 0 ( B v , i ) . Similarly, the element S v [ j ] of S v corresponding to the leftmost element S r ( v ) [ i ] in S r ( v ) [ a r , b r ] with a value within [ p , q ] is also the leftmost element in S v [ a b ] with a value within [ p , q ] [ γ + 1 , r v ] . Again j can be obtained by RL ( r ( v ) , [ p , q ] , [ a r , b r ] , [ γ + 1 , r v ] ) (induction hypothesis) and j = select 1 ( B v , j ) . Hence S v [ min { i , j } ] is the leftmost element we want to know. Now the answer is correctly returned with lines 4–10 of Algorithm 2: lines 7–10 cope with the case the RL function returns 0. ☐
Next we discuss the time complexity when calling RL ( v , [ p , q ] , [ a , b ] , [ l v , r v ] ) . We show that for each level (depth) of the wavelet tree the function visits at most four nodes. At the top level we visit only one node root . At level k, the general case occurs at most two times: it may occur when [ l v , r v ] contains the endpoints of [ p , q ] . Thus, at level k + 1 the RL function is called at most 2 × 2 = 4 times. Since the wavelet tree has depth log σ , RL function is called O ( log σ ) times. For each node, all calculation other than the recursion can be done in O ( 1 ) time, so the overall time complexity is  O ( log σ ) .
The same argument can be applied for the RR query. This completes the proof. ☐
Algorithm 2 Range Leftmost Value by Wavelet Tree
 1:function RL( v , [ p , q ] , [ a , b ] , [ α , ω ] )
 2:    if a > b or [ α , ω ] [ p , q ] = then return 0
 3:    if [ α , ω ] [ p , q ] then return a
 4:     γ ( α + ω ) / 2
 5:     i RL ( left ( v ) , [ p , q ] , [ rank 0 ( B v , a 1 ) + 1 , rank 0 ( B v , b ) ] , [ α , γ ] )
 6:     j RL ( right ( v ) , [ p , q ] , [ rank 1 ( B v , a 1 ) + 1 , rank 1 ( B v , b ) ] , [ γ + 1 , ω ] )
 7:    if i 0 and j 0 then return min ( select 0 ( B v , i ) , select 1 ( B v , j ) )
 8:    else if i 0 then return select 0 ( B v , i )
 9:    else if j 0 then return select 1 ( B v , j )
10:    else return 0
11: end function
Lemmas 7 and 8 imply another way to achieve Corollary 2.
Finally, we discuss the application of RL (RR) query beyond the dynamic DFS algorithm. Let us consider ORS and ORP queries on an l × l grid S whose placement of points is symmetric. Suppose that there are d points on a diagonal line of S (we call them diagonal points) and 2 k points within the other part of S . Generally, there is no assumption on query rectangles such as Claim 2, so with only RN (RP) queries we cannot remove the points within the lower triangular part of S . However, using RL (RR) queries as well as RN (RP) queries, we can consider only the upper triangular and the diagonal parts of S . The idea, which is very similar to the method in the preliminary version [10] of this paper, is as follows. First, for a query rectangle R = [ x 1 , x 2 ] × [ y 1 , y 2 ] of the ORS (ORP) query, we solve the corresponding RN (RP) query. Second, for the transposed query rectangle R = [ y 1 , y 2 ] × [ x 1 , x 2 ] , we solve the corresponding RL (RR) query, and “transpose” the answer. Finally, we combine these two answers: choose the one with smaller (larger) y-coordinate. Let S u d and B u d be an integer sequence and a bit vector constructed from the upper triangular and the diagonal parts of S in the same manner as S u and B u . A wavelet tree for S u d can solve RN, RP, RL and RR queries on S u d and occupies ( k + d + o ( k ) + o ( d ) ) log l bits, since the diagonal part has d points and the upper triangular part has k points. Because the space required for B u d is not a matter, now we obtain a space-efficient data structure for solving ORS and ORP queries on S . If d = 0 , the required space of the data structure is actually halved from using wavelet tree directly ( ( 2 k + d + o ( k ) + o ( d ) ) log n bits).
If d is large, we can further compress the space from ( k + d + o ( k ) + o ( d ) ) log l bits by treating diagonal points separately. Let B d [ 1 l ] be a bit vector such that B d [ i ] = 1 if there exists a point on coordinates ( i , i ) and B d [ i ] = 0 otherwise. For a query rectangle R = [ x 1 , y 1 ] × [ x 2 , y 2 ] , let [ a , b ] = [ max { x 1 , y 1 } , min { x 2 , y 2 } ] . Then it can be easily observed that R contains diagonal lines from ( a , a ) to ( b , b ) if a b and R has no diagonal points otherwise. Therefore if a b , a diagonal point with smallest (largest) y-coordinate within R can be obtained by one rank and one select query on B d . Now we do not retain the information of the diagonal points in the wavelet tree, so the overall required space of data structures is ( k + o ( k ) ) log l + l H 0 ( B d ) + o ( l ) ( k + o ( k ) ) log l + d log e l d + o ( l ) bits.

7. More Space-Efficient Dynamic DFS

In this section, we show algorithms to solve the dynamic DFS problem space-efficiently. Our algorithm is based on the algorithms of Baswana et al. [4], but there is much consideration in compressing the working space of it.

7.1. Fault Tolerant DFS

We begin with the algorithm for the fault tolerant DFS problem. Following the original algorithm described in Section 3.1, the important point is that once a data structure for answering Q and Q is built, the reduced adjacency list L is used and the whole adjacency list of the original graph is no longer needed. Moreover, information used in the algorithm other than the data structure and the reduced adjacency list takes only O ( n log n ) = o ( m ) log n bits, as described in Section 5. Since we have already shown in Corollary 2 that the data structure takes ( m + o ( m ) ) log n bits, we have only to consider the size of the reduced adjacency list L. From Section 3.1, for any k ( n ) graph updates, the number of edges in L is at most O ( n ) under the incremental case, and O ( n k log n ) under the fully dynamic case. The time complexity of this algorithm can be calculated from Lemma 1 and Corollary 2. The required space of this algorithm is ( m + o ( m ) ) log n bits plus the space required for L, that is, O ( m L log n ) bits (with standard linked lists) where m L is the upper bound of the number of edges in L. To sum up, we can obtain the following lemma.
Lemma 9.
An undirected graph G and its DFS tree T are given. Then there exists an algorithm such that with O ( m log n ) preprocessing time, a DFS tree for the graph obtained by applying any k ( n ) edge insertions to G can be built in O ( n log n ) time. This algorithm requires ( m + o ( m ) ) log n bits once the preprocessing is finished. Similarly, there exists an algorithm such that with O ( m log n ) preprocessing time, a DFS tree for the graph obtained by applying any k ( n ) updates (vertex/edge insertions/deletions) to G can be built in O ( n k log 2 n ) time. This algorithm requires ( m + o ( m ) ) log n + O ( n k log 2 n ) bits once the preprocessing is finished.

7.2. Amortized Update Time Dynamic DFS

Next, we focus on the amortized update time dynamic DFS algorithm. During the “amortized time” algorithm described in Section 3.2, we should perform the reconstruction of the data structure D to solve the fault tolerant DFS problem besides the fault tolerant DFS itself, and store information of up to last c updates. Here D is indeed a bit vector and a wavelet tree described in Section 6. Therefore, we must consider (a) how many edges the reduced adjacency list L may have, (b) how much space is required to store information of updates, and (c) how to rebuild the data structure space-efficiently. We analyze these issues one by one.
First we consider (a). Since we rebuild D periodically, we solve the fault tolerant DFS problem with at most c j updates in phase j. Therefore, we can obtain an upper bound on the number of edges in L. Under the incremental case, we can say f = m log n , g = log n and h = n log n for Lemma 2 (here f / g = m n holds), and the size of L is bounded by O ( n ) , as described in Section 7.1. Under the fully dynamic case, we can say f = m log n , g = n log 2 n and h = 0 , thus the upper bound is O ( n log n · f / g ) = O ( m n log 0.25 n ) which is o ( m ) under the assumption m = ω ( n log 0.5 n ) . Hence we can conclude that L takes only o ( m ) log n bits under any conditions.
Next we consider (b), but this is almost the same as (a). Under the incremental case, the number of edges inserted during c j updates is up to f / g = m which is o ( m ) . Under the fully dynamic case, the number of edges inserted or deleted during c j updates is up to O ( n f / g ) = O ( m n / log 0.75 n ) , since insertion or deletion of one vertex involves those of O ( n ) incident edges. This is smaller than the maximum size of L, and thus it does not affect the bound. Please note that in the analysis of (a) and (b), we write n j , m j , f j , g j , h j in Section 3.2 as n , m , f , g , h for simplicity.
Finally, we consider (c). Let L j be the order of vertices in G j defined by the pre-order traversal of T j , where G j and T j are the graph and its reported DFS tree at the beginning of phase j of the amortized update time algorithm, and let G j be the grid like G constructed from G j and L j (in our algorithm we do not retain G j , but it helps the description of our algorithm clear). In addition to these, let S j [ 1 m j ] and B j be an integer sequence and a bit vector built from the upper (or lower) triangular part of G j in the same manner as Section 6 (in Section 6, they are written as S u and B u (or S l and B l )). Now we focus on the end of phase j, i.e., the time T j + 1 is reported. Using these symbols, the rebuilding process at this moment is briefly described as follows. First, the order of vertices L j + 1 is decided according to the pre-order traversal of T j + 1 , and information attached to each vertex and edge of T j + 1 is initialized. Second, a grid G j + 1 is considered, and S j + 1 and B j + 1 are built. Finally, a wavelet tree W j + 1 for S j + 1 is constructed.
The main difficulty in this rebuilding process is that the whole adjacency list is not retained. That means the information of all edges in G j + 1 is not explicitly stored at this moment. It is stored in the wavelet tree W j for S j , the bit vector B j , and the information of the last c j updates. In our algorithm, we additionally retain the integer sequence S j during phase j of the algorithm. Then the main job in this moment is to construct S j + 1 and B j + 1 from S j , B j and the information of last c j updates.
To do this, we also retain during phase j an integer sequence M j [ 0 n j ] , where M j [ i ] is the number of points in the upper (or lower) triangular part of G j whose x-coordinate is less than i. Then 0 = M j [ 0 ] M j [ 1 ] M j [ n j ] = m j holds. The y-coordinates of the points whose x-coordinate is i are stored in S j [ M j [ i ] + 1 M j [ i + 1 ] ] ; we call this subsequence a block i of S j . We impose a condition on S j that each block of S j is sorted in ascending order. Please note that S j takes m j log n j bits and M j takes O ( n j log m j ) = o ( m j ) log n j bits.
Now we describe the way to rebuild the data structures space-efficiently at the end of phase j. First, destroy B j and W j . Second, sort the information of last c j updates, i.e., the inserted or deleted edges during the last c updates, in the same order as S j . This can be performed as follows. First convert this information to tuples ( x i , y i , d i ) where ( x i , y i ) is the vertex ids (according to L j ) of the endpoints of the edge with x i > y i (or x i < y i in the lower triangular case), and d i is information of 1 bit which represents whether the edge is inserted or deleted (here if there are inserted vertices, they are numbered from n j ). Second sort them by the ascending order of x i . If there are some tuples which share x i , they are sorted by the ascending order of y i . Please note that since the number of edges inserted or deleted during c j updates is up to o ( m j ) , sorting can be performed in linear time (i.e., o ( m j ) time) and o ( m j ) log n j bits of working space by radix sort. Third, create a new array S j + 1 [ 1 m j + 1 ] and M j + 1 [ 0 n j + 1 ] , which retains information of all edges in G j + 1 with the vertex numbering from L j . This can be done by simply scanning the tuples and S j simultaneously and merging them, since they are sorted in the same order. Fourth, destroy S j and M j , and decide the order L j + 1 of vertices in G j + 1 according to the pre-order traversal of T j + 1 . Fifth, create S j + 1 [ 1 m j + 1 ] and M j + 1 [ 0 n j + 1 ] from S j + 1 and M j + 1 , and destroy S j + 1 and M j + 1 . The detail of this process is described later. Finally, build W j + 1 and B j + 1 from M j + 1 and S j + 1 . Then we are ready for phase ( j + 1 ) .
The remaining part is creating S j + 1 and M j + 1 from S j + 1 and M j + 1 . For simplicity, let t = j + 1 , and suppose that we now focus on the upper triangular part of G t as in Section 6.2 (even if we focus on the lower triangular part of G t as in Section 6.3, we can perform this in almost the same way). First, construct an old-to-new correspondence table N of vertex numbering: N [ i ] is the vertex id from L t of a vertex whose vertex id from L j is i. Then the pseudocode for converting S t and M t to S t and M t is given in Algorithm 3. Here inc ( · ) and dec ( · ) stand for the increment and decrement (resp.) of this variable by one. This seems to be a bit complicated, but is equivalent to performing a kind of counting sort for two times. In the first part (lines 1–12 of Algorithm 3), we convert the old vertex id to the new one, let the points (i.e., edges in G t ) in the “lower” triangular part of G t , and sort them by their x-coordinates. In the second part (lines 14–21), we sort them by their y-coordinates and record their x-coordinates (see line 19) in S t . In this way their x- and y-coordinates are swapped, and finally they are in the upper triangular part of G t . In these processes, things get complicated since their x-coordinates are stored implicitly in M t and M t while y-coordinates are explicitly stored in S t and S t , but we manage to perform two counting sortings by this coordinate swapping method.
Algorithm 3 Creating S t [ 1 m t ] and M t [ 0 n t ] from S t [ 1 m t ] and M t [ 0 n t ]
 1:  i 0 , create S t [ 1 m t ] and M t [ 0 n t ] with all elem. 0
 2: for k : = 1 to m t do
 3:    while M t [ i + 1 ] < k do inc ( i )
 4:     inc ( M t [ min { N [ i ] , N [ S t [ k ] ] } ] )
 5: end for
 6: for l : = 1 to n t do M t [ l ] M t [ l ] + M t [ l 1 ]
 7:  i n t
 8: for k : = m t down to 1 do
 9:    while M t [ i ] k do dec ( i )
10:     S t [ M t [ min { N [ i ] , N [ S t [ k ] ] } ] ] max { N [ i ] , N [ S t [ k ] ] }
11:     dec ( M t [ min { N [ i ] , N [ S t [ k ] ] } ] )
12: end for
13: destroy S t and M t , create S t [ 1 m t ] and M t [ 0 n t ] with all elem. 0
14: for k : = 1 to m t do inc ( M t [ S t [ k ] ] )
15: for l : = 1 to n t do M t [ l ] M t [ l ] + M t [ l 1 ]
16:  i n t
17: for k : = m t down to 1 do
18:    while M t [ i ] k do dec ( i )
19:     S t [ M t [ S t [ k ] ] ] i
20:     dec ( M t [ S t [ k ] ] )
21: end for
22: destroy S t and M t
Finally, we consider the time complexity and the required space of this building process. In this analysis we write n j , m j as simply n , m since from (b), the number of edges is changed by only o ( m j ) during phase j when n j = o ( m j ) . The whole process to rebuild data structures takes O ( m log n ) time, since building single wavelet tree takes O ( m log n ) time and the others take only O ( m ) time. In the whole process, data structures or arrays which take ( m + o ( m ) ) log n bits are W j , S j , S j + 1 , S j + 1 , S j + 1 and W j + 1 , and at any time, this algorithm retains at most two of them. The working space for constructing wavelet tree in O ( m log n ) time [18] is at most O ( m log m ) bits. This can be written as O ( m / log m ) · log n = o ( m ) log n . Here the integer sequence S has m elements and n symbols ( [ 0 , n 1 ] ). Since we assume n = o ( m ) as in Section 2, the pointers to form a complete binary tree shape of the wavelet tree for S requires only O ( n log m ) = o ( m ) log n bits, which is negligible. All other data take only o ( m ) log n bits, and thus the space required by the algorithm is ( 2 m + o ( m ) ) log n bits. Combining these observations with Lemma 2 yields the following theorems. Theorem 2 proposes the incremental DFS algorithm, while Theorem 3 states the fully dynamic one.
Theorem 2.
There exists an algorithm such that given an undirected graph G and its DFS tree T, for any on-line sequence of edge insertions on G, a new DFS tree after each insertion can be built in amortized O ( n log n ) time per update, with O ( m log n ) preprocessing time. This algorithm requires only ( 2 m + o ( m ) ) log n bits once data structures for the original graph are built.
Theorem 3.
There exists an algorithm such that given an undirected graph G and its DFS tree T, for any on-line sequence of graph updates on G, a new DFS tree after each update can be built in amortized O ( m n log 1.25 n ) time per update under fully dynamic setting, with O ( m log n ) preprocessing time. This algorithm requires only ( 2 m + o ( m ) ) log n bits under m = ω ( n log 0.5 n ) once data structures for the original graph are built.

7.3. Worst-Case Update Time Dynamic DFS

Finally, we consider the worst-case update time algorithm for the dynamic DFS, following the “worst-case time” algorithm described in Section 3.2. To implement this space-efficiently, again we must consider (a), (b) and (c) described in Section 7.2, but two of them are almost the same argument. In the worst-case time algorithm, we should solve the fault tolerant DFS problem with at most c j 1 + c j updates and thus store information of last c j 1 + c j updates. Therefore, the required space for the reduced adjacency list and the information of updates are multiplied by some constant, but these are absorbed in the big O notation.
Therefore, we have only to consider (c). Let D j be the pair of the bit vector B j and the wavelet tree W j . Then during phase 0, D 0 is used to perform fault tolerant DFS and rebuilding of data structures is not needed. During phase j ( 1 ) , D j 1 is used and the following processes are done gradually: first destroy D j 2 (this is not needed for phase 1), and then build M j , S j and D j from M j 1 , S j 1 in the same way as Section 7.2. At the end of phase j there exist D j 1 , M j , S j and D j , and we can continue to the next phase ( j + 1 ) .
Finally, we consider how much space is needed to implement this algorithm. In phase j, D j 1 takes ( m + o ( m ) ) log n bits, and rebuilding the data structures requires at most ( 2 m + o ( m ) ) log n bits as described in Section 7.2. Therefore, the total required space is ( 3 m + o ( m ) ) log n bits. These results can be summarized in the following theorems.
Theorem 4.
There exists an algorithm such that given an undirected graph G and its DFS tree T, for any on-line sequence of edge insertions on G, a new DFS tree after each insertion can be built in worst-case O ( n log n ) time per update, with O ( m log n ) preprocessing time. This algorithm requires only ( 3 m + o ( m ) ) log n bits once data structures for the original graph are built.
Theorem 5.
There exists an algorithm such that given an undirected graph G and its DFS tree T, for any on-line sequence of graph updates on G, a new DFS tree after each update can be built in worst-case O ( m n log 1.25 n ) time per update under fully dynamic setting, with O ( m log n ) preprocessing time. This algorithm requires only ( 3 m + o ( m ) ) log n bits under m = ω ( n log 0.5 n ) once data structures for the original graph are built.

8. Applications

In this section, we show the applications of our fully dynamic DFS algorithms to dynamic connectivity, dynamic biconnectivity, and dynamic 2-edge-connectivity. The description for these applications also appears in the full version of the paper of Baswana et al. [6]. We basically follow their description, but now we must consider the additional space required by calculating them. Moreover, in dynamic biconnectivity and 2-edge-connectivity, we have some additional considerations to keep the update time same as Theorems 1 and 5.

8.1. Dynamic Connectivity

For the dynamic connectivity problem, we deal with an on-line sequence of graph updates and connectivity queries. The query takes two vertices as an input and asks whether these two vertices are in the same connected component or not. This can be easily done by the following way. For each graph update, perform dynamic DFS and obtain a new DFS tree T * rooted at the virtual vertex r. By removing r from T * , T * becomes a forest each tree of which is a spanning tree for one connected component. Then simply traversing T * from r, we can number the connected components of G, and attach to each vertex v in G the id of the connected component v belongs to. The query can be solved by simply checking the connected component id of two vertices; connected if they are same, or disconnected otherwise. Since traversing T * takes O ( n ) time and the additional required space is only O ( n log n ) = o ( m ) log n bits, these operations do not violate the update time and space complexity of dynamic DFS algorithms. Since the initial DFS tree can be obtained in O ( m + n ) time which is absorbed in the preprocessing time, we obtain the following theorem.
Theorem 6.
Given an undirected graph G, there exists an algorithm such that with O ( m log n ) preprocessing time, for any on-line sequence of graph updates (edge/vertex insertion/deletion) and connectivity queries, each update can be processed in worst-case O ( m n log 0.75 + ε n ) time ( O ( m n log 1.25 n ) time) and each query can be answered in worst-case O ( 1 ) time. This algorithm requires O ( m log n ) bits ( ( 3 m + o ( m ) ) log n bits) once the preprocessing is finished.

8.2. Dynamic Biconnectivity/2-Edge-Connectivity

For the dynamic biconnectivity (2-edge-connectivity) problems, we first formally define the problem. A set S of vertices in an undirected graph G is called a biconnected component iff it is the maximal set such that the removal of any one vertex in S keeps S connected. Similarly, a set S of vertices is said to be a 2-edge-connected component iff it is the maximal set such that the removal of any one edge whose endpoints are both in S keeps S connected. The biconnectivity (2-edge-connectivity) query takes two vertices as an input and asks whether these two vertices are in the same biconnected (2-edge-connected) component or not. The goal of the dynamic biconnectivity (2-edge-connectivity) problem is to design an algorithm which can process any on-line sequence of graph updates and biconnectivity (2-edge-connectivity) queries.
The concepts related to biconnectivity and 2-edge-connectivity are articulation points and bridges. A vertex v (an edge e) in G is called an articulation point (a bridge) iff the removal of v (e) increases the number of connected components in G. Then we can say that for any DFS tree T of G, two vertices u and v are in the same biconnected component iff the path from u to v in T (excluding u and v themselves) includes no articulation points. Similarly, two vertices u and v are in the same 2-edge-connected component iff the path from u to v in T contains no bridges. Therefore, now we can reduce the dynamic biconnectivity (2-edge-connectivity) problem to the following problem: to design an algorithm which can enumerate, for any on-line sequence of updates on G, all articulation points and bridges after each update.
In static setting, the articulation points and bridges can be listed also by DFS. Given a connected undirected graph G and its DFS tree T, we first number the vertices from 0 to n 1 by the pre-order traversal of T; the id of a vertex v is denoted by g ( v ) . Then the high number h ( v ) of a vertex v is defined by min { g ( w ) | there is at least one edge in G between w and T ( v ) } . It can be said that a vertex v is an articulation point iff v is a root of T and has multiple children, or v is a non-root vertex of T and has at least one child w with g ( v ) = h ( w ) . We can also say that a tree edge e = ( v , w ) (v is a parent of w) is a bridge iff h ( w ) = g ( v ) , and h ( x ) = g ( w ) for all children x of w. Therefore, if we can calculate h ( · ) for all vertices, we can detect all articulation points and bridges in O ( n ) time by simply traversing T.
Now the problem we want to solve is to design an algorithm such that given an undirected graph G, it can compute a new DFS tree and h ( · ) for the graph obtained by applying any k ( n ) graph updates to G. Baswana et al. [6] proposed an efficient method for solving this by modifying the fault tolerant DFS algorithm. Before explaining this, let us recall the fault tolerant DFS algorithm. We state in Section 3.1 that during the construction of a new DFS tree T * , when a path or a subtree x P T (derived from DTP) is visited, an ancestor-descendant path p * is extracted from x and the remaining part x p * is pushed back to P if originally x P or T otherwise. Let P t ( P p ) be a set of these extracted paths from T ( P ). The algorithm in Section 3.1 ensures us that | P p | k log n . For an extracted path p * P t P p , let u ( p * ) and l ( p * ) be the endpoints of p * , where u ( p * ) is an ancestor of l ( p * ) in the new DFS tree T * .
We describe their method to calculate h ( · ) . They use the query Q defined in Definition 2. First, compute the initial DTP in the same way as fault tolerant DFS, and for each vertex v in some subtree τ T , calculate the highest ancestor z v among vertices z τ such that there is an edge ( v , z ) in the updated graph. This is calculated by an ORS query on G with R = [ f ( v ) , f ( v ) ] × [ t b , t e ] where [ t b , t e ] is the interval the vertices of τ occupy in the (old) vertex numbering f ( · ) . Second, obtain a new DFS tree T * by fault tolerant DFS tree algorithm. In constructing T * , simultaneously number the vertices to get the (new) vertex numbering g ( · ) . Third, for each vertex v except r, attach an integer H ( v ) which is initialized to some constant larger than any vertex numbering g ( · ) , e.g., the number of vertices. Fourth, for each inserted edge ( v , w ) , update H ( v ) by g ( w ) and H ( w ) by g ( u ) . Here for a vertex v and an integer k, “update H ( v ) by k” means substituting min { H ( v ) , k } for H ( v ) . Finally, these operations are performed.
  • For each vertex v and each path p * P p , solve Q ( v , u ( p * ) , l ( p * ) ) to get an edge ( v , w ) and update H ( v ) by g ( w ) .
  • For each vertex v and each path p * P p , solve Q ( v , l ( p * ) , u ( p * ) ) to get ( v , w ) and update H ( w ) by g ( v ) .
  • For each vertex v that is in some subtree τ T initially (i.e., when the initial DTP is calculated), let p v * P t be the path which contains v. Then solve Q ( v , u ( p v * ) , l ( p v * ) ) to get ( v , w ) and update H ( v ) by g ( w ) .
  • For each vertex v that is in some subtree τ T initially, let p z * P t be the path which contains z v . Then solve Q ( v , u ( p z * ) , l ( p z * ) ) to get ( v , w ) and update H ( v ) by g ( w ) .
Baswana et al. [6] show that after performing them, h ( · ) is calculated by h ( v ) = min { H ( w ) | w is v itself or a descendant of v } . Thus, after calculating H ( · ) , h ( · ) can be computed in O ( n ) time by simply traversing the new DFS tree T * .
We bring their method to our situation. First we consider the time complexity. Except for the operations 1. to 4., we throw O ( n ) ORS queries, perform fault tolerant DFS and scan all inserted edges. The most time-consuming part among them is fault tolerant DFS, and takes O ( f · n k log n ) time with O ( f ) ORS (ORP) query time (Lemmas 1 and 3). The operations 3. and 4. solves Q for n times, thus they take O ( f · n log n ) time which is absorbed. However, the operations 1. and 2. solves Q for O ( n k log n ) times since | P t | = O ( k log n ) , and therefore they seem to take O ( f · n k log 2 n ) time, which is larger than performing fault tolerant DFS. This is because Q is solved O ( log n ) times slower than Q is. Here we show the following lemma, which implies that they take only O ( f · n k log n ) time.
Lemma 10.
The operations 1. and 2. can be performed by solving O ( n k log n ) ORS (ORP) queries in total.
Proof. 
In solving Q , we solve ORS (ORP) queries with R = [ f ( w ) , f ( w ) ] × [ a i , b i ] ( i = 1 , , k ) , where [ a 1 , b 1 ] , , [ a k , b k ] is the intervals p a t h ( x , y ) occupies in the (old) vertex numbering f ( · ) . For p * P t P p , let k ( p * ) be the number of intervals of vertex ids p * is divided into. Then it suffices to show that p * P t P p k ( p * ) = O ( k log n ) , since it can be said that for each vertex v, Q ( v , u ( p * ) , l ( p * ) ) and Q ( v , l ( p * ) , u ( p * ) ) are solved for all p * P t P p by p * P t P p k ( p * ) ORS (ORP) queries. Let us recall that the union of all paths in P p equals to the union of paths in P , a set of ancestor-descendant paths of the initial DTP. Let S be the vertices of the union of all paths in P p . Since | P | k (see Definition 1), S occupies at most N I = O ( k log n ) intervals. Thus, even if S is divided into | P p | = O ( k log n ) paths, the number of intervals to consider is at most N I + | P p | = O ( k log n ) . Hence p * P t P p k ( p * ) = O ( k log n ) . ☐
In this way we can say the time complexity is O ( f · n k log n ) .
Next we consider the required space, but it is easy. The key point is again that the whole adjacency list of G is not needed due to the usage of the query Q . In these processes, we must store the endpoints of each path in P t P p . For each vertex v, we must retain g ( v ) , H ( v ) , h ( v ) , z v , and a pointer to a path in P t P p v is contained, and so on. However, these sum up to only O ( n ) words of information, thus these takes only O ( n log n ) = o ( m ) log n bits. Therefore, we prove the following.
Lemma 11.
Given an undirected graph G and its DFS tree T, there exists an algorithm such that with O ( m log n ) preprocessing time, articulation points and bridges of the graph obtained by applying any k ( n ) updates (vertex/edge insertions/deletions) to G can be all enumerated in O ( n k log 1 + ε n ) ( O ( n k log 2 n ) ) time. This algorithm requires O ( m log n ) ( ( m + o ( m ) ) log n + O ( n k log 2 n ) ) bits of space once the preprocessing is finished.
We propose an efficient fully dynamic biconnectivity/2-edge-connectivity algorithm including vertex updates using Lemma 11. For 2-edge-connectivity, it can be observed from the definition that each vertex belongs to exactly one 2-edge-connected component. With the relation between 2-edge-connectivity and bridges, we can number the 2-edge-connected components of G, and attach to each vertex v the id of the 2-edge-connected component v belongs to, by simply traversing T * . Then the 2-edge-connectivity query can be answered in the same way as the connectivity query. For biconnectivity, we first build an LCA data structure [20] for T * after each update. We also compute, for each vertex v, the lowest ancestor a ( v ) among articulation points (excluding v itself). They can be all computed in O ( n ) time by simply traversing T * after each update. Then for the biconnectivity query with two input vertices v , w , first get x = L C A ( v , w ) in T * by querying the LCA data structure. Now the path from v to w in T * is indeed two ancestor-descendant paths from v to x and from x to w in T * . We can check whether these paths contain articulation vertices or not by comparing g ( a ( v ) ) to g ( x ) and g ( a ( w ) ) to g ( x ) . For example, if g ( x ) g ( a ( v ) ) , the path from v to x in T * contains at least one articulation points; otherwise does not. For one biconnectivity query the overall time is O ( 1 ) including the LCA query. The additional space required is also O ( n log n ) = o ( m ) log n bits because the LCA data structure takes only O ( n log n ) bits. Therefore, we obtain the following theorem.
Theorem 7.
Given an undirected graph G, there exists an algorithm such that with O ( m log n ) preprocessing time, for any on-line sequence of graph updates (edge/vertex insertion/deletion), biconnectivity queries and 2-edge-connectivity queries, each update can be processed in worst-case O ( m n log 0.75 + ε n ) time ( O ( m n log 1.25 n ) time) and each query can be answered in worst-case O ( 1 ) time. This algorithm requires O ( m log n ) bits ( ( 3 m + o ( m ) ) log n bits) once the preprocessing is finished.

Author Contributions

Conceptualization, K.N.; methodology, K.N. and K.S.; software, K.N.; validation, K.N. and K.S.; formal analysis, K.N. and K.S.; investigation, K.N.; resources, K.N.; data curation, K.N.; writing—original draft preparation, K.N.; writing—review and editing, K.N. and K.S.; visualization, K.N.; supervision, K.S.; project administration, K.S.; funding acquisition, K.S.

Funding

This work was supported by JST CREST Grant Number JPMJCR1402, Japan.

Conflicts of Interest

The first author is now working at NTT laboratories. This work has been done when the first author is in the University of Tokyo.

Abbreviations

The following abbreviations are used in this manuscript:
DFSDepth-First Search
ORS queryOrthogonal Range Successor query
ORP queryOrthogonal Range Predecessor query
HL decompositionHeavy-Light decomposition
LCA queryLowest Common Ancestor query

References

  1. Franciosa, P.G.; Gambosi, G.; Nanni, U. The incremental maintenance of a Depth-First-Search tree in directed acyclic graphs. Inf. Process. Lett. 1997, 61, 113–120. [Google Scholar] [CrossRef]
  2. Baswana, S.; Choudhary, K. On dynamic DFS tree in directed graphs. In Proceedings of the 40th International Symposium on Mathematical Foundations of Computer Science (MFCS), Milan, Italy, 24–28 August 2015; Volume 9235, pp. 102–114. [Google Scholar]
  3. Baswana, S.; Khan, S. Incremental algorithm for maintaining DFS tree for undirected graphs. In Proceedings of the 41st International Colloquium on Automata, Languages, and Programming (ICALP), Copenhagen, Denmark, 8–11 July 2014; Volume 8572, pp. 138–149. [Google Scholar]
  4. Baswana, S.; Chaudhury, S.R.; Choudhary, K.; Khan, S. Dynamic DFS in undirected graphs: Breaking the O(m) barrier. In Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), Arlington, VA, USA, 10–12 January 2016; pp. 730–739. [Google Scholar]
  5. Chen, L.; Duan, R.; Wang, R.; Zhang, H. Improved algorithms for maintaining DFS tree in undirected graphs. arXiv, 2016; arXiv:1607.04913. [Google Scholar]
  6. Baswana, S.; Chaudhury, S.R.; Choudhary, K.; Khan, S. Dynamic DFS in undirected graphs: Breaking the O(m) barrier. arXiv, 2015; arXiv:1502.02481. [Google Scholar]
  7. Chen, L.; Duan, R.; Wang, R.; Zhang, H.; Zhang, T. An improved algorithm for incremental DFS tree in undirected graphs. In Proceedings of the 16th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT), Malmö, Sweden, 18–20 June 2018; pp. 16:1–16:12. [Google Scholar]
  8. Baswana, S.; Goel, A.; Khan, S. Incremental DFS algorithms: A theoretical and experimental study. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), New Orleans, LA, USA, 7–10 January 2018; pp. 53–72. [Google Scholar]
  9. Khan, S. Near optimal parallel algorithms for dynamic DFS in undirected graphs. In Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), Washington, DC, USA, 24–26 July 2017; pp. 283–292. [Google Scholar]
  10. Nakamura, K.; Sadakane, K. A space-efficient algorithm for the dynamic DFS problem in undirected graphs. In Proceedings of the 11th International Conference and Workshops on Algorithms and Computation (WALCOM), Hsinchu, Taiwan, 29–31 March 2017; Volume 10167, pp. 295–307. [Google Scholar]
  11. Baswana, S.; Gupta, S.K.; Tulsyan, A. Fault tolerant and fully dynamic DFS in undirected graphs: Simple yet efficient. arXiv, 2018; arXiv:1810.01726. [Google Scholar]
  12. Grossi, R.; Gupta, A.; Vitter, J.S. High-order entropy-compressed text indexes. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), Baltimore, MD, USA, 12–14 January 2003; pp. 841–850. [Google Scholar]
  13. Frigioni, D.; Italiano, G.F. Dynamically switching vertices in planar graphs. Algorithmica 2000, 28, 76–103. [Google Scholar] [CrossRef]
  14. Chan, T.M.; Pătraşcu, M.; Roditty, L. Dynamic connectivity: Connecting to networks and geometry. SIAM J. Comput. 2011, 40, 333–349. [Google Scholar] [CrossRef]
  15. Jacobson, G. Space-efficient static trees and graphs. In Proceedings of the 30th Annual Symposium on Foundations of Computer Science (FOCS), Research Triangle Park, NC, USA, 30 October–1 November 1989; pp. 549–554. [Google Scholar]
  16. Clark, D. Compact Pat Trees. Ph.D. Thesis, University of Waterloo, Waterloo, ON, Canada, 1996. [Google Scholar]
  17. Raman, R.; Raman, V.; Rao, S.S. Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. ACM Trans. Algorithms 2007, 3, 43. [Google Scholar] [CrossRef]
  18. Munro, J.I.; Nekrich, Y.; Vitter, J.S. Fast construction of wavelet trees. Theor. Comput. Sci. 2016, 638, 91–97. [Google Scholar] [CrossRef]
  19. Sleator, D.D.; Tarjan, R.E. A data structure for dynamic trees. J. Comput. Syst. Sci. 1983, 26, 362–391. [Google Scholar] [CrossRef]
  20. Bender, M.A.; Farach-Colton, M. The LCA problem revisited. In Proceedings of the 4th Latin American Symposium on Theoretical Informatics (LATIN), Punta del Esk, Uruguay, 10–14 April 2000; Volume 1776, pp. 88–94. [Google Scholar]
  21. Belazzougui, D.; Puglisi, S.J. Range predecessor and Lempel-Ziv parsing. In Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), Arlington, VA, USA, 10–12 January 2016; pp. 2053–2071. [Google Scholar]
  22. Yu, C.C.; Hon, W.K.; Wang, B.F. Improved data structures for the orthogonal range successor problem. Comput. Geom. 2011, 44, 148–159. [Google Scholar] [CrossRef]
  23. Navarro, G. Wavelet trees for all. J. Discret. Algorithms 2014, 25, 2–20. [Google Scholar] [CrossRef]
  24. Kreft, S.; Navarro, G. Self-indexing based on LZ77. In Proceedings of the 22nd Annual Symposium on Combinatorial Pattern Matching (CPM), Palermo, Italy, 27–29 June 2011; Volume 6661, pp. 41–54. [Google Scholar]
Figure 1. The configurations of T ( w ) and p a t h ( x , y ) in T that can appear in Q ( T ( w ) , x , y ) .
Figure 1. The configurations of T ( w ) and p a t h ( x , y ) in T that can appear in Q ( T ( w ) , x , y ) .
Algorithms 12 00052 g001
Figure 2. The configuration of p a t h ( x , y ) and w in T that can appear in Q ( w , x , y ) and cannot appear in Q ( T ( w ) , x , y ) .
Figure 2. The configuration of p a t h ( x , y ) and w in T that can appear in Q ( w , x , y ) and cannot appear in Q ( T ( w ) , x , y ) .
Algorithms 12 00052 g002
Figure 3. An example of an undirected graph with vertex numbering, its adjacency list, and the corresponding rank space R .
Figure 3. An example of an undirected graph with vertex numbering, its adjacency list, and the corresponding rank space R .
Algorithms 12 00052 g003
Figure 4. (Left) an example of the grid G and its corresponding array S and bit vector B. Please note that this grid corresponds to the graph in Figure 3. (Right) the upper triangular part G u and its corresponding array S u and bit vector B u .
Figure 4. (Left) an example of the grid G and its corresponding array S and bit vector B. Please note that this grid corresponds to the graph in Figure 3. (Right) the upper triangular part G u and its corresponding array S u and bit vector B u .
Algorithms 12 00052 g004
Table 1. Comparison of required space and worst-case update time for dynamic DFS algorithms.
Table 1. Comparison of required space and worst-case update time for dynamic DFS algorithms.
Worst-Case Update Time
Space (bits)Fully DynamicIncremental
[4] O ( m log 2 n ) O ( m n log 2.5 n ) O ( n log 3 n )
[5,7] O ( m log 2 n ) O ( m n log 1.5 n ) O ( n )
A O ( m log n ) O ( m n log 0.75 + ε n ) O ( n log n ) *
B ( 3 m + o ( m ) ) log n O ( m n log 1.25 n ) O ( n log n )
[11] O ( m log n ) O ( m n log 0.5 n )
* If m = O ( n 2 / log n ) , this can be reduced to O ( n log ε n ) .

Share and Cite

MDPI and ACS Style

Nakamura, K.; Sadakane, K. Space-Efficient Fully Dynamic DFS in Undirected Graphs . Algorithms 2019, 12, 52. https://doi.org/10.3390/a12030052

AMA Style

Nakamura K, Sadakane K. Space-Efficient Fully Dynamic DFS in Undirected Graphs . Algorithms. 2019; 12(3):52. https://doi.org/10.3390/a12030052

Chicago/Turabian Style

Nakamura, Kengo, and Kunihiko Sadakane. 2019. "Space-Efficient Fully Dynamic DFS in Undirected Graphs " Algorithms 12, no. 3: 52. https://doi.org/10.3390/a12030052

APA Style

Nakamura, K., & Sadakane, K. (2019). Space-Efficient Fully Dynamic DFS in Undirected Graphs . Algorithms, 12(3), 52. https://doi.org/10.3390/a12030052

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop