A Survey on Approximation in Parameterized Complexity: Hardness and Algorithms

Parameterization and approximation are two popular ways of coping with NP-hard problems. More recently, the two have also been combined to derive many interesting results. We survey developments in the area both from the algorithmic and hardness perspectives, with emphasis on new techniques and potential future research directions.


Introduction
In their seminal papers of the mid 1960s, Cobham [1] and Edmonds [2] independently phrased what is now known as the Cobham-Edmonds thesis. It states that an optimization problem is feasibly solvable if it admits an algorithm with the following two properties:

1.
Accuracy: the algorithm should always compute the best possible (optimum) solution.

2.
Efficiency: the runtime of the algorithm should be polynomial in the input size n.
Shortly after the Cobham-Edmonds thesis was formulated, the development of the theory of NP-hardness and reducibility identified a whole plethora of problems that are seemingly intractable, i.e., for which algorithms with the above two properties do not seem to exist. Even though the reasons for this phenomenon remain elusive up to this day, this has not hindered the development of algorithms for such problems. To obtain an algorithm for an NP-hard problem, at least one of the two properties demanded by the Cobham-Edmonds thesis needs to be relaxed. Ideally, the properties are relaxed as little as possible, in order to stay close to the notion of feasible solvability suggested by the thesis.
A very common approach is to relax the accuracy condition, which means aiming for approximation algorithms [3,4]. The idea here is to use only polynomial time to compute an α-approximation, i.e., a solution that is at most a factor α times worse than the optimum solution obtainable for the given input instance. Such an algorithm may also be randomized, i.e., there is either a high probability that the output is an α-approximation, or the runtime is polynomial in expectation.
In a different direction, several relaxations of the efficiency condition have also been proposed. Popular among these is the notion of parameterized algorithms [5,6]. Here the input comes together with some parameter k ∈ N, which describes some property of the input and can be expected to be small bound framework for Turing kernels [19], and the question of approximate kernels for problems that do not even admit Turing kernels is fairly natural to ask. However we skip this discussion for the sake of brevity.
Finally, note that in literature, there is another notion of approximate kernels called α-fidelity kernelization [20] which is different from the one mentioned above. Essentially, an α-fidelity kernel is a polynomial time preprocessing procedure such that an optimal solution to the reduced instance translates to an α-approximate solution to the original. This definition allows a loss of precision in the preprocessing step, but demands that the reduced instance has to be solved to optimality. See [18] for a detailed discussion on the differences between the two approximate kernel notions.
Complexity-Theoretic Hypotheses. We assume that the readers have basic knowledge of (classic) parameterized complexity theory, including the W-hierarchy, the exponential time hypothesis (ETH), and the strong exponential time hypothesis (SETH). The reader may choose to recapitulate these definitions by referring to [6] (Sections 13 and 14).
We will additionally discuss two hypotheses that may not be standard to the community. The first is the Gap Exponential Time Hypothesis (Gap-ETH), which is a strengthening of ETH. Roughly speaking, it states that even the approximate version of 3SAT cannot be solved in subexponential time; a more formal statement of Gap-ETH can be found in Hypothesis 2. Another hypothesis we will discuss is the Parameterized Inapproximability Hypothesis (PIH), which states that the multicolored version of the DENSEST k-SUBGRAPH is hard to approximate in FPT time. Once again, we do not define PIH formally here; please refer to Hypothesis 1 for a formal statement.

FPT Hardness of Approximation
In this section, we focus on showing barriers against obtaining good parameterized approximation algorithms. The analogous field of study in the non-parameterized (NP-hardness) regime is the theory of hardness of approximation. The celebrated PCP Theorem [21,22] and numerous subsequent works have developed a rich set of tools that allowed researchers to show tight inapproximability results for many fundamental problems. In the context of parameterized approximation, the field is still in the nascent stage. Nonetheless, there have been quite a few tools that have already been developed, which are discussed in the subsequent subsections.
We divide this section into two parts. In Section 3.1, we discuss the results and techniques in the area of hardness of parameterized approximation under the standard assumption of W [1] = FPT. In Section 3.2, we discuss results and techniques in hardness of parameterized approximation under less standard assumptions such as the Gap Exponential Time Hypothesis, where the gap is inherent in the assumption, and the challenge is to construct gap-preserving reductions.

W[1]-Hardness of Gap Problems
In this subsection, we discuss W[1]-hardness of approximation of a few fundamental problems. In particular, we discuss the parameterized inapproximability (i.e., W[1]-hard to even approximate) of the DOMINATING SET problem, the (ONE-SIDED) BICLIQUE problem, the EVEN SET problem, the SHORTEST VECTOR problem, and the STEINER ORIENTATION problem. We emphasize here that the main difficulty that is addressed in this subsection is gap generation, i.e., we focus on how to start from a hard problem (with no gap), say k-CLIQUE (which is the canonical W[1]-complete problem), and reduce it to one of the aforementioned problems, while generating a non-trivial gap in the process.

Parameterized Intractability of Biclique and Applications to Parameterized Inapproximability
In this subsubsection, we will discuss the parametrized inapproximability of the one-sided biclique problem, and show how both that result and its proof technique lead to more inapproximability results.
We begin our discussion by formally stating the k-BICLIQUE problem where we are given as input a graph G and an integer k, and the goal is to determine whether G contains a complete bipartite subgraph with k vertices on each side. The complexity of k-BICLIQUE was a long standing open problem and was resolved only recently by Lin [23] where he showed that it is W[1]-hard. In fact, he showed a much stronger result and this shall be the focus of attention in this subsubsection. Theorem 1 ([23]). Given a bipartite graph G(L∪R, E) and k ∈ N as input, it is W [1]-hard to distinguish between the following two cases: • Completeness: There are k vertices in L with at least n Θ( 1 k ) common neighbors in R; • Soundness: Any k vertices in L have at most (k + 6)! common neighbors in R.
We shall refer to the gap problem in the above theorem as the ONE-SIDED k-BICLIQUE problem. To prove the above result, Lin introduced a technique which we shall refer to as Gadget Composition. The gadget composition technique has found more applications since [23]. We provide below a failed approach (given in [23]) to prove the above theorem; nonetheless it gives us good insight into how the gadget composition technique works.
Suppose we can construct a set family T = {S 1 , S 2 , . . . , S n } of subsets of [n] for some integers k, n and h > (for example, h = n 1/k and = (k + 1)!) such that: Property 1: Any k + 1 distinct subsets in T have intersection size at most ; Property 2: Any k distinct subsets in T have intersection size at least h.
Then we can combine T with an instance of k-CLIQUE to obtain a gap instance of ONE-SIDED k-BICLIQUE as follows. Given a graph G and parameter k with V(G) ⊆ [n], we construct our instance of ONE-SIDED k-BICLIQUE, say H(L∪R, E(H)) by setting L := E(G) and R := [n], where for any (v i , v j ) ∈ L and v ∈ [n], we have that ((v i , v j ), v) ∈ E(H) if and only if v ∈ S i ∩ S j . Let s := k(k − 1)/2. It is easy to check that if G has a k-vertex clique, say {v * 1 , . . . , v * k } is a clique in G, then Property 2 implies that |∆ := i∈[k] S v * i | ≥ h. It follows that the set of s vertices in L given by {(v * i , v * j ) : for all {i, j} ∈ ( [k] 2 )} are neighbors of every vertex in ∆ ⊆ R. On the other hand, if G contains no k-vertex clique, then any s distinct vertices in L (i.e., s edges in G) must have at least k + 1 vertices in G as their end points. Say V was the set of all vertices contained the s edges. By Property 1, we know that |∆ := v∈V S v | ≤ , and thus any s distinct vertices in L have at most common neighbors in R.
It is indeed very surprising that this technique can yield non-trivial inapproximability results, as the gap is essentially produced from the gadget and is oblivious to the input! This also stands in stark contrast to the PCP theorem and hardness of approximation results in NP, where all known results were obtained by global transformations on the input. The key difference between the parameterized and NP worlds is the notion of locality. For example, consider the k-CLIQUE problem, if a graph does not have a clique of size k, then given any k vertices, a random vertex pair in these k vertices does not have an edge with probability at least 1/k 2 . It is philosophically possible to compose the input graph with a simple error correcting code to amplify this probability to a constant, as we are allowed to blowup the input size by any function of k. In contrast, when k is not fixed, like in the NP world, k is of the same magnitude as the input size, and thus we are only allowed to blow up the input size by poly(n) factor. Nonetheless, we have to point out that the gadgets typically needed to make the gadget composition technique work must be extremely rich in combinatorial structure (and are typically constructed from random objects or algebraic objects), and were previously studied extensively in the area of extremal combinatorics.
Returning to the reduction above from k-CLIQUE to ONE-SIDED k-BICLIQUE, it turns out that we do not know how to construct the set system T , and hence the reduction does not pan out. Nonetheless Lin constructed a variant of T , where Property 2 was more refined and the reduction from k-CLIQUE to ONE-SIDED k-BICLIQUE, went through with slightly more effort.
Before we move on to discussing some applications of Theorem 1 and the gadget composition technique, we remark about known stronger time lower bound for ONE-SIDED k-BICLIQUE under stronger running time hypotheses. Lin [23] showed a lower bound of n Ω( √ k) for ONE-SIDED k-BICLIQUE assuming ETH. We wonder if this can be further improved.
Open Question 1 (Lower bound of ONE-SIDED k-BICLIQUE under ETH and SETH). Can the running time lower bound on ONE-SIDED k-BICLIQUE be improved to n Ω(k) under ETH? Can it be improved to n k−o (1) under SETH?
We remark that a direction to address the above question was detailed in [24]. While on the topic of the k-BICLIQUE problem, it is worth noting that the lower bound of n Ω( √ k) for ONE-SIDED k-BICLIQUE assuming ETH yields a running time lower bound of n Ω( log k log log k ) for the k-BICLIQUE problem (due to the soundness parameters in Theorem 1). However, assuming randomized ETH, the running time lower bound for the k-BICLIQUE problem can be improved to n Ω( √ k) [23]. Can this improved running time lower bound be obtained just under (deterministic) ETH? Finally, we remark that we shall discuss about the hardness of approximation of the k-BICLIQUE problem in Section 3.2.3.

Inapproximability of k-DOMINATING SET via Gadget Composition.
We shall discuss about the inapproximability of k-DOMINATING SET in detail in the next subsubsection. We would like to simply highlight here how the above framework was used by Chen and Lin [25] and Lin [26] to obtain inapproximability results for k-DOMINATING SET.
In [25], the authors starting from Theorem 1, obtain the W[1]-hardness of approximating k-DOMINATING SET to a factor of almost two. Then they amplify the gap to any constant by using a specialized graph product.
We now turn our attention to a recent result of Lin [26] who provided strong inapproximability result for k-DOMINATING SET (we refer the reader to Section 3.1.2 to obtain the context for this result). Lin's proof of inapproximability of k-DOMINATING SET is a one-step reduction from an instance of k-SET COVER on a universe of size O(log n) (where n is the number of subsets given in the collection) to an instance of k-SET COVER on a universe of size poly(n) with a gap of log n log log n 1/k . Lin then uses this gap-producing self-reduction to provide running time lower bounds (under different time hypotheses) for approximating k-set cover to a factor of (1 − o(1)) · log n log log n 1/k . Recall that k-DOMINATING SET is essentially 1 equivalent to k-SET COVER.
Elaborating, Lin designs a gadget by combining the hypercube partition gadget of Feige [28] with a derandomizing combinatorial object called universal set, to obtain a gap gadget, and then combines the gap gadget with the input k-SET COVER instance (on small universe but with no gap) to obtain a gap k-SET COVER instance. This is another success story of the gadget composition technique. 1 Recall that there is a pair of polynomial-time L-reductions between the minimum dominating set problem and the set cover problem [27].
Finally, we remark that Lai [29] recently extended Lin's inapproximability results for dominating set (using the same proof framework) to rule out constant-depth circuits of size f (k)n o( √ k) for any computable function f . Even Set. A recent success story of Theorem 1 is its application to resolve a long standing open problem called k-MINIMUM DISTANCE PROBLEM (also referred to as k-EVEN SET), where we are given as input a generator matrix A ∈ F n×m 2 of a binary linear code and an integer k, and the goal is to determine whether the code has distance at most k. Recall that the distance of a linear code is min 0 =x∈F m 2 Ax 0 where · 0 denote the 0-norm (aka the Hamming norm).
In [30], the authors showed that k-EVEN SET is W [1]-hard under randomized reductions. The result was obtained by starting from the inapproximability result stated in Theorem 1 followed by a series of intricate reductions. In fact they proved the following stronger inapproximability result. Theorem 2 ([30]). For any γ ≥ 1, given input (A, k) ∈ F n×m × N, it is W[1]-hard (under randomized reductions) to distinguish between

•
Completeness: Distance of the code generated by A is at most k , and, • Soundness: Distance of the code generated by A is more than γ · k.
We emphasize that even to obtain the W[1]-hardness of k-EVEN SET (with no gap), they needed to start from the gap problem given in Theorem 1.
The proof of the above theorem proceeds by first showing FPT hardness of approximation of the non-homogeneous variant of k-MINIMUM DISTANCE PROBLEM called the k-NEAREST CODEWORD PROBLEM. In k-NEAREST CODEWORD PROBLEM, we are given a target vector y (in F n ) in addition to (A, k), and the goal is to find whether there is any x (in F m ) such that the Hamming norm of Ax − y is at most k. As an intermediate step of the proof of Theorem 2, they showed that k-NEAREST CODEWORD PROBLEM is W[1]-hard to approximate to any constant factor.
An important intermediate problem which was studied by [30] to prove the inapproximability of k-NEAREST CODEWORD PROBLEM, was the k-LINEAR DEPENDENT SET problem where given a set A of n vectors over a finite field F q and an integer k, the goal is to decide if there are k vectors in A that are linearly dependent. They ruled out constant factor approximation algorithms for this problem running in FPT time. Summarizing, the high level proof overview of Theorem 2 follows by reducing ONE-SIDED k-BICLIQUE to k-LINEAR DEPENDENT SET, which is then reduced to k-NEAREST CODEWORD PROBLEM, followed by a final randomized reduction to k-MINIMUM DISTANCE PROBLEM.
Finally, we note that there is no reason to define k-MINIMUM DISTANCE PROBLEM only for binary code, but can instead be defined over larger fields as well. It turns out that [30] cannot rule out FPT algorithms for k-MINIMUM DISTANCE PROBLEM over F p with p > 2, when p is fixed and is not part of the input. Thus we have the open problem.
Open Question 2. Is it W[1]-hard to decide k-MINIMUM DISTANCE PROBLEM over F p with p > 2, when p is fixed and is not part on the input? Shortest Vector Problem. Theorem 1 (or more precisely the constant inapproximability of k-LINEAR DEPENDENT SET stated above) was also used to resolve the complexity of the parameterized k-SHORTEST VECTOR PROBLEM in lattices, where the input (in the p norm) is an integer k ∈ N and a matrix A ∈ Z n×m representing the basis of a lattice, and we want to determine whether the shortest (non-zero) vector in the lattice has length at most k, i.e., whether min 0 =x∈Z m Ax p ≤ k. Again, k is the parameter of the problem. It should also be noted here that (as in [31]), we require the basis of the lattice to be integer valued, which is sometimes not enforced in literature (e.g., [32,33]). This is because, if A is allowed to be any matrix in R n×m , then parameterization is meaningless because we can simply scale A down by a large multiplicative factor.
In [30], the authors showed that k-SHORTEST VECTOR PROBLEM is W[1]-hard under randomized reductions. In fact they proved the following stronger inapproximability result.

Theorem 3 ([30]
). For any p > 1, there exists a constant γ p > 1 such that given input (A, k) ∈ Z n×m × N, it is W[1]-hard (under randomized reductions) to distinguish between

•
Completeness: The p norm of the shortest vector of the lattice generated by A is ≤ k, and, • Soundness: The p norm of the shortest vector of the lattice generated by A is > γ p · k.
Notice that Theorem 2 rules out FPT approximation algorithms with any constant approximation ratio for k-EVEN SET. In contrast, the above result only prove FPT inapproximability with some constant ratio for k-SHORTEST VECTOR PROBLEM in p norm for p > 1. As with k-EVEN SET, even to prove the W[1]-hardness of k-SHORTEST VECTOR PROBLEM (with no gap), they needed to start from the gap problem given in Theorem 1.
The proof of the above theorem proceeds by first showing FPT hardness of approximation of the non-homogeneous variant of k-SHORTEST VECTOR PROBLEM called the k-NEAREST VECTOR PROBLEM. In k-NEAREST VECTOR PROBLEM, we are given a target vector y (in Z n ) in addition to (A, k), and the goal is to find whether there is any x (in Z m ) such that the p norm of Ax − y is at most k. As an intermediate step of the proof of Theorem 2, they showed that k-NEAREST VECTOR PROBLEM is W[1]-hard to approximate to any constant factor. Summarizing, the high level proof overview of Theorem 3 follows by reducing ONE-SIDED k-BICLIQUE to k-LINEAR DEPENDENT SET, which is then reduced to k-NEAREST VECTOR PROBLEM, followed by a final randomized reduction to k-SHORTEST VECTOR PROBLEM.
An immediate open question left open from their work is whether Theorem 3 can be extended to k-SHORTEST VECTOR PROBLEM in the 1 norm. In other words, Open Question 3 (Approximation of k-SHORTEST VECTOR PROBLEM in 1 norm). Is k-SHORTEST VECTOR PROBLEM in the 1 norm in FPT?

Parameterized Inapproximability of Dominating Set
In the k-DOMINATING SET problem we are given an integer k and a graph G on n vertices as input, and the goal is to determine if there is a dominating set of size at most k. It was a long standing open question to design an algorithm which runs in time T(k) · poly(n) (i.e., FPT-time), that would find a dominating set of size at most F(k) · k whenever the graph G has a dominating set of size k, for any computable functions T and F.
The first non-trivial progress on this problem was by Chen and Lin [25] who ruled out the existence of such algorithms (under W [1] = FPT) for all constant functions F (i.e., F(n) = c, where c is any universal constant). We discussed their proof technique in the previous subsubsection. A couple of years later, Karthik C. S. et al. [34] completely settled the question, by ruling out the existence of such an algorithm (under W [1] = FPT) for any computable function F. Thus, k-DOMINATING SET was shown to be totally inapproximable. We elaborate on their proof below. Theorem 4 ([34]). Let F : N → N be any computable function. Given an instance (G, k) of k-DOMINATING SET as input, it is W[1]-hard to distinguish between the following two cases:

•
Completeness: G has a dominating set of size k. • Soundness: Every dominating set of G is of size at least F(k) · k.
The overall proof follows by reducing k-MULTICOLOR CLIQUE to the gap k-DOMINATING SET with parameters as given in the theorem statement. In the k-MULTICOLOR CLIQUE problem, we are given an integer k and a graph G on vertex set V := V 1∪ V 2∪ · · ·∪V k as input, where each V i is an independent set of cardinality n, and the goal is to determine if there is a clique of size k in G. Following a straightforward reduction from the k-CLIQUE problem, it is fairly easy to see that k-MULTICOLOR CLIQUE is W[1]-hard.
The reduction from k-MULTICOLOR CLIQUE to the gap k-DOMINATING SET proceeds in two steps. In the first step we reduce k-MULTICOLOR CLIQUE to k-GAP CSP. This is the step where we generate the gap. In the second step, we reduce k-GAP CSP to gap k-DOMINATING SET. This step is fairly standard and mimics ideas from Feige's proof of the NP-hardness of approximating the MAX COVERAGE problem [28].
Before we proceed with the details of the above two steps, let us introduce a small technical tool from coding theory that we would need. We need codes known in literature as good codes, these are binary error correcting codes whose rate and relative distances are both constants bounded away from 0 (see [35] (Appendix E.1.2.5) for definitions). The reader may think of them as follows: for every ∈ N, we say that C ⊆ {0, 1} is a good code if (i) |C | = 2 ρ , for some universal constant ρ > 0, (ii) for any distinct c, c ∈ C we have that c and c have different values on at least δ fraction of coordinates, for some universal constant δ > 0. An encoding of C is an injective function E C : {0, 1} ρ → C . The encoding is said to be efficient if E C (x) can be computed in poly( ) time for any x ∈ {0, 1} ρ .
Let us fix k ∈ N and F : N → N as in the theorem statement. We further define .
From k-MULTICOLOR CLIQUE to k-GAP CSP. Starting from an instance of k-MULTICOLOR CLIQUE, say G on vertex set V := V 1∪ V 2∪ · · ·∪V k , we write down a set of constraints P on a variable set X := {x i,j | i, j ∈ [k], i = j} as follows. For every i, j ∈ [k], such that i = j, define E i,j to be the set of all edges in G whose end points are in V i and V j . An assignment to variable x i,j is an element of E i,j , i.e., a pair of vertices, one from V i and the other from V j . Suppose that x i,j was assigned the edge {v i , v j }, where v i ∈ V i and v j ∈ V j . Then we define the assignment of x i i,j to be v i and the assignment of x j i,j to be v j . We define P := {P 1 , . . . , P k }, where the constraint P i is defined to be satisfied if the assignment to all of . , x i k,i are the same. We refer to the problem of determining if there is an assignment to the variables in X such that all the constraints are satisfied as the k-CSP problem. Notice that while this is a natural way to write k-MULTICOLOR CLIQUE as a CSP, where we have tried to check if all variables having a vertex in common, agree on its assignment, there is no gap yet in the k-CSP problem. In particular, if there was a clique of size k in G then there is an assignment to the variables of X (by assigning the edges of the clique in G to the corresponding variable in X) such that all the constraints in P are satisfied; however, if every clique in G is of size less than k then there every assignment to the variables of X may violate only one constraint in P (and not more).
In order to amplify the gap, we rewrite the set of constraints P in a different way to obtain the set of constraints P , on the same variable set X, as follows. Suppose that x i,j was assigned the edge {v i , v j }, where v i ∈ V i and v j ∈ V j , then for β ∈ [log n], we define the assignment of x i,β i,j to be the β th coordinate of v i . Recall that |V i | = n and therefore we can label all vertices in V i by vectors in {0, 1} log n . We define P := {P 1 , . . . , P log n }, where the constraint P β is defined to be satisfied if and only if the following holds for all i ∈ [k]: the assignment to all of x i,β k,i are the same. Again notice that there is an assignment to the variables of X such that all the constraints in P are satisfied if and only if the same assignment also satisfies all the constraints in P .
However, rewriting P as P allows us to simply apply the error correcting code C (with parameters ρ and δ, and encoding function E C ) to the constraints in P , to obtain a gap! In particular, we choose to be such that ρ = log n. Consider a new set of constraints P , on the same variable set X, as follows. For any z ∈ {0, 1} log n and β ∈ [ ], we denote by E C (z) β , the β th coordinate of E C (z). We define P := {P 1 , . . . , P }, where the constraint P β is defined to be satisfied if and only if the following holds for Notice, as before, that there is an assignment to the variables of X such that all the constraints in P are satisfied if and only if the same assignment also satisfies all the constraints in P . However, for every assignment to X that violates at least one constraint in P , we have that the same assignment violates at least δ fraction of the constraints in P . To see this, consider an assignment that violates the constraint P 1 in P . This implies that there is some i ∈ [k] such that the assignment to all of are not the same. Let us suppose, without loss of generality, that the assignment to x i,1 1,i and x i,1 2,i are different. In other words, we have that By the distance of the code C we have that |∆| ≥ δ . Finally, notice that for all β ∈ ∆, we have that the assignment does not satisfy constraint P β in P . We refer to the problem of distinguishing if there is an assignment to X such that all the constraints are satisfied or if every assignment to X does not satisfy a constant fraction of the constraints, as the k-GAP CSP problem.
In order to rule out F(k) approximation FPT algorithms for k-DOMINATING SET, we will need that for every assignment to X that violates at least one constraint in P , we have that the same assignment violates at least α fraction of the constraints in P (instead of just δ; note that α is very close to 1, whereas δ can be at most half). To boost the gap 2 we apply a simple repetition/direct-product trick to our constraint system. Starting from P , we construct a new set of constraints P * , on the same variable set X, as follows.
log (1−δ) . For every S ∈ [ ] t , we define P S to be satisfied if and only if for all β ∈ S, the constraint P β is satisfied.
It is easy to see that P and P * have the same set of completely satisfying assignments. However, for every assignment to X that violates δ fraction of constraints in P , we have that the same assignment violates at least α fraction of the constraints in P * . To see this, consider an assignment that violates δ fraction of constraints in P , say it violates all constraints P β ∈ P , for every β ∈ ∆ ⊆ [ ] . This implies that the assignment satisfies constraint P S if and only if S ∈ ([ ] \ ∆) t . This implies that the fraction of constraints in P * that the assignment can satisfy is upper bounded by (1 − δ) t = 1 − α.
From k-GAP CSP to gap k-DOMINATING SET. In the second part, starting from the aforementioned instance of k-GAP CSP (after boosting the gap), we construct an instance H of k-DOMINATING SET. The construction is due to Feige [28] 3 and it proceeds as follows. Let F be the set of all functions from {0, 1} tk to ( k 2 ), i.e., F : The graph H is on vertex set U = A∪B, where A = P * × F and B = E(G), i.e., B is simply the edge set of G. We introduce an edge between all pairs of vertices in B. We introduce an edge between a := (S := (s 1 , . . . , s t ) ∈ [ ] t , f : {0, 1} tk → ( k 2 )) ∈ A and e := (v i , v j ) ∈ E if and only if the following holds.
Notice that the number of vertices in H is |A| We skip presenting details of this part of the proof here. The proofs have been derived many times in literature; if needed, the readers may refer to Appendix A of [34]. This completes our sketch of the proof of Theorem 4.
A few remarks are in order. First, the k-GAP CSP problem described in the proof above, is formalized as the k-MAXCOVER problem in [34] (and was originally introduced in [36]). In particular, the formalism of k-MAXCOVER (which may be thought of as the parameterized label cover problem) is generic enough to be used as an intermediate gap problem to reduce to both k-DOMINATING SET (as in [34]) and k-CLIQUE (as in [36]). Moreover, it was robust enough to capture stronger running time lower bounds (under stronger hypotheses); this will elaborated below. However, in order to keep the above proof succinct, we skipped introducing the k-MAXCOVER problem, and worked with k-GAP CSP, which was sufficient for the above proof.
Second, Karthik C. S. et al. [34] additionally showed that for every computable functions T, F : N → N and every constant ε > 0: • Assuming the Exponential Time Hypothesis (ETH), there is no F(k)-approximation algorithm for k-DOMINATING SET that runs in T(k) · n o(k) time.

•
Assuming the Strong Exponential Time Hypothesis (SETH), for every integer k ≥ 2, there is no F(k)-approximation algorithm for k-DOMINATING SET that runs in T(k) · n k−ε time.
In order to establish Theorem 4 and the above two results, Karthik C. S. et al. [34] introduced a framework to prove parameterized hardness of approximation results. In this framework, the objective was to start from either the W[1] = FPT hypothesis, ETH, or SETH, and end up with the gap k-DOMINATING SET, i.e., they design reductions from instances of k-CLIQUE, 3-CNF-SAT, and -CNF-SAT, to an instance of gap k-DOMINATING SET. A prototype reduction in this framework has two modular parts. In the first part, which is specific to the problem they start from, they generate a gap and obtain hardness of gap k-MAXCOVER. In the second part, they show a gap preserving reduction from gap k-MAXCOVER to gap k-DOMINATING SET, which is essentially the same as the reduction from k-GAP CSP to k-DOMINATING SET in the proof of Theorem 4.
The first part of a prototype reduction from the computational problem underlying a hypothesis of interest to gap k-MAXCOVER follows by the design of an appropriate communication protocol. In particular, the computational problem is first reduced to a constraint satisfaction problem (CSP) over k (or some function of k) variables over an alphabet of size n. The predicate of this CSP would depend on the computational problem underlying the hypothesis from which we started. Generalizing ideas from [37], they then show how a protocol for computing this predicate in the multiparty (number of players is the number of variables of the CSP) communication model, can be combined with the CSP to obtain an instance of gap k-MAXCOVER. For example, for the W[1] = FPT hypothesis and ETH, the predicate is a variant of the equality function, and for SETH, the predicate is the well studied disjointness function. The completeness and soundness of the protocols computing these functions translate directly to the completeness and soundness of k-MAXCOVER.
Third, we recall that Lin [26] recently provided alternate proofs of Theorem 4 and the above mentioned stronger running time lower bounds. While we discussed about his proof technique in Section 3.1.1, we would like to discuss about his result here. Following the right setting of parameters in the proof of Theorem 4 (for example set α = 1 − 1 (log n) Ω(1/k) ), we can obtain that approximating k-DOMINATING SET to a factor of (log n) 1/k 3 is W[1]-hard. Lin improved the exponent of 1/k 3 in the approximation factor to h(k) for any computable function h. Can this inapproximability be further improved? On the other hand, can we do better than the simple polynomial time greedy algorithm which provides a (1+ln n) factor approximation? This leads us to the following question: Open Question 4 (Tight inapproximability of k-DOMINATING SET). Is there a (log n) 1−o(1) factor approximation algorithm for k-DOMINATING SET running in time n k−0.1 ?
We conclude the discussion on k-DOMINATING SET with an open question on W[2]-hardness of approximation. As noted earlier, k-DOMINATING SET is a W[2]-complete problem, and Theorem 4 shows that the problem is W[1]-hard to approximate to any F(k) factor. However, is there some computable function F for which approximating k-DOMINATING SET is in W [1]? In other words we have: Open Question 5 (W[2]-completeness of approximating k-DOMINATING SET). Can we base total inapproximability of k-DOMINATING SET on W[2] = FPT?

Parameterized Inapproximability of Steiner Orientation by Gap Amplification
Gap amplification is a widely used technique in the classic literatures on (NP-)hardness of approximation (e.g. [38][39][40]). In fact, the arguably simplest proof of the PCP theorem, due to Dinur [40], is indeed via repeated gap amplification. The overall idea here is simple: we start with a hardness of approximation for a problem with small factor (e.g., 1 + 1/n). At each step, we perform an operation that transforms an instance of our problem to another instance, in such a way that the gap becomes bigger; usually this new instance will also be bigger than our instance. By repeatedly applying this operation, one can finally arrive at a constant, or even super constant, factor hardness of approximation.
There are two main parameters that determine the success/failure of such an approach: how large the new instance is compared to the old instance (i.e., size blow-up) and how large the new gap is compared to the old gap, in each operation. To see how these two come into the picture, let us first consider a case study where a (straightforward) gap amplification procedure does not work: k-CLIQUE. The standard way to amplify the gap for k-CLIQUE is through graph product. Recall that the (tensor) graph product of a graph G = (V, E) with itself, denoted by G ⊗2 , is a graph whose vertex set is V 2 and there is an edge between (u 1 , u 2 ) and (v 1 , v 2 ) if and only if (u 1 , v 1 ) ∈ E and (u 2 , v 2 ) ∈ E. It is not hard to check that, if we can find a clique of size t in G ⊗2 , then we can find one of size √ t in G (and vice versa). This implies that, if we have an instance of clique that is hard to approximate to within a factor of (1 + ε), then we may take the graph product with itself which yields an instance of CLIQUE that is hard to approximate to within a factor of (1 + ε) 2 .
Now, let us imagine that we start with the hard instance of an exact version of k-CLIQUE. We may think of this as being hard to approximate to within a factor of (1 − 1/k). Hence, we may apply the above gap amplification procedure log k times, resulting in an instance of CLIQUE that is hard to approximate to within a factor of (1 − 1/k) 2 log k , which is a constant bounded away from one (i.e., ≈ 1/e). The bad news here is that the number of the vertices of the final graph is n 2 log k = n k , where n is the number of vertices of the initial graph. This does not give any lower bound, because we can solve k-CLIQUE in the original graph in n O(k) time trivially! In the next subsection, we will see a simple way to prove hardness of approximating k-CLIQUE, assuming stronger assumptions. However, it remains an interesting and important open question how to prove such hardness from a non-gap assumption: Open Question 6. Is it W[1]-hard or ETH-hard to approximate k-CLIQUE to within a constant factor in FPT time?
Having seen a failed attempt, we may now move on to a success story. Remarkably, Wlodarczyk [41] recently managed to use gap amplification to prove hardness of approximation for connectivity problems, including the k-STEINER ORIENTATION problem. Here we are given a mixed graph G, whose edges are either directed or undirected, and a set of k terminal pairs {(s i , t i )} i∈ [k] . The goal is to orient all the undirected edges in such a way that maximizes the number of t i that can be reached from s i . The problem is known to be in XP [42] but is W[1]-hard even when all terminal pairs can be connected [43]. Starting from this W[1]-hardness, Wlodarczyk [41] devises a gap amplification step that implies a hardness of approximation with factor (log k) o(1) for the problem. Due to the technicality of the gap amplification step, we will not go into the specifics in this survey. However, let us point out the differences between this gap amplification and the (failed) one for CLIQUE above. The key point here is that the new instance of Wlodarczyk's gap amplification has size of the form f (k) · n instead of n 2 as in the graph product. This means that, even if we are applying Wlodarczyk's gap amplification step log(k) times, or, more generally, g(k) times, it only results in an instance of size f ( f (· · · ( f ( g(k)times k)))) · n, which is still FPT! Since the technique is still quite new, it is an exciting frontier to examine whether other parameterized problems allow such similar gap amplification steps.

Hardness from Gap Hypotheses
In the previous subsection, we have seen that several hardness of approximation results can be proved based on standard assumptions. However, as alluded to briefly, some basic problems, including k-CLIQUE, still evades attempts at proving such results. This motivates several researchers in the community to come up with new assumptions that allow more power and flexibility in proving inapproximability results. We will take a look at two of these hypotheses in this subsection; we note that there have also been other assumptions formulated, but we only focus on these two since they arguably have been used most often.
The first assumption, called the Parameterized Inapproximability Hypothesis (PIH) for short, can be viewed as a gap analogue of the W[1] = FPT assumption. There are many (equivalent) ways to state PIH. We choose to state it in terms of an inapproximability of the colored version of DENSEST k-SUBGRAPH.
In MULTICOLORED DENSEST k-SUBGRAPH, we are given a graph G = (V, E) where the vertex set V is partition in to k parts V 1 , . . . , V k . The goal is to select k vertices v 1 ∈ V 1 , v 2 ∈ V 2 , . . . , v k ∈ V k such that {v 1 , . . . , v k } induces as many edges as possible.
It is easy to see that the exact version of this problem is W[1]-hard, via a straightforward reduction from k-CLIQUE. PIH postulates that even the approximate version of this problem is hard: Hypothesis 1 (Parameterized Inapproximability Hypothesis (PIH) 4 [44]). For some constant ε > 0, there is no (1 + ε) factor FPT approximation algorithm for MULTICOLORED DENSEST k-SUBGRAPH.
There are two important remarks about PIH. First, the factor (1 + ε) is not important, and the conjecture remains equivalent even if we state it for a factor C for any arbitrarily large constant C; this is due to gap amplification via parallel repetition [39]. Second, PIH implies that k-CLIQUE is hard to approximate to within any constant factor: Lemma 1. Assuming PIH, there is no constant factor FPT approximation algorithm for k-CLIQUE.
The above result can be shown via a classic reduction of Feige, Goldwasser, Lovász, Safra and Szegedy (henceforth FGLSS) [45], which was one of the first works connecting proof systems and hardness of approximation. Specifically, the FGLSS reduction transforms G to another graph G by viewing the edges of G as vertices of G . Then, we connect {u 1 , v 1 } and {u 2 , v 2 } except when the union {u 1 , v 1 } ∪ {u 2 , v 2 } contains two distinct vertices from the same partition. One can argue that the size of the largest clique in G is exactly equal to the number of edges in the optimal solution of MULTICOLORED DENSEST k-SUBGRAPH on G. As a result, PIH implies hardness of approximation of the former. Interestingly, however, it is not known if the inverse is true and this remains an interesting open question: Open Question 7. Does PIH hold if we assume that k-CLIQUE is FPT inapproximable to within any constant factor?
As demonstrated by the FGLSS reduction, once we have a gap, it is much easier to give a reduction to another hardness of approximation result, because we do not have to create the initial gap ourselves (as in the previous subsection) but only need to preserve or amplify the gap. Indeed, PIH turns out to be a pretty robust hypothesis that gives FPT inapproximability for many problems, including k-CLIQUE, DIRECTED ODD CYCLE TRAVERSAL [44] and STRONGLY CONNECTED STEINER SUBGRAPH [46]. We remark that the current situation here is quite similar to that of the landscape of the classic theory of hardness of approximation before the PCP Theorem [21,22] was proved. There, Papadimitriou and Yannakakis introduced a complexity class MAX-SNP and show that many optimization problems are hard (or complete) for this class [47]. Later, the PCP Theorem confirms that these problems are NP-hard. In our case of FPT inapproximability, PIH seems to be a good analogy of MAX-SNP for problems in W [1] and, as mentioned before, PIH has been used as a starting point of many hardness of approximation results. However, there has not yet been many reverse reductions to PIH, and this is one of the motivation behind Question 7 above.
Despite the aforementioned applications of PIH, there are still quite a few questions that seem out of reach of PIH, such as whether there is an o(k) factor FPT approximation for k-CLIQUE or questions related 4 We remark that the original conjecture in [44] says that the problem is W[1]-hard to approximate. However, we choose to state the more relaxed form here. to running time lower bounds of approximation algorithms. On this front, another stronger conjecture called the Gap Exponential Time Hypothesis (Gap-ETH) is often used instead: Hypothesis 2 (Gap Exponential Time Hypothesis (Gap-ETH) [48,49]). For some constants ε, δ > 0, there is no O(2 δn )-time algorithm that can, given a 3CNF formula, distinguish between the following two cases: • (Completeness) the formula is satisfiable. • (Soundness) any assignment violates more than ε fraction of the clauses.
Here n denotes the number of clauses. 5 Clearly, Gap-ETH is a strengthening of ETH, which can be thought of in the above form but with ε = 1/n. Another interesting fact is that Gap-ETH is stronger than PIH. This can be shown via the standard reduction from 3SAT to k-CLIQUE that establishes N Ω(k) lower bound for the latter. The reduction, due to Chen et al. [50,51], proceed as follows. First, we partition the set of clauses C into C 1 , . . . , C k each of size n/k. For each C i , we create a partition V i in the new graph where each vertex corresponds to all partial assignments (to variables that appear in at least one clause of C k ) that satisfy all the clauses in C k . Two vertices are connected if the corresponding partial assignments are consistent, i.e., they do not assign a variable to different values.
If there is an assignment that satisfies all the clauses, then clearly the restrictions of this assignment to each clause corresponds to k vertices from different partitions that form a clique. On the other hand, it is also not hard to argue that, in the soundness case, the number of edges induced by any k vertices from different partitions is at most 1 − Θ(ε). Thus, Gap-ETH implies PIH as claimed.
Now that we have demonstrated that Gap-ETH is at least as strong as PIH, we may go further and ask how much more can we achieve from Gap-ETH, compared to PIH. The obvious consequences of Gap-ETH is that it can give explicit running time lower bounds for FPT hardness of approximation results. Perhaps more surprising, however, is that it can be used to improve the inapproximability ratio as well. The rest of this subsection is devoted to present some of these examples, together with brief overviews of how the proofs of these results work.

Strong Inapproximability of k-Clique
Our first example is the k-CLIQUE problem. Obviously, we can approximate k-CLIQUE to within a factor of k, by just outputting any single vertex. It had long been asked whether an o(k)-approximation is achievable in FPT time. As we saw above, PIH implies that a constant factor FPT approximation does not exist, but does not yet resolve this question. Nonetheless, assuming Gap-ETH, this question can be resolved in the negative: The reduction used in [36] to prove the above inapproximability is just a simple modification of the above reduction [50,51] that we saw for k-CLIQUE. Suppose that we would like to rule out a k g -approximation, where g = g(k) is a function such that lim k→∞ g(k) = ∞. The only change in the reduction is that, instead of letting C 1 , . . . , C k be the partition of the set of clauses C, we let each C i be a set of Dn g clauses for some sufficiently large constant D > 0. The rest of the reduction works similar to before: 5 The version where n denotes the number of variables is equivalent to the current formulation, because we can always assume without loss of generality that m = O(n) (see [48,49]).
for each C i , we create a vertex corresponding to each partial assignment that satisfies all the clauses in C i . Two vertices are joined by an edge if and only if they are consistent. This completes the description of the reduction. To see that the reduction yields Theorem 5, first note that, if there is an assignment that satisfies the CNF formula, then we can again pick the restrictions on this formula onto C 1 , . . . , C k ; these gives k vertices that induces a clique in the graph.
On the other hand, suppose that every assignment violates more than an ε fraction of clauses. We will argue that there is no clique of size g in the constructed graph. The only property we need from the subsets C 1 , . . . , C k is that the union of any g such subsets contain at least (1 − ε) fraction of the clauses. It is not hard to show that this is true with high probability, when we choose D to be sufficiently large. Now, suppose for the sake of contradiction that there exists a clique of size g in the graph. Since the vertices corresponding to the same subset C i form an independent set, it must be that these g vertices are from different subsets. Let us call these subsets C i 1 , · · · , C i g . Because these vertices induce a clique, we can find a global assignment that is consistent with each vertex. This global assignment satisfies all the clauses in C i 1 ∪ · · · ∪ C i g . However, C i 1 ∪ · · · ∪ C i g contains at least 1 − ε fraction of all clauses, which contradicts to our assumption that every assignment violates more than ε fraction of the clauses.
Then, we may run this algorithm to distinguish the two cases in Gap-ETH in f (k) · (2 Dn/g ) O(1) = 2 o(n) time, which violates Gap-ETH. This concludes our proof sketch. We end by remarking that the reduction may also be viewed as an instantiation of the randomized graph product [38,52,53,53], and it can also be derandomized. We omit the details of the latter here. Interested readers may refer to [36] for more detail.

Strong Inapproximability of Multicolored Densest k-Subgraph and Label Cover
For our second example, we go back to the MULTICOLORED DENSEST k-SUBGRAPH once again. Recall that PIH asserts that this problem is hard to approximate to some constant factor, and we have seen above that Gap-ETH also implies this. On the approximation front, however, only the trivial k-approximation algorithm is known: just pick a vertex that has edges to as many partitions as possible. Then, output that vertex and one of its neighbors from each partition. It is hence a natural question to ask whether it is possible to beat this approximation ratio. This question has been, up to lower order terms, answered in the negative, assuming Gap-ETH: An interesting aspect of the above result is that, even in the NP-hardness regime, no NP-hardness of factor k γ for some constant γ > 0 is known. In fact, the problem is closely related to (and is a special case of) a well-known conjecture in the hardness of approximation community called the Sliding Scale Conjecture (SSC) [55]. 6 (See [54] for more discussion on the relation between the two.) Thus, this is yet another instance where taking a parameterized complexity perspective helps us advance knowledge even in the classical settings.
To prove Theorem 6, arguably the most natural reduction here is the above reduction for Clique! Note that we now view the vertices corresponding to each subset C i as forming a partition V i . The argument in the YES case is exactly the same as before: if the formula is satisfiable, then there is a (multicolored) 6 See also the related Projection Game Conjecture (PGC) [56]. k-clique. However, as the readers might have noticed, the argument in the NO case does not go through anymore. In particular, even when the graph is quite dense (e.g., having half of the edges present), it may not contain any large clique at all and hence it is unclear how to recover back an assignment that satisfies a large fraction of constraints.
This obstacle was overcomed in [54] by proving an agreement testing theorem (i.e., direct product theorem), which is of the following form. Given k local functions f 1 , . . . , f k , where f i : S i → {0, 1} is a boolean function whose domain S i is a subset of a universe U . If some (small) ζ fraction of the pairs agree 7 with each other, then we can find (i.e., "decode") a global function h : [n] → {0, 1} that "approximately agrees" with roughly ζ fraction of the local functions. The theorem in [54] works when S 1 , . . . , S k are sets of size Ω(n).
Due to the technical nature of the definitions, we will not fully formalize the notions in the previous paragraph. Nonetheless, let us sketch how to apply the agreement testing theorem to prove the NO case for our reduction. Suppose for the sake of contradiction that the formula is not (1 − δ)-satisfiable and that there exists a k-subgraph with density ζ ≥ 1 k 1−o (1) . Recall that each selected vertex is simply a partial assignment onto the subset of clauses C i for some i; we may view this as a function f i : S i → {0, 1} where S i denote the set of variables that appear in C i . Here the universe U is the set of all variables. With this perspective, we can apply the agreement testing theorem to recover a global function h : U → {0, 1} that "approximately agrees" with roughly ζk of the local functions. Notice that, in this context, h is simply a global assignment for the CNF formula. Previously in the proof for inapproximability of Clique, we had a global assignment that (perfectly) agrees with g local functions, from which we can conclude that this assignment satisfies all but δ fraction of the clauses. It turns out that relaxing "perfect agreement" to "approximate agreement" does not affect the proof too much, and the latter still implies that h satisfies all but δ fraction of clauses as desired.
As for the proof of the agreement testing theorem itself, we will not delve too much into detail here. However, we note that the proof is based on looking at different "agreement levels" and the graph associated with them. It turns out that such a graph has a certain transitivity property, which allows one to "decode" back the global function h. This general approach of looking at different agreement levels and their transitivity properties is standard in the direct product/agreement testing literature [57][58][59]. The main challenge in [54] is to make the proof works for ζ as small as 1/k, which requires a new notion of transitivity.
To end this subsection, we remark that the MULTICOLORED DENSEST k-SUBGRAPH is known as the 2-ary Constraint Satisfaction Problem (2-CSP) in the classical hardness of approximation community. The problem, and in particular its special case called Label Cover, serves as the starting point of almost all known NP-hardness of approximation (see e.g., [60][61][62]). The technique in [54] can also be used to show inapproximability for Label Cover with strong running time lower bound of the form f (k) · N Ω(k) [63]. Due to known reductions, this has numerous consequences. For example, it implies, assuming Gap-ETH, that approximating k-EVEN SET to within any factor less than two cannot be done in f (k) · N o(k) time, considerably improving the lower bound mentioned in the previous subsection.

Inapproximability of k-Biclique and Densest k-Subgraph
While PIH (or equivalently the MULTICOLORED DENSEST k-SUBGRAPH problem) can serve as a starting point for hardness of approximation of many problems, there are some problems for which not 7 Naturally, we say that two functions f i and f j agree iff even a constant factor hardness is known under PIH, but strong inapproximability results can be obtained via Gap-ETH. We will see two examples of this here.
First is the k-BICLIQUE problem. Recall that in this problem, we are given a bipartite graph and we would like to determine whether there is a complete bipartite subgraph of size k. As stated earlier in the previous subsection, despite its close relationship to k-CLIQUE, k-BICLIQUE turned out to be a much more challenging problem to prove intractibility and even its W[1]-hardness was only shown recently [23]. This difficulty is corroborated by its approximability status in the classical (non-parameterized) regime; while CLIQUE is long known to be NP-hard to approximate to within N 1−o(1) factor [64], BICLIQUE is not even known to be NP-hard to approximate to within say 1.01 factor. 8 With this in mind, it is perhaps not a surprise that k-BICLIQUE is not known to be hard to approximate under PIH. Nonetheless, when we assume Gap-ETH, we can in fact prove a very strong hardness of approximation for the problem: Note that, similar to k-CLIQUE, a k-approximation for BICLIQUE can be easily achieved by outputting a single edge. Hence, in terms of the inapproximability ratio, the above result is tight.
Due to its technicality, we only sketch an outline of the proof of Theorem 7 here. Firstly, the reduction starts by constructing a graph that is similar (but not the same) to that of k-Clique that we describe above. The main properties of this graph is that (i) in the YES case where the formula is satisfiable, the graph contains many copies of k-BICLIQUE, and (ii) in the NO case where the formula is not even (1 − δ)-satisfiable, the graph contains few copies of g-BICLIQUE. The construction and these properties were in fact shown in [68]. In [36], it was observed that, if we subsample the graph by keeping each vertex independently with probability p for an appropriate value of p, then (i) ensures that at least one of the k-BICLIQUE survives the subsampling , whereas (ii) ensures that no g-BICLIQUE survives. This indeed gives the claimed result in the above theorem.
We remark that, while Theorem 7 seems to resolve the approximability of k-BICLIQUE, there is still one aspect that is not yet completely understood: the running time lower bound. To demonstrate this, recall that, for k-CLIQUE, the reduction that gives hardness of k-VS-g CLIQUE has size 2 O(n/g) ; this means that we have a running time lower bound of f (k) · N Ω(g) on the problem. This is of course tight, because we can determine whether a graph has a g-clique in N O(g) time. However, for k-BICLIQUE, the known reduction that gives hardness for k-VS-g BICLIQUE has size 2 O(n/ √ g) . This results in a running time lower bound of only f (k) · N Ω( √ g) . Specifically, for the most basic setting of constant factor approximation, Theorem 7 only rules out algorithms with running time f (k) · N o( √ k) . Hence, an immediate question here is:

Open Question 8.
Is there an f (k) · N o(k) -time algorithm that approximates k-BICLIQUE to within a constant factor?
To put things into perspective, we note that, even for exact algorithms for k-BICLIQUE, the best running time lower bound is still f (k) · N Ω( √ k) [23] (under any reasonable complexity assumption). This means that, to answer Question 8, one has to first settle the best known running time lower bound for exact algorithms, which would already be a valuable contribution to the understanding of the problem.
Let us now point out an interesting consequence of Theorem 7 for the DENEST k-SUBGRAPH problem. This is the "uncolored" version of the MULTICOLORED DENSEST k-SUBGRAPH problem as defined above, where there are no partitions V 1 , . . . , V k and we can pick any k vertices in the input graph G with the objective of maximizing the number of induced edges. The approximability status of DENEST k-SUBGRAPH very much mirrors that of k-BICLIQUE. Namely, in the parameterized setting, PIH is not known to imply hardness of approximation for DENSEST k-SUBGRAPH. Furthermore, in the classic (non-parameterized) setting, DENSEST k-SUBGRAPH is not known 9 to be NP-hard to approximate even to within a factor of say 1.01. Despite these, Gap-ETH does give a strong inapproximability for DENSEST k-SUBGRAPH, as stated below: In fact, the above result is a simple consequence of Theorem 7. To see this, recall the following classic result in extremal graph theory commonly referred to as the Kővári-Sós-Turán (KST) Theorem [71]: any k-vertex graph that does not contain a g-biclique as a subgraph has density at most O(k −1/g ). Now, the hardness for k-BICLIQUE from Theorem 7 tells us that there is no FPT time algorithm that can distinguish between the graph containing k-biclique from one that does not contain g-biclique for any g = ω (1). When the graph contains a k-biclique, we have a k-vertex subgraph with density (at least) 1/2. On the other hand, when the graph does not even contain a g-biclique, the KST Theorem ensures us that any k-vertex subgraph has density at most O(k 1/g ). This indeed gives a gap of O(k 1/g ) in terms of approximation Densest k-Subgraph and finishes the proof sketch for Theorem 8.
Unfortunately, Theorem 8 does not yet resolve the FPT approximability of DENSEST k-SUBGRAPH. In particular, while the hardness is only of the form k o(1) , the best known algorithm (which is the same as that of the multicolored version discussed above) only gives an approximation ratio of k. Hence, we may ask whether this can be improved: Open Question 9. Is there an o(k)-FPT-approximation algorithm for DENSEST k-SUBGRAPH?
This should be contrasted with Theorem 6, for which the FPT approximability of MULTICOLORED DENSEST k-SUBGRAPH is essentially resolved (up to lower order terms).

Algorithms
In this section we survey some of the developments on the algorithmic side in recent years. The organization of this section is according to problem types. We begin with basic packing and covering problems in Sections 4.1 and 4.2. We then move on to clustering in Section 4.3, network design in Section 4.4, and cut problems in Section 4.5. In Section 4.6 we present width reduction problems.
The algorithms in the above mentioned subsections compute approximate solutions to problems that are W[1]-hard. Therefore it is necessary to approximate, even when using parameterization. However, one may also aim to obtain faster parameterized runtimes than the known FPT algorithms, by sacrificing in the solution quality. We present some results of this type in Section 4.7.

Packing Problems
For a packing problem the task is to select as many combinatorial objects of some mathematical structure (such as a graph or a set system) as possible under some constraint, which restricts some objects 9 Again, similar to BICLIQUE, DENSEST k-SUBGRAPH is known to be hard to approximate under stronger assumptions [65,[68][69][70].
to be picked if others are. A basic example is the INDEPENDENT SET problem, for which a maximum sized set of vertices of a graph needs to be found, such that none of them are adjacent to each other.

Independent Set
The INDEPENDENT SET problem is notoriously hard in general. Not only is there no polynomial time n 1−ε -approximation algorithm [72] for any constant ε > 0, unless P=NP, but also, under Gap-ETH, no g(k)-approximation can be computed in f (k)n O(1) time [36] for any computable functions f and g, where k is the solution size. On the other hand, for planar graphs a PTAS exists [73]. Hence a natural question is how the problem behaves for graphs that are "close" to being planar.
One way to generalize planar graphs is to consider minor-free graphs, because planar graphs are exactly those excluding K 5 and K 3,3 as minors. When parameterizing by the size of an excluded minor, the INDEPENDENT SET problem is paraNP-hard, since the problem is NP-hard on planar graphs [74]. Nevertheless a PAS can be obtained for this parameter [75]. This result is part of the large framework of "bidimensionality theory" where any graph in an appropriate minor-closed class has treewidth bounded above in terms of the problem's solution value, typically by the square root of that value. These properties lead to efficient, often subexponential, fixed-parameter algorithms, as well as polynomial-time approximation schemes, for bidimensional problems in many minor-closed graph classes. The bidimensionality theory is based on algorithmic and combinatorial extensions to parts of the Robertson-Seymour Graph Minor Theory, in particular initiating a parallel theory of graph contractions. The foundation of this work is the topological theory of drawings of graphs on surfaces. We refer the reader to the survey of [77] and more recent papers [76,78,79].
A different way to generalize planar graphs is to consider a planar deletion set, i.e., a set of vertices in the input graph whose removal leaves a planar graph. Taking the size of such a set as a parameter, INDEPENDENT SET is again paraNP-hard [74]. However, by first finding a minimum sized planar deletion set, then guessing the intersection of this set with the optimum solution to INDEPENDENT SET, and finally using the PTAS for planar graphs [73], a PAS can be obtained parameterized by the size of a planar deletion set [8].
Theorem 10 ([8]). For the INDEPENDENT SET problem a (1 + ε)-approximation can be computed in 2 k n O(1/ε) time for any ε > 0, where k is the size of a minimum planar deletion set.
Ideas using linear programming allow us to generalize and handle larger noise at the expense of worse dependence on ε. Bansal et al. [80] showed that given a graph obtained by adding δn edges to some planar graph, one can compute a (1 + O(ε + δ))-approximate independent set in time n O(1/ε 4 ) , which is faster than the 2 k n O(1/ε) running time of Theorem 10 for large k = δn. Magen and Moharrami [81] showed that for every graph H and ε > 0, given a graph G = (V, E) that can be made H-minor-free after at most δn deletions and additions of vertices or edges, the size of the maximum independent set can be approximately computed within a factor (1 + ε + O(δ|H| log |H|)) in time n f (ε,H) . Note that this algorithm does not find an independent set. Recently, Demaine et al. [82] presented a general framework to obtain better approximation algorithms for various problems including INDEPENDENT SET and CHROMATIC NUMBER, when the input graph is close to well-structured graphs (e.g., bounded degeneracy, degree, or treewidth).
It is also worth noting here that INDEPENDENT SET problem can be generalized to the d-SCATTERED SET problem where we are given an (edge-weighted) graph and are asked to select at least k vertices, so that the distance between any pair is at least d [83]. Recently in [84] some lower and upper bounds on the approximation of the d-SCATTERED SET problem have been provided.
A special case of INDEPENDENT SET is the INDEPENDENT SET OF RECTANGLES problem, where a set of axis-parallel rectangles is given in the two-dimensional plane, and the task is to find a maximum sized subset of non-intersecting rectangles. This is a special case, since pairwise intersections of rectangles can be encoded by edges in a graph for which the vertices are the rectangles. Parameterized by the solution size, the problem is W[1]-hard [85], and while a QPTAS is known [86], it is a challenging open question whether a PTAS exists. It was shown [87] however that both a PAS and a PSAKS exist for INDEPENDENT SET OF RECTANGLES parameterized by the solution size, even for the weighted version.
The runtime of this PAS is f (k, ε)n g(ε) for some functions f and g, where k is the solution size. Note that the dependence on ε in the degree of the polynomial factor of this algorithm cannot be removed, unless FPT=W [1], since any efficient PAS with runtime f (k, ε)n O(1) could be used to compute the optimum solution in FPT time by setting ε to 1 k+1 in the W[1]-hard unweighted version of the problem [85]. However, in the so-called shrinking model an efficient PAS can be obtained [88] for INDEPENDENT SET OF RECTANGLES. The parameter in this case is a factor 0 < δ < 1 by which every rectangle is shrunk before computing an approximate solution, which is compared to the optimum solution without shrinking.
Another special case of INDEPENDENT SET is the INDEPENDENT SET ON UNIT DISK GRAPH problem, where given set of n unit disks in the Euclidean plane, the task is to determine if there exists a set of k non-intersecting disks. The problem is NP-hard [89] but admits a PTAS [90]. Marx [85] showed that, when parameterized by the solution size, the problem is W[1]-hard; this also rules out EPTAS (and even efficient PAS) for the problem, assuming FPT = W [1]. On the other hand, in [91] the authors give an FPT algorithm for a special case of INDEPENDENT SET ON UNIT DISK GRAPH when there is a lower bound on the distance between any pair of centers.

Vertex Coloring
A problem related to INDEPENDENT SET is the VERTEX COLORING problem, for which the vertices need to be colored with integer values, such that no two adjacent vertices have the same color (which means that each color class forms an independent set in the graph). The task is to minimize the number of used colors. For planar graphs the problem has a polynomial time 4/3-approximation algorithm [8] via the celebrated Four Color Theorem, and a better approximation is not possible in polynomial time [92]. Using this algorithm, a 7/3-approximation can be computed in FPT time when parameterizing by the size of a planar deletion set [8]. When generalizing planar graphs by excluding any fixed minor, and taking its size as the parameter, a 2-approximation can be computed in FPT time [93]. Due to the NP-hardness for planar graphs [92], neither of these two parameterizations admits a PAS, unless P=NP. One way to generalize VERTEX COLORING is to see each color class as an induced graph of degree 0. The DEFECTIVE COLORING problem 10 correspondingly asks for a coloring of the vertices, such that each color class induces a graph of maximum degree ∆, for some given ∆. The aim again is to minimize the number of used colors. In contrast to VERTEX COLORING, the DEFECTIVE COLORING problem is W[1]-hard [94] parameterized by the treewidth. This parameter measures how "tree-like" a graph is, and is defined as follows.

Definition 1.
A tree decomposition of a graph G = (V, E) is a tree T for which every node is associated with a bag X ⊆ V, such that the following properties hold: 1.
the union of all bags is the vertex set V of G, 2.
for every edge (u, v) of G, there is a node of T for which the associated bag contains u and v, and 3.
for every vertex u of G, all nodes of T for which the associated bags contain u, induce a connected subtree of T.
The width of a tree decomposition is the size of the largest bag minus 1 (which implies that a tree has a decomposition of width 1 where each bag contains the endpoints of one edge). The treewidth of a graph is the smallest width of any of its tree decompositions.
Treewidth is fundamental parameter of a graph and will be discussed more elaborately in Section 4.6.1. However, it is worth mentioning here that VERTEX COLORING is in FPT when parameterized by treewidth.
The strong polynomial-time approximation lower bound of n 1−ε for VERTEX COLORING [72] naturally carries over to the more general DEFECTIVE COLORING problem. A much improved approximation factor of 2 is possible though in FPT time if the parameter is the treewidth [94]. It can be shown however, that a PAS is not possible in this case, as there is no (3/2 − ε)-approximation algorithm for any ε > 0 parameterized by the treewidth [94], unless FPT=W [1]. A natural question is whether the bound ∆ of DEFECTIVE COLORING can be approximated instead of the number of colors. For this setting, a bicriteria PAS parameterized by the treewidth exists [94], which computes a solution with the optimum number of colors where each color class induces a graph of maximum degree at most (1 + ε)∆. Theorem 13 ([94]). For the DEFECTIVE COLORING problem, given a tree decomposition of width k of the input graph, The algorithms of the previous theorem build on the techniques of [95] using approximate addition trees in combination with dynamic programs that yield XP algorithms for these problems. This technique can be applied to various problems (cf. Section 4.2), including a different generalization of VERTEX COLORING called EQUITABLE COLORING. Here the aim is to color the vertices of a graph with as few colors as possible, such that every two adjacent vertices receive different colors, and all color classes contain the same number of vertices. It is a generalization of VERTEX COLORING, since one may add a sufficiently large independent set (i.e., a set of isolated vertices) to a graph such that the number of colors needed for an optimum VERTEX COLORING solution is the same as for an optimum EQUITABLE COLORING solution.
The EQUITABLE COLORING problem is W[1]-hard even when combining the number of colors needed and the treewidth of the graph as parameters [96]. On the other hand, a PAS exists [95] if the parameter is the cliquewidth of the input graph. This is a weaker parameter than treewidth, as the cliquewidth of a graph is bounded as a function of its treewidth. However, while bounded treewidth graphs are sparse, cliquewidth also allows for dense graphs (such as complete graphs). Formally, a graph of cliquewidth can be constructed using the following recursive operations using labels on the vertices:
Join(G, x, y): add all edges connecting a vertex of label x with a vertex of label y to the vertex-labelled graph G.
A cliquewidth expression with labels is a recursion tree describing how to construct a graph using the above four operations using only labels from the set {1, . . . , }. Notice that the cliquewidth of a complete graph is two and therefore we have graphs of bounded clique-width but unbounded treewidth. As stated earlier the cliquewidth of a graph is bounded above exponentially in its treewidth and this dependence is tight for some graph families [97].
The PAS for EQUITABLE COLORING will compute a coloring using at most k colors such that the ratio between the sizes of any two color classes is at most 1 + ε. In this sense it is a bicriteria approximation algorithm.
Theorem 14 ([95]). For the EQUITABLE COLORING problem, given a cliquewidth expression with labels for the input graph, a solution with optimum number of colors where the ratio between the sizes of any two color classes is at most 1 + ε, can be computed in (k/ε) O(k ) n O(1) time 11 for any ε > 0, where k is the optimum number of colors.
A variant of VERTEX COLORING is the MIN SUM COLORING problem, where, instead of minimizing the number of colors, the aim is to minimize the sum of (integer) colors, where the sum is taken over all vertices. This problem is FPT parameterized by the treewidth [99], but the related MIN SUM EDGE COLORING problem is NP-hard [100] on graphs of treewidth 2 (while being polynomial time solvable on trees [101]). For this problem the edges need to be colored with integer values, so that no two edges sharing a vertex have the same color, and the aim again is to minimize the total sum of colors. Despite being APX-hard [100] and also paraNP-hard for parameter treewidth, MIN SUM EDGE COLORING admits a PAS for this parameter [102].
Theorem 15 ([102]). For the MIN SUM EDGE COLORING problem a (1 + ε)-approximation can be computed in f (k, ε)n time for any ε > 0, where k is the treewidth of the input graph.

Subgraph Packing
A special family of packing problems can be obtained by subgraph packing. Let H be a fixed "pattern" graph. The H-PACKING problem, given the "host" graph G, asks to find the maximum number of vertex-disjoint copies of H. One can also let H be a family of graphs and ask the analogous problem. There is another choice whether each copy of H is required to be an induced subgraph or a regular subgraph. We focus on the regular subgraph case here.
When H is a single graph with k vertices, a simple greedy algorithm that finds an arbitrary copy of H and adds it to the packing, guarantees a k-approximation in time f (H, n) · n. Here f (H, n) denotes the time to find a copy of H in an n-vertex graph. Following a general result for k-SET PACKING, a (k + 1 + ε)/3-approximation algorithm that runs in polynomial time for fixed k, ε exists [103]. When H is 2-vertex-connected or a star graph, even for fixed k, it is NP-hard to approximate the problem better than a factor Ω(k/polylog(k)) [104]. There is no known connected H that admits an FPT (or even XP) algorithm achieving a k 1−δ -approximation for some δ > 0; in particular, the parameterized approximability of k-PATH PACKING is wide open. It is conceivable that k-PATH PACKING admits a parameterized o(k)-approximation algorithm, given an O(log k)-approximation algorithm for k-PATH DELETION [105] and an improved kernel for INDUCED P 3 PACKING [106].
When H is the family of all cycles, the problem becomes the VERTEX CYCLE PACKING problem, for which the largest number of vertex-disjoint cycles of a graph needs to be found. No polynomial time O(log 1/2−ε n)-approximation is possible for this problem [107] for any ε > 0, unless every problem in NP can be solved in randomized quasi-polynomial time. Furthermore, despite being FPT [108] parameterized by the solution size, VERTEX CYCLE PACKING does not admit any polynomial-sized exact kernel for this parameter [109], unless NP⊆coNP/poly. Nevertheless, a PSAKS can be found [18]. Theorem 16 ([18]). For the VERTEX CYCLE PACKING problem, a (1 + ε)-approximate kernel of size k O(1/(ε log ε)) can be computed in polynomial time, where k is the solution size.

Scheduling
Yet another packing problem on graphs, which, however, has applications in scheduling and bandwidth allocation, is the UNSPLITTABLE FLOW ON A PATH problem. Here a path with edge capacities is given together with a set of tasks, each of which specifies a start and an end vertex on the path and a demand value. The goal is to find the largest number of tasks such that for each edge on the path the total demand of selected tasks for which the edge lies between its start and end vertex, does not exceed the capacity of the edge. This problem admits a QPTAS [110], but it remains a challenging open question whether a PTAS exists. When parameterizing by the solution size, UNSPLITTABLE FLOW ON A PATH is W[1]-hard [111]. However a PAS exists [111] for this parameter.
Theorem 17 ([111]). For the UNSPLITTABLE FLOW ON A PATH problem a (1 + ε)-approximation can be computed in 2 O(k log k) n g(ε) time for some computable function g and any ε > 0, where k is the solution size.
Another scheduling problem is FLOW TIME SCHEDULING, for which a set of jobs is given, each of which is specified by a processing time, a release date, and a weight. The jobs need to be scheduled on a given number of machines, such that no job is processed before its release date and a job only runs on one machine at a time. Given a schedule, the flow time of a job is the weighted difference between its completion time and release date, and the task for the FLOW TIME SCHEDULING problem is to minimize the sum of all flow times. Two types of schedules are distinguished: in a preemtive schedule a job may be interrupted on one machine and then resumed on another, while in a non-preemtive schedule every job runs on one machine until its completion once it was started. If pre-emptive schedules are allowed, FLOW TIME SCHEDULING has no polynomial time O(log 1−ε p)-approximation algorithm [112], unless P = NP, where p is the maximum processing time. For the more restrictive non-preemtive setting, no O(n 1/2−ε )-approximation can be computed in polynomial time [113], unless P = NP, where n is the number of jobs. The latter lower bound is in fact even valid for only one machine, and thus parameterizing FLOW TIME SCHEDULING by the number of machines will not yield any better approximation ratio in this setting. A natural parameter for FLOW TIME SCHEDULING is the maximum over all processing times and weights of the given jobs. It is not known whether the problem is FPT or W[1]-hard for this parameter. However, when combining this parameter with the number of machines, a PAS can be obtained [114] despite the strong polynomial time approximation lower bounds.
Theorem 18 ([114]). For the FLOW TIME SCHEDULING problem a (1 + ε)-approximation can be computed in (mk) O(mk 3 /ε) n O(1) time in the preemtive setting, and in (mk/ε) O(mk 5 ) n O(1) time in the non-preemtive setting, for any ε > 0, where m is the number of machines and k is an upper bound on every processing time and weight.

Covering Problems
For a covering problem the task is to select a set of k combinatorial objects in a mathematical structure, such as a graph or set system (i.e., hypergraph), under some constraints that demands certain other objects to be intersected/covered. A basic example is the SET COVER problem where we are given a set system, which is simply a collection of subsets of a universe. The goal is to determine whether there are k subsets whose union cover the whole universe.
There are two ways define optimization based on covering problems. First, we may view the covering demands as strict constraints and aim to find a solution that minimize the constraint/cost while covering all objects (i.e., relaxing the size-k constraint); this results in a minimization problem. Second, we may view the size constraint as a strict constraint and aim to find a solution that covers as many objects as possible; this results in a maximization problem. We divide our discussion mainly into two parts, based on these two types of optimization problems. In Section 4.2.3, we discuss problems related to covering that fall into neither category.

Minimization Variants
We start out discussion with the minimization variants. For brevity, we overload the problem name and use the same name for the minimization variant (e.g., we use SET COVER instead of the more cumbersome MIN SET COVER). Later on, we will use different names for the maximization versions; hence, there will be no confusion.
Set Cover, Dominating Set and Vertex Cover. As discussed in detail in Section 3.1.2, SET COVER and equivalently DOMINATING SET are very hard to approximate in the general case. Hence, special cases where some constraints are placed on the set system are often considered. Arguably the most well-studied special case of SET COVER is the VERTEX COVER problem, in which the set system is a graph. That is, we would like to find the smallest set of vertices such that every edge has at least one endpoint in the selected set (i.e., the edge is "covered"). VERTEX COVER is well known to be FPT [115] and admit a linear-size kernel [116]. A generalization of VERTEX COVER on d-uniform hypergraph, where the input is now a hypergraph and the goal is to find the smallest set of vertices such that every hyperedge contain at least one vertex from the set, is also often referred to as d-HITTING SET in the parameterized complexity community. However, we will mostly use the nomenclature VERTEX COVER on d-uniform hypergraph because many algorithms generalizes well from VERTEX COVER in graphs to hypergraphs. Indeed, branching algorithms for VERTEX COVER on graphs can be easily generalized to hypergraphs, and hence the latter is also FPT. Polynomial-size kernels are also known for VERTEX COVER on d-uniform hypergraphs [117].
While VERTEX COVER both on graphs and d-uniform hypergraphs are already tractable, approximation can still help make algorithms even faster and kernels even smaller. We defer this discussion to Section 4.7.

Connected Vertex Cover.
A popular variant of VERTEX COVER that is the CONNECTED VERTEX COVER problem, for which the computed solution is required to induce a connected subgraph of the input. Just as VERTEX COVER, the problem is FPT [118]. However, unlike VERTEX COVER, CONNECTED VERTEX COVER does not admit a polynomial-time kernel [119], unless NP⊆coNP/poly. In spite of this, a PSAKS for CONNECTED VERTEX COVER exists: Theorem 19 ([18]). For any ε > 0, an (1 + ε)-approximate kernel with k O(1/ε) vertices can be computed in polynomial time.
The ideas behind [18] is quite neat and we sketch it here. There are two reduction rules: (i) if there exists a vertex with degree more than ∆ := 1/ε just "select" the vertex and (ii) if we see a vertex with more than k false twins, i.e., vertices with the same set of neighbors, then we simply remove it from the graph. An important observation for (i) is that, since we have to either pick the vertex or all ≥ ∆ neighbors anyway, we might as well just select it even in the second case because it affects the size of the solution by a factor of at most 1+∆ ∆ = 1 + ε. For (ii), it is not hard to see that we either select one of the false twins or all of them; hence, if a vertex has more than k false twins, then it surely cannot be in the optimal solution. Roughly speaking, these two observations show that this is an (1 + ε)-kernel. Of course, in the actual proof, "selecting" a vertex needs to be defined more carefully, but we will not do it here. Nonetheless, imagine the end step when we cannot apply these two reduction rules anymore. Essentially speaking, we end up with a graph where some (less than (1 + ε)k) vertices are marked as "selected" and the remaining vertices have degree at most ∆. Now, every vertex is either inside the solution, or all of its neighbors must lie in the solution. There are only (at most) k vertices in the first case. For the second case, note that these vertices have degree at most ∆ and they have at most k false twins, meaning that there are at most k 1+∆ = k 1+1/ε such vertices. In other words, the kernel is of size k O(1/ε) as desired. This constitutes the main ideas in the proof; let us stress again that the actual proof is of course more complicated than this since we did not define rule (i) formally.
Recently, Krithika et al. [120] considered the following structural parameters beyond the solution size: split deletion set, clique cover and cluster deletion set. In each case, the authors provide a PSAKS for the problem. We will not fully define these parameters here, but we note that the first parameter (split deletion set) is always no larger than the size of the minimum vertex cover of the graph. In another very recent work, Majumdar et al. [121] give a PSAKS for each of the following parameters, each of which is always no larger than the solution size: the deletion distance of the input graph to the class of cographs, the class of bounded treewidth graphs, and the class of all chordal graphs. Hence, these results may be viewed as a generalization of the aforementioned PSAKS from [18].
Connected Dominating Set. Similar to CONNECTED VERTEX COVER, the CONNECTED DOMINATING SET problem is the variant of DOMINATING SET for which the solution additionally needs to induce a connected subgraph of the input graph. When placing no restriction on the input graph, the problem is as hard to approximate as DOMINATING SET. However, for some special classes of graphs, PSAKS or bi-PSAKS 12 are known; these include graphs with bounded expansion, nowhere dense graphs, and d-biclique-free graphs [122].

Covering Problems parameterized by Graph Width Parameters.
Several works in literature also study the approximability of variants of VERTEX COVER and DOMINATING SET parameterized by graph widths [95,123]. These variants include: • POWER VERTEX COVER (PVC). Here, along with the input graph, each edge has an integer demand and we have to assign (power) values to vertices, such that each edge has at least one endpoint with a value at least its demand. The goal is to minimize the total assigned power. Note that this is generalizes of VERTEX COVER, where edges have unit demands. • CAPACITATED VERTEX COVER (CVC). The problem is similar to VERTEX COVER, except that each vertex has a capacity which limits the number of edges that it can cover. Once again, VERTEX COVER is a special case of CVC where each vertex's capacity is ∞. • CAPACITATED DOMINATING SET (CDS). Analogous to CVC, this is a generalization of DOMINATING SET where each vertex has a capacity and it can only cover/dominate at most that many other vertices.
All problems above are FPT under standard parameter (i.e., the optimum) [123,124]. However, when parameterizing by the treewidth 13 , all three problems become W[1]-hard [123]. (This is in contrast to VERTEX COVER and DOMINATING SET, both of which admit straightforward dynamic programming FPT algorithms parameterized by treewidth.) Despite this, good FPT approximation algorithms are known for the problem. In particular, a PAS is known for PVC [123]. For CVC and CDS, a bicriteria PAS exists for the problem [95], which in this case computes a solution of size at most the optimum, so that no vertex capacity is violated by more than a factor of 1 + ε. The approximation algorithms for CVC and CDS are results of a more general approach of Lampis [95]. The idea is to execute an "approximate" version of dynamic programming in tree decomposition instead of the exact version; this helps reduce the running time from n O(w) to (log n/ε) O(w) , which is FPT. The approach is quite flexible: several approximation for graphs problems including covering problems can be achieved via this method and it also applies to clique-width. Please refer to [95] for more details.
Packing-Covering Duality and Erdős-Pósa Property. Given a set system (V, C) where V is the universe and C = {C 1 , . . . , C m } is a collection of subsets of V, HITTING SET is the problem of computing the smallest S ⊆ V that intersects every C i , and SET PACKING is the problem of computing the largest subcollection C ⊆ C such that no two sets in C intersect. It can also be observed that the optimal value for HITTING SET is at least the optimal value for SET PACKING, while the standard LP relaxations for them (covering LP and packing LP) have the same optimal value by strong duality. Studying the other direction of the inequality (often called the packing-covering duality) for natural families of set systems has been a central theme in combinatorial optimization. The gap between the covering optimum and packing optimum is large in general (e.g., DOMINATING SET/INDEPENDENT SET), but can be small for some families of set systems (e.g., s-t CUT/s-t DISJOINT PATHS and VERTEX COVER/MATCHING especially in bipartite graphs).
One notion that has been important for both parameterized and approximation algorithms is the Erdős-Pósa property [125]. A family of set systems is said to have the Erdős-Pósa property when there is a function f : N → N such that for any set system in the family, if the packing optimum is k, the covering optimum is at most f (k). This immediately implies that the multiplicative gap between these two optima is at most f (k)/k, and constructive proofs for the property for various set systems have led to ( f (k)/k)-approximation algorithms. Furthermore, for some problems including CYCLE PACKING, the Erdős-Pósa property gives an immediate parameterized algorithm. We refer the reader to a recent survey [126] and papers [108,127,128].
The original paper of Erdős and Pósa [125] proved the property for set systems (V, C) when there is an underlying graph G = (V, E) and C is the set of cycles, which corresponds to the pair CYCLE PACKING/FEEDBACK VERTEX SET; every graph either has at least k vertex-disjoint cycles or there is a feedback vertex set of size at most O(k log k). Many subsequent papers also studied natural set systems arising from graphs where V is the set of vertices or edges and C denotes a collection of subgraphs of interest. For those set systems, Erdős-Pósa Properties are closely related to Set Packing introduced in Section 4.1.3 and F -DELETION problems introduced in Section 4.6.

Maximization Variants
We now move on to the maximization variants of covering problems. To our knowledge, these covering problems are much less studied in the context of parameterized approximability compared to their minimization counterparts. In particular, we are only aware of works on the maximization variants of SET COVER and VERTEX COVER, which are typically called MAX k-COVERAGE and MAX k-VERTEX COVER respectively.
Max k-Coverage. Recall that here we are given a set system and the goal is to select k subsets whose union is maximized. It is well known that the simple greedy algorithm yields an e e−1 -approximation [129]. Furthermore, Fiege shows, in his seminal work [28], show that this is tight: e e−1 − ε -approximation is NP-hard for any constant ε > 0. In fact, recently it has been shown that this inapproximability applies also to the parameterized setting. Specifically, under Gap-ETH, e e−1 − ε -approximation cannot be achieved in FPT time [130] or even f (k) · n o(k) time [63]. In other words, the trivial algorithm is tight in terms of running time, the greedy algorithm is tight in terms of approximation ratio, and there is essentially no trade-off possible between these two extremes. We remark here that this hardness of approximation is also the basis of hardness for k-MEDIAN and k-MEANS [130] (see Section 4.3).
Due to the strong inapproximability result for the general case of MAX k-COVERAGE, different parameters have to be considered in order to obtain a PAS for MAX k-COVERAGE. An interesting positive result here is when the parameters are k and the VC dimension of the set system, for which a PAS exists while the exact version of the problem is W[1]-hard [131].
Max k-Vertex Cover. Another special case of MAX k-COVERAGE is the restriction when each element belongs to at most d subsets in the system. This corresponds exactly to the maximization variant of the VERTEX COVER problem on d-uniform hypergraph, which will refer to as MAX k-VERTEX COVER. Note here that, for such set systems, their VC-dimensions are also bounded by log d + 1 and hence the aforementioned PAS of [131] applies here as well. Nonetheless, MAX k-VERTEX COVER admits a much simpler PAS (and even PSAKS) compared to MAX k-COVERAGE parameterized by k and VC-dimension, as we will discuss more below.
MAX k-VERTEX COVER was first studied in the context of parameterized complexity by Guo et al. [132] who showed that the problem is W[1]-hard. Marx, in his survey on parameterized approximation algorithms [8], gave a PAS for the problem with running time 2 O(k 3 /ε) . Later, Lokshtanov et al. [18] shows that Marx's approach can be used to give a PSAKS of size O(k 5 /ε 2 ). Both of these results mainly focus on graphs. Later, Skowron and Faliszewski [133] 14 gave a more general argument that both works generally for any d-uniform hypergraph and improves the running time and kernel size: Theorem 20 ([133]). For the MAX k-VERTEX COVER problem in d-uniform hypergraphs, a (1 + ε)-approximation can be computed in O * (d/ε) k time for any ε > 0. Moreover, an (1 + ε)-approximate kernel with O(dk/ε) vertices can be computed in polynomial time.
The main idea of the above proof is simple and elegant, and hence we will include it here. For convenience, we will only discuss the graph case, i.e., d = 2. It suffices to just give the O(k/ε)-vertex kernel; the PAS immediately follows by running the brute force algorithm on the output instance from the kernel. The kernel is as simple as it gets: just keep 2k/ε vertices with highest degrees and throw the remaining vertices away! Note that there is a subtle point here, which is that we do not want to throw away the edges linking from the kept vertices to the remaining vertices. If self-loops are allowed in a graph, this is not an issue since we may just add a self-loop to each vertex for each edge adjacent to it with the other endpoint being discarded. When self-loops are not allows, it is still possible to overcome this issue but with slightly larger kernel; we refer the readers to Section 3.2 of [134] for more detail.
Having defined the kernel, let us briefly discuss the intuition as to why it works. Let V 2k/ε denote the set of 2k/ε highest-degree vertices. The main argument of the proof is that, if there is an optimal solution S, then we may modify it to be entirely contained in V 2k/ε while preserving the number of covered edges to within (1 + ε) factor. The modification is simple: for every vertex that is outside of V 2k/ε , we replace it with a random vertex from V 2k/ε . Notice here that we always replace a vertex with a higher-degree vertex. Naturally, this should be good in terms of covering more edges, but there is a subtle point here: it is possible that the high degree vertices are "double counted" if a particular edge is covered by both endpoints. The size 2k/ε is selected exactly to combat this issue; since the set is large enough, "double counting" is rare for random vertices. This finishes our outline for the intuition.
We end by remarking that MAX k-VERTEX COVER on graphs is already APX-hard [135], and hence the PASes mentioned above once again demonstrate additional power of FPT approximation algorithms over polynomial-time approximation algorithms.

Other Related Problems
There are several other covering-related problems that do not fall into the two categories we discussed so far. We discuss a couple such problems below.
Min k-Uncovered. The first is the MIN k-UNCOVERED problem, where the input is a set system and we would like to select k sets as to minimize the number of uncovered elements. When we are concerned with exact solutions, this is of course the SET COVER. However, the optimization version becomes quite different from MAX k-COVERAGE. In particular, since it is hard to determine whether we can find k subsets that cover the whole universe, the problem is not approximable at all in the general case. However, if restrict ourselves to graphs and hypergraphs (for which we refer to the problem as MIN k-VERTEX COVER), it is possible to get a (randomized) PAS for the problem [133]: Theorem 21 ([133]). For the MIN k-VERTEX UNCOVERED problem in d-uniform hypergraphs, a (1 + ε)-approximation can be computed in O * (d/ε) k time for any ε > 0.
The algorithm is based on the following simple randomized branching: pick a random uncovered element and branch on all possibilities of selecting a subset that contains it. Notice that since an element belongs to only d subsets, the branching factor is at most d. The key intuition in the approximation proof is that, when the number of elements we have covered so far is still much less than that in the optimal solution, there is a relatively large probability (i.e., ε) that the random element is covered in the optimal solution. If we always pick such a "good" element in most branching steps, then we would end up with a solution close to the optimum. Skowron and Faliszewski [133] formalizes this intuition by showing that the algorithm outputs an (1 + ε)-approximate solution with probability roughly ε k . Hence, by repeating the algorithm (1/ε) k time, one arrives at the claimed PAS. To the best of our knowledge, it is unknown whether a PSAKS exists for the problem.
Min k-Coverage. Another variant of the SET COVER problem studied is MIN k-COVERAGE, 15 where we would like to select k subsets that minimizes the number of covered elements. We stress here that this problem is not a relaxation of SET COVER but rather is much more closely related to graph expansion problems (see [137]).
It is known that, when there is no restriction on the input set system, the problem is (up to a polynomial factor) as hard to approximate as the DENEST k-SUBGRAPH problem [136]. Hence, by the inapproximability of the latter discussed earlier in the survey (Theorem 8), we also have that there is no k o(1) -approximation algorithm for the problem that runs in FPT time.
Once again, the special case that has been studied in literature is when the input set system is a graph, in which case we refer to the problem as MIN k-VERTEX COVER. Gupta, Lee, and Li [138,139] used the technique of Marx [8] to give a PAS for the problem with running time O * ((k/ε) O(k) ). The running time was later improved in [134] to O * ((1/ε) O(k) ). The algorithm there is again based on branching, but the rules are more delicate and we will not discuss them here. An interesting aspect to note here is that, while both MAX k-VERTEX COVER and MIN k-VERTEX COVER have PSAKS of the (asymptotically) same running time, the former admits a PSAKS whereas the latter does not (assuming a variant of the Small Set Expansion Conjecture) [134].
To the best of our knowledge, MIN k-VERTEX COVER has not been explicitly studied on d-uniform hypergraphs before, but we suspect that the above results should carry over from graphs to hypergraphs as well.

Clustering
Clustering is a representative task in unsupervised machine learning that has been studied in many fields. In combinatorial optimization communities, it is often formulated as the following: Given a set P of points and a set F of candidate centers (also known as facilities), and a metric on X ⊇ P ∪ F given by the distance ρ : X × X → R + ∪ {0}, choose k centers C ⊆ F to minimize some objective function cost := cost(P, C). To fully specify the problem, the choices to make are the following. Let ρ(C, p) := min c∈C ρ(c, p).

•
Objective function: Three well-studied objective functions are • Metric space: The ambient metric space X can be -A general metric space explicitly given by the distance ρ : The Euclidean space R d equipped with the 2 distance.

-
Other structured metric spaces including metrics with bounded doubling dimension or bounded highway dimension.
While many previous results on clustering focused on non-parameterized polynomial time, there are at least three natural parameters one can parameterize: The number of clusters k, the dimension d (if defined), and the approximation accuracy parameter ε. In general metric spaces, parameterized approximation algorithms (mainly with parameter k) were considered very recently, but in Euclidean spaces, many previous results already give parameterized approximation algorithms with parameters k, d, and ε.

General Metric Space
We can assume X = P ∪ F without loss of generality. Let n := |X| and note that the distance ρ : X × X → R + ∪ {0} is explicitly specified by Θ(n 2 ) numbers. A simple exact algorithm running in time O(n k+1 ) can be achieved by enumerating all k centers c 1 , . . . , c k ∈ F and assign each point p to the closest center. In this setting, the best approximation ratios achieved by polynomial time algorithms are 2.611 + ε for k-MEDIAN [140], 9 + ε for k-MEANS [141], and 3 for k-CENTER [142]. 16 From the hardness side, it is NP-hard to approximate k-MEDIAN within a factor 1 + 2/e − ε ≈ 1.73 − ε, k-MEDIAN within a factor 1 + 8/e − ε ≈ 3.94 − ε, k-CENTER within a factor 3 − ε [143].
While there are some gaps between the best algorithms and the best hardness results for k-MEDIAN and k-MEANS, it is an interesting question to ask how parameterization by k changes the approximation ratios for both problems. Cohen-Addad et al. [130] studied this question and gave exact answers.
These results show that if we parameterize by k, 1 + 2/e (for k-MEDIAN) and 1 + 8/e (for k-MEANS) are the exact limits of approximation for parameterized approximation algorithms. Similar reductions also show that no parameterized approximation algorithm can achieve (3 − ε)-approximation for k-CENTER for any ε > 0 (only assuming W [2] = FPT), so the power of parameterized approximation is exactly revealed for all three objective functions.
Algorithm for k-MEDIAN. We briefly describe ideas for the algorithm for k-MEDIAN in Theorem 22. The main technical tool that the algorithm uses is a coreset, which will be also frequently used for Euclidean subspaces in the next subsection.
When S is a set of points with weight functions w : S → R + , let us extend the definition of the objective function cost(S, C) such that Given a clustering instance (P, F, ρ, k) and ε > 0, a subset S ⊆ P with weight functions w : S → R + is called a (strong) corset if for any k centers C = {c 1 , . . . , c k } ⊆ F, | cost(S, C) − cost(P, C)| ≤ ε · cost(P, C). 16 A special case that has received significant attention assumes P = F. In this case, the best approximation ratio for k-CENTER becomes 2.
Given the coreset, it remains to give a good parameterized approximation algorithm for the problem for a much smaller (albeit weighted) point set |P| = O(poly(log n, k)). Note that |F| can be still as large as n, so naively choosing k centers from F will take n k time and exhaustively partitioning P into k sets will take k |P| = n poly(k) time. (Indeed, exactly solving this small case will give an EPAS, which will contradict the Gap-ETH.) Fix an optimal solution, and let C * = {c * 1 , . . . , c * k } are the optimal centers and P * i ⊆ P is the cluster assigned to c * i . One information we can guess is, for each i ∈ [k], the point p i ∈ P * i closest to c * i and (approximate) ρ(c * i , p i ). Since |P| = poly(k log n), guessing them only takes time (k log n) O(k) , which can be made FPT by separately considering the case (log n) k ≤ n and the case (log n) k ≥ n, in which k = Ω(log n/ log log n) and (log n) k = (k log k) O(k) .
Let F i ⊆ F be the set of candidate centers that are at distance approximately r i from p i , so that c * i ∈ F i for each i. The algorithm chooses k centers For any point p ∈ P (say p ∈ P * j , though the algorithm doesn't need to know j), then by the choice of p j . This immediately gives a 3-approximation algorithm in FPT time, which is worse than the best polynomial time approximation algorithm. To get the optimal (1 + 2/e)-approximation algorithm, we further reduce the job of finding c i ∈ F i to maximizing a monotone submodular function with a partition matroid constraint, which is known to admit an optimal (1 − 1/e)-approximation algorithm [146]. Then we can ensure that for (1 − 1/e) fraction of points, the distance to the chosen centers is shorter than in the optimal solution, and for the remaining 1/e fraction, the distance is at most three times the distance in the optimal solution. We refer the reader to [130] for further details.
Constructing a coreset. As discussed above, a coreset is a fundamental building block for optimal parameterized approximation algorithms for k-MEDIAN and k-MEANS for general metrics. We briefly describe the construction of Chen [144] that gives a coreset of cardinality O(k 2 log 2 n/ε 2 ) for k-MEDIAN. Similar ideas can be also used to obtain an EPAS for Euclidean spaces parameterized by k, though better specific constructions are known in Euclidean spaces.
We first partition P into P 1 , . . . P such that = O(k log n) and Such a partition can be obtained by using a known (bicriteria) constant factor approximation algorithm for k-MEDIAN. Next, let t = O(k log n), and for each i = 1, . . . , , we let S i = {s 1 , . . . , s t } be a random subset of t points of P i where each s j is an independent and uniform sample from P i and is given weight |P i |/t. (If |P i | ≤ t, we simply let S i = P i with weights 1.) The final coreset S is the union of all S i 's.
To prove that it works, we simply need to show that for any set of k centers C ⊆ F with |C| = k, so that the union bound over ( n k ) choices of C works. Indeed, we show that for each i = 1, . . . , , so that we can also union bound and sum over i ∈ [ ], using the fact that It is left to prove (1). Fix C and i (let P i = {p 1 , . . . , p |P i | }), and recall that When |P i | ≤ t, S i = P i , so (1) holds. Otherwise, recall that S i = {s 1 , . . . , s t } where each s j is an independent and uniform sample from P i with weight w := |P i |/t. For j = 1, . . . , t, let X j := w · ρ(C, s j ). Note that cost(S i , C) = ∑ j X j and cost If we let X min := min j∈[|P i |] (wρ(C, p j )) and Y j := (X j − X min )/(w · diam(P i )), Y j 's are t i.i.d. random variables that are supported in [0, 1]. The standard Chernoff-Hoeffding inequality gives proving (1) for t = O(k log n) and finishing the proof. A precise version of this argument was stated in Haussler [147].

Euclidean Space
For Euclidean spaces, we assume that X = F = R d for some d ∈ N, endowed with the standard 2 metric. Let n = |P| in this subsection. Now we have k and d as natural structural parameters of clustering tasks. Many previous approximation algorithms in EUCLIDEAN k-MEDIAN and EUCLIDEAN k-MEANS in Euclidean spaces, without explicit mention to parameterized complexity, are parameterized approximation algorithms parameterized by k or d (or both). The highlight of this subsection is that for both EUCLIDEAN k-MEDIAN and EUCLIDEAN k-MEANS, an EPAS exists with only one of k and d as a parameter. Without any parameterization, both EUCLIDEAN k-MEDIAN and k-MEANS are known to be APX-hard [148,149]. We introduce these results in the chronological order, highlighting important ideas.
EUCLIDEAN k-MEDIAN with parameter d. The first PTAS for EUCLIDEAN k-MEDIAN in Euclidean spaces with fixed d appears in Arora et al. [150]. The techniques extend Arora's previous PTAS for the EUCLIDEAN TRAVELING SALESMAN problem in Euclidean spaces [151], first proving that there exists a near-optimal solution that interacts with a quadtree (a geometric division of R d into a hierarchy of square regions) in a restricted sense, and finally finding such a tour using dynamic programming. The running time is n O(1/ε) for d = 2 and n (log n/ε) d−2 for d > 2. Kolliopoulous and Rao [152] improved the running time to 2 O((log(1/ε)/ε) d−1 ) n log d+6 n, which is an EPAS with parameter d.

EUCLIDEAN k-MEDIAN and EUCLIDEAN k-MEANS with parameter k.
An EPAS for EUCLIDEAN k-MEANS even with parameters both k and d took longer to be discovered, and first appeared when Matoušek [153] gave an approximation scheme that runs in time O(nε 2k 2 d log k n). After this, several improvements on the running time followed Bādoiu et al. [154], De La Vega et al. [155], Har-Peled and Mazumdar [156].
A crucial property of the Euclidean space that allows an EPAS with parameter k (which is ruled out for general metrics by Theorem 22) is the sampling property, which says that for any set Q ⊆ R d as one cluster, there is an algorithm that is given only g(1/ε) samples from Q and outputs h(1/ε) candidate centers such that one of them is ε-close to the optimal center for the entire cluster Q for some functions g, h. (For example, for k-MEANS, the mean of O(1/ε) random samples ε-approximates the actual mean with constant probability.) This idea leads to an (1 + ε)-approximation algorithm running in time |P| f (ε,k) . Together even with a general coreset construction of size poly(k, log n, 1/ε), one already gets an EPAS with parameter k. Better coresets construction are also given in Euclidean spaces. Recent developments [160][161][162] construct core-sets of size poly(k, 1/ε) (no dependence on n or d), which is further extended to the shortest-path metric of an excluded-minor graph [163].

EUCLIDEAN k-MEANS with parameter d.
Cohen-Addad et al. [164] and Friggstad et al. [165] recently gave approximation schemes running in time n f (d,ε) using local search techniques. These results were improved to an EPAS in [166], and also extended to doubling metrics [167].
Other metrics and k-CENTER. For the k-CENTER problem an EPAS exists when parametrizing by both k and the doubling dimension [168], and also for planar graphs there is an EPAS for parameter k, which is implied by the EPTAS of Fox-Epstein et al. [169] (cf. [168]).
There are also parameterized approximation schemes for metric spaces with bounded highway dimension [168,170,171] and various graph width parameters [98].

Capacitated clustering and other variants.
Another example where the parameterization by k helps is CAPACITATED k-MEDIAN, where each possible center c ∈ F has a capacity u c ∈ N and can be assigned at most u c points. It is not known whether there exists a constant-factor approximation algorithm, and known constant factor approximation algorithms either open (1 + ε)k centers [172] or violate capacity constraints by an (1 + ε) factor [173]. Adamczyk et al. [174] gave a (7 + ε)-approximation algorithm in f (k, ε)n O(1) time, showing that a constant factor parameterized approximation algorithm is possible. The approximation ratio was soon improved to (3 + ε) [175]. For CAPACITATED EUCLIDEAN k-MEANS, [176] also gave a (69 + ε)-approximation algorithm for in f (k, ε)n O(1) time.
While the capacitated versions of clustering look much harder than their uncapacitated counterparts, there is no known theoretical separation between the capacitated version and the uncapacitated version in any clustering task. Since the power of parameterized algorithms for uncapacitated clustering is well understood, it is a natural question to understand the "capacitated VS uncapacitated question" in the FPT setting.
Open Question 10. Does CAPACITATED k-MEDIAN admit an (1 + 2/e)-approximation algorithm in FPT time with parameter k? Do CAPACITATED EUCLIDEAN k-MEANS/k-MEDIAN admit an EPAS with parameter k or d?
Since clustering is a universal task, like capacitated versions, many variants of clustering tasks have been studied including k-MEDIAN/k-MEANS WITH OUTLIERS [177] and MATROID/KNAPSACK MEDIAN [178]. While no variant is proved to harder than the basic versions, it would be interesting to see whether they all have the same parameterized approximability with the basic versions.

Network design
In network design, the task is to connect some set of vertices in a metric, which is often given by the shortest-path metric of an edge-weighted graph. Two very prominent problems of this type are the TRAVELLING SALESPERSON (TSP) and STEINER TREE problems. For TSP all vertices need to be connected in a closed walk (called a route), and the length of the route needs to be minimized. 17 For STEINER TREE a subset of the vertices (called terminals) is given as part of the input, and the objective is to connect all terminals by a tree of minimum weight in the metric (or graph). Both of these are fundamental problems that have been widely studied in the past, both on undirected and directed input graphs.
Undirected graphs. A well-studied parameter for STEINER TREE is the number of terminals, for which the problem has been known to be FPT since the early 1970s due to the work of Dreyfus and Wagner [179]. Their algorithm is based on dynamic programming and runs in 3 k n O(1) time if k is the number of terminals. Faster algorithms based on the same ideas with runtime (2 + δ) k n O(1) for any constant δ > 0 exist [180] (here the degree of the polynomial depends on δ). The unweighted STEINER TREE problem also admits a 2 k n O(1) time algorithm [181] using a different technique based on subset convolution. Given any of these exact algorithms as a subroutine, a faster PAS can also be found [20] (cf. Section 4.7). On the other hand, no exact polynomial-sized kernel exists [119] for the STEINER TREE problem, unless NP⊆coNP/poly. Interestingly though, a PSAKS can be obtained [18].
This kernel is based on a well-known fact proved by Borchers and Du [182], which is very useful to obtain approximation algorithms for the STEINER TREE problem. It states that any Steiner tree can be covered by smaller trees containing few terminals, such that these trees do not overlap much. More formally, a full-component is a subtree of a Steiner tree, for which the leaves coincide with its terminals. For the optimum Steiner tree T and any ε > 0, there exist full-components C 1 , . . . , C of T such that 1.
the sum of the weights of the full-components is at most 1 + ε times the cost of T, and 3.
taking any collection of Steiner trees T 1 , . . . , T , such that each tree T i connects the subset of terminals that forms the leaves of full-component C i , the union i=1 T i is a feasible solution to the input instance.
Not knowing the optimum Steiner tree, it is not possible to know the subsets of terminals of the full-components corresponding to the optimum. However, it is possible to compute the optimum Steiner tree for every subset of terminals of size at most 2 1/ε using an FPT algorithm for STEINER TREE. The time to compute all these solutions is k O(2 1/ε ) n O(1) , using for instance the Dreyfus and Wagner [179] algorithm. Now the above three properties guarantee that the graph given by the union of all the computed Steiner trees, contains a (1 + ε)-approximation for the input instance. In fact, the best polynomial time approximation algorithm known to date [183] uses an iterative rounding procedure to find a ln(4)-approximation of the optimum solution in the union of these Steiner trees. To obtain a kernel, the union needs to be sparsified, since it may contain many Steiner vertices and also the edge weights might be very large. However, Lokshtanov et al. [18] show that the number of Steiner vertices can be reduced using standard techniques, while the edge weights can be encoded so that their space requirement is bounded in the parameter and the cost of any solution is distorted by at most a 1 + ε factor.
A natural alternative to the number of terminals is to consider the vertices remaining in the optimum tree after removing the terminals: a folklore result states that STEINER TREE is W[2]-hard parameterized by the number of non-terminals (called Steiner vertices) in the optimum solution. At the same time, unless P=NP there is no PTAS for the problem, as it is APX-hard [184]. However an approximation scheme is obtainable when parametrizing by the number of Steiner vertices k in the optimum, and also a PSAKS is obtainable under this parameterization.
To obtain both of these results, Dvořák et al. [185] devise a reduction rule that is based on the following observation: if the optimum tree contains few Steiner vertices but many terminals, then the tree must contain (1) a large component containing only terminals, or (2) a Steiner vertex that has many terminal neighbours. Intuitively, in case (2) we would like to identify a large star with terminal leaves and small cost in the current graph, while in case (1) we would like to find a cheap edge between two terminals. Note that such a single edge also is a star with terminal leaves. The reduction rule will therefore find the star with minimum weight per contained terminal, which can be done in polynomial time. This rule is applied until the number of terminals, which decreases after each use, falls below a threshold depending on the input parameter k and the desired approximation ratio 1 + ε. Once the number of terminals is bounded by a function of k and ε, the Dreyfus and Wagner [179] algorithm can be applied on the remaining instance, or a kernel can be computed using the PSAKS of Theorem 23. It can be shown that the reduction rule does not distort the optimum solution by much as long as the threshold is large enough, which implies the following theorem. This theorem is also generalizable to the STEINER FOREST problem, where a list of terminal pairs is given and the task is to find a minimum weight forest in the input graph connecting each pair. In this case though, the parameter has to be combined with the number of connected components of the optimum forest [185].
A variation of the STEINER FOREST problem is the SHALLOW-LIGHT STEINER NETWORK (SLSN) problem. Here a graph with both edge costs and edge lengths is given, together with a set of terminal pairs and a length threshold L. The task is to compute a minimum cost subgraph, which connects each terminal pair with a path of length at most L. For this problem a dichotomy result was shown [186] in terms of the pattern given by the terminal pairs. More precisely, the terminal pairs are interpreted as edges in a graph for which the vertices are the terminals: if C is some class of graphs, then SLSN C is the SHALLOW-LIGHT STEINER NETWORK problem restricted to sets of terminal pairs that span some graph in C. Let C denote the class of all stars, and C λ the class of graphs with at most λ edges. The SLSN C problem is APX-hard [184], as it is a generalization of STEINER TREE (where L = ∞). At the same time, both the SLSN C and SLSN C λ problems parameterized by the number of terminals are paraNP-hard [187], since they are generalizations of the RESTRICTED SHORTEST PATH problem (where there is exactly one terminal pair). A PAS can however be obtained for both of these problems (whenever λ is a constant), but for no other class C of demand patterns [186]. Theorem 25 ([186]). For any constant λ > 0, there is an FPTAS for the SLSN C λ problem. For the SLSN C problem a (1 + ε)-approximation can be computed in 4 k (n/ε) O(1) time for any ε > 0, where k is the number of terminal pairs. Moreover, under Gap-ETH no (5/3 − ε)-approximation for SLSN C can be computed in f (k)n O(1) time for any ε > 0 and computable function f , whenever C is a recursively enumerable class for which C ⊆ C ∪ C λ for every constant λ.
A notable special case is when all edge lengths are 1 but edge costs are arbitrary. Then SLSN C λ is polynomial time solvable for any constant λ, while SLSN C is FPT parameterized by the number of terminals [186]. At the same time the parameterized approximation lower-bound of Theorem 25 is still valid for this case. It is not known however, whether constant approximation factors can be obtained for SLSN C when C is a class different from C λ and C . More generally we may ask the following question.
Open Question 11. Given some class of graphs C ⊆ C ∪ C λ , which approximation factor α C can be obtained in FPT time for SLSN C parameterized by the number of terminals?
Turning to the TSP problem, a generalization of TSP introduces deadlines until which vertices need to be visited by the computed tour. A natural parameterization in this setting is the number of vertices that have deadlines. It can be shown [188] that no approximation better than 2 can be computed when using this parameter. Nevertheless, a 2.5-approximation can be computed in FPT time [188]. The algorithm will guess the order in which the vertices with deadlines are visited by the optimum solution. It then computes a 3/2-approximation for the remaining vertices using Christofides algorithm [3]. The approximation ratio follows, since the optimum tour can be thought of as two tours, of which one visits only the deadline vertices, while the other contains all remaining vertices. The approximation algorithm incurs a cost of OPT for the former, and a cost of 3 2 · OPT for the latter part of the optimum tour. Low dimensional metrics. Just as for clustering problems, another well-studied parameter in network design is the dimension of the underlying geometric space. A typical setting is when the input is assumed to be a set of points in some k-dimensional p -metric, where distances between points x and y are given by a function 1.7em(x, y) = (∑ k i=1 |x i − y i | p ) 1/p . Two prominent examples are Euclidean metrics (where p = 2) and Manhattan metrics (where p = 1). The dimension k of the metric space has been studied as a parameter from the parameterized approximation point-of-view avant la lettre for quite a while. It was shown [189,190] that both STEINER TREE and TSP are paraNP-hard for this parameter (since they are NP-hard even if k = 2), and that they are APX-hard in general metrics [184,191]. However, a PAS for Euclidean metrics both for the STEINER TREE and the TSP problems were shown to exist in the seminal work of Arora [151]. 18 The techniques are similar to those used for clustering, and we refer to Section 4.3.2 for an overview. 18 In [151] the runtime of these algorithms is stated as O(n(log n) O( √ k/ε) k−1 ), which can be shown to be upper bounded by This result also holds for the t-MST and t-TSP problems [151], where the cheapest tree or tour, respectively, on at least t nodes needs to be found. In this case the runtime has to be multiplied by t however.
A related setting is the parameterization by the doubling dimension of the underlying metric. That is, when the parameter k is the smallest integer such that any ball in the metric can be covered by 2 k balls of half the radius. Any point set in a k dimensional p -metric has doubling dimension O(k), and thus the latter parameter generalizes the former. For the TSP problem the above theorem can be generalized [192] to a PAS parameterized by the doubling dimension.
Theorem 28 ([192]). For the TSP problem a (1 + ε)-approximation can be computed in 2 (k/ε) O(k 2 ) n log 2 n time for any ε > 0, if the input consists of n points with doubling dimension k.
Given that a PAS exists for STEINER TREE in the Euclidean case, it is only natural to ask whether this is also possible for low doubling metrics. Only a QPTAS is known so far [193]. Moreover, a related parameter is the highway dimension, which is used to model transportation networks. As shown by Feldmann et al. [194] the techniques of Talwar [193] for low doubling metrics can be generalized to the highway dimension to obtain a QPTAS as well. Again, it is quite plausible to assume that a PAS exists.

Open Question 12.
Is there a PAS for STEINER TREE parameterized by the doubling dimension? Is there a PAS for either STEINER TREE or TSP parameterized by the highway dimension? Directed Graphs. When considering directed input graphs (asymmetric metrics), the DIRECTED STEINER TREE problem takes as input a terminal set and a special terminal called the root. The task is to compute a directed tree of minimum weight that contains a path from each terminal to the root. In general no f (k)-approximation can be computed in FPT time for any computable function f , when the parameter k is the number of Steiner vertices in the optimum solution [185]. A notable special case is the unweighted DIRECTED STEINER TREE problem, which for this parameter admits a PAS. The techniques here are the same as those used to obtain Theorem 24 for the undirected case. However, in contrast to the undirected case which admits a PSAKS, no polynomial-sized (2 − ε)-approximate kernelization exists for DIRECTED STEINER TREE [185], unless NP⊆coNP/poly. It is an intriguing question whether a 2-approximate kernel exists.

Open Question 13.
Is there a polynomial-sized 2-approximate kernel for the unweighted DIRECTED STEINER TREE problem parameterized by the number of Steiner vertices in the optimum solution?
If the parameter is the number of terminals, the (weighted) DIRECTED STEINER TREE problem is FPT, using the same algorithm as for the undirected version [179,180]. A different variant of STEINER TREE in directed graphs is the STRONGLY CONNECTED STEINER SUBGRAPH problem, where a terminal set needs to be strongly connected in the cheapest possible way. This problem is W[1]-hard parameterized by the number of terminals [195], and no O(log 2−ε n)-approximation can be computed in polynomial time [196], unless NP ⊆ ZTIME(n polylog(n) ). However, a 2-approximation can be computed in FPT time [197].
The crucial observation for this algorithm is that in any strongly connected solution, fixing some terminal as the root, every terminal can be reached from the root, while at the same time the root can be reached from each terminal. Thus the optimum solution is the union of two directed trees, of which one is directed towards the root and the other is directed away from the root, and the leaves of both trees are terminals. Hence it suffices to compute two solutions to the DIRECTED STEINER TREE problem, which can be done in FPT time, to obtain a 2-approximation for STRONGLY CONNECTED STEINER SUBGRAPH. Interestingly, no better approximation is possible with this runtime [46]. Theorem 29 ([46,197]). For the STRONGLY CONNECTED STEINER SUBGRAPH problem a 2-approximation can be computed in (2 + δ) k n O(1) time for any constant δ > 0, where k is the number of terminals. Moreover, under Gap-ETH no (2 − ε)-approximation can be computed in f (k)n O(1) time for any ε > 0 and computable function f .
A generalization of both DIRECTED STEINER TREE and STRONGLY CONNECTED STEINER SUBGRAPH is the DIRECTED STEINER NETWORK problem, 19 for which an edge-weighted directed graph is given together with a list of ordered terminal pairs. The aim is to compute the cheapest subgraph that contains a path from s to t for every terminal pair (s, t). If k is the number of terminals, then for this problem no k 1/4−o(1) -approximation can be computed in f (k)n O(1) time [54] for any computable function f , under Gap-ETH. Both a PAS and a PSAKS exist [46] for the special case when the input graph is planar and bidirected, i.e., for every directed edge uv the reverse edge vu exists and has the same cost.
Similar to the PSAKS for the STEINER TREE problem, these two algorithms are based on a generalization of Borchers and Du [182]. That is, Chitnis et al. [46] show that a planar solution in a bidirected graph can be covered by planar graphs with at most 2 O(1/ε) terminals each, such that the sum of their costs is at most 1 + ε times the cost of the solution. These covering graphs may need to contain edges that are reverse to those in the solution, but are themselves not part of the solution. For this the underlying graph needs to be bidirected. Analogous to STEINER TREE, to obtain a kernel it then suffices to compute solutions for every possible list of ordered pairs of at most 2 O(1/ε) terminals. In contrast to STEINER TREE however, there is no FPT algorithm for this. Instead, an XP algorithm with runtime 2 O(k 3/2 log k) n O( needs to be used, which runs in polynomial time for k ≤ 2 O(1/ε) terminals with ε being a constant. After taking the union of all computed solutions, the number of Steiner vertices and the encoding length of the edge weights can be reduced in a similar way as for the STEINER TREE problem. To obtain a PAS, the algorithm will guess how the planar optimum can be covered by solutions involving only small numbers of terminals. It will then compute solutions on these subsets of at most 2 O(1/ε) terminals using the same XP algorithm.

Cut Problems
Starting from Menger's theorem and the corresponding algorithm for s-t CUT, graph cut problems have always been at the heart of combinatorial optimization. While many natural generalizations of s-t CUT are NP-hard, further study of these cut problems yielded beautiful techniques such as flow-cut gaps and metric embeddings in approximation algorithms [198,199], and also important separators and randomized contractions in parameterized algorithms [200][201][202][203].

Multicut
An instance of UNDIRECTED MULTICUT (resp. DIRECTED MULTICUT) is an undirected (resp. directed) graph G = (V, E) with k pairs of vertices (s 1 , t 1 ), . . . , (s k , t k ). The goal is to remove the minimum number of edges such that there is no path from s i to t i for every i ∈ [k]. UNDIRECTED MULTIWAY CUT (resp. DIRECTED MULTIWAY CUT) is a special case of UNDIRECTED MULTICUT (resp. DIRECTED MULTIWAY CUT) where k vertices are given as terminals and the goal is to make sure there is no path between any pair of terminals. They have been actively studied from both approximation and parameterized algorithms perspectives. We survey parameterized approximation algorithms for these problems with parameters k and the solution size OPT.
Undirected Multicut. UNDIRECTED MULTICUT admits an O(log k)-approximation algorithm [204] in polynomial time, and is NP-hard to approximate within any constant factor assuming the Unique Games Conjecture [205]. UNDIRECTED MULTIWAY CUT admits an 1.2965-approximation algorithm [206] in polynomial time, and is NP-hard to approximate within a factor 1.20016 [207]. UNDIRECTED MULTICUT (and thus UNDIRECTED MULTIWAY CUT) admits an exact algorithm parametrized by OPT [200,201].
With k as a parameter, we cannot hope for an exact algorithm or an approximation scheme, since even UNDIRECTED MULTIWAY CUT with 3 terminals is NP-hard to approximate within a factor 12/11 − ε for any ε > 0 under the Unique Games Conjecture. However, for UNDIRECTED MULTICUT with k pairs (s 1 , t 1 ), . . . , (s k , t k ), one can reduce it to k O(k) instances of UNDIRECTED MULTIWAY CUT with at most 2k terminals, by guessing a partition of these s 1 , t 1 , . . . , s k , t k according to the connected components containing them in the optimal solution (e.g., s i and t i should be always in different groups), merging the vertices in the same group into one vertex, and solving UNDIRECTED MULTIWAY CUT with the merged vertices as terminals. This shows an 1.2965-approximation algorithm for UNDIRECTED MULTICUT that runs in time k O(k) n O (1) . Some recent results improve or generalize this observation. For graphs with bounded genus g, Cohen-Addad et al [208] gave an EPAS running in time f (g, k, ε) · n log n. Chekuri and Madan [209] considered the demand graph H, which is the graph formed by k edges (s 1 , t 1 ), . . . , (s k , t k ). When t is the smallest integer such that H does not contain t disjoint edges as an induced subgraph, they presented a 2-approximation algorithm that runs in time k O(t) n O(1) .

Directed Multicut.
Generally, DIRECTED MULTICUT is a much harder computational task than UNDIRECTED MULTICUT in terms of both approximation and parameterized algorithms. DIRECTED MULTICUT admits a min(k, O(n 11/23 ))-approximation algorithm [210]. It is NP-hard to approximate within a factor k − ε for any ε > 0 for fixed k [211] under the Unique Games Conjecture, or 2 Ω(log 1−ε n) for any ε > 0 [212] for general k. DIRECTED MULTIWAY CUT admits an 2-approximation algorithm [213], which is tight even when k = 2 [211]. Parameterizing by OPT, DIRECTED MULTICUT is FPT for k = 2, but DIRECTED MULTICUT is W[1]-hard even when k = 4 [43]. DIRECTED MULTIWAY CUT on the other hand is in FPT [202].
Since it is hard to improve the trivial k-approximation algorithm even for fixed k [211], parameterizing by k does not yield a better approximation algorithm. Chitnis and Feldmann [214] gave a k/2-approximation algorithm that runs in time 2 O(OPT 2 ) n O (1) , and also proved that the problem under the same parameterization is still hard to approximate within a factor 59/58 with k = 4.

Open Question 14.
What is the best approximation ratio (as a function of k) achieved by a parameterized algorithm (with parameter OPT)? Will it be close to O (1) or Ω(k)?

Minimum Bisection and Balanced Separator.
Given a graph G = (V, E), MINIMUM EDGE BISECTION (resp. MINIMUM VERTEX BISECTION) asks to remove the fewest number of edges such that the graph is partitioned into two parts A and B with ||A| − |B|| ≤ 1. BALANCED EDGE SEPARATOR (resp. BALANCED VERTEX SEPARATOR) is a more relaxed version of the problem where the goal is to bound the size of the largest component by αn for some 1/2 < α < 1. It has been actively studied from approximation algorithms, culminating in O( log n)-approximation algorithms for both BALANCED EDGE SEPARATOR and BALANCED VERTEX SEPARATOR [199,215], and an O(log n)-approximation algorithm for MINIMUM EDGE BISECTION [216].

k-Cut
Given an undirected graph G = (V, E) and an integer k ∈ N, the k-CUT problem asks to remove the smallest number of edges such that G is partitioned into at least k non-empty connected components. The edge contraction algorithm by Karger and Stein [218] yields a randomized exact XP algorithm running in time O(n 2k ), which was made deterministic by Thorup [219]. There were recent improvements to the running time [139,220]. There is an exact parameterized algorithm with parameter OPT [202,221]. For general k, it admits a (2 − 2/k)-approximation algorithm [222], and is NP-hard to approximate within a factor (2 − ε) for any ε > 0 under the Small Set Expansion Hypothesis [67].
A simple reduction shows that k-CUT captures (k − 1)-CLIQUE, so an exact FPT algorithm with parameter k is unlikely to exist. Gupta et al. [138] gave an (2 − δ)-approximation algorithm for a small universal constant δ > 0 that runs in time f (k) · n O(1) . The approximation ratio was improved to 1.81 in [139], and further to 1.66 [223]. Very recently, Lokshtanov et al. [224] gave a PAS that runs in time (k/ε) O(k) n O(1) , thereby (essentially) resolving the parameterized approximability of k-CUT.

F -DELETION Problems
Let F be a vertex-hereditary family of undirected graphs, which means that if G ∈ F and H is a vertex-induced subgraph of G, then H ∈ F as well. F -DELETION is the problem where given a graph G = (V, E), we are supposed to find S ⊆ V such that the subgraph induced by V \ S (denoted by G \ S) belongs to F . The goal is to minimize |S|. The natural weighted version, where there is a non-negative weight w(v) for each vertex v and the goal is to minimize the sum of the weights of the vertices in S, is called WEIGHTED F -DELETION.
F -DELETION captures numerous combinatorial optimization problems, including VERTEX COVER (when F includes all graphs with no edges), FEEDBACK VERTEX SET (when F is the set of all forests), and ODD CYCLE TRANSVERSAL (when F is the set of all bipartite graphs). There are a lot more interesting graph classes F studied in structural and algorithmic graph theory. Some famous examples include planar graphs, perfect graphs, chordal graphs, and graphs with bounded treewidth.
In addition to beautiful structural results that give multiple equivalent characterizations, these graph classes often admit very efficient algorithms for some tasks that are believed to be hard in general graphs. Therefore, a systematic study of F -DELETION for more graph classes is not only an interesting algorithmic task by itself, but also a way to obtain better algorithms for other optimization problems when the given graph G is close to a nice class F (i.e., deleting few vertices from G makes it belong to F .) Indeed, some algorithms for INDEPENDENT SET for noisy planar/minor-free graphs discussed in Section 4.1 use an algorithm for F -DELETION as a subroutine [80].
For the maximization version where the goal is to maximize |V \ S|, a powerful but pessimistic characterization is known. Lund and Yannakakis [225] showed that whenever F is vertex-hereditary and nontrivial (i.e., there are infinitely many graphs in F and out of F ), the maximization version is hard to approximate within a factor 2 log 1/2−ε n for any ε > 0. So no nontrivial F is likely to admit even a polylogarithmic approximation algorithm. However, the situation is different for the minimization problem, since VERTEX COVER admits a 2-approximation algorithm, while ODD CYCLE TRANSVERSAL [226] and PERFECT DELETION [227] are NP-hard to approximate within any constant factor approximation algorithm. (The first result assumes the Unique Games Conjecture.) It indicates that a characterization of approximabilities for the minimization versions will be more complex and challenging.
There are two (closely related) frameworks to capture large graph classes.
• Choose a graph width parameter (e.g., treewidth, pathwidth, cliquewidth, rankwidth, etc.) and k ∈ N. Let F be the set of graphs G with the chosen width parameter at most k. The parameter of F -DELETION is k.

•
Choose a notion of subgraph (e.g., subgraph, induced subgraph, minor, etc.) and a finite family of forbidden graphs H. Let F be the set of graphs G that do not have any graph in H as the chosen notion of subgraph. The parameter of F -DELETION is |H| := ∑ H∈H |V(H)|.
Many interesting classes are capture by the above frameworks. For example, to express FEEDBACK VERTEX SET, we can take F to be the set of graphs with treewidth at most 1, or equivalently, the set of graphs that does not have the triangle graph K 3 as a minor. In the rest of the subsection, we introduce known results of F -DELETION under the above two parameterization. Note that under these two parameterizations, the need for approximation is inherent since the simplest problem in both frameworks, VERTEX COVER, already does not admit a polynomial-time (2 − ε)-approximation algorithm under the Unique Games Conjecture.
Finally, we mention that the parameterization by the size of the optimal solution has been studied more actively from the parameterized complexity community, where many important problems are shown to be in FPT [228][229][230].

Treewidth and Planar Minor Deletion
The treewidth of a graph (see Definition 1) is arguably the most well-studied graph width parameter with numerous structural and algorithmic applications. It is one of the most important concepts in the graph minor project of Robertson and Seymour. Algorithmically, Courcelle's theorem [231] states that every problem expressible in the monadic second-order logic of graphs can be solved in FPT time parameterized by treewidth. We refer the reader to the survey of Bodlaender [232]. Computing treewidth is NP-hard in general [233], but if we parameterize by treewidth, it can be done in FPT time [234], and there is a faster constant-factor parameterized approximation algorithm [235].
Let k ∈ N be the parameter. TREEWIDTH k-DELETION (also known as TREEWIDTH k-MODULATOR in the literature) is a special case of F -DELETION where F is the set of all graphs with treewidth at most k. Note the case k = 0 yields VERTEX COVER and k = 1 yields FEEDBACK VERTEX SET.
Fomin et al. [228] gave a randomized f (k)-approximation algorithm that runs in g(k) · nm for some computable functions f and g. The approximation ratio was improved by Gupta et al. [236] that gave a deterministic O(log k)-approximation algorithm that runs in f (k) · n O(1) some f . This result has immediate applications to minor deletion problems. Let H be a finite set of graphs, and consider H-MINOR DELETION, which is a special case of F -DELETION when F is the set of all graphs that do not have any graph in H as a minor. Its parameterized and kernelization complexity (with parameter OPT) for family H has been actively studied [228,237,238].
When H contains a planar graph H (also known as PLANAR H-DELETION in the literature), by the polynomial grid-minor theorem [239], any graph G ∈ F has treewidth at most k := poly(|V(H)|). Therefore, in order to solve H-MINOR DELETION, one can first solve TREEWIDTH k-DELETION to reduce the treewidth to k and then solve H-MINOR DELETION optimally using Courcelle's theorem [231]. Combined with the above algorithm for TREEWIDTH k-DELETION [236], this strategy yields an O(log k)-approximation algorithm that runs in f (|H|) · n O(1) time.
Beyond PLANAR H-DELETION, there are not many results known for H-MINOR DELETION. The case H = {K 5 , K 3,3 } is called MINIMUM PLANARIZATION and was recently shown to admit an O(log O(1) n)-approximation algorithm in n O(log n/ log log n) time [240].
While the unweighted versions of TREEWIDTH k-DELETION and PLANAR H-DELETION admit an approximation algorithm whose approximation ratio only depends on k not n, such an algorithm is not known for WEIGHTED TREEWIDTH k-DELETION or WEIGHTED PLANAR H-DELETION. Agrawal et al. [241] gave a randomized O(log 1.5 n)-approximation algorithm and a deterministic O(log 2 n)-approximation algorithm that run in polynomial time for fixed k, i.e., the degree of the polynomial depends on k. Bansal et al. [80] gave an O(log n log log n)-approximation algorithm for the edge deletion version. The only graphs H whose weighted minor deletion problem is known to admit a constant factor approximation algorithm are single edge (WEIGHTED VERTEX COVER), triangle (WEIGHTED FEEDBACK VERTEX SET), and diamond [242]. For the weighted versions, no hardness beyond VERTEX COVER is known.
Open Question 15. Does WEIGHTED TREEWIDTH k-DELETION admit an f (k)-approximation algorithm with parameter k for some function f ? Does TREEWIDTH k-DELETION admit a c-approximation algorithm with parameter k for some universal constant c?
Algorithms for TREEWIDTH k-DELETION. Here we present high-level ideals of [236,241] for TREEWIDTH k-DELETION and WEIGHTED TREEWIDTH k-DELETION respectively. These two algorithms share the following two important ingredients:

2.
There are good approximation algorithms to find such separators.
Given an undirected and vertex-weighted graph G = (V, E) and an integer k ∈ N, let (WEIGHTED) k-VERTEX SEPARATOR be the problem whose goal is to remove the vertices of minimum total weight so that each connected component has at most k vertices. An algorithm is called an α-bicriteria approximation algorithm if it returns a solution whose total weight is at most α · OPT and each connected component has at most 1.1k vertices. 20 The case k = 2n/3 is called BALANCED SEPRATOR and has been actively studied in the approximation algorithms community, and the best approximation algorithm achieves O( log n)-bicriteria approximation [215]. When k is small, O(log k)-bicriteria approximation is also possible [105].
WEIGHTED TREEWIDTH k-DELETION. Agrawal et al. [241] achieves an O(log 1.5 n)-approximation for WEIGHTED TREEWIDTH k-DELETION in time n O(k) . It would be interesting to see whether the running time can be made FPT with parameter k.
The main structure of their algorithm is top-down recursive. Deleting the optimal solution S * from G reduces the treewidth of G \ S * to k, so from the forest decomposition of G \ S * , there exists a set 20 here 1.1 can be replaced by 1 + ε for any constant ε > 0. M * ⊆ G \ S * with at most k + 1 vertices such that each connected component of G \ (M * ∪ S * ) has at most 2n/3 vertices. While we do not know S * , we can exhaustively try every possible M ⊆ V with |M| ≤ k + 1 and use the bicriteria approximation algorithm for BALANCED SEPRATOR to find M and S such that (1) |M| ≤ k + 1, (2) w(S) ≤ O( log n)OPT, and (3) G \ (M \ S) has at most 1.1 · (2n/3) ≤ 3n/4 vertices.
Let G 1 , . . . , G t be the resulting connected components of G \ (S ∩ M). We solve each G i recursively to compute S i such that each G i \ S i has treewidth at most k. The weight of S was already bounded in terms of OPT, but the weight of M was not, so we finally need to consider the graph induced by M ∪ V(G 1 ) ∪ · · · ∪ V(G t ) and delete vertices of small weight to ensure small treewidth. However, this task is easy since since the treewidth of each G i is bounded by k and |M| ≤ k + 1, which bounds the treewidth of the considered graph by 2k + 1. So we can fetch the algorithm for small treewidth graphs to solve the problem optimally. Note that the total weight of removed vetices in this recursive call is at most (O( log n) + 1)OPT. Since ∑ i OPT(G i ) ≤ OPT(G) and the recursion depth is at most O(log n), the total approximation ratio is O(log 1.5 n).
TREEWIDTH k-DELETION. Gupta et al. [236] give an O(log k)-approximation algorithm that runs in time f (k) · n O(1) for the unweighted version of TREEWIDTH k-DELETION. The main structure of this algorithm is bottom-up iterative refinement. The algorithm maintains a feasible solution S ⊆ V (we can start with S = V), and iteratively uses S to obtain another feasible solution S . If the new solution is not smaller (i.e., |S | ≥ |S|), then |S| ≤ O(log k) · OPT.
Let us focus on one refinement step with the current feasible solution S. Let S * be the optimal solution, so that G \ S * has treewidth at most k. We use the following simple lemma showing the existence of a good separator of G in a finer scale than before. Plugging H ← G \ S * , T ← S in the above lemma and letting S = R ∪ S * , we can conclude that there exists S ⊆ V such that |S | ≤ |R| + |S * | ≤ ε|S| + OPT and each connected component of G \ S has at most O(k/ε) vertices from S.
How can we find such a set S efficiently? Note that if S = V, then S is an O(k/ε)-vertex separator of G. Lee [105] defined a generalization of k-VERTEX SEPARATOR called k-SUBSET VERTEX SEPARATOR, where the input consists of G = (V, E), S ⊆ V, k ∈ N, and the goal is to remove the smallest number of vertices so that each connected component has at most k vertices from S, and gave an O(log k)-bicriteria approximation algorithm.
Since the above lemma guarantees that OPT for O(k/ε)-SUBSET VERTEX SEPARATOR is at most OPT for TREEWIDTH k-DELETION plus ε|S|, applying this bicriteria approximation algorithm yields S such that |S | ≤ O(log k)(OPT + ε|S|) and each connected component of G \ S has at most O(k/ε) vertices from S. Since S is a feasible solution, it implies that the treewidth of each connected component is bounded by O(k/ε), so we can solve each component optimally in time f (k/ε) · n O(1) . By setting ε = 0.5, we can see the size of new solution is strictly decreased unless |S| = O(log k) · OPT, finishing the proof.

Subgraph Deletion
Let H be a fixed pattern graph with k vertices. Given a host graph G, deciding whether H is a subgraph of G (in the usual sense) is known as SUBGRAPH ISOMORPHISM, whose parameterized complexity with various parameters (e.g., k, tw(H), genus(G), etc.) was studied by Marx and Pilipczuk [243].
Guruswami and Lee [104] studied the corresponding vertex deletion problem H-SUBGRAPH DELETION (called H-TRANSVERSAL in the paper), which is a special case of F -DELETION where F is the set of graphs that do not have H as a subgraph. Note that the problem admits a simple k-approximation algorithm that runs in time O(n · f (n, H)), where f (n, H) denotes time to solve SUBGRAPH ISOMORPHISM with the pattern graph H and a host graph with n vertices. Their main hardness result states that assuming the Unique Games Conjecture, whenever H is 2-vertex connected, for any ε > 0, no polynomial time algorithm (including algorithms running in time n f (k) for any f ) can achieve a (k − ε)-approximation. (Without the UGC, they still ruled out a (k − 1 − ε)-approximation.) Among H that are not 2-vertex-connected, there is an O(log k)-approximation algorithm when H is a star (in time n O(1) ) or a path (in time f (k)n O(1) ) [104,105,244]. The algorithm for k-path follows from the result for TREEWIDTH k-DELETION, because any graph without a k-path has treewidth at most k. Whenever H is a tree with k vertices, detecting a copy of H in G with n vertices can be done in 2 O(k) n O(1) time [245], and it is open whether there is an O(log k)-approximation algorithm for H-SUBGRAPH DELETION in time f (k) · n O(1) .

Other Deletion Problems
Chordal graphs. A graph is chordal if it does not have an induced cycle of length ≥ 4. Chordal graphs form a subclass of perfect graphs that have been actively studied. Initially motivated by efficient kernels, approximation algorithms for CHORDAL DELETION have been developed recently. The current best results are a poly(OPT)-approximation [127,246] and a O(log 2 n)-approximation [241].
Edge versions. While this subsection focused on the vertex deletion problem, there are some results on the edge deletion, edge addition, and edge modification versions. (Edge modification allows both addition and deletion.) Cao and Sandeep [247] studied MINIMUM FILL-IN, whose goal is to add the minimum number of edges to make a graph chordal. They gave new inapproximability results implying improved time lower bounds for parameterized algorithms. Giannopoulou et al. [248] gave O(1)-approximation algorithms for PLANAR H-IMMERSION DELETION parameterized by H. Bliznets et al. [249] considered H-free edge modification for a forbidden induced subgraph H and give an almost complete characterization on its approximability depending on H.
Directed graphs. There is also a large body of work on parameterized algorithms for vertex deletion problems in directed graphs. While many of the known problems (including DIRECTED FEEDBACK VERTEX SET [250]) admit an exact FPT algorithm, Lokshtanov et al. [44] studied DIRECTED ODD CYCLE TRANSVERSAL, and proved that it is W[1]-hard and is unlikely to admit an PAS under the Parameterized Inapproximability Hypothesis (or Gap-ETH). They complemented the result by showing a 2-approximation algorithm running in time f (OPT)n O(1) .

Faster Algorithms and Smaller Kernels via Approximation
The focus of this section so far has been on problems for which its exact version is intractable (i.e., W[1]/W[2]-hard) and the goal is to obtain good approximations in FPT time. In this subsection, we shift our focus slightly by asking: does approximation allow us to find faster algorithms for problems already known to be in FPT?
To illustrate this, let us consider VERTEX COVER. It is of course well-known that the exact version of the problem can be solved in FPT time, with the current best running time being O * (1.2738 k ) [251]. The question here would be: if we are allowed to output an (1 − ε)-approximate solution, instead of just an exact one, can we speed up the algorithm?
To the best of our knowledge, such a question was tackled for the first time by Bourgeois et al. [252] and revisited quite a few times in the literature [20,[253][254][255][256]. As one might have suspected, the answer to this question is a YES, as stated below.
The main idea of the algorithm is inspired by the "local ratio" method in the approximation algorithms literature (see e.g. [257]) and we sketch it here. The algorithm works in two stages. In the first stage, we run the greedy algorithm: as long as we have picked less than 2εk vertices so far and not all edges are covered, pick an uncovered edge and add both endpoints to our solution. In the second stage, we run the exact algorithm on the remaining part of the graph to find a VERTEX COVER of size (1 − ε)k. Since the first stage runs in polynomial time, the running time of the entire algorithm is dominated by the second stage, whose running time is O * (δ (1−ε)k ) as desired. The correctness of the algorithm follows from the fact that, for each selected edge in the first step, the optimal solution still needs to pick at least one endpoint. As a result, the optimal solution must pick at least εk vertices with respect to the first stage (compared to 2εk picked by the algorithm). Thus, when the optimal solution is of size at most k, there must be a solution in the second stage of size at most (1 − ε)k, meaning that the algorithm finds such a solution and outputs a vertex cover of size (1 + ε)k as claimed.
The above "approximate a small fraction and brute force the rest" approach of Fellows et al. [20] generalizes naturally to problems beyond VERTEX COVER. Fellows et al. [20] formalized the method in terms of α-fidelity kernelization and apply it to several problems, including CONNECTED VERTEX COVER, d-HITTING SET and STEINER TREE. For these problems, the method gives an (1 + ε)-approximation algorithm that runs in time O * (δ (1−Ω(ε))k ), where δ > 0 denotes a constant for which a O * (δ k )-time algorithm is known for the exact version of the corresponding problem. The approach, in some form or another, is also applicable both to other parameterized problems [258,259] and to non-parameterized problems (e.g. [252]); since the latter is out-of-scope for the survey, we will not discuss the specifics here.
An intriguing question related to this line of work is whether it must be the case that the running time of (1 + ε)-approximation algorithms is of the form O * (δ (1−Ω(ε))k ). That is, can we get a (1 + o(1))-approximation for these problems in time O * (λ k ) where λ is a constant strictly smaller than δ? More specifically, we may ask the following: Open Question 16. Let δ > 0 be the smallest (known) constant such that an O * (δ k )-time exact algorithm exists for VERTEX COVER. Is there an algorithm that, for any ε > 0, runs in time f (1/ε) · O * (λ k ) for some constant Of course, the question applies not only for VERTEX COVER but other problems in the list as well. The informal crux of this question is whether, in the regime of very good approximation factors (i.e. 1 + o(1)), approximation can still be exploited in such a way that the algorithm works significantly better than the approach "approximate a o(1) fraction and then brute force".
Turning back once again to our running example of VERTEX COVER, it turns out that algorithms faster than "approximate a small fraction and then brute force" are known [253,255,256] but only for the regime of large approximation ratios. In particular, Brankovic and Fernau [253] give faster algorithms than in Theorem 31 already for approximation ratios as small as 3/2. The algorithms in [255,256] focus on the case of "barely non-trivial" (2 − ρ)-approximation factors. (Recall the greedy algorithm yields a 2-approximation and, under the Unique Games Conjecture, the problem is NP-hard to approximate to within any constant factor less than 2.) The algorithm in [255] has a running time of O * (2 k/2 Ω(1/ρ) ), which was later improved in [256] to O * (2 k/2 Ω(1/ρ 2 ) ). These running times should be contrasted with that of "approximate a small fraction and then brute force" (i.e., applying Theorem 31 directly with ε = 1 − ρ) which gives an algorithm with running time O * (2 kρ ). In other words, Refs. [255,256] improve the "saving factor" from 1/ρ to 2 Ω(1/ρ) and 2 Ω(1/ρ 2 ) respectively. It should be noted however that, since the known (2 − o(1))-factor hardness of approximation is shown via the Unique Games Conjecture and unique games admit subexponential time algorithms [260,261], it is still entirely possible that this regime of approximating VERTEX COVER admits subexponential time algorithms as well. This is perhaps the biggest open question in the "barely non-trivial" approximation range: Open Question 17. Is there an algorithm that runs in 2 o(k) n O(1) time and achieves an approximation ratio of (2 − ρ) for some absolute constant ρ > 0?
Let us now briefly discuss the techiques used in some of the aforementioned works. The algorithms in [253,255] are based on branching in conjunction with certain approximation techniques. (See also [262] where a similar technique is used for a related problem TOTAL VERTEX COVER.) A key idea in [253,255] is that (i) if the (average or maximum) degree of the graph is small, then good polynomial-time approximation algorithms are known [263] and (ii) if the degree is large, then branching algorithms are naturally already fast. The second part of [253] involves a delicate branching rule. However, for [255], it is quite simple: for some threshold d (to be specified), as long as there exists a vertex with degree at least d, then (1) with some probability, simply add the vertex to the vertex cover, or (2) branch on both possibilities of it being inside the cover and outside. After this branching finishes and we are left with low-degree graphs, just run the known polynomial-time approximation algorithms [263] on these graphs. The point here is that the "error" incurred if option (1) is chosen will be absorbed by the approximation. By carefully selecting d and the probability, one can arrive at the desired running time and approximation guarantee. This algorithm is randomized, but can be derandomized using the sparsification lemma [264].
To the best of our knowledge, this "barely non-trivial approximation" regime has not been studied beyond VERTEX COVER. In particular, while Bansal et al. [255] apply their techniques on several problems, these are not parameterized problems and we are not aware of any other parameterized study related to the regime discussed here.
Parallel to the running time questions we have discussed so far, one may ask an analogous question in the kernelization regime: does approximation allow us to find smaller kernels for problems that already admit polynomial-size kernels? As is the case with exact algorithms, parameterized approximation algorithms go hand in hand with approximate kernels. Indeed, many algorithmic improvements mentioned can also be viewed as improvements in terms of the size of the kernels. In particular, recall the proof sketch of Theorem 31 for VERTEX COVER. If we stop and do not proceed with brute force in the second step, then we are left with an (1 + ε)-approximate kernel. It is also not hard to argue that, by for instance applying the standard 2k-size kernelization at the end, we are left with at most 2(1 − ε)k vertices. This improves upon the best known 2k − Θ(log k) bound for the exact kernel [265]. A similar improvement is known also for d-HITTING SET [20].

Future Directions
Although we have provided open questions along the way, we end this survey by zooming out and discussing some general future directions or meta-questions, which we find to be interesting and could be the basis for future work.

Approximation Factors
The quality of a polynomial-time approximation algorithm is mainly measured by the obtainable approximation factor α: the smaller it is the more feasibly solvable the problem is. Therefore, a lot of work has been invested into determining the smallest obtainable approximation factor α for all kinds of computationally hard problems. In the non-parameterized (i.e., NP-hardness) world, a whole spectrum of approximability has been discovered (cf. [3,4]): the most feasibly solvable NP-hard problems (e.g., the KNAPSACK problem) admit a so-called polynomial-time approximation scheme (PTAS), which is an algorithm computing a (1 + ε)-approximation for any given constant ε > 0. Some problems can be shown not to admit a PTAS (under reasonable complexity assumptions), but still allow constant approximation factors (e.g., the STEINER TREE problem). Yet others can only be approximated within a polylogarithmic factor (e.g., the SET COVER problem), while some are even harder than this, as the best approximation factor obtainable is polynomial in the input size (e.g., the CLIQUE problem).
In contrast to polynomial-time approximation algorithms, a full spectrum of obtainable approximation ratios is still missing when allowing parameterized runtimes. Instead, only some scattered basic results are known. In particular, most of parameterized approximation problems belongs to one of the following categories: • A parameterized approximation scheme (PAS) exists, i.e., for any constant ε > 0 a (1 + ε)-approximation can be computed in f (k)n O(1) time for some parameter k. These are currently the most prevalent types of results in the literature. To just mention one example, the STEINER TREE problem is APX-hard, but admits a PAS [185] when parameterized by the number of non-terminals (so-called Steiner vertices) in the optimum solution (cf. Section 4.4).

•
A lower bound excluding any non-trivial approximation factor exists. For example, under ETH the DOMINATING SET problem has no g(k)-approximation in f (k)n o(k) time [34] for any functions g and f , where k is the size of the largest dominating set. • A polynomial-time approximation algorithm can achieve a similar approximation ratio, i.e., the parameterization is not very helpful. For instance, for the k-CENTER problem 21 a 2-approximation can be computed in polynomial time [266], but even when parameterizing by k no (2 − ε)-approximation is possible [171] for any ε > 0, under standard complexity assumptions. A similar situation holds for MAX k-COVERAGE, which we discussed in Section 4.2.2.

•
Constant or logarithmic approximation ratios can be shown, and which beat any approximation ratio obtainable in polynomial time. For instance, STRONGLY CONNECTED STEINER SUBGRAPH problem : under standard complexity assumptions, for this problem no polynomial-time O(log 2−ε n)-approximation algorithm exists [196], and there is no FPT algorithm parameterized by the number k of terminals [195]. However it is not hard to compute a 2-approximation in 2 O(k) n O(1) time [197], and no (2 − ε)-approximation algorithm with runtime f (k)n O(1) exists [46] under Gap-ETH, for any function f and any ε > 0 (cf. Section 4.4).
For many problems discussed in this survey, including DENSEST k-SUBGRAPH, STEINER TREE with bounded doubling/highway dimension, it has not been determined which category they belong. There are also a lot of problems in the final category for which asymptotically tight approximation ratios have not been found, including DIRECTED MULTICUT, TREEWIDTH k-DELETION (both weighted and unweighted). The parameterized approximability of H-MINOR DELETION for non-planar H is also widen open except 21 Here we consider the version where the set of candidate centers is not separately given.
MINIMUM PLANARIZATION (H = {K 5 , K 3,3 }) [240]. It is an immediate but still interesting direction to prove tight parameterized approximation ratios for these (and more) problems.
Digressing, we remark that this survey does not include FPT-approximation of counting problems, such as approximately counting number of k-paths in a graph. The best (1 + ε)-multiplicative factor algorithm known [267,268] for counting number of k-paths runs in time 4 k f (ε)poly(n) for some subexponential function f (cf. [269]). So a natural question is: can we count k-paths approximately in time c k , where c is as close to the base of running time of the algorithm of deciding existence of k-Path in a graph (the best currently known c is roughly 1.657 [270,271])?

Parameterized Running Times
The quality of FPT algorithms is mainly measured in the obtainable runtime. Given a parameter k, for some problems the optimum solution can be computed in f (k)n g(k) time, for some functions f and g independent of the input size n (i.e., the degree of the polynomial also depends on the parameter). If such an algorithm exists the problem is slice-wise polynomial (XP), and the algorithm is called an XP algorithm. A typical example is if a solution of size k is to be found within a data set of size n, in which case often an n O(k) time exhaustive search algorithm exists. However, an FPT algorithm with runtime, say, O(2 k n) is a lot more efficient than an XP algorithm with runtime n O(k) , and therefore the aim is usually to find FPT algorithms, while XP algorithms are counted as prohibitively slow. The discovery of the W-hierarchy in complexity theory has paved the way to providing evidence when an FPT algorithm is unlikely to exist. Assuming ETH, it is even possible to provide lower bounds on the runtimes obtainable by any FPT or XP algorithm. Similar to approximation algorithms, this has lead to the discovery of a spectrum of tractability (cf. [6]): starting from slightly sub-exponential 2 O( √ k) n O(1) time, through single exponential 2 O(k) n O(1) time, to double exponential 2 2 O(k) n O(1) time for FPT algorithms with matching asymptotic lower bounds under ETH (e.g., for the PLANAR VERTEX COVER, VERTEX COVER, and EDGE CLIQUE COVER problems, respectively, each parameterized by the solution size). For XP algorithms, asymptotically tight runtime bounds of the form n O( √ k) and n O(k) can be obtained under ETH (e.g., for the CLIQUE problem parameterized by the solution size, and the PLANAR BIDIRECTED STEINER NETWORK problem parameterized by the number of terminals [46], respectively). Finally, problems that are NP-hard when the given parameter is constant do not even allow XP algorithms unless P=NP (e.g., the GRAPH COLOURING problem where the parameter is the number of colours).
In terms of tight runtime bounds, existing results on parameterized approximation algorithms are few and far between. In particular, most of them show that for a given parameter k one of the following cases applies.

•
An approximation is possible in f (k)n O(1) time for some function f . Most current results are only concerned with the existence of an algorithm with this type of runtime, i.e., they do not provide any evidence that the obtained runtime is best possible, or try to optimize it. The only lower bounds known exclude certain types of approximation schemes when a hardness result for the parameterization by the solution size exists. For instance, it is known that if some problem does not admit a 2 o(k) n O(1) time algorithm for this parameter k then it also does not admit an EPTAS with runtime 2 o(1/ε) n O(1) (cf. [5,8]).

•
A certain approximation ratio cannot be obtained in f (k)n O(1) time for any function f . For example, it is known that while a 2-approximation for the STRONGLY CONNECTED STEINER SUBGRAPH problem can be computed in 2 O(k) n O(1) time [197], where k is the number of terminals, no (2 − ε)-approximation can be computed in f (k)n O(1) time [46] for any function f , under Gap-ETH (cf. Section 4.4).
Hence, matching lower bounds on the time needed to compute an approximation are missing. For example, is the runtime of 2 O(k) n O(1) best possible to compute a 2-approximation for the STRONGLY CONNECTED STEINER SUBGRAPH problem? Could there be a 2 O( √ k) n O(1) time algorithm to compute a 2-approximation as well? For PASs the exact obtainable runtime is often elusive, even if certain types of approximation schemes can be excluded. For instance, for the STEINER TREE problem parameterized by the number of Steiner vertices in the optimum solution a (1 + ε)-approximation can be computed in 2 O(k 2 /ε 4 ) n O(1) time [185]. Is the dependence on k and ε best possible? Could there be a 2 O(k/ε 4 ) n O(1) or 2 O(k 2 /ε) n O(1) time algorithm as well?
We remark that, for problems for which straightforward algorithms are known to be (essentially) the best possible in FPT time, or for which an improvement over polynomial time approximation is not possible, sometimes tight running time lower bounds are known in conjunction with tight inapproximability ratios. This includes k-DOMINATING SET (Section 3.1.2), k-CLIQUE (Section 3.2.1) and MAX k-COVERAGE (Section 4.2.2).

Kernel Sizes
The development of compositionality has lead to a theory from which lower bounds on the size of the smallest possible kernel of a problem can be derived (under reasonable complexity assumptions). The spectrum (cf. [6]) here reaches from polynomial-sized kernels (e.g., for any q ≥ 3 and ε > 0 the q-SAT problem parameterized by the number of variables n has no O(n q−ε )-sized kernel) to exponential-sized kernels (e.g., the STEINER TREE problem parameterized by the number of terminals does not admit any polynomial-sized kernel despite being FPT).
For approximate kernels, only a small number of publications exist, and the few known results fall into two categories: • A polynomial-sized approximate kernelization scheme (PSAKS) exists, i.e., for any ε > 0 there is a (1 + ε)-approximate kernelization algorithm that computes a (1 + ε)-approximate kernel of size polynomial in the parameter k. For example, the STEINER TREE problem admits a PSAKS for both the parameterization in the number of terminals [18] and in the number of Steiner vertices in the optimum [185], even though neither of these two parameters admits a polynomial-sized (exact) kernel. • A lower bound excluding any approximation factor for polynomial-sized kernels exists. For example, the LONGEST PATH problem parameterized by the maximum path length has no α-approximate polynomial-sized kernel for any α [18], despite being FPT for this parameter [6].
Hence again the intermediate cases, for which tight constant or logarithmic approximation factors can be proved for polynomial-sized kernels, are missing. Studying approximate kernelization algorithms however is of undeniable importance to the field of parameterized approximation algorithms, as witnessed by the importance of exact kernelization to fixed-parameter tractability.

Completeness in Hardness Of Approximation
A final direction we would like to highlight is to obtain more completeness in inapproximability results. Most of the results so far for FPT hardness of approximation either (i) rely on gap hypothesis or (ii) yield a hardness in terms of the W-hierarchy but the exact version of the problem is known to be complete on an even higher level (e.g., DOMINATING SET is known to be W[1]-hard to approximate but its exact version is W[2]-complete). We have discussed (i) extensively in Section 3.2 and some examples of (ii) in Section 3.1. There are also some examples of (ii) that are not covered here; for instance, Marx [272] showed W[t]-hardness for certain monotone/anti-monotone circuit satisfiability problems and the exact versions of these problems are known to be complete for higher levels of the W-hierarchy. The situation here is unlike that in the theory of NP-hardness of approximation; there the PCP Theorem [21,22] implies NP-completeness of optimization problems. 22 Thus, in the parameterized inapproximability arena, the main question here is whether we can prove completeness results for hardness of approximation for the aforementioned problems. The two important examples here are: is k-CLIQUE W[1]-hard to approximate, and is k-DOMINATING SET W[2]-hard to approximate? As discussed in Section 3.2, the former is also closely related to resolving PIH.
Finally, we note that, while completeness results are somewhat rare in FPT hardness of approximation, some are known. We give two such examples here. First is the k-STEINER ORIENTATION problem, discussed in Section 3.1.3; it is W[1]-complete to approximate [41]. Second is the MONOTONE CIRCUIT SATISFIABILITY problem (without depth bound), which was proved to be W[P]-complete by Marx [272]. However, it does not seem clear to us whether these techniques can be applied elsewhere, e.g., for k-CLIQUE.