Sorting by Multi-Cut Rearrangements

. Let S be a string built on some alphabet Σ . A multi-cut re-arrangement of S is a string S (cid:48) obtained from S by an operation called k -cut rearrangement , that consists in (1) cutting S at a given number k of places in S , making S the concatenated string X 1 · X 2 · X 3 . . . X k · X k +1 , where X 1 and X k +1 are possibly empty, and (2) rearranging the X i s so as to obtain S (cid:48) = X π (1) · X π (2) · X π (3) . . . X π ( k +1) , π being a permutation on 1 , 2 . . . k + 1 satisfying π (1) = 1 and π ( k + 1) = k + 1. Given two strings S and T built on the same multiset of characters from Σ , the Sorting by Multi-Cut Rearrangements ( SMCR ) problem asks whether a given number (cid:96) of k -cut rearrangements suﬃces to transform S into T . The SMCR problem generalizes and thus encompasses several classical genomic rearrangements problems, such as Sorting by Trans-positions and Sorting by Block Interchanges . It may also model chromoanagenesis , a recently discovered phenomenon consisting in massive simultaneous rearrangements. In this paper, we study the SMCR problem from an algorithmic complexity viewpoint, and provide, depending on the respective values of k and (cid:96) , polynomial-time algorithms as well as NP -hardness, FPT -algorithms, W[1] -hardness and approximation results, either in the general case or when S and T are permutations


Introduction
Genome rearrangements refer to large-scale evolutionary events that affect the genome of a species.They include among others reversals [1], transpositions [2], and block interchanges [5]; see also [9] for a full description.Compared to smallscale evolutionary events such as insertion, deletion or substitution of single DNA nucleotides, they are considered to be rare and, until recently, were assumed to happen one after the other.In the recent literature, however, a new type of event, To appear in Proceedings of the 47th International Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM '21), Bolzano-Bozen, Italy, January 2021.© Springer.called chromoanagenesis, has been shown to occur in genomes [13,12].The term chromoanagenesis subsumes different types of rearrangements (namely, chromothripsis, chromoanasynthesis and chromoplexy) whose common ground is the following: in a single event, the genome is cut into many blocks, and then rearranged.As stated by Pellestor and Gatinois [12], these are "massive chromosomal rearrangements arising during single chaotic cellular events".Chromoanagenesis, and notably chromothripsis, is suspected to play a role in cancer and congenital diseases [13].In this paper, we introduce a new model for genome rearrangements that is general enough to encompass most of the previously described genome rearrangements [9] as well as chromoanagenesis.Our goal here is to study its properties in terms of computational complexity.
Notation.Given an alphabet Σ, we say that two strings S ∈ Σ * and T ∈ Σ * are balanced if S and T are built on the same multiset of characters -in other words, each character in S also appears in T in the same number of occurrences.We denote by |S| the length of a string S. Unless otherwise stated, we assume that |S| = |T | = n.We denote by S i , 1 ≤ i ≤ n, the i-th character of S. Given a string S in Σ * , we denote by d the maximum number of occurrences of any character of Σ in S. In the specific case where d = 1 (i.e. when S and T are permutations), and for any 0 ≤ i ≤ n, we say that there is a breakpoint at position i in S (or, equivalently, that (S i , S i+1 ) is a breakpoint) if the two consecutive characters S i and S i+1 are not consecutive in T .For the specific cases i = 0 and i = n, we artificially set S 0 = T 0 = α 0 and S n+1 = T n+1 = α n+1 where α 0 / ∈ Σ and α n+1 / ∈ Σ.Thus, there is a breakpoint at position 0 (resp.n) in S whenever S 1 = T 1 (S n = T n ).We also denote by b(S, T ) the number of breakpoints in S with respect to T .If (S i , S i+1 ) is not a breakpoint, we say that it is an adjacency.Definition 1.Given a string S ∈ Σ * and an integer k, a k-cut rearrangement of S is an operation consisting in the two following steps: (1) cut S at k locations (thus S can be written as the concatenation of k + 1 strings, i.e. S = X 1 • X 2 • X 3 . . .X k+1 , where each X i is possibly empty, and where a cut occurs between X i and X i+1 , 1 ≤ i ≤ k) and ( 2) rearrange (i.e., permute) the X i s so as to obtain S = X π(1) • X π(2) • X π(3) . . .X π(k+1) , π being a permutation on the elements 1, 2 . . .k + 1 such that π(1) = 1 and π(k + 1) = k + 1.Each of the X i s considered in a given k-cut rearrangement will be called a block.
Note that, although a k-cut rearrangement has been defined as a cut along the string at k locations, it is always possible, if necessary, to perform only k ≤ k cuts at a given step-thus mimicking a k-cut rearrangement while actually realizing a k -cut rearrangement-by cutting several times at the same location.The case where X 1 (resp.X k+1 ) is empty corresponds to the case where the leftmost (resp.rightmost) block of S is moved to obtain S , otherwise X 1 (resp.X k+1 ) remains unmoved in S .Note that, in this model, each of the blocks X i s can only be permuted, thus no reversal of an X i is allowed, and therefore the strings we consider are always unsigned.In this paper, we study the following problem.

Sorting by Multi-Cut Rearrangements (SMCR)
Instance : Two balanced strings S and T , two integers and k.Question : Is there a sequence of at most many k-cut rearrangements that transforms S into T ?
For convenience, we may also refer to the SMCR problem with parameters k and as the (k, )−SMCR problem.Our goal in this paper is to provide algorithmic results regarding SMCR.The computational complexity of SMCR highly depends on whether we set bounds on k and : depending on applications, they can either be fixed constants (and in that case algorithms running in e.g.O(n k ) are acceptable), parameters (unbounded, but far smaller than n, then algorithms in f (k) • poly(n)-that is, Fixed-Parameter Tractable (or FPT) algorithms [8,7]-would be relevant even for fast-growing functions f ), or parts of the input (i.e.unbounded, and in that case only polynomial-time algorithms on n and k are relevant).Hence, we will consider each of these cases for both and k.For this study, we will consider separately the case of strings (i.e., d > 1 both in S and T ) from the case of permutations (i.e., d = 1 both in S and T ), in Sections 2 and 3, respectively.
Basic observations.Both in permutations and strings, the cases k = 1 and k = 2 are trivial, since they do not allow to move any block, and thus we are in presence of a Yes-instance iff S = T .
Additionally, the SMCR problem is a natural generalization and extension of a certain number of problems that have already been defined and studied in the literature before, as described hereafter.
When k = 3, each k-cut rearrangement is necessarily a transposition of blocks X 2 and X 3 .Thus SMCR in that case is equivalent to the Sorting by Transpositions problem [2], for which we know it is NP-hard, even if S and T are permutations [3].
When k = 4, each k-cut rearrangement allows to move two blocks among X 2 , X 3 and X 4 , which exactly corresponds to the Sorting by Block Interchange problem.This problem is known to be in P for permutations [5] and NP-hard for strings (an NP-hardness proof for binary strings is given in [6], Theorem 5.7.2).
When = 1, the SMCR problem comes down to deciding whether k cuts are sufficient to rearrange S into T in one atomic move (i.e., one k-cut rearrangement).In permutations, the problem is trivially solved by counting the number b(S, T ) of breakpoints between S and T , since we have a Yes-instance iff b(S, T ) ≤ k.In strings, the SMCR problem resembles the Minimum Common String Partition problem [11], as will be discussed in Theorems 3 and 4.
When and k are constant, SMCR is trivially polynomial-time solvable, since a brute-force algorithm, exhaustively testing all possible k-cut rearrangements at each of the authorized moves, has a running time of O(n k +1 ) -the additional n factor being needed to verify that the result corresponds to string T .
It is also natural to wonder whether (k, )-SMCR and (k , 1)-SMCR are equivalent.It can be easily seen that a Yes-instance for (k, )-SMCR is also a for = 1 even when d = 2 (Thm 3) parameter ?for any fixed ≥ 1 (Thm 2) part of the input for any k ≥ 5 even in k-ary strings (Thm 1) for k = 3, 4 even in binary strings [3,6] Yes-instance for (k , 1)-SMCR: it suffices for this to aggregate all cuts from the (k, )-SMCR solution (of which there are at most k ), and rearrange accordingly.However, the reverse (i.e. from (k , 1)-SMCR to (k, )-SMCR) is not always true.For example, take S = afedcbg, T = abcdefg, k = 3, and = 2: this is a Yes-instance for (6, 1)-SMCR, whereas it is a No-instance for (3, 2)-SMCR.Indeed, in this instance the number b(S, T ) of breakpoints is equal to 6. Thus, in the former case, the 6 following cuts (symbolized as vertical segments) in S = a|f|e|d|c|b|g suffice to obtain T after a single 6-cut rearrangement.In the latter case, every 3-cut rearrangement is a transposition, and in this instance no transposition can decrease b(S, T ) by 3. Thus at least three 3-cut rearrangements are necessary to transform S into T .

Sorting by Multi-Cut Rearrangements in Strings
In this section, we provide algorithmic results concerning the Sorting by Multi-Cut Rearrangements problem, in the general case where S and T are strings.
Our results are summarized in Table 1.
As mentioned in the previous section, we know that SMCR is NP-hard in binary strings for k = 3, 4. In the following theorem, we extend this result to any value of k, however in larger alphabet strings.
Theorem 1. SMCR is NP-hard for any fixed k ≥ 5, even in k-ary strings.
Proof.We reduce the NP-hard 3-Partition problem in which the input is a set A of integers and an integer B, and the question is whether A can be partitioned into triples such that the integers of each triple sum up to B. Given an instance of 3-Partition (A, B) with A = {a 1 , a 2 , . . ., a 3m } and mB = 3m i=1 a i , we construct an instance of SMCR for any fixed k ≥ 5 as follows.For ease of presentation, we assume that each a i is a multiple of 4m and that B 4 < a i < B 2 , so that we have the following property: If for some subset I of {1, . . ., 3m} and some δ with 0 ≤ δ ≤ 4m we have i∈I a i + δ = B + 4, then i∈I a i = B, δ = 4, and |I| = 3.We use a size-k alphabet {0, 1, . . ., k −1}, we denote by X the string k−1 • k−2 • . . .3, and by X the reverse of X, i.e. 3 • 4 • . . .• k−1 (note that X and X have length k − 3 ≥ 2).We define S := 10 a1 10 a2 . . . 10 a3m 1(20X0X0X0) m 2 and T := (1X ) 3m 1(20 B+4 ) m 2, and set = 3m.This completes the construction.Before proving its correctness, we group the adjacencies of the strings S and T based on whether the adjacencies of the two involved characters are in excess in S, in T , or equal in both strings.
There are no further adjacencies in S or T .To show the correctness of the reduction we show that (A, B) is a Yes-instance of 3-Partition iff there exists a sequence of at most = 3m many k-cut rearrangements transforming S into T .
(⇒) Pick a solution of 3-Partition.For each triple (a i , a j , a p ) of the solution, choose a unique substring 20X0X0X0 of S and perform the following three k-cut rearrangements: First, cut S around 0 ai and around the first copy of X in the chosen subsequence, and cut at every position inside X. Observe that the number of cuts is exactly k.Now reverse X into X and exchange 0 ai and X .Perform a similar k-cut rearrangement with a j and the second occurrence of X and with a p and the third occurrence of X in the selected substring.The selected substring is transformed into 200 ai 00 aj 00 a k 0 = 20 B+4 and since each string 0 ai is replaced by X , the first part of the string is (1X ) 3m .Hence, the string obtained by the 3m many k-cut rearrangements described above is T .
(⇐) There are altogether 6m + (k − 2)3m = 3km adjacencies in Group 1 which are in excess in S, and 3km adjacencies in Group 2 which are in excess in T .Since = 3m, each k-cut rearrangement cuts k adjacencies in Group 1 (and no adjacency in Group 2 or 3).In particular, no 00 adjacency may be cut in a feasible solution, so each subsequence 0 B+4 in T is obtained by concatenating a number of strips of the form 0 ai as well as some number δ of 0 singletons.Since S has 4m of these singleton 0s, we have 0 ≤ δ ≤ 4m.By the constraint on the values of a i , each subsequence 0 B+4 in T contains four singletons from S and three substrings 0 ai of S whose lengths sum to B. Thus, the m substrings 0 B+4 in T correspond to a partition of A into m sets of three integers whose values sum up to B. Theorem 2. SMCR is NP-hard for any fixed .
Proof.The reduction being very similar to the one of Theorem 1, we only highlight the differences to have a fixed instead of fixed k.First assume that m is a multiple of (add up to triples of dummy elements otherwise), and let k = 15m .Note that k is a multiple of 5.The reduction is the same as above with a size-5 alphabet.In other words, we have X = 43 and X = 34.
In the forward direction, use the described scenario using 5-cut rearrangements, but combine a series of k/5 such rearrangements into a single k-cut rearrangement, as described at the end of Section 1.This gives a total of 3m k/5 = many k-cut rearrangements sorting S into T .In the reverse direction, the same breakpoint count holds, namely 3k m adjacencies need to be broken using kcut rearrangements, with k = 3k m.So again no 00 adjacency may be broken, and by the same argument, we obtain a valid 3-partition of A.
The previous theorem shows NP-hardness of SMCR for any fixed .However, a stronger result can be obtained in the specific case = 1.
Proof.The proof is obtained by reduction from the Minimum Common String Partition problem, which has been proved to be NP-hard in strings, even when d = 2 [11].Recall that the decision version of MCSP asks, given two balanced strings S and T , and an integer p, whether S can be written as the concatenation of p blocks S = X 1 • X 2 . . .X p−1 • X p and T can be written as , where π is a permutation of 1, 2 . . .p.Note that here we may have π(1) = 1 and/or π(p) = p.
Given an instance (S, T, p) of MCSP, we build an instance (S , T , k, ) of SMCRby setting S = x • S, T = T • x (with x / ∈ Σ), k = p + 2 and = 1.Clearly, if (S, T, p) is a Yes-instance for MCSP, then (S , T , p + 2, 1) is a Yesinstance for SMCR: the MCSP solution uses p − 1 cuts, to which we add one before x, one after x, and one after S n for solving SMCR.Conversely, if SMCR is a Yes-instance for SMCR, and since x occurs only once in S , then S 1 = x, and thus 2 cuts are used to "isolate" x from S .Besides, since T ends with x, there must exist a cut after the last character of S .Hence, since S = x • S, at most k − 3 = p − 1 cuts are used strictly within S, which in turns means that S has been decomposed in p blocks, which can be rearranged so as to obtain T since = 1.Thus, (S, T, p) is a Yes-instance for MCSP.
Note that MCSP has been proved to be in FPT with respect to the size of the solution [4].It can be seen that this result can be adapted for the SMCR problem in the case = 1.

Sorting by Multi-Cut Rearrangements in Permutations
In this section, we provide algorithmic results concerning the Sorting by Multi-Cut Rearrangements problem, in the specific case where S and T are permutations.Our results are summarized in Table 2.
Theorem 5. SMCR in permutations is FPT with respect to parameter + k.
Proof.We obtain the fixed-parameter tractability result by using the following reduction rule: If there is a common adjacency (a, b) in S and T , then remove b from S and T .Before we show the correctness, observe that exhaustive application of the rule indeed gives the desired result: Any Yes-instance that is reduced exhaustively with respect to the above rule has O(k ) letters: We must cut between every adjacency in S. Overall, we may create at most 2k cuts via many k-cut rearrangements.Hence, if S has more than 2k adjacencies, then (S, T ) is a No-instance.Thus, after applying the rule exhaustively, we either know that that the instance is a No-instance or we may solve it in f (k, ) time by using a brute-force algorithm.Thus it remains to show correctness of the rule.Consider an instance consisting of the permutations S and T to which the rule is applied and let S and T denote the resulting instance.We show that (S, T ) is a Yes-instance if and only if (S , T ) is a Yes-instance.
(⇒) If (S, T ) is a Yes-instance, then there is a sequence of + 1 permutations (S = S 1 , S 2 , . . ., S +1 = T ) such that S i+1 can be obtained from S i via one k-cut rearrangement.Removing b from S i gives a sequence (S 1 , S 2 , . . ., S +1 = T ) such that S i+1 can be obtained from S i via one k-cut rearrangement.
(⇐) If (S , T ) is a Yes-instance, then there is a sequence of + 1 permutations (S = S 1 , S 2 , . . ., S +1 = T ) such that S i+1 can be obtained from S i via one k-cut rearrangement.Replacing a by ab in each permutation S i gives a sequence (S = S 1 , S 2 , . . ., S +1 = T ) such that S i+1 can be obtained from S i via one k-cut rearrangement.Theorem 6. SMCR in permutations is W [1]-hard parameterized by .
Proof.The proof is by reduction from Unary Bin Packing, whose instance is a multiset A = {a 1 , a 2 , . . ., a n } of integers encoded in unary, and two integers b and C. The goal is to decide whether one can partition A into b multisets A 1 , . . ., A b , such that aj ∈Ai a j ≤ C, for each 1 ≤ i ≤ b.This problem has been shown to be W [1]-hard [10] with respect to the number of multisets b, even when the sum of the elements n i=1 a i is equal to bC.Take an instance I of Unary Bin Packing such that n i=1 a i = bC.From I, we construct, in polynomial time, the following instance I * of SMCR.We first define S as the following permutation of [bC + 1]: where each X i is the length-(a i − 1) decreasing sequence over We set T to be the identity over the same alphabet [bC + 1].
An element at position i in S is called an anchor if S i = i (in bold above).Since we want to transform S into the identity permutation, the anchors correspond to fixed points that are already well located.For any 1 ≤ i ≤ n, the reversed sequence X i is delimited by two anchors.We finally set = b and k = C.Each X i has exactly a i breakpoints: two at its extremities with the anchors and a i − 2 internal ones.Since n i=1 a i = bC, it can be seen that S contains exactly k breakpoints.Since a k-cut rearrangement can remove at most k breakpoints, then at least such rearrangements are necessary to sort S. We now show that I is a Yes-instance for Unary Bin Packing problem iff I * is a Yes-instance for SMCR.
(⇒) Suppose I is a Yes-instance for Unary Bin Packing.Thus there exists a partition A 1 . . .A b of the multiset A. To sort S, the k-cut rearrangements we apply consist in reversing the S i s.Note that in order to reverse a complete S i corresponding to a given a i , 1 ≤ i ≤ n, we need exactly a i cuts, e.g. to transform σ = . . .p + 1|p For any 1 ≤ i ≤ b, we have aj ∈Ai a j = C and since C = k, we can define a k-cut rearrangement that consists in reversing the X j1 , X j2 , . . ., X jp corresponding to elements a j1 , a j2 , . . ., a jp of A i .Since there are b such multisets and since b = , such k-cut rearrangements are sufficient to sort S.
(⇐) Suppose I * is a Yes-instance for SMCR.Since b(S, T ) = k, using kcut rearrangements to sort π means that each rearrangement removes exactly k breakpoints.It is only feasible if each X i is reversed at once (i.e., during a single k-cut rearrangement) using exactly a i cuts.Indeed, if we cut at c i < a i places in X i , we will be able to fix strictly less than c i breakpoints and so the k-cut rearrangement in which the c i cuts take place will not remove k breakpoints as expected.By following the moves during the sorting of S, it suffices to see which X i s are reversed within the same k-cut rearrangement.In that case, the sum of the corresponding a i s is equal to k = C and, using = b, such multisets of a i s provide a solution to Unary Bin Packing.Theorem 7.For any k ≥ 5, SMCR in permutations is NP-hard.This hardness proof is by reduction from Sorting By Transpositions on 3-cyclic permutations [3].Intuitively, in such permutations, it is straightforward to identify triples of breakpoints, called 3-cycles, that should be solved together in a transposition, however the difficulty arises in selecting a correct order in which those 3-cycles should be solved.Our approach consists in extending these 3-cycles into k-cycles, such that any k-cut rearrangement solving the original cycle must solve all k breakpoints together, and still performs a simple transposition on the rest of the sequence (to this end, k − 3 dummy elements are created in order to consume the extra blocks in k-cut rearrangements).We first recall the necessary definitions and properties for breakpoints and cyclic permutations, then show how to extend a single cycle by only two or three elements, and finally successively apply this method to extend all cycles to any size k ≥ 5.
Breakpoints and Cycle Graph.For a permutation S of length n, we assume the alphabet of S is {1, . . ., n}.We further write S 0 = 0 and S n+1 = n + 1.For a rearrangement r transforming S into S , we write r(S) = S and r(S, T ) = (S , T ).The cycle graph C(S, T ) of strings S and T is the graph over n + 1 vertices {0, . . .n} with arcs T j → S i if T j+1 = S i+1 .Every vertex has in-degree and out-degree 1, so the graph is a disjoint union of cycles.Self-loops are called trivial cycles (when seen as a cycle) or adjacencies (when seen as an arc), other arcs are breakpoints.An element (or vertex) x is an adjacency (resp.breakpoint) according to its outgoing arc (we use transparently the bijection between a vertex and its outgoing arc).A k-cycle is a cycle of length k.The next breakpoint of breakpoint x → y in C(S, T ) is y (or equivalently, the outgoing arc of y).We write C x (S, T ) for the cycle of C(S, T ) containing element x.A cycle graph is k-cyclic (and, by extension, a pair of sequences generating this cycle graph) if it contains only adjacencies and k-cycles.A rearrangement r applied to a permutation S cuts an element x, 0 ≤ x ≤ n, if it cuts between x = S i and S i+1 .Furthermore, it solves breakpoint x if r cuts x and x is an adjacency in r(S).It solves a cycle if it solves all breakpoints in it.We write d b (S, T ) for the number of breakpoints of C(S, T ).A k-cut rearrangement is efficient if it solves k breakpoints.A pair (S, T ) is k-efficiently sortable if there exists a sequence of efficient k-cut rearrangements transforming S into T .The following is a trivial generalization of a well-known lower bound for the transposition distance.
Proposition 1.A k-cut rearrangement may not solve more than k breakpoints, so S needs at least d b (S,T ) k k-cut rearrangements to be transformed into T .Furthermore, the bound is reached if and only if (S, T ) is k-efficiently sortable.
Proposition 2. If r solves a breakpoint, it cuts the next breakpoint in the cycle graph.
Proof.Let x → y be an arc of the cycle graph, and let x be the successor of x in T as well as the successor of y in S. If r solves x, then r joins a block ending in x with a block starting in x , so x is the first element of some block of r.Thus, y is the last element of some block of r, and r cuts the breakpoint y in C(S, T ).Proposition 3. If r is efficient, it solves a cycle iff it solves any breakpoint in it.Furthermore r solves all breakpoints in a union of cycles of total size k.
Proof.If r is efficient, then it solves all breakpoints that it cuts (since it may not solve a breakpoint without cutting them, and it solves and cuts k breakpoints).By Proposition 2, if r solves a breakpoint in a cycle, then it must solve all subsequent arcs in the same cycle.Hence, r either solves all breakpoints of a cycle or none at all.The size constraint follows from the fact that all cycles are disjoint.
Cycle C 1 is tied to another cycle C 2 through the pair of breakpoints (x, y) if x is in C 1 , y is in C 2 , the permutation S has S i = y and S i+1 = x for some i, and T has T j = x and T j j + 1 = y for some y.A breakpoint is without ties if no cycle is tied to the cycle containing it.
Proof.Let r be an efficient rearrangement solving C 1 and, in particular, x.Then r must place y after x in r(S), although y is before x in S, so r must have a cut somewhere between y and x, i.e. just after y.So r cuts breakpoint y, and solves cycle C 2 .
One-cycle Extensions.Let (S, T ) be a pair of permutations.Let x be a vertex of C(S, T ) with the following properties (we say that x is safe): x is either an adjacency or a breakpoint without ties in a cycle of length k x ≥ 3, and all 2-cycles in C(S, T ) are tied.The p-extension of (S, T ) on x, with p ∈ {2, 3}, denoted φ p x (S, T ) is the pair (S , T ) such that: Lemma 1.A p-extension on x has the following effects on the cycle graph: -If x is an adjacency, it adds p trivial cycles.
-If x is a breakpoint and p = 2, it adds n+1 and n+2 to the cycle containing x.
-If x is a breakpoint and p = 3, it adds n + 2 to the cycle containing x and a 2-cycle (n + 1, n + 3) tied to the one containing x.
Other arcs and tied cycles are unchanged.
Proof.If x is an adjacency, the p-extension inserts elements n+1 to n+p in both strings in the same order after x, and they are followed by the same element in both strings since x is an adjacency, so only trivial cycles are added.Assume now that x is a breakpoint.Consider first an arc T j → S i with T j = x in C(S, T ).Since no element is inserted after T j or S i , T j → S i also appears in C(S , T ) (the case i = j = n is particular, as n + 1 is explicitly introduced in both sequences, but it also yields the arc T j → S j in C(S , T )).If a cycle is tied to another one through a pair (x, y) in S and (y, x) in T , these factors cannot be broken by the p-extension (since x is safe, no cycle can be tied to C x (S, T )), so it is still tied after the extension.Similarly, a non-tied cycle cannot become tied because of the extension.
It remains to describe arcs going out from {x, n + 1, . . ., n + p}.Let y be the head of the outgoing arc from x.
For j such that T j = x, we have T j+1 = n + p = S n+p , so there exists an arc x → S n+p−1 = n + p − 1 in C(S , T ) (note in particular that y no longer has its incoming arc x → y).
For j such that T j = n + 1, we have T j+1 = n + p + 1 = S n+p+1 , so there exists an arc n + 1 → S n+p = n + p in C(S , T ).
At this point, the out-going arcs for all vertices except n + 2 have been described, as well as in-coming arcs for all vertices except y, so the last remaining arc is n + 2 → y.
Overall, for p = 2, arc x → y is replaced with the path x → n+1 → n+2 → y.For p = 3, arc x → y is replaced with x → n + 2 → y and a 2-cycle n + 1 ↔ n + 3 is created.Note that this 2-cycle is tied to C x (S , T ) through (n + 2, n + 3).
We now show how efficient rearrangements can be adapted through extensions.Let r be a k-cut rearrangement of (S, T ).We write r = ψ p x (r) for the k -cut rearrangement of (S , T ) = φ p x (S, T ) defined as follows: -If r does not cut x, then k = k, r cuts the same elements as r, and rearranges the blocks in the same order.-If r cuts x, then k = k + p, r cuts the same elements as r as well as n + 1, n + 2 and n + 3 (when p = 3), and rearranges the blocks in the same way as r, with elements n + 3 (when p = 3) and n + 2 inserted after x.
The following two lemmas show how efficient rearrangements of (S, T ) and those of φ x (S, T ) are related through ψ p x .
Lemma 2. If r is an efficient k-cut rearrangement of (S, T ), then r = ψ x (r) is an efficient k -rearrangement of (S , T ) = φ p x (S, T ).Furthermore r (S , T ) = φ p x (r(S, T )).
Proof.If r does not cut x, then k = k and r solves in (S , T ) exactly the same breakpoints as r, so it is efficient.Furthermore, all elements in r (S , T ) and φ p x (r(S, T )) are in the same order as in r(S, T ), except for n + 1, . . ., n + p which are inserted, in both case, at the end of S and in T as in T (since r and r do not edit the second string).
If r cuts x, r furthermore solves breakpoints n + 1, . . ., n + p, since it rearranges these elements in the same order as in T .So it is an efficient rearrangement as well.Finally, all elements in r (S , T ) and φ p x (r(S, T )) are in the same order as in r(S, T ), except for n + p, . . ., n + 2 (which are inserted after x in both strings) and n + 1 (which is inserted as a last element).Lemma 3. If r is an efficient k -rearrangement of (S , T ) = φ p x (S, T ) with k ∈ {k x , k x + p}, then there exists an efficient k-cut rearrangement r of (S, T ) such that r = ψ x (r), where k = k − p = k x if r cuts x and k = k otherwise.Furthermore, r (S , T ) = φ p x (r(S, T )).
Proof.We build r from r using the converse operations of Lemma 2: mimicking the cuts and reordering of r , but ignoring cuts after n + 1, . . ., n + p if r cuts x.
The relation between k and k and the efficiency of r follow from the fact that r solves either all of x, n + 1, . . .n + p, or none at all,as proven in the claim below.The 'furthermore' part follows from Lemma 2, applied to r.
Proof.For p = 2, this is a direct application of Proposition 3 since elements x, n + 1 and n + 2 are in the same cycle of C(S , T ).For p = 3, by Lemma 1, C x = C x (S , T ) is a (k x + 1)-cycle containing x and n+2, and C(S , T ) also contains a cycle denoted C y with elements n+1 and n+3.By Proposition 3, r solves any element in C x (resp.C y ) iff it solves all elements in the same cycle (in particular, k ≥ k x + 1 if r cuts x, so k = k + p).Furthermore C y is tied to C x , so if r solves C y it must also solve C x ( by Proposition 4).It remains to check the last direction: if r solves C x , then it also solves C y .Indeed, C x is a k x + 1-cycle and r solves a total of k x + 3 breakpoints, so it must also solve some 2-cycle C y .Aiming at a contradiction, assume that C y = C y .Then C y is already a 2-cycle of C(S, T ), and it is tied to some other cycle C x (both in C(S, T ) and C(S , T )), so r also solves C x .Since C x may not be equal to C x (x was chosen without ties), r solves at least Extending all cycles.We use the natural order over integers as an arbitrary total order over the nodes.The representative of a cycle is its minimum node.We assume (S, T ) to be k-cyclic for some k.A sample for (S, T ), where (S, T ) is k-cyclic is a list X containing the representative from each k-cycle, and an arbitrary number of adjacencies.The p-extensions of (S, T ) for sample X = (x 1 , . . ., x ) and of a rearrangement r of (S, T ) are, respectively, Φ p X (S, T ) = φ p x (. . .φ p x2 (φ p x1 (S, T )) . ..) and Ψ p X (r) = ψ p x (. . .ψ p x2 (ψ p x1 (r)) . ..).
Lemma 6.For any odd k ≥ 5, deciding whether a k-cyclic pair (S, T ) is kefficiently sortable is NP-hard.For any even k ≥ 6, deciding whether a pair (S, T ) is k-efficiently sortable is NP-hard.
Proof.By induction on k.Deciding if a 3-cyclic pair (S, T ) is efficiently sortable is NP-hard (cf.[3], where it is shown that deciding if a permutation can be sorted with d b (S,T ) 3 transpositions is NP-hard).For any k ≥ 5, take p = 2 if k is odd and p = 3 otherwise, and consider a (k − p)-cyclic instance (S, T ) and a sample X for (S, T ) (note that one always exists): (S, T ) is (k − p)-efficiently sortable iff Φ p X (S, T ) is k-efficiently sortable by Lemma 5, and Φ p X (S, T ) is k-cyclic for p = 2 by Proposition 5.This gives a polynomial reduction proving hardness for k (even when restricted to k-cyclic permutations when k is odd).
Theorem 7 is a corollary of Lemma 6, since a k-cyclic pair (S, T ) is kefficiently iff S can be rearranged into T with no more than d b (S,T ) k k-cut rearrangements (Proposition 1).
Let Opt-SMCR be the optimisation version of SMCR, where we look for the smallest that is necessary to obtain T from S by k-cut rearrangements.
Proof.Let I = (S, T, k) be an instance of Opt-SMCR.We first rewrite S and T into S and T in such a way that T = id n .Let k = k 2 .The algorithm consists in iterating the following three steps, starting from S : (a) rewrite S by contracting adjacencies so as to obtain a permutation without fixed point, (b) cut around (i.e., right before and right after) the first k elements 1, 2, 3 . . .k of that permutation, and (c) rearrange it so as to obtain id k followed by the rest of the permutation.Steps (b) and (c) above actually correspond to the case where k is even.If k is odd, (b) and (c) are slightly modified, since we are left with an unused cut: (b') do as (b) and additionally cut to the left of k + 1, (c') do as (c) but rearrange in such a way that k and k + 1 are consecutive.
Clearly, the optimal value for Opt-SMCR satisfies ≥ b(S,T ) k .Our algorithm removes at least k (at least k + 1) breakpoints at each iteration when k is even (when k is odd), and thus requires ≤ b(S,T ) k ( ≤ b(S,T ) k +1 ) many k-cut rearrangements.Altogether we have ≤ k k if k is even and 2 , we conclude that ≤ 2 .

Conclusion
We introduced Sorting by Multi-Cut Rearrangements, a generalization of usual genome rearrangement problems that do not incorporate reversals.We discussed its classical computational complexity (P vs. NP-hard) and its membership in FPT with respect to the parameters and k.For this, we distinguished the case where S (and thus T ) is a permutation from the case where it is a string.
The obvious remaining open problems are the ones indicated with a question mark in Tables 1 and 2, namely (a) the FPT status of SMCR with respect to parameter + k in strings, and (b) the computational complexity for constant and k part of the input in permutations.Extensions or variants of SMCR could also be studied, notably the one allowing reversals (and thus applicable to signed strings/permutations), or the one where T is the lexicographically ordered string derived from S. Finally, would also be interesting to better understand the comparative roles of and k in SMCR, for instance by studying the following question: assuming k is increased by some constant c, what impact does it have on the optimal distance?
GF was partially supported by the PHC Procope program 17746PC GJ was partially supported by the PHC Procope program 17746PC CK was partially supported by the DAAD Procope program 57317050

Theorem 4 .
When = 1, SMCR is FPT with respect to parameter k.Proof.Assuming S = T , let A (resp.B) be the length of the longest common prefix (resp.suffix) of S and T .For 0 ≤ a ≤ A and 0 ≤ b ≤ B, let S a,b , T a,b be the strings obtained from S, T by removing the first a and last b characters.Then T can be obtained from S by one k-cut rearrangement if and only if, for some pair (a, b), S a,b and T a,b admit a common string partition into k − 1 blocks.Indeed, this is easy to verify by matching the limits of the blocks in MCSP (including at the end of the strings) with the cuts of the rearrangement.So SMCR when = 1 can be solved using O(n 2 ) calls to MCSP with parameter k − 1, each with a different pair (a, b), which itself is FPT for k[4].
FPT (Thm 5) W[1]-hard (Thm 6) part of the input NP-hard[3] NP-hard (Thm 7) * Note that it is not sufficient to check only with the longest common prefix and suffix (i.e. S A,B and T A,B ), as can be seen in the following example, where S can be transformed into T via one 3-cut rearrangement, A = B = 2, but only S 1,1 and T 1,1 have a common partition into 2 blocks: S = a acb adb b and T = a adb acb b.

Table 1 .
Summary of the results for Sorting by Multi-Cut Rearrangements in strings.d is the maximum number of occurrences of a character in the input string S.

Table 2 .
Summary of the results for Sorting by Multi-Cut Rearrangements in permutations.* existence of a 2-approximation algorithm for Opt-SMCR (Thm 8).