Nonasymptotic Upper Bounds on Binary Single Deletion Codes via Mixed Integer Linear Programming

The size of the largest binary single deletion code has been unknown for more than 50 years. It is known that Varshamov–Tenengolts (VT) code is an optimum single deletion code for block length n≤10; however, only a few upper bounds of the size of single deletion code are proposed for larger n. We provide improved upper bounds using Mixed Integer Linear Programming (MILP) relaxation technique. Especially, we show the size of single deletion code is smaller than or equal to 173 when the block length n is 11. In the second half of the paper, we propose a conjecture that is equivalent to the long-lasting conjecture that “VT code is optimum for all n”. This equivalent formulation of the conjecture contains small sub-problems that can be numerically verified. We provide numerical results that support the conjecture.


Introduction
A deletion channel is one of the most important channels in the history of communication. The channel has a deletion error where the symbol is being removed without knowing the position of it. Unlike many other channels where the positions of symbols remain the same, the decoder needs to specify the position of each symbol, which is called a synchronization issue. Mainly due to this issue, the deletion channel is surprisingly hard to analyze.
There are several different mathematical problem formulations of the deletion channel. One natural way is the probabilistic approach where the deletion occurs in i.i.d. manner with probability p. Kanoria and Montanari provided an approximation of the channel capacity of binary deletion channel when p → 0 [1]. However, the channel capacity is unknown in this setting even in binary case.
Alternatively, we can define the problem in algebraic way. We assume that there will be at most k deletion errors while transmitting n number of symbols. The question is the maximum size of the deletion code that can correct any k deletion errors. There is a nice survey paper by Sloane [2], and it is easy to see that the problem is extremely challenging even in single deletion case.
Although the problem is still open, there has been some progress in this algebraic setting. In binary case, Varshamov and Tenengolts proposed a simple code (VT-code) construction that corrects any single deletion error [3]. It is asymptotically optimum when n grows while the number of deletions k = 1 is fixed. VT-code is known to be optimum when n ≤ 10 and conjectured that it is optimum for all n. Tenengolts generalized VT-code to a non-binary version [4]. Gabrys and Sala proposed a code that can correct two deletions [5]. Sima and Bruck also generalized VT-code that can correct k deletions [6]. Both results [5,6] show n − log |C| = O(k log n) where C is the deletion code. This implies Definition 1. For a positive integer n, a set of binary vectors C ⊂ X n is a single deletion code if B D (c 1 ) ∩ B D (c 2 ) = ∅ for all c 1 = c 2 in C.
The definition implies that the single deletion code can always correct a single deletion error. The following lemma shows that the single deletion code can also correct a single insertion error.
The above lemma is simply from the fact that B I (

Varshamov-Tenengolts Codes
Let v n : X n → Z be a function that computes the "VT-weights" of an n-dimensional binary vector.
Note that we do not take any modulo operations, and therefore v n (x n ) can take value from 0 to which is a single deletion code [7,15]. Levenshtein showed that VT a (n) is perfect for all 0 ≤ a ≤ n [16]. In other words, For any a, we have |VT 0 (n)| ≥ |VT a (n)| ≥ |VT 1 (n)| where the first inequality is from Varshamov [3] and the second inequality is from Ginzburg [17]. Thus, the size of the optimum single deletion code is lower bounded by |VT 0 (n)|. An analytic formula |VT 0 (n)| is given in [2]: Borchers showed that VT 0 (n) is optimum single deletion code for n ≤ 10 [14]. The optimality of VT-code is still open for n ≥ 11. In the case of n = 11, the size of VT-code is |VT 0 (n)| = 172 but the best known upper bound of the largest single deletion code is |C| ≤ 174.

Maximum Independent Set Approach
Consider a graph where all binary vectors are nodes. There exists an edge between two nodes x n andx n if and only if B D (x n ) ∩ B D (x n ) = ∅. Then, the optimum single deletion code corresponds to the maximum independent set. Note that the Maximum Independent Set (MIS) problem is NP-complete.
There are reduction rules in graph theory that provide an equivalent graph problem while reducing the size of the graph. Isolated vertex removal technique is useful in our case. An isolated vertex is a node for which its neighborhood forms a clique. For example, in our case, the neighborhood set of node 0 n is {100 · · · 00, 010 · · · 00, . . . , 00 · · · 01}, and {00 · · · 00, 100 · · · 00, 010 · · · 00, . . . , 00 · · · 01} forms a clique. This implies that the node 0 n is an isolated vertex. Butenko et al. showed that there exists a maximum independent set that contains all isolated vertices [12]. Thus, there exists an optimum single deletion code that contains both 0 n and 1 n .
Note that it is still infeasible to solve our MIS problem with state of the art algorithms [18,19] when n ≥ 11. Segundo et al. proposed a variation of BBMC [20] and found the maximum independent set of the graph induced from two deletion (k = 2) channel [14]. However, the graph induced from single deletion channel (k = 1) is more challenging to find the maximum independent set.

LP Relaxation
For simplicity, we define several functions and new notations. Let N = 2 n , and [N − 1] = {0, 1, . . . , N − 1} be the set that contains all nonnegative integers smaller than or equal to N − 1. We further let b n : [N − 1] → {0, 1} n be the function that converts the decimal number to the binary vector (e.g., b 4 (3) = 0011). Note that we drop n if it is clear from the context, i.e., b ≡ b n .
In the above section, we define a graph where there exists an edge between x n andx n if their deletion balls share an element. Instead, we define an equivalent graph where the set of nodes are and only if there is an edge between b(i) and b(j).
Our goal is to find an independent set U ⊂ V that has maximum number of elements. For 0 ≤ i ≤ N − 1, let X = (X 0 , X 1 , . . . , X N−1 ) be binary variables where X i = 1 if i ∈ U and X i = 0 if i ∈ U . If there exists an edge between i and j, then the independent set U cannot contain both i and j. Thus, the following Integer Programming (IP) problem is equivalent to the maximum independent set problem.
Clearly, it is an NP-hard problem, which is extremely challenging to solve. Instead, we can relax it to an easier problem, and bounding the solution of the IP. One way of doing it is classical Linear Programming (LP) relaxation, which is given by LP relaxation allows the variable X i to take a value between 0 and 1, and the solution of relaxed problem provides an upper bound of the original IP. However, this gives a trivial solution in our case, which is X i = 1/2 for all i ∈ {0, 1, . . . N − 1}. The maximum value of the objective function is 2 n−1 , and it is much larger than the known upper bound 2 n −2 n−1 [9]. Kulkarni et al. proposed another LP relaxation [9]. The idea is that the independent set can contain at most one node from each clique. For any (n − 1)-dimensional vector y n−1 , the set of ) for all i, j ∈ Q I (y n−1 ). Thus, the independent set U can take at most one node from Q I (y n−1 ), and we have clique constraints ∑ i∈Q I (y n−1 ) X i ≤ 1. This implies another LP relaxation which provides a tighter upper bound of the original IP problem.
On the other hand, Lovász proposed an SDP-relaxation of the Maximum Independent Set (MIS) problem [11]. The solution of SDP problem is called Lovász theta number, which provides a tighter bound of MIS problem.
The following table presents the upper bound from the above relaxations as well as |VT 0 (n)| which is a lower bound. Note that the complexity of LP relaxed problem is low, and we can get bounds for n ≥ 13 as well. For example, the size of maximum deletion code is smaller than or equal to 593 when n = 13. However, due to the complexity issue, we are not able to compute Lovász theta number for n ≥ 12. For example, it took more than 24 h on our machine when n = 12.

Mixed Integer Linear Programming
LP is faster than the original IP problem; however it is hard to parallelize. On the other hand, IP inherently uses branch-and-bound technique which can be parallelized, but still intractable with current multi-thread processors. In this section, we propose a Mixed Integer Linear Programming (MILP) problem, which is in between LP problem and the original IP problem.

Main Results
In the LP relaxation, all variables are relaxed as described in Section 2.4. Instead, we relax specific variables only, which provides a semi-relaxed optimization problem. More precisely, we design S ⊂ V and keep X i to be binary variable for i ∈ S while other variables are relaxed as in LP relaxation. This provides a Mixed Integer Programming (MIP) problem where variables are either integer or real numbers: The last two constraints are because there exists a maximum independent set that contains both 0 n and 1 n from Section 2.3.
MIP is generally as computationally demanding as IP problems. However, since all constraints are linear, the above optimization problem is a Mixed Integer Linear Programming (MILP) problem. MILP can be solved in reasonable amount of time if we carefully design the set S. Let MILP n (S) denote the above MILP problem.
If S = ∅, then MILP n (S) is equivalent to LP. On the other extreme, if S = V, then MILP n (S) is equivalent to the original IP. If S is nontrivial subset of X n , then the solution of MILP n (S) provides a tighter upper bound of maximum independent set problem while having low complexity.
Clearly, we prefer smaller S because of complexity. Thus, the goal is designing S in smart way. The main idea is increasing the size of S in greedy manner. We start from fully relaxed LP problem, i.e., S = ∅, and add elements one by one under certain criterion.
More precisely, we solve MILP n (S) in each iteration and add a node i to S based on the following rule. Let d : [N − 1] → {0, 1, . . . , 2 n−1 } be the function which indicates the number of clique constraints that contains i. Since i ∈ Q I (y n−1 ) if and only if y n−1 ∈ B D (b(i)), we have d(i) = |B D (b(i))|. Thus, the variable X i affects the d(i) number of clique constraints. Furthermore, if X i is large, it restricts other variables in clique constraints more. Thus, we measure the amount of "impact" of variable X i by d(i) × X i . Finally, the algorithm finds the node i that maximizes d(i) × X i and add it to S. This procedure is described in Algorithm 1.

Algorithm 1 Sequential MILP.
Input: target threshold τ Solve MILP n (S) and let T be the objective function value and X be the solution The above algorithm takes a target threshold τ as an input which can be a previously known upper bound of the original IP. In each iteration, it computes the objective function value T of MILP n (S). Whenever T is smaller than the target bound τ, then the algorithm halts and we get a new bound T of the size of maximum independent set. For example, suppose we let τ = |VT 0 (n)| + 1 and the above program halts with T < τ, then we have a new upper bound that the size of the maximum single deletion code is strictly smaller than |VT 0 (n)| + 1. In such case, we can claim that VT 0 (n) is the optimum single deletion code. On the other hand, suppose the program ends with |S| = N which means S = V. In such case, the return value T is the size of the maximum independent set because it is the objective function value of the original IP. Note that the size of S is increased by 1 in each iteration, and therefore there will be at most N iterations.
Note that the way of choosing i 0 = arg max i ∈S d(i) × X i is not an optimum way. However, it is an effective way, as shown in Section 3.2.

Experiments
We implemented Python code using PULP python package [21] with cbc solver [22]. Note that cbc solver supports multiple threads. For our experiments, we used a machine with AMD Threadripper 1950X processor and 64 GB of RAM. The operating system was Ubuntu 18.04 LTS.

Connection to Metaheuristics
Although MILP problem has lower complexity than the original IP problem, MILP often encounter the computational issue as well. This is because the most state-of-the-art MILP solvers such as CPLEX [23] are based on branch-and-bound techniques, and it often has exponentially large search space. Thus, the smartly fixing the variable to binary is necessary as we presented in the previous section.
Similar heuristic algorithms appear in various other computationally challenging (Mixed) Integer Programming problems, such as lot sizing problem [24,25] and connected facility location problem [26]. This is commonly referred to as hybrid metaheuristics. For example, Wilbaut and Hanafi proposed iterative idea to solve MIP problems [27]. The authors applied this idea to IP problems such as knapsack [28]. The idea is iteratively solving the LP relaxed problem to get an upper bound and reduced problem with fixing variables to get a lower bound until the lower and upper bounds match. In this paper, we do not fix the value of variable, but remove the relaxed constraints (so that some variables remain binary). For other works in metaheuristics, we refer the interested reader to the nice survey paper by Blum et al. [29].

Equivalent Conjecture
The above semi-relaxation provides an improved upper bound; however, the running time is still an issue. In this section, we provide smaller optimization problems that can provide insights for the optimality of VT 0 (n) code.

VT-Sum Based Partition
For 0 ≤ i ≤ n(n+1) 2 , let S n,i ⊂ X n be the set of binary vectors whose VT-weights are i.
} is a partition of X n , and {S n,i : 0 ≤ i ≤ n(n+1) 2 , i ≡ a (mod n + 1)} is a partition of VT a (n).
The following lemma provides useful properties of S n,i .

Proof.
1. There exists a one-to-one correspondence between S n,i and S n, n(n+1) Thus, the third property is a direct consequence of the second property.

Remark 1.
If we view the original problem as a maximum independent set problem, the above partition S n,0 , . . . , S n, n(n+1) 2 can be useful since: • There are no internal edges in S n,i for all i.

•
There are no edges between S n,i and S n,j if |i − j| ≥ n + 1.
This is not exactly a "partite" graph but has a similar flavor of it, and there might be an efficient way of finding a maximum independent set.

Equivalent Conjecture
For a given single deletion code C, we also define a similar partition of C. For 0 ≤ i ≤ n(n+1) 2 , we let C i = C ∩ S n,i = {x n ∈ C : v n (x n ) = i}. Then, we are ready to state our first lemma which is a building block of the main conjecture. Lemma 3. Let n be a positive integer, and C ⊂ X n be a single deletion code. If there exists an integer 0 ≤ k ≤ n 2 such that then VT 0 (n) is not an optimum single deletion code.
By the first property of Lemma 2, we have |VT 0 (n)| = |VTn+1 2 (n)| for odd n. Thus, we have the following lemma as well, which is essentially the same as Lemma 3.

Lemma 4.
Let n be an odd positive integer, and C ⊂ X n be a single deletion code. If there exists an integer 0 ≤ k ≤ n−1 2 such that then VT 0 (n) is not an optimum single deletion code.
The following theorem tells us that the above conjecture is equivalent to the original conjecture that "VT-code is optimum". Theorem 1. VT 0 (n) code is an optimum single deletion code if and only if Conjecture 1 holds.

Proof.
Note that the "only if" part (for both even and odd n) directly comes from Lemmas 3 and 4.

Special Case of k = 1
Theorem 1 implies that if we can find any code that satisfies the inequality in Equation (1) or the inequality in Equation (2), then VT code is not optimum. Thus, the plausible strategy to disprove the conjecture is finding a counterexample for Conjecture 1. However, in this section, we show the inequality in Equation (1) is true for all n when k = 1.
We define two functions that are useful in the remaining sections. The following lemma is for the definition of the first function. Proof. It is well-known that the size of B I (x n ) is n + 2 for all x n [30]. Clearly, a single insertion can only increase a VT-sum, in other words, for y n+1 ∈ B I (x n ), v n+1 (y n+1 ) ≥ v n (x n ).
On the other hand, a single insertion can increase a VT-sum by at most n + 1 by adding "1" to the last position. More precisely, for y n+1 ∈ B I (x n ), we have v n+1 (y n+1 ) ≤ v n (x n ) + n + 1 = v n+1 (x n 1).
First, we define a map g which maps x n to y n+1 where v n+1 (y n+1 ) ≡ 0 mod (n + 1) and y n+1 ∈ B I (x n ). More precisely, we have g : X n → i≥0 S n+1,(n+1)i , where g(x n ) ∈ B I (x n ).
Since v n+1 (B I (x n )) = {v n (x n ), . . . , v n (x n ) + n + 1} consists of n + 2 consecutive numbers, we can determine g(x n ) uniquely when x n ∈ VT 0 (n). On the other hand, if x n ∈ VT 0 (n), both x n 0 and x n 1 are possible candidates for g(x n ). In this case, we set g(x n ) = x n 0.
Define another function h : X n+1 → X n , where h(x n+1 ) = x n . The function h simply deletes the last bit. If we combine two functions, we get f (x n ) = h(g(x n )). Then, the function f satisfies the following properties.
This concludes the proof.
Then, we have a theorem that supports Conjecture 1.

Theorem 2.
For positive integer n, any single deletion code C satisfies the following inequality.
Proof. Let C = n+1 i=0 C i . From the above lemma, we have v n ( f (x n )) ≤ v n (x n ) + n ≤ 2n + 1 for all x n ∈ C . Since f (x n ) ∈ VT 0 (n), it is clear that v n ( f (x n )) should be either 0 or n + 1. In other words, Suppose there exists x n ,x n ∈ C such that f (x n ) = f (x n ) = y n , then v n (y n ) is either 0 or n + 1. First, if v n (y n ) = 0, then y n = 0 n . In this case, the weights (number of ones) of x n andx n are at most one, i.e., w(x n ), w(x n ) ≤ 1. This implies that 0 n−1 ∈ B D (x n ) ∩ B D (x n ) which is a contradiction. On the other hand, consider the case where v n (y n ) = n + 1. Since g(x n ), g(x n ) ≤ 2n + 1 and v n+1 (y n 1), we have g(x n ) = g(x n ) = y n 0. This implies that B I (x n ) ∩ B I (x n ) has at least one element, and therefore B D (x n ) ∩ B D (x n ) = ∅. This is a contradiction.

Integer Programming
As mentioned above, if we can find a single deletion code that satisfies the inequality in Equation (1) or the inequality in Equation (2), that immediately disproves the optimality of VT code. Since the size of the problem is relatively smaller when k is small, we can numerically solve the Integer Programming (IP) problem without relaxation.
For example, we can check whether the inequality in Equation (1) holds or not for some fixed n and k, using the following optimization problem. ∑ i∈Q I (y n−1 ) X i ≤ 1, for y n−1 ∈ X n−1 X i ∈ {0, 1}, i = 0, 1, . . . , N − 1.
For some n and k, if the maximum value of the objective function is smaller than or equal to M n,k ∆ = |S n,0 | + |S n,n+1 | + . . . + |S n,(n+1)k |, then it means that no single deletion code satisfies the inequality in Equation (1).
Note that the number of variables are |{x n ∈ X n : v n (x n ) ≤ (n + 1)k}|, which is strictly smaller than 2 n . We solve the above optimization numerically for various n and k. For all combinations that we tried, the maximum value of the objective function is M n,k . The following table shows all combinations of (n, k) that we were able to check in a reasonable amount of time. Entries without check mark are combinations that we could not check due to the running time. Similarly, we numerically verify the inequality in Equation (2) from Lemma 4. For all combinations of (n, k) that we tried, we were not able to find any counterexample. The following table shows all combinations that we were able to check.

Conclusions
We investigated the maximum size of binary single deletion code. Tighter upper bounds are provided using Mixed Integer Linear Programming relaxation. In the case of n = 11, we showed that the size of the largest single deletion code is smaller than or equal to 173. This implies that the largest single deletion code is size of either 172 or 173. In addition, we showed a conjecture that is equivalent to the optimality of VT code. Numerical results are proposed that support the conjecture that VT code is an optimum single deletion code.
One possible direction for future work is semi-relaxed Semidefinite Programming problem. Since Lovász number (which is based on Semidefinite Programming) provides a better bound than LP, we can propose the Mixed Integer Semidefinite Programming as we semi-relaxed the LP problem. Solvers for Mixed Integer Semidefinite Programming are not as popular as solvers for MILP except a few initial works [31]. However, we think it can verify the optimality of VT code in the case of n = 11.