Reconstruction of Multiple Strings of Constant Weight from Prefix–Suffix Compositions

Yang, Yaoyu; Chen, Zitan

doi:10.3390/e27010039

Open AccessArticle

Reconstruction of Multiple Strings of Constant Weight from Prefix–Suffix Compositions^†

by

Yaoyu Yang

¹

and

Zitan Chen

^2,*

¹

School of Data Science, The Chinese University of Hong Kong, Shenzhen 518172, China

²

School of Science and Engineering, Future Networks of Intelligence Institute, The Chinese University of Hong Kong, Shenzhen 518172, China

^*

Author to whom correspondence should be addressed.

^†

This paper was presented in part at the 2024 IEEE International Symposium on Information Theory (ISIT 2024).

Entropy 2025, 27(1), 39; https://doi.org/10.3390/e27010039

Submission received: 15 November 2024 / Revised: 30 December 2024 / Accepted: 3 January 2025 / Published: 6 January 2025

(This article belongs to the Special Issue Coding and Algorithms for DNA-Based Data Storage Systems)

Download

Browse Figures

Versions Notes

Abstract

Motivated by studies of data retrieval in polymer-based storage systems, we consider the problem of reconstructing a multiset of binary strings that have the same length and the same weight from the compositions of their prefixes and suffixes of every possible length. We provide necessary and sufficient conditions for which unique reconstruction up to the reversal of the strings is possible. Additionally, we present two algorithms for reconstructing strings from the compositions of prefixes and suffixes of constant-length constant-weight strings.

Keywords:

string reconstruction; constant-weight strings; integer partition; DNA and polymer-based storage; prefix-suffix compositions

1. Introduction

The growing demand for archival data storage calls for innovative solutions to store information beyond traditional methods that rely on magnetic tapes or hard disk drives. Recent advancement in macromolecule synthesis and sequencing suggests that polymers, such as DNA, are promising media for future archival data storage, largely attributed to their high storage density and durability. Data retrieval in polymer-based storage systems depends on macromolecule sequencing technologies [1,2] to read out the information stored in the polymers. However, common sequencing technologies often only read random fragments of polymers. Thus, the task of data retrieval in these systems has to be based on the information provided by the fragments.

Under proper assumptions, one may represent polymers by binary strings and turn the problem of data retrieval into the problem of string reconstruction from substring compositions, i.e., from the number of zeros and the number of ones in substrings of every possible length. In [3], the authors characterized the length for which strings can be uniquely reconstructed from their substring compositions up to reversal. Extending the work of [3], the authors of [4,5] studied the problem of string reconstruction from erroneous substring compositions. Specifically, ref. [4] designed coding schemes capable of reconstructing strings in the presence of substitution errors, and [5] further proposed codes that can deal with insertion and deletion errors. Observing that it may not be realistic to assume that the compositions of all substrings are available, the authors of [6] initiated the study of string reconstruction based on the compositions of prefixes and suffixes of all possible lengths. In fact, ref. [6] considered the more general problem of reconstructing multiple distinct strings of the same length simultaneously from the compositions of their prefixes and suffixes. The main result of [6] reveals that for the reconstruction of no more than h distinct strings of the same length, there exists a code with a rate approaching

1 / h

asymptotically. Following [6], the authors of [7] studied in depth the problem of reconstructing a single string from the compositions of its prefixes and suffixes. In particular, their work completely characterized the strings that can be reconstructed from the unique prefix and suffix compositions up to reversal.

The efficiency of data retrieval is a major concern for practical polymer-based storage systems, and thus, low-complexity algorithms for string reconstruction are of great interest. In the case of reconstruction from error-free substring compositions, ref. [3] described a backtracking algorithm for binary strings of length n with worst-case time complexity exponential in

\sqrt{n}

. Moreover, refs. [4,8] constructed sets of binary strings that can be uniquely reconstructed with a time complexity polynomial in n. In the case of reconstruction from error-free compositions of prefixes and suffixes, refs. [6,7] presented sets of binary strings that can be efficiently reconstructed. For reconstruction in the presence of substitution composition errors, ref. [4] showed that when the number of errors is a constant independent of n, there exist coding schemes with decoding complexity polynomials in n.

We note that string reconstruction is a classic problem [9,10,11] and has been studied under various settings, including reconstruction from substrings [9,12,13] and from subsequences [10,11,14,15,16,17] under either combinatorial or probabilistic assumptions.

In this paper, we consider the problem of reconstructing h strings that are not necessarily distinct but have the same length

n \geq 1

and weight

\bar{w} \leq n

from their error-free compositions of prefixes and suffixes of all possible lengths. The problem of reconstructing multiple strings from prefix–suffix compositions becomes more amenable to analysis if the strings are of constant weight. This is because nice properties due to symmetry can be tethered to the prefix–suffix compositions. It is worth mentioning that the work of [18] studied the largest possible set of constant-weight binary

B_{2}

-sequences, i.e., the set of constant-weight binary strings with the property that the real-valued sums of all distinct pairs of strings are different. Such sequences, albeit without the constraint of being constant weight, were used in [6] to ensure unique reconstructions of strings based on their prefix and suffix compositions.

Our first result is a characterization of the properties of constant-weight strings that enable unique reconstructions up to reversal, expanding our earlier work [19]. Additionally, we present two algorithms that reconstruct constant-weight strings from prefix–suffix compositions. Given prefix–suffix compositions as input, one of the algorithms can efficiently output a multiset of strings whose prefix–suffix compositions are the same as the input, and the other is able to output all multisets of strings up to reversal that are allowed by the input. Our analysis relies on the running weight information of the strings that can be extracted from the prefix–suffix compositions and the inherent symmetry of constant-weight strings and their reversals.

The rest of this paper is organized as follows. In Section 2, we present the problem statement and introduce necessary notation and preliminaries that are helpful for later sections. In particular, we introduce the notion of cumulative weight functions that capture the running weight information of a multiset of strings, which is used throughout the paper. In Section 3, we derive the necessary and sufficient conditions for unique reconstruction. Section 4 is devoted to the reconstruction algorithms. We conclude this paper and mention a few open problems in Section 5.

2. Notation and Preliminaries

Let n be a positive integer. Denote

[n] = {1, 2, \dots, n}

and

〚 n 〛 = {0, 1, \dots, n}

. For integers

n_{1}, n_{2}

, define

[n_{1}, n_{2}] = {n_{1}, n_{1} + 1, \dots, n_{2}}

if

n_{1} \leq n_{2}

and

[n_{1}, n_{2}] = \emptyset

if

n_{1} > n_{2}

. Let

t = t_{1} t_{2} \dots t_{n} \in {0, 1}^{n}

be a binary string of length n, and the reversal of

t

is denoted by

\overset{\leftarrow}{t} = t_{n} t_{n - 1} \dots t_{1}

. The weight of

t

is the number of ones in

t

, denoted by

wt (t)

. The composition of

t

is formed by the number of zeros and the number of ones in

t

. More precisely, the ordered pair

(n - wt (t), wt (t))

is called the composition of

t

. For

1 \leq l \leq n

, the length-l prefix and the length-l suffix of

t

are denoted by

t [l]

and

t [- l]

, respectively. We will use “∪” to denote both the set union and the multiset union. The exact meaning of “∪” will be clear from the context.

Definition 1.

The set of compositions of all prefixes of a string

t \in {0, 1}^{n}

is called the prefix compositions of

t

, denoted by

M_{p} (t)

. More precisely,

\begin{matrix} M_{p} (t) = {(j - wt (t [j]), wt (t [j])) ∣ 1 \leq j \leq n} . \end{matrix}

The suffix compositions of

t

are similarly defined to be

\begin{matrix} M_{s} (t) = {(j - wt (t [- j]), wt (t [- j])) ∣ 1 \leq j \leq n} . \end{matrix}

The prefix–suffix compositions of

t

are defined to be the multiset union of

M_{p} (t)

and

M_{s} (t)

, denoted by

M (t)

. Let U be a multiset of binary strings. Define

M (U)

to be the multiset union of

M (t), t \in U

, i.e.,

\begin{matrix} M (U) = ⋃_{t \in U} M (t) . \end{matrix}

The multiset

M (U)

is called the prefix–suffix compositions of U.

Example 1.

Take

t = 110101

. The prefix compositions of

t

are

M_{p} (t) = {(0, 1), (0, 2), (1, 2), (1, 3), (2, 3), (2, 4)},

and the suffix compositions of

t

are

M_{s} (t) = {(0, 1), (1, 1), (1, 2), (2, 2), (2, 3), (2, 4)} .

Taking the multiset union of

M_{p} (t)

and

M_{s} (t)

, we get

M (t) = {(0, 1), (0, 1), (0, 2), (1, 1), (1, 2), (1, 2), (1, 3), (2, 2), (2, 3), (2, 3), (2, 4), (2, 4)} .

Consider the multiset

U = {110101, 110101, 101110}

. The prefix–suffix compositions

M (U)

can be found to be

{{(0, 1)}^{5}, {(1, 0)}^{1}, {(0, 2)}^{2}, {(1, 1)}^{4}, {(1, 2)}^{6}, {(1, 3)}^{4}, {(2, 2)}^{2}, {(1, 4)}^{1}, {(2, 3)}^{5}, {(2, 4)}^{6}},

where by

{(i, j)}^{t}

, we mean t compositions of the form

(i, j)

, namely, compositions of i zeros and j ones.

Note that different multisets may result in the same prefix–suffix compositions. For example, reversing a string in a multiset gives rise to a different multiset that has the same prefix–suffix compositions.

Definition 2.

Let U and V be two multisets of strings. The multiset V is said to be a reversal of U, denoted by

V \sim U

, if

| V | = | U |

, and for any string

t \in U

the sum of the multiplicities of

t

and

\overset{\leftarrow}{t}

in U equals the sum of the multiplicities of

t

and

\overset{\leftarrow}{t}

in V. The collection of multisets that are reversals of U forms an equivalent class, denoted by

[U]

, i.e.,

[U] : = {V ∣ V \sim U}

.

Given the prefix–suffix compositions of a multiset H of

h \geq 1

binary strings of length n, we are interested in finding the h strings in H. The only constraint for the strings in H at this point is the length, and later, we will restrict them to the same weight. We do not impose any other constraints on H; in particular, H is allowed to have a string and its reversal at the same time. In the sequel, we denote

M : = M (H)

for simplicity. Clearly, any reversal of H has prefix–suffix compositions M. However, there may exist multisets that are not reversals of H but have the same compositions as H. If a multiset has prefix–suffix compositions M, we say the multiset is compatible with M. Let

H : = {[U] ∣ M (U) = M}

be the collection of all equivalent classes whose members are compatible with M. We say that H can be uniquely reconstructed up to reversal if and only if

| H | = 1

.

As we will see later, it is helpful to present the information provided by each composition, i.e., the length and weight of the corresponding substring, on a two-dimensional grid. This motivates the following notation. Note that since

| H | = h

, there are

2 n h

compositions in M. Denote the grid by

T : = {(l, m) ∣ l \in 〚 n 〛, m \in [2 h]} .

Assume the strings in H are given by

h_{j}

,

j = 1, \dots, h

. Then, one can record

wt (h_{j} [l])

on the grid T with coordinates

(l, 2 j - 1)

and

wt (h_{j} [- l])

on grid T with coordinates

(l, 2 j)

. Therefore, the task of reconstructing H from M becomes appropriate in identifying the second coordinate of

(l, m) \in T

, i.e., the label of the string in H, based on the weights of the prefixes and suffixes. To this end, we define an integer-valued bivariate function on T, as stated below.

Definition 3.

A function

f : T \to 〚 n 〛

is called a cumulative weight function (CWF) if it satisfies the following conditions:

(i): $f (0, m) = 0$ for any $m \in [2 h]$ ;
(ii): $f (l, m) - f (l - 1, m) \in {0, 1}$ for any $(l, m) \in [n] \times [2 h]$ ;
(iii): for each $j \in [h]$ , there exists $w_{j} \in 〚 n 〛$ such that $f (l, 2 j - 1) + f (n - l, 2 j) = w_{j}$ for all $l \in 〚 n 〛$ .

If

w_{j} = \bar{w} \in 〚 n 〛

for all

j \in [h]

, then f is said to be a constant-weight CWF or have constant weight

\bar{w}

.

It is clear that a CWF can be induced by the weights of the prefixes and suffixes of the strings in H. In particular, Item (iii) in Definition 3 is satisfied by taking

w_{j} = wt (h_{j})

. At the same time, a CWF also identifies a multiset of h strings because one can reconstruct string

h_{j}

using the weights of the prefixes given by

{f (l, 2 j - 1) ∣ l \in 〚 n 〛}

straightforwardly.

Definition 4.

Let

f : T \to 〚 n 〛

be a CWF. The multiset

H_{f} : = {t_{j} = t_{j, 1} \dots t_{j, n} ∣ t_{j, l} = f (l, 2 j - 1) - f (l - 1, 2 j - 1) for all l \in [n], j \in [h]}

is called the multiset of strings corresponding to CWF f.

Note that a CWF f uniquely determines

H_{f}

(with an ordering of the strings induced by f), and the multiset H (or any of its reversals) induces CWFs that are equivalent up to permutation of the ordering of the strings in H. Therefore, one may use CWFs as a proxy for analyzing the reconstructibility of H based on M. By definition, a CWF

f (l, m)

consists of

2 h

univariate functions obtained by fixing the variable

m \in [2 h]

. It is convenient to deal with these component functions directly.

Definition 5.

Let

f : T \to 〚 n 〛

be a CWF. For

m \in [2 h]

, let

f_{m} : 〚 n 〛 \to 〚 n 〛

be the function given by

f_{m} (l) = f (l, m)

.

Example 2.

Consider the multiset

U = {110101, 110101, 101110}

given in Example 1. The values of the CWF

f : 〚 6 〛 \times [6] \to 〚 6 〛

induced by U are given in Table 1. The graphs of the component functions

f_{1}

,

f_{2}

, …,

f_{6}

are shown in Figure 1.

By Item (iii) of Definition 3, if f is the CWF induced by H, then

f_{2 j - 1}

and

f_{2 j}

record the weight information starting from the two ends of the same string in H. In other words,

2 j - 1

and

2 j

refer to the same string. As we will be constantly relating

f_{2 j - 1}

to

f_{2 j}

, or the other way around, let us introduce the following definition for notational convenience.

Definition 6.

Let

m \in [2 h]

. Define

m^{*} \in [2 h]

by

\begin{matrix} m^{*} = \{\begin{matrix} m - 1 if m is even, \\ m + 1 if m is odd . \end{matrix} \end{matrix}

The problem of reconstructing a single string from its prefix–suffix compositions, i.e., the case where

h = 1

in our setting, is examined in [7]. The authors of [7] introduced the so-called swap operation for a string

t

to generate all the strings that have the same prefix–suffix compositions as

t

, thereby deducing the conditions for a single string to be reconstructed uniquely up to reversal from prefix–suffix compositions. Specifically, the swap operation is performed on carefully chosen coordinates where

t

and

\overset{\leftarrow}{t}

disagree, so as to produce new strings that maintain the same prefix–suffix compositions. Let f be the CWF induced by

{t}

and let

f_{1}, f_{2}

correspond to

t

and

\overset{\leftarrow}{t}

, respectively. Using the language of CWFs, the swap operation should be performed over the domain where

f_{1}

and

f_{2}

take different values. Since

f_{1}, f_{2}

capture the running weight information from the two ends of the same string

t

, they must be 180-degree rotational symmetric. More precisely, if

wt (t) = \bar{w}

, then

f_{1}

should be the same as

f_{2}

when it is rotated 180 degrees about

(n / 2, \bar{w} / 2)

. With this observation, it follows that if

f_{1}^{'}

and

f_{2}^{'}

are the functions corresponding to the strings obtained by swapping bits of

t

with

\overset{\leftarrow}{t}

, then

f_{1}^{'}, f_{2}^{'}

must be 180-degree rotational symmetric for them to record the weight information from the two ends of a single string.

Generalizing the idea of comparing

t

and

\overset{\leftarrow}{t}

for producing new strings, we introduce the notions of discrepancy and maximal intervals between functions

f_{m_{1}}

and

f_{m_{2}}

for any

m_{1}, m_{2} \in [2 h]

and

h \geq 1

as follows.

Definition 7.

For

m_{1}, m_{2} \in [2 h]

, define the discrepancy between the functions

f_{m_{1}}

and

f_{m_{2}}

to be the set

D (m_{1}, m_{2}) : = {l \in [n] ∣ f_{m_{1}} (l) \neq f_{m_{2}} (l)}

. For

k_{1}, k_{2} \in [n]

, the set

I : = [k_{1}, k_{2}] \subset [n]

is called a maximal interval (of the discrepancy) between

f_{m_{1}}

and

f_{m_{2}}

, if I is a nonempty set such that

I \subset D (m_{1}, m_{2})

,

k_{1} - 1 \notin D (m_{1}, m_{2})

, and

k_{2} + 1 \notin D (m_{1}, m_{2})

.

Due to Item (iii) of Definition 3, the maximal intervals between

f_{m}, f_{m^{*}}

exhibit symmetry about

n / 2

, as shown in the next proposition.

Proposition 1.

Let

[k_{1}, k_{2}] \subset 〚 n 〛

be a maximal interval between

f_{m}

and

f_{m^{*}}

. If

k_{2} + 1 < n - k_{2}

, i.e.,

k_{2} < ⌊ n / 2 ⌋

, then

[n - k_{2}, n - k_{1}]

is another maximal interval between

f_{m}

and

f_{m^{*}}

. Similarly, if

k_{1} > ⌈ n / 2 ⌉

, then

[n - k_{2}, n - k_{1}]

is another maximal interval between

f_{m}

and

f_{m^{*}}

. If

k_{1} \leq ⌈ n / 2 ⌉

and

k_{2} \geq ⌊ n / 2 ⌋

, then it is necessary that

k_{2} = n - k_{1}

and so

k_{1} \leq ⌊ n / 2 ⌋

and

k_{2} \geq ⌈ n / 2 ⌉

.

Proof.

Since

f_{m} (l) \neq f_{m^{*}} (l)

for

l \in [k_{1}, k_{2}]

, by Item (iii) of Definition 3, we have

f_{m} (l) \neq f_{m^{*}} (l)

for

l \in [n - k_{2}, n - k_{1}]

. The proposition follows by inspecting the intersection of

[k_{1}, k_{2}]

and

[n - k_{2}, n - k_{1}]

. □

As the focus of this paper is on constant-weight strings, let us mention the following simple observation for constant-weight CWFs, which is also a consequence of Item (iii) of Definition 3.

Proposition 2.

Assume f is a constant-weight CWF. Let

m_{1}, m_{2} \in [2 h]

and

k_{1}, k_{2} \in [n - 1]

. If

[k_{1}, k_{2}] \subset [n - 1]

is a maximal interval between

f_{m_{1}}

and

f_{m_{2}}

, then

[n - k_{2}, n - k_{1}]

is a maximal interval between

f_{m_{1}^{*}}

and

f_{m_{2}^{*}}

.

Example 3.

Continuing Example 2, the CWF f has constant weight since it is induced by the multiset U in which all strings have weight 4. As shown in Figure 1, we observe the following:

The set $[1, 2]$ is a maximal interval between $f_{1}$ and $f_{6}$ , and $[4, 5]$ is also a maximal interval between $f_{2}$ and $f_{5}$ , as asserted by Proposition 2;
The set ${1}$ is a maximal interval between $f_{2}$ and $f_{6}$ , and ${5}$ is also a maximal interval between $f_{1}$ and $f_{5}$ , as asserted by Proposition 2.

Next, we introduce the notion of swap between functions

f_{m_{1}}

and

f_{m_{2}}

for any

m_{1}, m_{2} \in [2 h]

that ensures the resulting component functions still form a CWF. In view of Proposition 1, the swap operation has to be defined properly so that the symmetry of

f_{m_{1}}, f_{m_{1}^{*}}

and

f_{m_{2}}, f_{m_{2}^{*}}

are preserved after swapping, i.e., Item (iii) of Definition 3 is still satisfied by the new functions obtained after swapping.

Definition 8.

Let f be a CWF,

m_{1}, m_{2} \in [2 h]

, and

I \subset 〚 n 〛

be a maximal interval between

f_{m_{1}}

and

f_{m_{2}}

. Let g be the CWF obtained from f by swapping the image of

(l, m_{1})

under f for that of

(l, m_{2})

, and the image of

(n - l, m_{1}^{*})

under f for that of

(n - l, m_{2}^{*})

for all

l \in I

. More precisely, if

m_{1}^{*} \neq m_{2}

, then g satisfies

g_{m} = f_{m}

for

m \in [2 h] ∖ {m_{1}, m_{1}^{*}, m_{2}, m_{2}^{*}}

and

\begin{matrix} g_{m_{1}} (l) & = \{\begin{matrix} f_{m_{2}} (l), & l \in I \\ f_{m_{1}} (l), & l \notin I \end{matrix}, & g_{m_{1}^{*}} (n - l) & = \{\begin{matrix} f_{m_{2}^{*}} (n - l), & l \in I \\ f_{m_{1}^{*}} (n - l), & l \notin I \end{matrix}, \\ g_{m_{2}} (l) & = \{\begin{matrix} f_{m_{1}} (l), & l \in I \\ f_{m_{2}} (l), & l \notin I \end{matrix}, & g_{m_{2}^{*}} (n - l) & = \{\begin{matrix} f_{m_{1}^{*}} (n - l), & l \in I \\ f_{m_{2}^{*}} (n - l), & l \notin I \end{matrix} . \end{matrix}

If

m_{1}^{*} = m_{2}

, then g satisfies

g_{m} = f_{m}

for

m \in [2 h] ∖ {m_{1}, m_{1}^{*}}

and writing

I = [k_{1}, k_{2}], \bar{I} : = [n - k_{2}, n - k_{1}]

, we have

\begin{matrix} g_{m_{1}} (l) & = \{\begin{matrix} f_{m_{2}} (l), & l \in I \cup \bar{I} \\ f_{m_{1}} (l), & l \notin I \cup \bar{I} \end{matrix}, & g_{m_{1}^{*}} (n - l) & = \{\begin{matrix} f_{m_{2}^{*}} (n - l), & l \in I \cup \bar{I} \\ f_{m_{1}^{*}} (n - l), & l \notin I \cup \bar{I} \end{matrix} . \end{matrix}

Denote the mapping

(f, I, m_{1}, m_{2}) \mapsto g

by ϕ.

Example 4.

Continuing Example 3, we perform the swap operation on

f_{1}, f_{2}, f_{5}, f_{6}

:

Let $g = ϕ (f, I = [1, 2], m_{1} = 1, m_{2} = 6)$ . Then, the multiset corresponding to g is $H_{g} = {011101, 110101, 110101}$ by Definition 4. Observe in Table 1 and Figure 1 that $f_{1} (l) = f_{6} (l)$ for all $l \in 〚 6 〛 ∖ [1, 2]$ , so $g_{1} = f_{6}$ . This explains the fact that the first string in $H_{g}$ is the reversal of the third string in U. If we define $g^{'} = ϕ (f, I = {1}, m_{1} = 5, m_{2} = 6)$ , then $H_{g} = H_{g^{'}}$ .
Let $g = ϕ (f, I = {1}, m_{1} = 2, m_{2} = 6)$ . Then, the multiset corresponding to g is $H_{g} = {110110, 110101, 101101}$ by Definition 4. Observe in Table 1 and Figure 1 that $f_{2} (1) = f_{5} (1)$ , $f_{1} (5) = f_{6} (5)$ , and $f_{5} (l) = f_{6} (l)$ for all $l \in 〚 6 〛 ∖ ({1} \cup {5})$ . By the swap operation, we have $g_{6} (1) = f_{2} (1) = f_{5} (1)$ and $g_{5} (5) = f_{1} (5) = f_{6} (5)$ . Thus, $g_{5} = g_{6}$ . Indeed, the third string in $H_{g}$ is a palindrome.

The ideas of maximal intervals and swapping are particularly helpful in establishing the necessity of the conditions for unique reconstruction as we will see in Section 3.1.

If f is constant weight, then by the rotational symmetry of

f_{m}

and

f_{m^{*}}

, the behavior of f is completely characterized by

(l, f_{m} (l))

for all

l \leq ⌊ n / 2 ⌋

and

m \in [2 h]

. This motivates us to look at the “median weight” of the component functions

{f_{m}}

, introduced below.

Definition 9.

For

m \in [2 h]

, the median weight of

f_{m}

is defined to be

med (f_{m}) = \frac{1}{2} (f_{m} (⌊ n / 2 ⌋) + f_{m} (⌈ n / 2 ⌉)) \in R

. For

w \in R

, let

A_{f} (w) = {m \in [2 h] ∣ med (f_{m}) = w}

be the set of labels of the component functions

{f_{m}}

for which the median weight is w. If f is clear from the context, denote

A_{f} (w)

by

A (w)

for simplicity.

The set

A (w)

plays an important role in showing the sufficiency of the conditions for the unique reconstruction in Section 3.2. Note that if f has constant weight

\bar{w}

, then

| A_{f} (\bar{w} / 2) |

must be even. In fact, for any

m \in [2 h]

, we have

m \in A (\bar{w} / 2)

if and only if

m^{*} \in A (\bar{w} / 2)

, due to the 180-degree rotational symmetry of

f_{m}

and

f_{m^{*}}

about

(n / 2, \bar{w} / 2)

.

As mentioned earlier, CWFs may be used as a proxy for reconstructing multisets given M. In fact, our reconstruction algorithms presented in Section 4 essentially reconstruct CWFs whose corresponding multisets are compatible with M, and such CWFs are said to be “solutions” to M.

Definition 10.

A CWF

f : T \to 〚 n 〛

is called a solution to the composition multiset M if the multiset equality

M = {(l - f_{m} (l), f_{m} (l)) ∣ m \in [2 h], l \in [n]}

holds.

Remark 1.

If f is a solution to M and

I \subset 〚 n 〛

is a maximal interval between

f_{m_{1}}

and

f_{m_{2}}

, then

g = ϕ (f, I, m_{1}, m_{2})

is also a solution to M.

In order to recover all multisets of strings compatible with M, it suffices to find all CWF solutions to M. Therefore, it is helpful to establish connections between multiset M and CWF f, which is what we will do next.

Definition 11.

Let

f : T \to 〚 n 〛

be a CWF. For

(l, w) \in 〚 n 〛^{2}

, let

A_{f} (l, w) = {m \in [2 h] ∣ f_{m} (l) = w}

. When the underlying CWF f is clear from the context, denote

A_{f} (l, w)

by

A (l, w)

for simplicity.

Definition 12.

For

(l, w) \in 〚 n 〛^{2}

, let

a_{l, w}

be the number of pairs

(l - w, w)

in M if

(l, w) \neq (0, 0)

and define

a_{0, 0} = 2 h

.

Remark 2.

By Definitions 11 and 12,

| A (l, w) |

is the number of functions in

{f_{m}}

that satisfies

f_{m} (l) = w

, and

a_{l, w}

is the number of length-l prefixes and suffixes of weight w. Therefore, by Definition 10, a CWF f is a solution to M if and only if

| A (l, w) | = a_{l, w}

for all

(l, w) \in 〚 n 〛^{2}

.

By Remark 2, to find a solution f to M, one may plot the elements of the multiset M on a two-dimensional grid (see Figure 2 for example) and construct a CWF f such that it passes the point

(l, w)

exactly

a_{l, w}

times on the grid. Below, we mention a few basic properties of

A (l, w)

and

a_{l, w}

that immediately follow from Definitions 11 and 12.

Proposition 3.

(i): For $l \in [n]$ and $w_{1}, w_{2} \in 〚 n 〛$ , if $w_{1} \neq w_{2}$ , then $A (l, w_{1}) \cap A (l, w_{2}) = \emptyset$ .
(ii): For $(l, w) \in 〚 n - 1 〛^{2}$ , it holds that $A (l, w) \subset A (l + 1, w) \cup A (l + 1, w + 1)$ .
(iii): For $(l, w) \in {[n]}^{2}$ , it holds that $A (l, w) \subset A (l - 1, w) \cup A (l - 1, w - 1)$ .

Note that (ii) (resp., (iii)) of Proposition 3 simply says that the weight of a substring cannot decrease (resp., increase) if its length increases (resp., decreases). As mentioned previously, a solution f to M must pass

(l, w)

exactly

a_{l, w}

times. To further assist in finding such CWFs, we will be interested in the number of length (l) and weight (w) prefixes and suffixes whose weight remains the same if the length decreases, and the number of those whose weight decreases with the length. They are denoted by

b_{l, w}

and

c_{l, w}

, introduced in the next definition.

Definition 13.

Let f be a solution to M. For all

(l, w) \in {[n]}^{2}

, define

b_{l, w} = | A (l, w) \cap A (l - 1, w) |

and

c_{l, w} = | A (l, w) \cap A (l - 1, w - 1) |

. Moreover, define

b_{l, 0} = | A (l, 0) |

,

c_{l, 0} = 0

for all

l \in [n]

.

Proposition 4.

Let f be a solution to M. The numbers

{b_{l, w}, c_{l, w} ∣ (l, w) \in [n] \times 〚 n 〛; w \leq l}

can be computed from the numbers

{a_{l, w} ∣ (l, w) \in 〚 n 〛^{2}; w \leq l}

.

Proof.

Since f is a solution to M, we have

a_{l, w} = | A (l, w) |

. By Definition 13,

b_{l, l} = 0

,

c_{l, l} = a_{l, l}

for all

l \in [n]

. It remains to find

b_{l, w}, c_{l, w}

where

0 \leq w \leq l - 1

. By (i) of Proposition 3,

A (l - 1, w)

and

A (l - 1, w - 1)

are disjoint. Therefore,

b_{l, w} + c_{l, w} \leq a_{l, w}

. At the same time, by (iii) of Proposition 3, we have

a_{l, w} \leq b_{l, w} + c_{l, w}

, and thus

\begin{matrix} a_{l, w} = b_{l, w} + c_{l, w}, (l, w) \in [n] \times 〚 n 〛 . \end{matrix}

(1)

Using (ii) of Proposition 3, we have

a_{l, w} \leq b_{l + 1, w} + c_{l + 1, w + 1}

. By (i) of Proposition 3,

A (l + 1, w)

and

A (l + 1, w + 1)

are disjoint, so we also have

b_{l + 1, w} + c_{l + 1, w + 1} \leq a_{l, w}

. Therefore,

\begin{matrix} a_{l, w} = b_{l + 1, w} + c_{l + 1, w + 1}, (l, w) \in 〚 n - 1 〛^{2} . \end{matrix}

(2)

It follows from (2) that

b_{l, l - 1} = a_{l - 1, l - 1} - c_{l, l}

for

l \in [n]

. Using (1), we obtain

c_{l, l - 1} = a_{l, l - 1} - b_{l, l - 1}

for

l \in [n]

. Therefore, we have found

{b_{l, l - 1}, c_{l, l - 1} ∣ l \in [n]}

. Next, from (2) and (1), we have

b_{l, l - 2} = a_{l - 1, l - 2} - c_{l, l - 1}

and

c_{l, l - 2} = a_{l, l - 2} - b_{l, l - 2}

for

l \in [2, n]

. Repeating this process, we can determine

{b_{l, l - i}, c_{l, l - i} ∣ l \in [i, n]}

for all

i \in [n]

. □

Remark 3.

As a consequence of Proposition 4, the numbers

{b_{l, w}}

and

{c_{l, w}}

can be found by inspecting M, and thus, they are properties of M in the sense that all solutions to M result in the same

{b_{l, w}}

and

{c_{l, w}}

. In fact, from the recursive procedures in the above proof, for

w \leq l

, we have

\begin{matrix} b_{l, w} & = \sum_{v = w}^{l - 1} a_{l - 1, v} - \sum_{v = w + 1}^{l} a_{l, v}, \end{matrix}

(3)

\begin{matrix} c_{l, w} & = \sum_{v = w}^{l} a_{l, v} - \sum_{v = w}^{l - 1} a_{l - 1, v} . \end{matrix}

(4)

Since

b_{l, w} \geq 0, c_{l, w} \geq 0

, it follows that for all

w \leq l

\begin{matrix} \sum_{v = w}^{l - 1} a_{l - 1, v} & \geq \sum_{v = w + 1}^{l} a_{l, v}, \end{matrix}

(5)

\begin{matrix} \sum_{v = w}^{l} a_{l, v} & \geq \sum_{v = w}^{l - 1} a_{l - 1, v} . \end{matrix}

(6)

The numbers

{a_{l, w}}, {b_{l, w}}, {c_{l, w}}

are instrumental in analyzing the possible behaviors of the component functions

{f_{m}}

in Section 4. Before proceeding to present our main results, we summarize some important notation introduced in this section in Table 2 for ease of reference.

3. Necessary and Sufficient Conditions for Unique Reconstruction

In this section, we assume H is a multiset of h strings of length n and weight

\bar{w}

. The main result of this section is stated in the following theorem.

Theorem 1.

Let f be a solution to M. There is exactly one multiset of strings (up to reversal) compatible with M, i.e.,

| H | = 1

, if and only if f satisfies the following conditions:

(i): For any $m_{1}, m_{2} \in [2 h]$ with $m_{1}^{*} = m_{2}$ , there exist at most two maximal intervals between $f_{m_{1}}$ and $f_{m_{2}}$ .
(ii): For any $m_{1}, m_{2} \in [2 h]$ with $m_{1}^{*} \neq m_{2}$ , there exists at most one maximal interval between $f_{m_{1}}$ and $f_{m_{2}}$ .

3.1. Necessity

To give a rough idea of why the conditions in Theorem 1 are necessary for unique reconstruction, let us first consider some simple examples for the case where there is a single string

t

. Suppose

t = 011101

and so

\overset{\leftarrow}{t} = 101110

. A string

s = 101110

, which has the same prefix–suffix compositions as

t

, can be obtained by swapping the first two and last two bits of

t

for those of

\overset{\leftarrow}{t}

. Note that

s

is simply

\overset{\leftarrow}{t}

and we only obtain the reversal of

t

after swapping. Using the language of CWFs, let f be the CWF induced by

{t}

and

f_{1}, f_{2}

be the functions corresponding to

t, \overset{\leftarrow}{t}

. We observe that there are only two maximal intervals between

f_{1}

and

f_{2}

.

Next, let us examine an example where we produce a new string by swapping. Take

t = 010101

and so

\overset{\leftarrow}{t} = 101010

. In this case, there are three maximal intervals between the corresponding functions

f_{1}, f_{2}

. Swapping the first two and last two bits of

t

with

\overset{\leftarrow}{t}

, we obtain a new string

s = 100110

. Clearly,

s \neq \overset{\leftarrow}{t}

, and

s

has the same prefix–suffix compositions as

t

.

From the above two examples, one may expect that if there are at least three maximal intervals between

t

and

\overset{\leftarrow}{t}

, then

t

cannot be uniquely reconstructed, and therefore, the existence of at most two maximal intervals is necessary for the unique reconstruction of a single string up to reversal. A similar analysis can also be carried out for two strings that are not reversals of each other, and it turns out that the existence of at most one maximal interval is necessary for the unique reconstruction up to reversal in this case.

Lemma 1.

Let f be a solution to M and let

m \in [2 h]

. If there exist at least three maximal intervals between

f_{m}

and

f_{m^{*}}

, then

| H | > 1

.

Proof.

Let

I_{1} = [k_{1}, k_{2}], I_{2} = [k_{3}, k_{4}], I_{3} = [k_{5}, k_{6}]

be three maximal intervals between

f_{m}

and

f_{m^{*}}

. Without loss of generality, we may assume

0 < k_{1} \leq k_{2} < k_{3} \leq k_{4} < k_{5} \leq k_{6} < n

. Construct

g = ϕ (f, I_{1}, m, m^{*})

. By Remark 1, g is also a solution to M. Let

{\bar{I}}_{1} = [n - k_{2}, n - k_{1}]

. By construction of g, we have

g_{m} \neq f_{m}

on

I_{1} \cup {\bar{I}}_{1}

. Moreover,

g_{m} \neq f_{m^{*}}

on either

I_{2}

or

I_{3}

since

{\bar{I}}_{1}

cannot equal both of them. Therefore, the string corresponding to

g_{m}

is not the same as

f_{m}, f_{m^{*}}

and we have

[H_{g}] \neq [H_{f}]

. Hence, if there exist at least three maximal intervals between

f_{m}

and

f_{m^{*}}

, then

| H | > 1

. □

Lemma 2.

Let f be a solution to M, and let

m_{1}, m_{2} \in [2 h]

with

m_{1}^{*} \neq m_{2}

. If there exist at least two maximal intervals between

f_{m_{1}}

and

f_{m_{2}}

, then

| H | > 1

.

Proof.

Let

I_{1}

,

I_{2}

be two maximal intervals between

f_{m_{1}}

and

f_{m_{2}}

. Without loss of generality, assume

{⌊ n / 2 ⌋, ⌈ n / 2 ⌉} ⊄ I_{1}

. Construct

g = ϕ (f, I_{1}, m_{1}, m_{2})

. By Remark 1, g is also a solution to M. In the following, we will show that

\begin{matrix} {f_{m_{1}}, f_{m_{1}^{*}}, f_{m_{2}}, f_{m_{2}^{*}}} \neq {g_{m_{1}}, g_{m_{1}^{*}}, g_{m_{2}}, g_{m_{2}^{*}}}, \end{matrix}

(7)

implying

| H | > 1

.

Since

m_{1}^{*} \neq m_{2}

and

I_{1}

is a maximal interval between

f_{m_{1}}

and

f_{m_{2}}

, by construction of g, we have

g_{m_{1}} \neq f_{m_{1}}

, and

I_{1}

is the only maximal interval between

g_{m_{1}}

and

f_{m_{1}}

. We claim

g_{m_{1}} \neq f_{m_{1}^{*}}

also holds. Indeed, if

g_{m_{1}} = f_{m_{1}^{*}}

, then

I_{1}

is the only maximal interval between

f_{m_{1}^{*}}

and

f_{m_{1}}

. However, since

I_{1} ⊅ {⌊ \frac{n}{2} ⌋, ⌈ \frac{n}{2} ⌉}

, according to Proposition 1, there are at least two maximal intervals between

f_{m_{1}^{*}}

and

f_{m_{1}}

, which is a contradiction. Therefore,

g_{m_{1}} \neq f_{m_{1}^{*}}

. So far, we have shown

\begin{matrix} g_{m_{1}} \neq f_{m_{1}}, g_{m_{1}} \neq f_{m_{1}^{*}} . \end{matrix}

(8)

By construction of g, we have

g_{m_{1}} \neq f_{m_{2}}

. If

g_{m_{1}} \neq f_{m_{2}^{*}}

, then (7) holds and we are done.

Consider the case where

g_{m_{1}} = f_{m_{2}^{*}}

. Using arguments similar to those leading to (8), one can obtain

\begin{matrix} g_{m_{2}} \neq f_{m_{2}}, g_{m_{2}} \neq f_{m_{2}^{*}} . \end{matrix}

By construction of g, we also have

g_{m_{2}} \neq f_{m_{1}}

. Next, we would like to show

g_{m_{2}} \neq f_{m_{1}^{*}}

for (7) to hold. Recall that the set

I_{1}

is the only maximal interval between

g_{m_{1}}

and

f_{m_{1}}

. Since

g_{m_{1}} = f_{m_{2}^{*}}

, it follows that

I_{1}

is the only maximal interval between

f_{m_{2}^{*}}

and

f_{m_{1}}

. Write

I_{1} = [k_{1}, k_{2}]

. By Proposition 2, the set

[n - k_{2}, n - k_{1}]

is a maximal interval between

f_{m_{2}}

and

f_{m_{1}^{*}}

, and so

f_{m_{2}} (n - k_{1}) \neq f_{m_{1}^{*}} (n - k_{1})

. Since

I_{1} ⊅ {⌊ \frac{n}{2} ⌋, ⌈ \frac{n}{2} ⌉}

, we have

I_{1} \cap [n - k_{2}, n - k_{1}] = \emptyset

. By construction of g, we have

g_{m_{2}} (n - k_{1}) = f_{m_{2}} (n - k_{1})

and it follows that

g_{m_{2}} (n - k_{1}) \neq f_{m_{1}^{*}} (n - k_{1})

, i.e.,

g_{m_{2}} \neq f_{m_{1}^{*}}

. Therefore, (7) also holds.

In summary, no matter whether

g_{m_{1}}

and

f_{m_{2}^{*}}

are the same or not, (7) always holds. It follows that the multisets corresponding to

f, g

satisfy

[H_{f}] \neq [H_{g}]

, and thus,

| H | > 1

. □

The necessity part of Theorem 1 follows from Lemmas 1 and 2.

3.2. Sufficiency

From the above discussion on the necessity, it is not difficulty to see that if f is a solution to M such that the conditions in Theorem 1 hold, then any CWF g resulted from a series of the swap operations between

f_{1}, \dots, f_{2 h}

satisfies

[H_{g}] = [H_{f}]

. Therefore, the sufficiency of the conditions follows if one can further show that any solution to M can be obtained from repeated applications of the swap operation between

f_{1}, \dots, f_{2 h}

. However, it is, in general, not obvious to establish such a connection between f and an arbitrary solution to M. Thus, we take a different approach to showing the sufficiency. Our main idea is to translate the conditions in Theorem 1 to properties shared by all solutions to M and utilize these properties to establish the sufficiency of the conditions.

As mentioned before, the CWF f induced by h strings of length n and weight

\bar{w}

is determined by the behaviors of the functions

{f_{m}}

on

〚 ⌊ n / 2 ⌋ 〛

because of the constant weight. Based on the values that the functions

{f_{m}}

take at

n / 2

, i.e., the median weight

med (f_{m})

, the functions

{f_{m}}

can be formed into groups

A (w), w = 0, 1 / 2, 1, \dots, \bar{w}

. In the following, we analyze the behaviors of the functions

{f_{m}}

according to their membership in these groups. Let us first rephrase the conditions for

f_{m}, f_{m^{*}}

in Theorem 1 using their rotational symmetry.

Proposition 5.

Let f be a solution that satisfies the conditions in Theorem 1. Then the following holds:

(i): For any $m \in A (\bar{w} / 2)$ , either $f_{m} = f_{m^{*}}$ or there are exactly two maximal intervals between $f_{m}$ and $f_{m^{*}}$ , and exactly one of the two intervals is contained in $[⌊ n / 2 ⌋]$ .
(ii): For any $m \in [2 h] ∖ A (\bar{w} / 2)$ , there is exactly one maximal interval between $f_{m}$ and $f_{m^{*}}$ .

Proof.

As mentioned previously, for any

m \in [2 h]

, we have

m \in A (\bar{w} / 2)

if and only if

m^{*} \in A (\bar{w} / 2)

. For any

m \in A (\bar{w} / 2)

with

f_{m} \neq f_{m^{*}}

, since f satisfies the conditions in Theorem 1, there is either one maximal interval or two maximal intervals between

f_{m}

and

f_{m^{*}}

. Since

med (f_{m}) = med (f_{m^{*}})

, it follows that at least one of the maximal intervals is contained in

[⌊ n / 2 ⌋]

or

[⌊ n / 2 ⌋ + 1, n]

. Suppose there is only one maximal interval between

f_{m}

and

f_{m^{*}}

. Then, the maximal interval is contained in

[⌊ n / 2 ⌋]

or

[⌊ n / 2 ⌋ + 1, n]

, but by Proposition 1, there are two maximal intervals between

f_{m}

and

f_{m^{*}}

, which is a contradiction. So, there are exactly two maximal intervals. Now, suppose the two intervals are both in

[⌊ n / 2 ⌋]

or both in

[⌊ n / 2 ⌋ + 1, n]

. Then, by Proposition 1, there are more than two maximal intervals between

f_{m}

and

f_{m^{*}}

, which is a contradiction. It follows that exactly one of the two intervals is contained in

[⌊ n / 2 ⌋]

.

For any

m \in [2 h] ∖ A (\bar{w} / 2)

, we have

med (f_{m}) \neq med (f_{m^{*}})

; so, by Proposition 1, there exists one maximal interval between

f_{m}

and

f_{m^{*}}

that contains

{⌊ n / 2 ⌋, ⌈ n / 2 ⌉}

. Furthermore, if there is another maximal interval contained in

[⌊ n / 2 ⌋]

or

[⌊ n / 2 ⌋ + 1, n]

, by Proposition 1, there are at least three maximal intervals between

f_{m}

and

f_{m^{*}}

, which is a contradiction to the conditions in Theorem 1. Therefore, there is exactly one maximal interval between

f_{m}

and

f_{m^{*}}

. □

Example 5.

Consider the set of strings

V = {1000111, 1110001, 1100011, 1010011}

. The CWF g induced by V is given by Table 3. One can check that g satisfies the conditions in Theorem 1. Below, let us verify what Proposition 5 claims. Note that the strings in V have the same weight

\bar{w} = 4

and

A_{g} (\bar{w} / 2) = {5, 6, 7, 8}

. From Table 3, we can observe that

g_{5} = g_{6}

, and

{2} \subset [⌊ n / 2 ⌋] = [3]

and

{5}

are the only two maximal intervals between

g_{7}

and

g_{8}

. At the same time,

[2 h] ∖ A_{g} (\bar{w} / 2) = {1, 2, 3, 4}

. From Table 3, we can observe that there is exactly one maximal interval between

g_{1}

and

g_{2}

, and the same holds for

g_{3}

and

g_{4}

.

Below, we introduce two more definitions that are helpful for discussing the behaviors of the functions

{f_{m}}

in this subsection.

Definition 14.

For

m \in [2 h]

and

I \subset 〚 n 〛

, let

G (f_{m}, I) : = {(l, f_{m} (l)) ∣ l \in I}

be the graph of

f_{m}

over I and denote

G (f_{m}) : = G (f_{m}, 〚 n 〛)

.

Definition 15.

An element

(l, w) \in {[n]}^{2}

is called a branching point if

b_{l, w} > 0

and

c_{l, w} > 0

. An element

(l, w) \in 〚 n - 1 〛^{2}

is called a merging point if

b_{l + 1, w} > 0

and

c_{l + 1, w + 1} > 0

. The branching and merging points are so named because we would like to visualize the graphs

{G (f_{m}) ∣ m \in [2 h]}

evolving from

l = n

to

l = 0

.

Example 6.

Let

U = {110101, 110101, 101110}

and

V = {1000111, 1110001, 1100011, 1010011}

as given in Examples 1 and 5. In Figure 2, we depict

M (U)

and

M (V)

by writing the non-zero numbers

a_{l, w}

on top of the points

(l, w)

. The numbers

b_{l, w}

and

c_{l, w}

can be determined by (3) and (4) in Remark 3, from which branching and merging points can be identified using Definition 15.

Using Proposition 5, we examine the conditions in Theorem 1 in terms of the branching points and merging points on

{f_{m}}

in a series of lemmas below. Lemma 3 first examines the functions

{f_{m}}

for which

m \in [2 h] ∖ A (\bar{w} / 2)

.

Lemma 3.

Let f be a solution to M that satisfies the conditions in Theorem 1 and let

m \in [2 h] ∖ A (\bar{w} / 2)

. If

(l, w) \in G (f_{m})

is a merging point, then there are no branching points in

G (f_{m}, [l])

.

Proof.

If

(l, w) \in G (f_{m})

is a merging point, there exists

m_{1} \in [2 h] ∖ {m}

such that

f_{m} (l + 1) \neq f_{m_{1}} (l + 1)

and

f_{m} (l) = f_{m_{1}} (l)

. We claim

G (f_{m}, [l]) = G (f_{m_{1}}, [l])

. Indeed, if

G (f_{m}, [l]) \neq G (f_{m_{1}}, [l])

then there is at least one maximal interval between

f_{m}

and

f_{m_{1}}

contained in

[l - 1]

, in addition to the one contained in

[l + 1, n]

. Since f satisfies the conditions in Theorem 1, we must have

m_{1} = m^{*}

. However, by Proposition 5, if

m_{1} = m^{*}

there should be only one maximal interval between

f_{m_{1}}

and

f_{m}

, leading to a contradiction. Hence,

G (f_{m}, [l]) = G (f_{m_{1}}, [l])

.

Suppose

(k, v) \in G (f_{m}, [l])

is a branching point. Then there exists

m_{2} \in [2 h] ∖ {m, m_{1}}

such that

f_{m_{2}} (k) = f_{m} (k)

and

f_{m_{2}} (k - 1) \neq f_{m} (k - 1)

. Since

(l, w) \in G (f_{m})

is a merging point and we have

f_{m} (l + 1) \neq f_{m_{1}} (l + 1), G (f_{m}, [l]) = G (f_{m_{1}}, [l])

, there must exist

\tilde{m} \in {m, m_{1}}

such that

f_{\tilde{m}} (k - 1) \neq f_{m_{2}} (k - 1)

and

f_{\tilde{m}} (l + 1) \neq f_{m_{2}} (l + 1)

. It follows that there are two maximal intervals between

f_{\tilde{m}}

and

f_{m_{2}}

, and therefore, by the conditions in Theorem 1, we have

m_{2} = {\tilde{m}}^{*}

.

If

\tilde{m} \in [2 h] ∖ A (\bar{w} / 2)

, then by Proposition 5, there should be exactly one maximal interval between

f_{\tilde{m}}, f_{m_{2}}

, which is a contradiction.

If

\tilde{m} \in A (\bar{w} / 2)

, then

\tilde{m} = m_{1}

and

m_{2} = m_{1}^{*} \in A (\bar{w} / 2)

. Therefore, the median weights of

f_{m}

are different from that of

f_{m_{1}}, f_{m_{2}}

and there exists

l_{*} \in {⌊ n / 2 ⌋, ⌈ n / 2 ⌉}

such that

f_{m} (l_{*}) \neq f_{m_{1}} (l_{*}), f_{m} (l_{*}) \neq f_{m_{2}} (l_{*})

. Since

G (f_{m}, [l]) = G (f_{m_{1}}, [l])

, we have

l < l_{*}

. It follows that

k < l_{*}

. So there exists a maximal interval between

f_{m}, f_{m_{2}}

that is contained in

[k, n]

, in addition to the one contained in

[k - 1]

. Since

m \in [2 h] ∖ A (\bar{w} / 2)

we have

m_{2} \neq m^{*}

, and therefore, by the conditions in Theorem 1, there should be only one maximal interval between

f_{m}, f_{m_{2}}

, which is a contradiction.

Thus, there are no branching points in

G (f_{m}, [l])

. □

Remark 4.

One can also verify that if

(l, w) \in G (f_{m})

is a branching point, where

m \in [2 h] ∖ A (\bar{w} / 2)

, then

G (f_{m}, [l, n])

has no merging points.

Example 7.

Continuing Example 5, let us use the CWF g to verify Lemma 3 and Remark 4. In this case,

[2 h] ∖ A_{g} (\bar{w} / 2) = {1, 2, 3, 4}

. Note that

G (g_{1}) = {(0, 0), (1, 1), (2, 1), (3, 1), (4, 1), (5, 2), (6, 3), (7, 4)}

contains two branching points

(5, 2)

,

(6, 3)

and two merging points

(1, 1)

,

(2, 1)

, as shown in Figure 2.

The next three lemmas examine the behaviors of

{f_{m}}

for which

m \in A (\bar{w} / 2)

. In particular, the discussion is based on whether

f_{m}, m \in A (\bar{w} / 2)

are all the same or not.

Lemma 4.

Let f be a solution to M that satisfies the conditions in Theorem 1. If

f_{m}, m \in A (\bar{w} / 2)

are all the same, then there are no branching points in

G (f_{m}, [⌊ n / 2 ⌋])

for all

m \in A (\bar{w} / 2)

.

Proof.

Suppose there exist branching points in

G (f_{m}, [⌊ n / 2 ⌋])

for some

m \in A (\bar{w} / 2)

and let

(l_{*}, w_{*}) \in G (f_{m}, [⌊ n / 2 ⌋])

be a branching point. Since

f_{m_{1}} = f_{m_{2}}

for any

m_{1}, m_{2} \in A (\bar{w} / 2)

, there must exist

\tilde{m} \in [2 h] ∖ A (\bar{w} / 2)

such that

f_{\tilde{m}} (l_{*}) = f_{m} (l_{*})

and

f_{\tilde{m}} (l_{*} - 1) \neq f_{m} (l_{*} - 1)

. Moreover, since

\tilde{m} \notin A (\bar{w} / 2)

, we have

f_{\tilde{m}} (l) \neq f_{m} (l)

for some

l \in [l_{*}, ⌈ n / 2 ⌉]

. It follows that there exist two maximal intervals between

f_{m}

and

f_{\tilde{m}}

. This is a contradiction to Item (ii) in Theorem 1 by noticing

m^{*} \neq \tilde{m}

since

m^{*} \in A (\bar{w} / 2)

for all

m \in A (\bar{w} / 2)

. □

If

f_{m}, m \in A (\bar{w} / 2)

are not all the same, Lemma 5 shows that graphs of

f_{m}

over

[⌊ n / 2 ⌋]

are essentially of two kinds. The proof of Lemma 5 is presented in Appendix A.

Lemma 5.

Let f be a solution to M that satisfies the conditions in Theorem 1. If

f_{m}, m \in A (\bar{w} / 2)

are not all the same, then there exists

m_{1} \in A (\bar{w} / 2)

such that there are exactly two maximal intervals between

f_{m_{1}}

and

f_{m_{1}^{*}}

, and

f_{m}, m \in A (\bar{w} / 2) ∖ {m_{1}, m_{1}^{*}}

are all the same. Moreover, it holds that

G (f_{m}, [⌊ n / 2 ⌋]) = G (f_{m_{1}}, [⌊ n / 2 ⌋])

for all

m \in A (\bar{w} / 2) ∖ {m_{1}, m_{1}^{*}}

or

G (f_{m}, [⌊ n / 2 ⌋]) = G (f_{m_{1}^{*}}, [⌊ n / 2 ⌋])

for all

m \in A (\bar{w} / 2) ∖ {m_{1}, m_{1}^{*}}

.

Using Lemma 5, we can further deduce the property of the branching points and merging points on

f_{m}, m \in A (\bar{w} / 2)

.

Example 8.

Let g be the CWF induced by

V = {1000111, 1110001, 1100011, 1010011}

as in Example 5. In this case,

A_{g} (\bar{w} / 2) = {5, 6, 7, 8}

. Note that

f_{m}, m \in A_{g} (\bar{w} / 2)

are not all the same (see Figure 3). As pointed out by Lemma 5, there exists

m_{1} = 7

, such that there are exactly two maximal intervals (

{2}

and

{5}

) between

g_{7}

and

g_{8}

. Moreover,

g_{5} = g_{6}

and it holds that

G (g_{5}, [⌊ n / 2 ⌋]) = G (g_{6}, [⌊ n / 2 ⌋]) = G (g_{7}, [⌊ n / 2 ⌋])

.

In addition, observe in Figure 3 that

(3, 2)

is the only branching point with

l \in [⌊ n / 2 ⌋]

on the graph of

g_{7}

, as asserted in the next lemma.

Lemma 6.

Let f be a solution to M that satisfies the conditions in Theorem 1. If

f_{m}, m \in A (\bar{w} / 2)

are not all the same, then there exists

m_{1} \in A (\bar{w} / 2)

such that there is a maximal interval

[l_{1} + 1, l_{2} - 1] \subset [n]

between

f_{m_{1}}

and

f_{m_{1}^{*}}

, where

l_{2} \leq ⌊ n / 2 ⌋

. Moreover,

(l_{2}, f_{m_{1}} (l_{2}))

is the only branching point in

G (f_{m}, [⌊ n / 2 ⌋])

and there is no merging point in

G (f_{m}, [l_{2}, ⌊ n / 2 ⌋])

for all

m \in A (\bar{w} / 2)

.

Proof.

By Lemma 5 and Proposition 5, there exists

m_{1} \in A (\bar{w} / 2)

such that there is a maximal interval

[l_{1} + 1, l_{2} - 1] \subset [n]

between

f_{m_{1}}

and

f_{m_{1}^{*}}

, where

l_{2} \leq ⌊ n / 2 ⌋

. In addition, by Lemma 5, we have that

(l_{1}, f_{m_{1}} (l_{1}))

is a merging point in

G (f_{m})

for all

m \in A (\bar{w} / 2)

. In what follows, let

m \in A (\bar{w} / 2)

.

Suppose there exists

l_{*} \in [⌊ n / 2 ⌋], l_{*} \neq l_{2}

such that

(l_{*}, f_{m} (l_{*}))

is a branching point. By Lemma 5, there exist

a \in [2 h] ∖ A (\bar{w} / 2)

such that

f_{a} (l_{*}) = f_{m} (l_{*})

and

f_{a} (l_{*} - 1) \neq f_{m} (l_{*} - 1)

. It follows that there are two maximal intervals between

f_{a}, f_{m}

: one is contained in

[l_{*} - 1]

and the other is contained in

[l_{*} + 1, n]

(since the median weight of

f_{a}

is different from that of

f_{m}

). However, we have

a \neq m^{*}

, which is a contradiction to the conditions in Theorem 1. Therefore,

(l_{2}, f_{m_{1}} (l_{2}))

is the only branching point in

G (f_{m}, [⌊ n / 2 ⌋])

.

Suppose there exists

l_{*} \in [l_{2}, ⌊ n / 2 ⌋]

such that

(l_{*}, f_{m} (l_{*}))

is a merging point. By Lemma 5, there exists

a \in [2 h] ∖ A (\bar{w} / 2)

such that

f_{a} (l_{*}) = f_{m} (l_{*})

and

f_{a} (l_{*} + 1) \neq f_{m} (l_{*} + 1)

. Since

(l_{2}, f_{m_{1}} (l_{2}))

is the only branching point in

G (f_{\tilde{m}}, [⌊ n / 2 ⌋])

for all

\tilde{m} \in A (\bar{w} / 2)

, it follows from Lemma 5 that there exists

b \in A (\bar{w} / 2)

such that

G (f_{b}, [l_{2}]) \neq G (f_{a}, [l_{2}])

. Therefore, there exist two maximal intervals between

f_{a}, f_{b}

: one contained in

[l_{2}]

and the other is contained in

[l_{*} + 1, n]

. However, we have

a \neq b^{*}

, which is a contradiction to the conditions in Theorem 1. Thus, there is no merging point in

G (f_{m}, [l_{2}, ⌊ n / 2 ⌋])

. □

So far, we have translated the conditions in Theorem 1 to properties of the branching points and merging points on

{f_{m}}

. The advantage of doing so is that properties of branching points and merging points are shared by all solutions to M. Let

f, f^{'}

be two solutions to M. As a result of Remarks 2 and 3,

(l, w) \in {[n]}^{2}

is a branching point in

G (f_{m})

for some

m \in [2 h]

if and only if

(l, w)

is a branching point in

G (f_{m^{'}}^{'})

for some

m^{'} \in [2 h]

. In particular, there is no branching point in

G (f_{m}, I)

for

I \subset 〚 n 〛

if and only if there is no branching point in

G (f_{m^{'}}^{'}, I)

. The same statements hold for merging points. In view of this, we can then facilitate the description of the conditions in Theorem 1 in terms of branching points and merging points to establish the sufficiency of the conditions in Theorem 1.

Let us present two simple propositions that relate

f, f^{'}

using branching points and merging points.

Proposition 6.

Let f,

f^{'}

be two solutions to M and

[l_{1}, l_{2}] \subset [n]

. If there is no branching point in

G (f_{m}, [l_{1}, l_{2}])

and

f_{m} (l_{2}) = f_{m^{'}}^{'} (l_{2})

for some

m^{'} \in [2 h]

, then for any

l \in [l_{1} - 1, l_{2}]

it holds that

f_{m} (l) = f_{m^{'}}^{'} (l)

.

Proof.

Suppose there exist

l \in [l_{1} - 1, l_{2}]

such that

f_{m} (l) \neq f_{m^{'}}^{'} (l)

. Let

l_{*} \in [l_{1}, l_{2}]

be such that

f_{m} (l_{*}) = f_{m^{'}}^{'} (l_{*})

and

f_{m} (l_{*} - 1) \neq f_{m^{'}}^{'} (l_{*} - 1)

. Let

w = f_{m} (l_{*})

. Since

f, f^{'}

are solutions to M, it follows that the number of pairs

(l_{*} - w, w)

in M is at least 2, i.e.,

a_{l_{*}, w} \geq 2

. Moreover, we have

b_{l_{*}, w} \geq 1, c_{l_{*}, w} \geq 1

. Thus, by Definition 15,

(l_{*}, w)

is a branching point in

G (f_{m}, [l_{1}, l_{2}])

, resulting in a contradiction. □

Proposition 7.

Let f,

f^{'}

be two solutions to M and

[l_{1}, l_{2}] \subset [n]

. If

f_{m} (l_{2}) = f_{m^{'}}^{'} (l_{2})

and

f_{m} (l_{1}) \neq f_{m^{'}}^{'} (l_{1})

, there must be a branching point in

G (f_{m}, [l_{1} + 1, l_{2}])

. Similarly, if

f_{m} (l_{2}) \neq f_{m^{'}}^{'} (l_{2})

and

f_{m} (l_{1}) = f_{m^{'}}^{'} (l_{1})

, there must be a merging point in

G (f_{m}, [l_{1}, l_{2} - 1])

.

Proof.

The first part of the statement is a direct consequence of Proposition 6. For the second part, we observe that there exists

l_{*} \in [l_{1} + 1, l_{2}]

such that

f_{m} (l_{*}) \neq f_{m^{'}}^{'} (l_{*})

and

f_{m} (l_{*} - 1) = f_{m^{'}}^{'} (l_{*} - 1)

. Without loss of generality, assume

f_{m} (l_{*}) = f_{m} (l_{*} - 1) = w

and

f_{m^{'}}^{'} (l_{*}) = f_{m^{'}}^{'} (l_{*} - 1) + 1

. These two equations imply

b_{l_{*}, w} \geq 1

and

c_{l_{*}, w + 1} \geq 1

, respectively. Thus, by Definition 15,

(l_{*} - 1, w)

is a merging point in

G (f_{m}, [l_{1}, l_{2} - 1])

. □

In the next two lemmas, we show that if

f, f^{'}

are two solutions to M with f satisfying the conditions in Theorem 1, then the multiset equality

{f_{m} ∣ m \in [2 h]} = {f_{m}^{'} ∣ m \in [2 h]}

must hold, thereby proving the sufficiency of the conditions in Theorem 1 for unique reconstruction up to reversal.

Lemma 7.

Let f,

f^{'}

be two solutions to M, with f satisfying the conditions in Theorem 1. Let

ψ_{1} (f) = {f_{m} ∣ m \in [2 h] ∖ A_{f} (\bar{w} / 2)}

be a multiset and define

ψ_{1} (f^{'})

accordingly. Then,

ψ_{1} (f) = ψ_{1} (f^{'})

.

Proof.

Let

m \in S_{f} : = [2 h] ∖ A_{f} (\bar{w} / 2)

. Note that

f_{m} \neq f_{m^{*}}

since their median weights are different. Thus, there are branching points in

G (f_{m})

. Let

(l_{*}, w_{*})

be the branching point in

G (f_{m})

such that

l_{*} \leq l

for any branching point

(l, w) \in G (f_{m})

. Let

\begin{matrix} r = \{\begin{matrix} 0 & if w_{*} = f_{m} (l_{*} - 1), \\ 1 & if w_{*} = f_{m} (l_{*} - 1) + 1 . \end{matrix} \end{matrix}

In other words, r is an indicator of the behavior of

f_{m}

to the left of the branching point

(l_{*}, w_{*})

. By definition of r, we have

m \in S_{f} \cap A_{f} (l_{*}, w_{*}) \cap A_{f} (l_{*} - 1, w_{*} - r)

.

Let

S_{f^{'}} = [2 h] ∖ A_{f^{'}} (\bar{w} / 2)

and

m^{'} \in S_{f^{'}} \cap A_{f^{'}} (l_{*}, w_{*}) \cap A_{f^{'}} (l_{*} - 1, w_{*} - r)

. In the following, we will show

f_{m^{'}}^{'} = f_{m}

. Since there is no branching point in

G (f_{m}, [l_{*} - 1])

and

f_{m} (l_{*} - 1) = f_{m^{'}}^{'} (l_{*} - 1)

, by Proposition 6, we have

f_{m} (l) = f_{m^{'}}^{'} (l)

for any

l \in [0, l_{*} - 1]

. Suppose

f_{m} (l) \neq f_{m^{'}}^{'} (l)

for some

l \in [l_{*}, n]

. Then by Proposition 7, there is a merging point in

G (f_{m}, [l_{*}, l - 1])

. But then by Lemma 3, there are no branching points in

G (f_{m}, [l_{*}])

, contradicting that

(l_{*}, w_{*}) \in G (f_{m}, [l_{*}])

is a branching point. Thus,

f_{m} (l) = f_{m^{'}}^{'} (l)

for all

l \in [l_{*}, n]

. It follows that

f_{m} = f_{m^{'}}^{'}

for any

m^{'} \in S_{f^{'}} \cap A_{f^{'}} (l_{*}, w_{*}) \cap A_{f^{'}} (l_{*} - 1, w_{*} - r)

.

Next, let us show that

S_{f^{'}} \cap A_{f^{'}} (l_{*}, w_{*}) \cap A_{f^{'}} (l_{*} - 1, w_{*} - r) = A_{f^{'}} (l_{*}, w_{*}) \cap A_{f^{'}} (l_{*} - 1, w_{*} - r)

. Toward a contradiction, suppose that there exists

m_{0} \in A_{f^{'}} (\bar{w} / 2) \cap A_{f^{'}} (l_{*}, w_{*}) \cap A_{f^{'}} (l_{*} - 1, w_{*} - r)

. Then,

f_{m_{0}}^{'} (l_{*}) = f_{m} (l_{*})

and

f_{m_{0}}^{'} (l_{*} - 1) = f_{m} (l_{*} - 1)

. Note that

med (f_{m_{0}}^{'}) \neq med (f_{m})

. If

l_{*} - 1 \geq ⌈ n / 2 ⌉

, by Proposition 7, there must be a branching point in

G (f_{m}, [l_{*} - 1])

, contradicting the assumption that

l_{*} \leq l

for any branching point

(l, w) \in G (f_{m})

. If

l_{*} - 1 < ⌈ n / 2 ⌉

, there must be a merging point in

G (f_{m}, [l_{*}, ⌈ n / 2 ⌉])

, but by Lemma 3 there should be no branching points in

G (f_{m}, [l_{*}])

, contradicting that

(l_{*}, w_{*})

is a branching point. We thus conclude

A_{f^{'}} (\bar{w} / 2) \cap A_{f^{'}} (l_{*}, w_{*}) \cap A_{f^{'}} (l_{*} - 1, w_{*} - r) = \emptyset

, and so,

S_{f^{'}} \supset A_{f^{'}} (l_{*}, w_{*}) \cap A_{f^{'}} (l_{*} - 1, w_{*} - r)

.

Note that for any

m^{'} \in S_{f^{'}} ∖ (A_{f^{'}} (l_{*}, w_{*}) \cap A_{f^{'}} (l_{*} - 1, w_{*} - r))

, we have

f_{m^{'}}^{'} \neq f_{m}

. Therefore, the multiplicity of

f_{m}

in

ψ_{1} (f^{'})

is

| S_{f^{'}} \cap A_{f^{'}} (l_{*}, w_{*}) \cap A_{f^{'}} (l_{*} - 1, w_{*} - r) | = | A_{f^{'}} (l_{*}, w_{*}) \cap A_{f^{'}} (l_{*} - 1, w_{*} - r) |

. Taking

f^{'} = f

, one can repeat the above arguments to show that

f_{m} = f_{\tilde{m}}

for any

\tilde{m} \in S_{f} \cap A_{f} (l_{*}, w_{*}) \cap A_{f} (l_{*} - 1, w_{*} - r)

and the multiplicity of

f_{m}

in

ψ_{1} (f)

is

| A_{f} (l_{*}, w_{*}) \cap A_{f} (l_{*} - 1, w_{*} - r) |

.

Since

f, f^{'}

are solutions to M,

| A_{f} (l_{*}, w_{*}) \cap A_{f} (l_{*} - 1, w_{*} - r) | = | A_{f^{'}} (l_{*}, w_{*}) \cap A_{f^{'}} (l_{*} - 1, w_{*} - r) |

, i.e., the multiplicity of

f_{m}

in

ψ_{1} (f)

equals the multiplicity of

f_{m}

in

ψ_{1} (f^{'})

. Furthermore, this holds for distinct

f_{m} \in ψ_{1} (f)

. Since

| S_{f} | = | S_{f^{'}} |

, i.e.,

| ψ_{1} (f) | = | ψ_{1} (f^{'}) |

, we obtain

ψ_{1} (f) = ψ_{1} (f^{'})

. □

Lemma 8.

Let f,

f^{'}

be two solutions to M with f satisfying the conditions in Theorem 1. Let

ψ_{0} (f) = {f_{m} ∣ m \in A_{f} (\bar{w} / 2)}

be a multiset and define

ψ_{0} (f^{'})

accordingly. Then,

ψ_{0} (f) = ψ_{0} (f^{'})

.

The idea of the proof for Lemma 8 is similar to that for Lemma 7, whereas it relies on Lemmas 4 and 6 instead of Lemma 3. The complete proof is given in Appendix B.

It follows from Lemmas 7 and 8 that the conditions in Theorem 1 are sufficient for unique reconstruction up to reversal.

Example 9.

Let us use Theorem 1 to determine whether the multiset

U = {110101, 110101, 101110}

given in Example 1 can be uniquely reconstructed up to reversal. The CWF f induced by U is given in Example 2. As shown in Figure 1, there are two maximal intervals (

{1}

and

{4}

) between

f_{2}

and

f_{6}

. This violates Item (ii) of Theorem 1, so we conclude that U cannot be uniquely reconstructed from

M (U)

up to reversal. Indeed, in Example 4, we found multisets not equivalent to U but compatible with

M (U)

.

4. Reconstruction Algorithms

As before, we assume in this section that M is the prefix–suffix compositions of the multiset H of h strings of length n and weight

\bar{w}

. We present two algorithms that produce multisets of strings compatible with M. Both algorithms first construct CWFs and then find the corresponding multisets as in Definition 4. The algorithm in Section 4.1 is a greedy algorithm that outputs a single multiset compatible with M with running time

O (n h)

. The algorithm in Section 4.2 is able to output all compatible multisets up to reversal. Its running time is, in general, exponential as it relies on a breadth-first search to find all possible CWFs and solve a number of integer partition problems.

4.1. An Algorithm That Outputs a Multiset of Strings Compatible with M

To construct a multiset compatible with M, it suffices to find a CWF f that is a solution to M. In Algorithm 1, we construct such a CWF by assigning larger w to

f_{2 k - 1} (l), k \in [h]

in a greedy way as l goes from n to 0.

Algorithm 1 Algorithm for obtaining one multiset of strings compatible with M

By Remark 2, Algorithm 1 produces a multiset of strings compatible with M if the function f constructed in the algorithm is a CWF and

| A_{f} (l, w) | = a_{l, w}

for

(l, w) \in 〚 n 〛

.

Claim 2.

The function f constructed in Algorithm 1 is a CWF.

Proof.

Let us first show that f is a mapping from T to

〚 n 〛

. Noticing Lines 7 and 8 in the algorithm, it suffices to show that for each

l \in [n - 1], k \in [h]

, there exists

w \in [l]

such that

1 + s_{l, w + 1} \leq k \leq a_{l, w} + s_{l, w + 1}

. Note that

s_{l, w + 1}

is a non-increasing function of w. Thus, as w decreases from l to 1,

s_{l, w + 1}

increases from 0 to at most

2 h

. Moreover,

a_{l, w} + s_{l, w + 1} = s_{l, w}

. Therefore, Lines 13 and 14 are well defined for each

l \in [n - 1], k \in [h]

. It remains to show that f as constructed satisfies the conditions required in Definition 3. Clearly, Item (i) in the definition is satisfied according to Line 7 and Item (iii) is satisfied by Lines 7, 8 and 14.

As for Item (ii) in Definition 3, it suffices to show that if

f (l, 2 k - 1) = w

, then

f (l - 1, 2 k - 1) \in {w, w - 1}

. According to Line 12, it suffices to show that if

k \in [1 + s_{l, w + 1}, min {s_{l, w}, h}]

then

k \in [1 + s_{l - 1, w + 1}, min {s_{l - 1, w}, h}] \cup [1 + s_{l - 1, w}, min {s_{l - 1, w - 1}, h}] = [1 + s_{l - 1, w + 1}, min {s_{l - 1, w - 1}, h}]

. From (6), we have

\begin{matrix} s_{l, w + 1} = \sum_{v = w + 1}^{l} a_{l, v} \geq \sum_{v = w + 1}^{l - 1} a_{l - 1, v} = s_{l - 1, w + 1} . \end{matrix}

At the same time, from (5), we have

\begin{matrix} s_{l - 1, w - 1} = \sum_{v = w - 1}^{l - 1} a_{l - 1, v} \geq \sum_{v = w}^{l} a_{l, v} \geq s_{l, w} . \end{matrix}

Therefore,

[1 + s_{l, w + 1}, min {s_{l, w}, h}] \subset [1 + s_{l - 1, w + 1}, min {s_{l - 1, w - 1}, h}]

. □

Next, we would like to show that

| A_{f} (l, w) | = a_{l, w}

. Before that, let us make some simple observations. Since M is the prefix–suffix compositions of h strings of length n, for each

l \in [n]

, the number of prefixes and suffixes of length l with weights in

〚 l 〛

is equal to

2 h

. Moreover, since the h strings are of the same weight

\bar{w}

, for each

l \in [n]

and

w \in 〚 \bar{w} 〛

, the number of prefixes and suffixes of length l with weight w is the same as the number of suffixes and prefixes of length

n - l

with weight

\bar{w} - w

. These observations extend to the case where

l = 0

since

a_{0, 0} = 2 h

by Definition 12. Thus, we have the following proposition.

Proposition 8.

(i): $\sum_{w = 0}^{l} a_{l, w} = 2 h$ for all $l \in 〚 n 〛$ .
(ii): If M is the prefix–suffix composition of strings with constant weight, then $a_{l, w} = a_{n - l, \bar{w} - w}$ for all $l \in 〚 n 〛$ and $w \in 〚 \bar{w} 〛$ .

Claim 3.

For

(l, w) \in 〚 n 〛^{2}

, it holds that

| A_{f} (l, w) | = a_{l, w}

.

Proof.

From Lines 7 and 8 in Algorithm 1, we have

| A_{f} (0, 0) | = | A_{f} (n, \bar{w}) | = 2 h = a_{0, 0} = a_{n, \bar{w}}

. In what follows, let

l \in [n - 1]

. It is clear that

| A_{f} (l, w) | = 0 = a_{l, w}

for all

w > l

. Since

s_{l, w + 1}

increases from 0 to at most

2 h

as w goes from l to 1, there exists

w_{*} \in [l]

such that

s_{l, w_{*} + 1} < h

and

s_{l, w_{*}} \geq h

. From Lines 12 and 13 in the algorithm, for each

w \in [w_{*} + 1, l]

, we have

\begin{matrix} A_{f} (l, w) = {2 k - 1 ∣ 1 + s_{l, w + 1} \leq k \leq a_{l, w} + s_{l, w + 1}} . \end{matrix}

Therefore, for all

l \in [n - 1]

and

w \in [w_{*} + 1, l]

, we have

| A_{f} (l, w) | = a_{l, w}

.

Consider the case where

w \in [0, w_{*} - 1]

. Since

s_{l, w + 1} \geq s_{l, w_{*}} \geq h

, we have

\begin{matrix} h & \geq 2 h - s_{l, w + 1} \\ = 2 h - \sum_{v = w + 1}^{\bar{w}} a_{l, v} \end{matrix}

\begin{matrix} = 2 h - \sum_{v = w + 1}^{\bar{w}} a_{n - l, \bar{w} - v} \end{matrix}

(9)

\begin{matrix} = 2 h - \sum_{v = 0}^{\bar{w} - w - 1} a_{n - l, v} \end{matrix}

\begin{matrix} = \sum_{v = 0}^{n - l} a_{n - l, v} - \sum_{v = 0}^{\bar{w} - w - 1} a_{n - l, v} \end{matrix}

(10)

\begin{matrix} = \sum_{v = \bar{w} - w}^{n - l} a_{n - l, v} \\ = s_{n - l, \bar{w} - w}, \end{matrix}

where (9) follows by (ii) in Propositions 8 and (10) follows by Proposition (i) in Proposition 8. From Lines 12 and 14, for each

w \in [0, w_{*} - 1]

we have

\begin{matrix} A_{f} (l, w) = {2 k ∣ 1 + s_{n - l, \bar{w} - w + 1} \leq k \leq a_{n - l, \bar{w} - w} + s_{n - l, \bar{w} - w + 1}} . \end{matrix}

Therefore, for all

l \in [n - 1]

and

w \in [0, w_{*} - 1]

, we have

| A_{f} (l, w) | = a_{n - l, \bar{w} - w} = a_{l, w}

Lastly, note that

a_{l, w_{*}} + s_{l, w_{*} + 1} \geq h

, and similar to the above calculations that lead to

h \geq s_{n - l, \bar{w} - w}

for

w \in [0, w_{*} - 1]

, one can also obtain

h < 2 h - s_{l, w_{*} + 1} = a_{n - l, \bar{w} - w_{*}} + s_{n - l, \bar{w} - w_{*} + 1}

. Then, from Lines 12, 13, and 14 we have

\begin{matrix} A_{f} (l, w_{*}) = {2 k - 1 ∣ 1 + s_{l, w_{*} + 1} \leq k \leq h} \cup {2 k ∣ 1 + s_{n - l, \bar{w} - w_{*} + 1} \leq k \leq h} . \end{matrix}

Therefore, for all

l \in [n - 1]

, we have

\begin{matrix} | A_{f} (l, w_{*}) | & = h - s_{l, w_{*} + 1} + h - s_{n - l, \bar{w} - w_{*} + 1} \\ = 2 h - \sum_{v = w_{*} + 1}^{\bar{w}} a_{l, v} - \sum_{v = \bar{w} - w_{*} + 1}^{\bar{w}} a_{n - l, v} \end{matrix}

\begin{matrix} = 2 h - \sum_{v = w_{*} + 1}^{\bar{w}} a_{l, v} - \sum_{v = \bar{w} - w_{*} + 1}^{\bar{w}} a_{l, \bar{w} - v} \end{matrix}

(11)

\begin{matrix} = 2 h - \sum_{v = w_{*} + 1}^{\bar{w}} a_{l, v} - \sum_{v = 0}^{w_{*} - 1} a_{l, v} \end{matrix}

\begin{matrix} = a_{l, w_{*}}, \end{matrix}

(12)

where (11) follows by (ii) in Propositions 8 and (12) follows by Proposition (i) in Proposition 8. Hence, for all

l \in [n - 1]

and

w \in [l]

, we have

| A_{f} (l, w) | = a_{l, w}

. □

As a consequence of Claims 2 and 3, we have the following theorem.

Theorem 4.

The output of Algorithm 1 is a multiset of strings compatible with M.

Algorithm 1 is an efficient algorithm with time complexity

O (n h)

, although it can only produce one multiset compatible with M, so it may not be helpful if one desires all compatible multisets. Nevertheless, let us mention one important application of Algorithm 1. In Theorem 1, the necessary and sufficient conditions for unique reconstruction given the prefix–suffix compositions M are described in terms of a CWF rather than M itself. Therefore, to determine the unique reconstructibility of M using Theorem 1, it is necessary that one should be able to come up with a CWF solution to M. Algorithm 1 does exactly what is needed for this purpose.

Moreover, when one has a CWF f solution to M at hand, in view of Lemmas 1 and 2, it is tempting to use the swap operation as defined in Definition 8 to enumerate all possible compatible multisets up to reversal. However, it is, in general, not easy to keep track of the swap operations. In the next subsection, we take a different route to constructing all compatible multisets by utilizing the inherent symmetry of the constant-length constant-weight strings, bypassing the difficulty brought about by the complexity of swap operations.

4.2. An Algorithm That Outputs All Multisets of Strings Compatible with M

As mentioned before, to find a multiset of strings compatible with M, one may plot the elements of multiset M on a two-dimensional grid and construct a CWF f such that it passes each point

(l, w)

exactly

a_{l, w}

times on the grid. Moreover, one may infer the behavior of the component functions

{f_{m}}

from the numbers

{b_{l, w}}, {c_{l, w}}

. Therefore, to obtain all possible multisets of strings (up to reversal) that are compatible with M, one may examine all possible behaviors of

{f_{m}}

based on

{a_{l, w}}, {b_{l, w}}, {c_{l, w}}

.

Since all h strings that give rise to M have the same length n and the same weight

\bar{w}

, the graph of the component function

f_{m}

is the same as that of

f_{m^{*}}

when

f_{m}

is rotated 180 degrees around

(n / 2, \bar{w} / 2)

. As a result of this rotational symmetry, given the values of

f_{m} (l)

for all

m \in [2 h]

and

l \in [0, ⌊ n / 2 ⌋]

, the remaining values of

f_{m} (l), l \in [⌊ n / 2 ⌋ + 1, n]

can be fully determined for all

m \in [2 h]

. Thus, it suffices to reconstruct all possible

{f_{m}}

from the midpoint

n / 2

to 0 and then extend them from

n / 2

to n. However, there is one catch. The reason why such an extension is possible is that

f_{m}, f_{m^{*}}

captures the running weight starting from the two ends of a single string. However, for functions

g_{i}, g_{j} : 〚 ⌊ n / 2 ⌋ 〛 \to 〚 \bar{w} 〛, i \neq j

reconstructed from M, it is, in general, not clear whether

g_{i}, g_{j}

capture the weight information of the same string. Nevertheless, by the rotational symmetry,

g_{i}

and

g_{j}

capture the weight information of the same string only if their median weights sum to

\bar{w}

when they are extended. Therefore, one may identify h pairs of functions from the

2 h

component functions

{g_{i}}

reconstructed from M such that the sum of median weights within each pair is

\bar{w}

. With the identification of such pairs, the resulting CWF formed by

{g_{i}}

corresponds to a multiset of strings compatible with M. Thus, to obtain all compatible multisets, one needs to enumerate all possible ways of forming pairs that satisfy the median weight constraint.

Based on the above discussion, our algorithm of constructing all multisets of strings compatible with M are divided into two stages. In the first stage, which we call the scan stage, all possible “half strings” are generated based on M. In the second stage, which we call the assembly stage, pairs of “half strings” are combined to form “full strings”. The details of the two stages are described below. For ease of discussion, below, the subscript of the component functions will be referred to as the label.

4.2.1. Scan Stage

In the scan stage, we keep track of the behaviors of the component functions from the midpoint

n / 2

to 0. Consider the case where n is even. For each

w \in 〚 \bar{w} 〛

,

a_{n / 2, w}

indicates the number of component functions that evaluate to w at

n / 2

. Moreover, we have

\sum_{w = 0}^{\bar{w}} a_{n / 2, w} = 2 h

. As there are

2 h

component functions, we may partition the

2 h

labels into disjoint subsets of sizes

a_{n / 2, w}, w \in 〚 \bar{w} 〛

. If n is odd,

a_{n / 2, w}

is undefined, but the behavior of the component functions at

n / 2

can be determined by

b_{⌈ n / 2 ⌉, w}, c_{⌈ n / 2 ⌉, w}

. Since

\sum_{w = 0}^{\bar{w}} (b_{⌈ n / 2 ⌉, w} + c_{⌈ n / 2 ⌉, w}) = 2 h

, we can partition the

2 h

labels into disjoint subsets of sizes

b_{⌈ n / 2 ⌉, w}, c_{⌈ n / 2 ⌉, w}, w \in 〚 \bar{w} 〛

. More precisely, as the first step of the scan stage, we construct a collection

{P (t / 2) ∣ t = 0, \dots, 2 \bar{w}}

of disjoint subsets of

[2 h]

such that

⋃_{t = 0}^{2 \bar{w}} P (t / 2) = [2 h]

and that if n is even,

\{\begin{matrix} | P (t / 2) | = a_{n / 2, t / 2}, & t is even, \\ | P (t / 2) | = 0, & t is odd; \end{matrix}

(13)

if n is odd,

\{\begin{matrix} | P (t / 2) | = b_{⌈ n / 2 ⌉, t / 2}, & t is even, \\ | P (t / 2) | = c_{⌈ n / 2 ⌉, ⌈ t / 2 ⌉}, & t is odd . \end{matrix}

(14)

Observe that the elements in the set

P (t / 2)

are the labels of the component functions whose median weight equals

t / 2

. Therefore, we basically reconstruct the values of

2 h

component functions at

n / 2

by constructing the collection

{P (t / 2)}

. Given the values of the component functions at

n / 2

, we reconstruct their values at

l \leq n / 2

according to

{b_{l, w}}, {c_{l, w}}

as l goes from

⌊ n / 2 ⌋

to 0. Specifically, we keep track of the labels of the component functions as we assign values to the component functions at

l = ⌊ n / 2 ⌋, \dots, 0

according to

{b_{l, w}}, {c_{l, w}}

, and obtain finer partitions of the

2 h

labels as l goes to 0. The bookkeeping of the partitions is done by a function F that maps each

(l, w) \in 〚 ⌊ n / 2 ⌋ 〛 \times 〚 \bar{w} 〛

to a collection of disjoint nonempty subsets of the

2 h

labels. The labels in these disjoint subsets correspond to component functions that evaluate to w at l. Moreover, the subsets in

F (l, w), w \in 〚 \bar{w} 〛

are all disjoint, and we have

⋃_{w = 0}^{\bar{w}} ⋃_{J \in F (l, w)} J = [2 h]

for

l \leq n / 2

. The construction of F is described below.

By construction of

{P (t / 2)}

, the component functions that evaluate to

\bar{w}

at

⌊ n / 2 ⌋

are those with labels in

P (\bar{w})

, and for each

w \in 〚 \bar{w} - 1 〛

, the component functions that evaluate to w at

⌊ n / 2 ⌋

are those with labels in

P (w)

and

P (w + 1 / 2)

. Therefore,

\begin{matrix} F (⌊ n / 2 ⌋, \bar{w}) & = {P (\bar{w})}, \\ F (⌊ n / 2 ⌋, w) & = {P (w), P (w + 1 / 2)}, w \in 〚 \bar{w} - 1 〛 . \end{matrix}

As the value of a component function at

l - 1

may remain the same as or decrease by one from the value at l, given

F (l, w), w \in 〚 \bar{w} 〛

, we can further partition each subset in

F (l, w), w \in 〚 \bar{w} 〛

into two subsets of sizes

b_{l, w}, c_{l, w}

for

l = ⌊ n / 2 ⌋, \dots, 0

. Eventually, we obtain the set

F (0, 0)

in which every element is a subset of the

2 h

labels for which the corresponding component functions have exactly the same values at

l = 0, \dots, ⌈ n / 2 ⌉

. Moreover, component functions with labels in different elements in

F (0, 0)

are not equal. At this point, the behaviors of the

2 h

component functions are determined over

〚 ⌈ n / 2 ⌉ 〛

. In particular, one can define

2 h

component functions

{g_{m}}

over

〚 ⌊ n / 2 ⌋ 〛

to be

g_{m} (l) = w if m \in ⋃_{J \in F (l, w)} J .

Note that there are different ways of partitioning subsets in

F (l, w), w \in 〚 \bar{w} 〛

, and each of them leads to a distinct F. However, we are only interested in those F’s that result in distinct “half strings”, i.e., distinct multiset

{g_{m}}

. In other words, we only care about the number of labels for which the corresponding component functions are the same over

〚 ⌊ n / 2 ⌋ 〛

. In fact, this is the reason why we only stipulate the size of the subsets in the initial partition

{P (t / 2)}

. In order to construct all possible F, each of which leads to a distinct multiset

{g_{m}}

, we need to enumerate different ways of partitioning subsets in

F (l, w), w \in 〚 \bar{w} 〛

. This is accomplished as follows. Let

q = | F (l, w) |

and write

F (l, w) = {J_{1}, \dots, J_{q}}

. Further, let

K_{i} \subset J_{i}

be the labels for which the corresponding component functions have a value equal to

w - 1

at

l - 1

. Denote

| K_{i} |

by

x_{i}

. Since

c_{l, w}

is the number of component functions that have values equal to

w - 1

at

l - 1

and have values equal to w at l, we have

\begin{matrix} \sum_{i = 1}^{q} x_{i} = c_{l, w} . \end{matrix}

(15)

Every solution to (15) such that

x_{i} \in 〚 | J_{i} | 〛

gives rise to a distinct partition of the subsets in

F (l, w)

. By enumerating all possible solutions to (15) for every

l \in [⌊ n / 2 ⌋]

and

w \in 〚 \bar{w} 〛

, we are able to find the set

F

of all possible F that leads to distinct

{g_{m}}

via a breadth-first search. The scan stage is formally stated in Algorithm 2.

As a consequence of the scan stage, we obtain a set of all possible “half strings” from M in the sense of the following claim.

Claim 5.

Let

{t_{j} ∣ j \in [h]}

be a multiset of strings compatible with M and define the multiset of length-

⌈ n / 2 ⌉

prefixes and suffixes of

t_{j}, j \in [h]

to be

S = {s_{2 j - 1} = t_{j} [⌈ n / 2 ⌉], s_{2 j} = \overset{\leftarrow}{t_{j}} [⌈ n / 2 ⌉] ∣ j \in [h]}

. Let

S^{'}

be the underlying set of S, i.e.,

S^{'}

is the set of distinct strings in S. Then, there exists

F \in F

output by Algorithm 2 such that there is a bijection between

F (0, 0)

and

S^{'}

that maps

J \in F (0, 0)

to

s \in S^{'}

with

| J | = | {m ∣ s_{m} = s, s_{m} \in S, m \in [2 h]} |

.

In other words, there exists

F \in F

such that every element J in

F (0, 0)

can be identified with a distinct string

s

in S whose multiplicity in S equals

| J |

.

Proof.

Let f be the CWF induced by

{t_{j} ∣ j \in [h]}

with

f_{2 j - 1}

being induced by the running weight of

t_{j}

and

f_{2 j}

by the running weight of

\overset{\leftarrow}{t_{j}}

. Then, f is a solution to M. Let us construct a function

\tilde{F}

that maps each

(l, w) \in 〚 ⌊ n / 2 ⌋ 〛 \times 〚 \bar{w} 〛

to a collection of disjoint nonempty subsets of

[2 h]

dependent on f. Given M, we can construct a collection

{P (t / 2) ∣ t \in 〚 2 \bar{w} 〛}

of disjoint subsets of

[2 h]

such that

⋃_{t = 0}^{2 \bar{w}} P (t / 2) = [2 h]

and that satisfies (13) or (14) based on the parity of n. Furthermore, there exists a permutation on

[2 h]

such that

P (t / 2)

is formed by

m \in [2 h]

for which

med (f_{m}) = t / 2

. Define

\begin{matrix} \tilde{F} (⌊ n / 2 ⌋, \bar{w}) & = {P (\bar{w})}, \\ \tilde{F} (⌊ n / 2 ⌋, w) & = {P (w), P (w + 1 / 2)}, w \in 〚 \bar{w} - 1 〛 . \end{matrix}

For

l \in [⌊ n / 2 ⌋]

, define

\begin{matrix} \tilde{F} (l - 1, \bar{w}) & = {A_{f} (l - 1, \bar{w}) \cap J ∣ J \in \tilde{F} (l, \bar{w})}, \\ \tilde{F} (l - 1, w) & = {A_{f} (l - 1, w) \cap J ∣ J \in \tilde{F} (l, w) \cup \tilde{F} (l, w + 1)}, w \in 〚 \bar{w} - 1 〛, \end{matrix}

where

A_{f} (l, w)

is as given in Definition 11. Moreover, we exclude the empty set in

\tilde{F} (l, w)

for each

(l, w)

. It follows that

⋃_{\tilde{J} \in \tilde{F} (l, w)} \tilde{J} = A_{f} (l, w)

for

l \in 〚 ⌊ n / 2 ⌋ 〛, w \in 〚 \bar{w} 〛

. In addition,

m, m^{'} \in [2 h]

are in the same set

\tilde{J} \in \tilde{F} (l, w)

if and only if the component functions

f_{m}

and

f_{m^{'}}

have the same graph over

[l, ⌈ n / 2 ⌉]

. Therefore,

| \tilde{F} (0, 0) |

equals the number of distinct graphs over

〚 ⌈ n / 2 ⌉ 〛

of

f_{m}, m \in [2 h]

, i.e.,

| \tilde{F} (0, 0) | = | S^{'} |

. Furthermore, there is a bijection between

\tilde{F} (0, 0)

and

S^{'}

that maps

\tilde{J} \in \tilde{F} (0, 0)

to

s \in S^{'}

with

\tilde{J} = {m ∣ s_{m} = s, s_{m} \in S, m \in [2 h]}

.

The set

F

output by Algorithm 2 is the set of bookkeeping functions F that keep track of all admissible behaviors of the component functions given M. Moreover, every element in

F (0, 0)

is a subset of the

2 h

labels for which the corresponding component functions have the same graph over

〚 ⌈ n / 2 ⌉ 〛

. The construction of

P (t / 2)

in Line 7 and

K_{i}

in Line 22 in Algorithm 2 is oblivious of which labels in

[2 h]

to choose but dependent on the admissible sizes of the sets. Since the size of

P (t / 2)

must satisfy (13), (14) and the set X constructed on Line 20 enumerates all admissible sizes for

K_{i}

, there exists

F \in F

such that

| F (0, 0) | = | \tilde{F} (0, 0) |

and a bijection between

F (0, 0)

and

\tilde{F} (0, 0)

that maps

J \in F (0, 0)

to

\tilde{J} \in \tilde{F} (0, 0)

with

| J | = | \tilde{J} |

. Therefore, there are bijections between

\tilde{F} (0, 0), S^{'}

and between

F (0, 0), \tilde{F} (0, 0)

and it follows that there is a bijection between

F (0, 0), S^{'}

. □

4.2.2. Assembly Stage

In the assembly stage, we construct CWFs for each

F \in F

by identifying pairs in

{g_{m}}

whose median weights sum to

\bar{w}

. As mentioned in the scan stage,

F (0, 0)

is a partition of

[2 h]

, and for each

J \in F (0, 0)

, the component functions with labels in J have the same graph over

〚 ⌈ n / 2 ⌉ 〛

. As we would like to form pairs of component functions based on their median weights, it is helpful to group the elements of

F (0, 0)

based on the median weight. More precisely, for each possible median weight

w = 0, 1 / 2, 1, \dots, \bar{w}

, we construct a collection

R_{w}

of sets for which the corresponding component functions have median weight w, given by

\begin{matrix} R_{w} = {J ∣ J \in F (0, 0), J \subset P (w)} . \end{matrix}

Let

r_{w} = | R_{w} |

. Since different elements in

F (0, 0)

correspond to component functions with different graphs,

r_{w}

is the number of distinct component functions that have median weight w. Moreover, each element

R_{w, i} \in R_{w}, i \in [r_{w}]

is a set of labels for which the corresponding component functions have median weight w and the same graph over

〚 ⌈ n / 2 ⌉ 〛

.

By the rotational symmetry, two component functions capture the weight information of the same string only if their median weights sum to

\bar{w}

. Therefore, a label in

R_{w, i} \in R_{w}, i \in [r_{w}]

must be paired with a label in

R_{\bar{w} - w, j} \in R_{\bar{w} - w}, j \in [r_{\bar{w} - w}]

in order to combine two “half strings” into a single “full string”. Formally, the pairing of labels can be described by a permutation

σ

on

[2 h]

such that if

u \in R_{w, i}

is paired with

v \in R_{\bar{w} - w, j}

, then

σ (u) = m, σ (v) = m^{*}

for some

m \in [2 h]

, i.e.,

σ {(u)}^{*} = σ (v)

.

Algorithm 2 Scan stage

To enumerate all possible methods of forming pairs that satisfy the median weight constraint, we need to consider different methods of pairing a component function of median weight

w \in {0, 1 / 2, 1, \dots, \bar{w}}

with a component function of median weight

\bar{w} - w

. Let us first consider the case where

w \in {0, 1 / 2, 1, \dots, (\bar{w} - 1) / 2}

. For

i \in [r_{w}], j \in [r_{\bar{w} - w}]

, let

y_{w, i, j}

be the number of labels chosen in

R_{w, i}

to be paired with labels in

R_{\bar{w} - w, j}

. Then,

{(y_{w, i, j})}_{i \in [r_{w}], j \in [r_{\bar{w} - w}]}

must satisfy

\begin{matrix} \sum_{i = 1}^{r_{w}} y_{w, i, j} = | R_{\bar{w} - w, j} |, j \in [r_{\bar{w} - w}], \end{matrix}

(16)

\begin{matrix} \sum_{j = 1}^{r_{\bar{w} - w}} y_{w, i, j} = | R_{w, i} |, i \in [r_{w}] . \end{matrix}

(17)

For each solution

{(y_{w, i, j})}_{i \in [r_{w}], j \in [r_{\bar{w} - w}]}

to (16) and (17), we partition

R_{\bar{w} - w, j}

into disjoint subsets

{V_{w, i, j} ∣ i \in [r_{w}]}

and

R_{w, i}

into disjoint subsets

{U_{w, i, j} ∣ j \in [r_{\bar{w} - w}]}

such that

| V_{w, i, j} | = | U_{w, i, j} | = y_{w, i, j}

. The labels in

V_{w, i, j}

are then paired with the labels in

U_{w, i, j}

.

Consider the case where

w = \bar{w} / 2

. In this case, the labels in

R_{\bar{w} / 2}

need to be paired with each other so we have a slightly different integer partition problem. For

i \in [r_{\bar{w} / 2}], j \in [r_{\bar{w} / 2}]

, let

y_{\bar{w} / 2, i, j}

be the number of labels chosen in

R_{\bar{w} / 2, i} \in R_{\bar{w} / 2}

to be paired with labels in

R_{\bar{w} / 2, j} \in R_{\bar{w} / 2}

. Then,

y_{\bar{w} / 2, i, i}

must be even for all i and

y_{\bar{w} / 2, i, j} = y_{\bar{w} / 2, j, i}

for all

i \neq j

. Moreover,

{(y_{\bar{w} / 2, i, j})}_{i \in [r_{\bar{w} / 2}], j \in [r_{\bar{w} / 2}]}

must satisfy

\begin{matrix} \sum_{j = 1}^{r_{\bar{w} / 2}} y_{\bar{w} / 2, i, j} = | R_{\bar{w} / 2, i} |, i \in [r_{\bar{w} / 2}] . \end{matrix}

(18)

For each solution

{(y_{\bar{w} / 2, i, j})}_{i \in [r_{\bar{w} / 2}], j \in [r_{\bar{w} / 2}]}

to (18), we partition

R_{\bar{w} / 2, j}

into disjoint subsets

{U_{\bar{w} / 2, i, j} ∣ j \in [r_{\bar{w} / 2}]}

such that

| U_{\bar{w} / 2, i, j} | = y_{\bar{w} / 2, i, j}

. The labels in

U_{\bar{w} / 2, j, i}

are then paired with the labels in

U_{\bar{w} / 2, i, j}

for

i \neq j

, and the labels in

U_{\bar{w} / 2, i, i}

are organized into

y_{\bar{w} / 2, i, i} / 2

pairs arbitrarily.

Let

Y_{w} = {{(y_{w, i, j})}_{i \in [r_{w}], j \in [r_{\bar{w} - w}]}}

be the set of all solutions to the integer partition problem associated with

w \in {0, 1 / 2, 1, \dots, \bar{w} / 2}

, and let

Y = Y_{0} \times Y_{1 / 2} \times Y_{1} \times \dots \times Y_{\bar{w} / 2}

. Then, each

(y_{w, i, j}) \in Y

corresponds to a distinct method of forming pairs of the component functions such that the median weight constraint is satisfied. Specifically, since

R_{t / 2, i}, t \in 〚 2 \bar{w} 〛, i \in [r_{t / 2}]

are disjoint and

⋃_{t = 0}^{2 \bar{w}} ⋃_{i = 1}^{r_{t / 2}} R_{t / 2, i} = ⋃_{J \in F (0, 0)} J = [2 h],

one can easily define a permutation

σ

on

[2 h]

such that if

u \in R_{w, i}

is paired with

v \in R_{\bar{w} - w, j}

then

σ (u) = m, σ (v) = m^{*}

for some

m \in [2 h]

. Furthermore, given

σ

, a CWF f can be determined by combining the paired component functions, i.e., those with labels

u, v \in [2 h]

satisfying

σ {(u)}^{*} = σ (v)

. The corresponding multiset

H_{f}

can then be found using Definition 4. The details are presented in Algorithm 3.

Theorem 6.

The output

H

of running Algorithm 2 followed by Algorithm 3 is the set of all multisets compatible with M up to reversal.

Proof.

Let

{t_{j} ∣ j \in [h]}

be a multiset of strings compatible with M and define the multiset of length-

⌈ n / 2 ⌉

prefixes and suffixes of

t_{j}, j \in [h]

to be

S = {s_{2 j - 1} = t_{j} [⌈ n / 2 ⌉], s_{2 j} = \overset{\leftarrow}{t_{j}} [⌈ n / 2 ⌉] ∣ j \in [h]}

. Let

S^{'}

be the underlying set of S. By Claim 5, there exists

F \in F

output by Algorithm 2 such that there is a bijection

π

between

F (0, 0)

and

S^{'}

that maps

J \in F (0, 0)

to

s \in S^{'}

with

| J | = | {m ∣ s_{m} = s, s_{m} \in S, m \in [2 h]} |

. Denote the set J mapped to

s

under

π

by

J_{s}

and denote

{m ∣ s_{m} = s, s_{m} \in S, m \in [2 h]}

by

I_{s}

. Since

[2 h] = ⋃_{s \in S^{'}} J_{s} = ⋃_{s \in S^{'}} I_{s}

, a permutation

\tilde{σ}

on

[2 h]

can be further constructed such that it is a bijection between

J_{s}

and

I_{s}

for every

s \in S^{'}

.

In Algorithm 3, given F, all possible permutations for pairing labels in

R_{w}

and

R_{\bar{w} - w}

for all

w \in {0, 1 / 2, 1, \dots, \bar{w} / 2}

are found. In particular, there exists a permutation

σ

such that for any

u, v \in [2 h]

satisfying

\tilde{σ} {(u)}^{*} = \tilde{σ} (v)

, it holds that

σ {(u)}^{*} = σ (v)

. The way

σ

is constructed is shown on Lines 23 to 24, 30, and 33 to 34 in Algorithm 3. Next, a function f is constructed according to

F, σ

on Lines 37 to 44 in Algorithm 3. It is easy to verify f is a CWF. The way that

σ

is constructed ensures that the multiset

H_{f}

constructed on Lines 46 in Algorithm 3 satisfies

H_{f} \sim {t_{j} ∣ j \in [h]}

, i.e.,

{t_{j} ∣ j \in [h]} \in [H_{f}]

. It follows that any multiset compatible with M is in the same equivalent class of some element in the output

H

.

It remains to check that the elements in

H

are all distinct. In fact, let us show that the CWFs constructed in Algorithm 3 as multisets

{f_{m}}

are distinct. Let

F_{1}, F_{2} \in F

with

F_{1} \neq F_{2}

. Then,

F_{1}, F_{2}

correspond to distinct sets of “half strings”, and any pairing permutations

σ_{1}, σ_{2}

admissible for

F_{1}, F_{2}

, respectively, lead to distinct multisets of component functions. Furthermore, if

σ_{1}, σ_{2}

are two different pairing permutations admissible for

F_{1}

, then the two multisets of component functions resulted from

σ_{1}, σ_{2}

are also different since each element

R_{w, i} \in R_{w}

corresponds to a distinct “half string”. Therefore, all CWFs constructed in Algorithm 3 are distinct as multisets. Moreover, since a multiset and its reversals induce the same multiset of component functions, if a multiset is in

H

, any of its reversals are not in

H

. Hence,

H

is a set of all multisets compatible with M up to reversal. □

We end this section with an example of running Algorithms 2 and 3, and a checklist in Table 4 for some important notations used for discussing the algorithms.

Algorithm 3 Assembly stage

Example 10.

Consider the multiset

U = {110101, 110101, 101110}

given in Example 1. Let us go through Algorithms 2 and 3 to find all multisets compatible with

M (U)

(up to reversal). Note that

n = 6 = a_{3, 2}

; thus, right before the steps for finding finer partitions in Algorithm 2, we have

F (3, 2) = {P (2)}

, where

P (2) = [6]

, and

F (l, w)

is empty for the other values of

(l, w)

. Next, let us look into the nested for-loops to find finer partitions.

For $l = 3, w = 2$ : Note that $q = | F (3, 2) | = 1$ , and $x_{1} = c_{3, 2} = 4$ . Choose $K_{1} \subset P (2)$ to be $[4]$ , a size-4 subset of $P (2)$ . Construct $G (2, 2) = {[6] ∖ K_{1}}$ , $G (2, 1) = {K_{1}}$ , and let $F = {G}$ .
For $l = 2, w = 2$ : Take the only element $F \in F$ . Note that $q = | F (2, 2) | = 1$ and $x_{1} = c_{2, 2} = 2$ . Then, $K_{1} = [5, 6] \in F (2, 2)$ . Construct $G (1, 1) = {K_{1}}$ , and let $F = {G}$ .
For $l = 2, w = 1$ : Take the only element $F \in F$ . Note that $q = | F (2, 1) | = 1$ and $x_{1} = c_{2, 1} = 1$ . Choose $K_{1} \subset [4] \in F (2, 1)$ to be ${1}$ , a size-1 subset of $[4]$ . Construct $G (1, 1) = F (1, 1) \cup ([4] ∖ K_{1}) = {[5, 6], [2, 4]}$ , $G (1, 0) = {K_{1}}$ , and let $F = {G}$ .
For $l = 1, w = 1$ : Take the only element $F \in F$ . In this step, $q = | F (1, 1) | = 2$ and $c_{1, 1} = 5$ . Take $J_{1} = [2, 4], J_{2} = [5, 6]$ . The equation in Line 20 of Algorithm 2 becomes $x_{1} + x_{2} = 5$ , where $x_{1} \in {0, 1, 2, 3}$ and $x_{2} \in {0, 1, 2}$ . The only solution to this equation is $x_{1} = 3$ and $x_{2} = 2$ . Construct $G (0, 0) = F (1, 1)$ and let $F = {G}$ .
For $l = 1, w = 0$ : Take the only element $F \in F$ . Note that $q = | F (1, 0) | = 1$ and $x_{1} = c_{1, 0} = 0$ . Construct $G (0, 0) = F (0, 0) \cup F (1, 0) = {{1}, {2, 3, 4}, {5, 6}}$ and output $F = {G}$ .

At this point, Algorithm 2 terminates, and we obtain

F

that contains only one element. Let us call this element F. Observe that F contains three subsets of

[6]

:

{1}, {2, 3, 4}, {5, 6}

. Tracing back how these three sets are generated, we see that

{1}, {5, 6}, {2, 3, 4}

correspond to three “half strings”: 011, 110, 101, respectively.

As the first step of Algorithm 3, we need to construct

R_{w}

for

w \in {0, 1 / 2, 1, \dots, \bar{w} = 4}

. Since

P (w) = \emptyset

for

w \neq 2

and

P (2) = [6]

, the only nonempty set among the

R_{w}

’s is

R_{2} = F (0, 0)

and

r_{2} = | R_{2} | = 3

. To proceed, noticing that

\bar{w} / 2 = 2

, we need to find the set

Y_{\bar{w} / 2} = {(y_{i, j}) ∣ i, j \in {1, 2, 3}, y_{i, i} \in 2 N, y_{i, j} = y_{j, i} \in N, and (18) is satisfied}

. (Here, we omit the first subscript

\bar{w} / 2

of y.) Since

r_{\bar{w} / 2} = 3

, from (18), we have the following three equations:

\{\begin{matrix} y_{1, 1} + y_{1, 2} + y_{1, 3} = 1, \\ y_{2, 1} + y_{2, 2} + y_{2, 3} = 2, \\ y_{3, 1} + y_{3, 2} + y_{3, 3} = 3 . \end{matrix}

Here, we may take

R_{\bar{w} / 2, 1} = {1}

,

R_{\bar{w} / 2, 2} = {5, 6}

,

R_{\bar{w} / 2, 3} = {2, 3, 4}

. The remaining steps in Algorithm 3 essentially pair up “half strings”. Each solution in

Y_{0} \times Y_{1 / 2} \times Y_{1} \times \dots \times Y_{\bar{w} / 2}

indicates a way to assemble them. Since the only nonempty set among the

R_{w}

’s is

R_{2}

, the

Y_{w}

’s for which

w \neq \bar{w} / 2

are trivial. By calculation,

Y_{\bar{w} / 2}

contains three feasible solutions:

[\begin{matrix} y_{1, 1} & y_{1, 2} & y_{1, 3} \\ y_{2, 1} & y_{2, 2} & y_{3, 2} \\ y_{3, 1} & y_{3, 2} & y_{3, 3} \end{matrix}] = [\begin{matrix} 0 & 0 & 1 \\ 0 & 2 & 0 \\ 1 & 0 & 2 \end{matrix}] or [\begin{matrix} 0 & 1 & 0 \\ 1 & 0 & 1 \\ 0 & 1 & 2 \end{matrix}] or [\begin{matrix} 0 & 0 & 1 \\ 0 & 0 & 2 \\ 1 & 2 & 0 \end{matrix}] .

The first solution suggests that we combine the first half string 011 with the reversal of the third one 101, resulting in a full string 011101. It also suggests that we combine two half strings 110 into 110011 (the second half comes from the reversal of 110) and that we combine two half strings 101 into 101101. Thus, the first solution can generate a multiset of strings

H_{1} = {011101, 110011, 101101}

that is compatible with

M (U)

. Similarly, the second solution gives

H_{2} = {011011, 110101, 101101}

. Lastly, according to the third solution, we have another multiset

H_{3} = {011101, 110101, 110101}

.

In summary, the output of Algorithm 3 is

H = {H_{1}, H_{2}, H_{3}}

. According to Theorem 6, this gives all multisets compatible with

M (U)

up to reversal.

5. Concluding Remarks

We propose to use cumulative weight functions to describe the prefix–suffix compositions of a multiset of binary strings and facilitate this description to derive necessary and sufficient conditions for the unique reconstruction of multisets of strings of the same weight up to reversal. Moreover, two reconstruction algorithms are presented. One is an efficient algorithm that outputs one multiset of strings compatible with the given prefix–suffix compositions and can be used to assist in determining the unique reconstructibility of the given compositions. The other one is able to output all admissible multisets up to reversal that are compatible with the given compositions.

Many problems in the reconstruction of multiple strings remain open. For example, can one lift the constant-weight assumption and characterize the conditions for the unique reconstruction of multiple strings from prefix–suffix compositions? In addition, if the prefix–suffix compositions are erroneous, can one design low-redundancy encoding schemes for the strings such that they can be recovered efficiently?

Author Contributions

Conceptualization, Z.C.; Formal analysis, Y.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported in part by the Basic Research Project of Hetao Shenzhen-Hong Kong Science and Technology Cooperation Zone under Project HZQB-KCZYZ-2021067, the Guangdong Provincial Key Laboratory of Future Network of Intelligence under Project 2022B1212010001, the National Natural Science Foundation of China under Grant 62201487, and the Shenzhen Science and Technology Stable Support Program.

Data Availability Statement

Data sharing is not applicable to this article because the work is entirely theoretical, involving only mathematical statements and proofs.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof of Lemma 5

Recall that

m \in A (\bar{w} / 2)

if and only if

m^{*} \in A (\bar{w} / 2)

so the size of

A (\bar{w} / 2)

must be even. The case where

| A (\bar{w} / 2) | = 0

is vacuously true and the case where

| A (\bar{w} / 2) | = 2

follows from Proposition 5. Below we assume

| A (\bar{w} / 2) | \geq 4

.

Let

\tilde{m} \in A (\bar{w} / 2)

. If

f_{\tilde{m}} \neq f_{{\tilde{m}}^{*}}

, then by Proposition 5, there are exactly two maximal intervals between

f_{\tilde{m}}

and

f_{{\tilde{m}}^{*}}

. Let

m \in A (\bar{w} / 2) ∖ {\tilde{m}, {\tilde{m}}^{*}}

. By the conditions in Theorem 1, there is at most one maximal interval between

f_{m}

and

f_{\tilde{m}}

. We claim that there is exactly one maximal interval between them. Indeed, if

f_{m} = f_{\tilde{m}}

, then there are two maximal intervals between

f_{m}

and

f_{{\tilde{m}}^{*}}

, contradicting the conditions in Theorem 1. Similarly, one can show that there is exactly one maximal interval between

f_{m}

and

f_{{\tilde{m}}^{*}}

.

Suppose

G (f_{m}, [⌊ n / 2 ⌋]) \neq G (f_{\tilde{m}}, [⌊ n / 2 ⌋])

and

G (f_{m}, [⌊ n / 2 ⌋]) \neq G (f_{{\tilde{m}}^{*}}, [⌊ n / 2 ⌋])

. Since there is exactly one maximal interval between

f_{m}, f_{\tilde{m}}

and exactly one maximal between

f_{m}, f_{{\tilde{m}}^{*}}

, it is necessary that

G (f_{m}, [⌊ n / 2 ⌋ + 1, n]) = G (f_{\tilde{m}}, [⌊ n / 2 ⌋ + 1, n]) = G (f_{{\tilde{m}}^{*}}, [⌊ n / 2 ⌋ + 1, n])

. However, by Proposition 5, there is a maximal interval between

f_{\tilde{m}}, f_{{\tilde{m}}^{*}}

that is contained in

[⌊ n / 2 ⌋ + 1, n]

, leading to a contradiction. Therefore,

G (f_{m}, [⌊ n / 2 ⌋]) = G (f_{\tilde{m}}, [⌊ n / 2 ⌋])

or

G (f_{m}, [⌊ n / 2 ⌋]) = G (f_{{\tilde{m}}^{*}}, [⌊ n / 2 ⌋])

for all

m \in A (\bar{w} / 2) ∖ {\tilde{m}, {\tilde{m}}^{*}}

.

Let

a, b \in A (\bar{w} / 2) ∖ {\tilde{m}, {\tilde{m}}^{*}}

. Suppose

G (f_{a}, [⌊ n / 2 ⌋]) \neq G (f_{b}, [⌊ n / 2 ⌋])

. Without loss of generality, we may assume further that

G (f_{a}, [⌊ n / 2 ⌋]) = G (f_{\tilde{m}}, [⌊ n / 2 ⌋])

and

G (f_{b}, [⌊ n / 2 ⌋]) = G (f_{{\tilde{m}}^{*}}, [⌊ n / 2 ⌋])

. Therefore, there is a maximal interval contained

[⌊ n / 2 ⌋]

between

f_{a}, f_{{\tilde{m}}^{*}}

. Then, by the conditions in Theorem 1, we have

G (f_{a}, [⌊ n / 2 ⌋ + 1, n]) = G (f_{{\tilde{m}}^{*}}, [⌊ n / 2 ⌋ + 1, n])

. Similarly, we also have

G (f_{b}, [⌊ n / 2 ⌋ + 1, n]) = G (f_{\tilde{m}}, [⌊ n / 2 ⌋ + 1, n])

. It follows that

a^{*} \neq b

, and there are two maximal intervals between

f_{a}, f_{b}

, contradicting the conditions in Theorem 1. Therefore,

G (f_{a}, [⌊ n / 2 ⌋]) = G (f_{b}, [⌊ n / 2 ⌋])

for all

a, b \in A (\bar{w} / 2) ∖ {\tilde{m}, {\tilde{m}}^{*}}

.

Lastly, consider the case where

f_{\tilde{m}} = f_{{\tilde{m}}^{*}}

. Let

m \in A (\bar{w} / 2) ∖ {\tilde{m}, {\tilde{m}}^{*}}

be such that

f_{m} \neq f_{\tilde{m}}

. Since

f_{m} \neq f_{\tilde{m}}

, by definition of

A (\bar{w} / 2)

and the conditions in Theorem 1, there exists exactly one maximal interval between

f_{m}, f_{\tilde{m}}

that is either contained in

[⌊ n / 2 ⌋]

or

[⌊ n / 2 ⌋ + 1, n]

. Without loss of generality, assume

G (f_{m}, [⌊ n / 2 ⌋]) \neq G (f_{\tilde{m}}, [⌊ n / 2 ⌋])

and

G (f_{m}, [⌊ n / 2 ⌋ + 1, n]) = G (f_{\tilde{m}}, [⌊ n / 2 ⌋ + 1, n])

. By the 180-degree rotational symmetry of

f_{m}, f_{\tilde{m}}

and

f_{m^{*}}, f_{{\tilde{m}}^{*}}

, we have

G (f_{m^{*}}, [⌊ n / 2 ⌋]) = G (f_{{\tilde{m}}^{*}}, [⌊ n / 2 ⌋])

and

G (f_{m^{*}}, [⌊ n / 2 ⌋ + 1, n]) \neq G (f_{{\tilde{m}}^{*}}, [⌊ n / 2 ⌋ + 1, n])

. Since

f_{\tilde{m}} = f_{{\tilde{m}}^{*}}

, we have

G (f_{m}, [⌊ n / 2 ⌋]) \neq G (f_{m^{*}}, [⌊ n / 2 ⌋])

and

G (f_{m}, [⌊ n / 2 ⌋ + 1, n]) \neq G (f_{m^{*}}, [⌊ n / 2 ⌋ + 1, n])

. Therefore, there exist two maximal intervals between

f_{m}, f_{m^{*}}

. The remainder of the proof for this case follows similarly to the case where

f_{\tilde{m}} \neq f_{{\tilde{m}}^{*}}

.

Appendix B. Proof of Lemma 8

Consider the case where

f_{m}, m \in A (\bar{w} / 2)

are all the same. By Lemma 4, there are no branching points in

G (f_{m}, [⌊ n / 2 ⌋])

for all

m \in A_{f} (\bar{w} / 2)

.

Let

m \in A_{f} (\bar{w} / 2)

. Note that for any

m^{'} \in A_{f^{'}} (\bar{w} / 2)

, we have

f_{m^{'}}^{'} (⌊ n / 2 ⌋) = f_{m} (⌊ n / 2 ⌋)

. Therefore, by Proposition 6, we have

G (f_{m}, [0, ⌊ n / 2 ⌋]) = G (f_{m^{'}}^{'}, [0, ⌊ n / 2 ⌋])

for all

m \in A_{f} (\bar{w} / 2)

and all

m^{'} \in A_{f^{'}} (\bar{w} / 2)

. Recall that

m \in A_{f} (\bar{w} / 2)

if and only if

m^{*} \in A_{f} (\bar{w} / 2)

, and

G (f_{m^{*}}, [⌈ n / 2 ⌉, n])

is the same as

G (f_{m}, [0, ⌊ n / 2 ⌋])

when rotated 180 degrees about

(n / 2, \bar{w} / 2)

. Moreover, the same holds for

m^{'} \in A_{f^{'}} (\bar{w} / 2)

. It follows that

G (f_{m}, [⌈ n / 2 ⌉, n]) = G (f_{m^{'}}^{'}, [⌈ n / 2 ⌉, n])

for all

m \in A_{f} (\bar{w} / 2)

and all

m^{'} \in A_{f^{'}} (\bar{w} / 2)

. Thus,

f_{m} = f_{m^{'}}^{'}

for all

m \in A_{f} (\bar{w} / 2)

and all

m^{'} \in A_{f^{'}} (\bar{w} / 2)

. Since

f, f^{'}

are solutions to M, we have

| A_{f} (\bar{w} / 2) | = | A_{f^{'}} (\bar{w} / 2) |

. Therefore, if

f_{m}, m \in A (\bar{w} / 2)

are all the same, then

ψ_{0} (f) = ψ_{0} (f^{'})

.

Consider the case where

f_{m}, m \in A (\bar{w} / 2)

are not all the same. By Lemma 5, there exists

m_{1} \in A (\bar{w} / 2)

such that there is a maximal interval

[l_{1} + 1, l_{2} - 1] \subset [n]

between

f_{m_{1}}, f_{m_{1}^{*}}

, where

l_{2} \leq ⌊ n / 2 ⌋

. Moreover by Lemma 6,

(l_{2}, f_{m_{1}} (l_{2}))

is the only branching point in

G (f_{m}, [⌊ n / 2 ⌋])

and there is no merging point in

G (f_{m}, [l_{2}, ⌊ n / 2 ⌋])

for all

m \in A (\bar{w} / 2)

.

Let

m \in A_{f} (\bar{w} / 2)

. Note that

(l_{2}, f_{m_{1}} (l_{2}))

is the branching point in

G (f_{m})

such that

l_{2} \leq l

for any branching point

(l, w) \in G (f_{m})

. Similar to the proof of Lemma 7, let

(l_{*}, w_{*}) = (l_{2}, f_{m_{1}} (l_{2}))

and

\begin{matrix} r = \{\begin{matrix} 0 & if w_{*} = f_{m} (l_{*} - 1), \\ 1 & if w_{*} = f_{m} (l_{*} - 1) + 1 . \end{matrix} \end{matrix}

By definition of r, we have

m \in A_{f} (\bar{w} / 2) \cap A_{f} (l_{*}, w_{*}) \cap A_{f} (l_{*} - 1, w_{*} - r)

.

Let

m^{'} \in A_{f^{'}} (\bar{w} / 2) \cap A_{f^{'}} (l_{*}, w_{*}) \cap A_{f^{'}} (l_{*} - 1, w_{*} - r)

. In the following we will show

G (f_{m^{'}}^{'}, [⌊ n / 2 ⌋]) = G (f_{m}, [⌊ n / 2 ⌋])

. For notational convenience, let us define

{\hat{f}}_{m} = G (f_{m}, [⌊ n / 2 ⌋])

and

{\hat{f}}_{m^{'}}^{'} = G (f_{m^{'}}^{'}, [⌊ n / 2 ⌋])

. Further, define

{\hat{ψ}}_{0} (f) = {{\hat{f}}_{m} ∣ m \in A_{f} (\bar{w} / 2)}

and

{\hat{ψ}}_{0} (f^{'}) = {{\hat{f}}_{m^{'}}^{'} ∣ m^{'} \in A_{f^{'}} (\bar{w} / 2)}

.

By definition of

l_{*}

, there is no branching point in

G (f_{m}, [l_{*} - 1])

. Since

f_{m} (l_{*} - 1) = f_{m^{'}}^{'} (l_{*} - 1)

, by Proposition 6, we have

f_{m} (l) = f_{m^{'}}^{'} (l)

for any

l \in [0, l_{*} - 1]

. Suppose

f_{m} (l) \neq f_{m^{'}}^{'} (l)

for some

l \in [l_{*}, ⌊ n / 2 ⌋]

. Then, by Proposition 7, there is a merging point in

G (f_{m}, [l_{*}, l - 1])

. However, there is no merging point in

G (f_{m}, [l_{*} = l_{2}, ⌊ n / 2 ⌋])

, which is a contradiction. Thus,

f_{m} (l) = f_{m^{'}}^{'} (l)

for all

l \in [l_{*}, ⌊ n / 2 ⌋]

. It follows that

{\hat{f}}_{m^{'}}^{'} = {\hat{f}}_{m}

for any

m^{'} \in A_{f^{'}} (\bar{w} / 2) \cap A_{f^{'}} (l_{*}, w_{*}) \cap A_{f^{'}} (l_{*} - 1, w_{*} - r)

.

In the following, we will show

A_{f^{'}} (\bar{w} / 2) \supset A_{f^{'}} (l_{*}, w_{*}) \cap A_{f^{'}} (l_{*} - 1, w_{*} - r)

. Toward a contradiction, suppose that there exists

m_{0} \in ([2 h] ∖ A_{f^{'}} (\bar{w} / 2)) \cap A_{f^{'}} (l_{*}, w_{*}) \cap A_{f^{'}} (l_{*} - 1, w_{*} - r)

. Then,

f_{m_{0}}^{'} (l_{*}) = f_{m} (l_{*})

and

f_{m_{0}}^{'} (l_{*} - 1) = f_{m} (l_{*} - 1)

. Note that

med (f_{m_{0}}^{'}) \neq med (f_{m})

. Since

l_{*} \leq ⌊ n / 2 ⌋

, by Proposition 7, there must be a merging point in

G (f_{m}, [l_{*}, ⌊ n / 2 ⌋])

, which contradicts Lemma 6. Therefore,

A_{f^{'}} (\bar{w} / 2) \supset A_{f^{'}} (l_{*}, w_{*}) \cap A_{f^{'}} (l_{*} - 1, w_{*} - r)

.

Note that for any

m^{'} \in A_{f^{'}} (\bar{w} / 2) ∖ (A_{f^{'}} (l_{*}, w_{*}) \cap A_{f^{'}} (l_{*} - 1, w_{*} - r))

, we have

{\hat{f}}_{m^{'}}^{'} \neq {\hat{f}}_{m}

. Therefore, the multiplicity of

{\hat{f}}_{m}

in

{\hat{ψ}}_{0} (f^{'})

is

| A_{f^{'}} (l_{*}, w_{*}) \cap A_{f^{'}} (l_{*} - 1, w_{*} - r) |

. Taking

f^{'} = f

, one can repeat the above arguments to show that

{\hat{f}}_{m} = {\hat{f}}_{\tilde{m}}

for any

\tilde{m} \in A_{f} (\bar{w / 2}) \cap A_{f} (l_{*}, w_{*}) \cap A_{f} (l_{*} - 1, w_{*} - r)

and the multiplicity of

{\hat{f}}_{m}

in

{\hat{ψ}}_{0} (f)

is

| A_{f} (l_{*}, w_{*}) \cap A_{f} (l_{*} - 1, w_{*} - r) |

.

Since

f, f^{'}

are solutions to M,

| A_{f} (l_{*}, w_{*}) \cap A_{f} (l_{*} - 1, w_{*} - r) | = | A_{f^{'}} (l_{*}, w_{*}) \cap A_{f^{'}} (l_{*} - 1, w_{*} - r) |

, i.e., the multiplicity of

{\hat{f}}_{m}

in

{\hat{ψ}}_{0} (f)

equals the multiplicity of

{\hat{f}}_{m}

in

{\hat{ψ}}_{0} (f^{'})

. Furthermore, this holds for distinct

{\hat{f}}_{m} \in {\hat{ψ}}_{0} (f)

. Since

| A_{f} (\bar{w} / 2) | = | A_{f^{'}} (\bar{w} / 2) |

, i.e.,

| {\hat{ψ}}_{0} (f) | = | {\hat{ψ}}_{0} (f^{'}) |

, we obtain

{\hat{ψ}}_{0} (f) = {\hat{ψ}}_{0} (f^{'})

.

Lastly, by Lemma 6, it holds that

{\hat{f}}_{a} = {\hat{f}}_{m_{1}}

for all

a \in A_{f} (\bar{w} / 2) ∖ {m_{1}, m_{1}^{*}}

or

{\hat{f}}_{a} = {\hat{f}}_{m_{1}^{*}}

for all

a \in A_{f} (\bar{w} / 2) ∖ {m_{1}, m_{1}^{*}}

. Thus, the multiplicity of

{\hat{f}}_{m}

in

{\hat{ψ}}_{0} (f)

is either 1 or

| A_{f} (\bar{w} / 2) | - 1

. Without loss of generality, assume the multiplicity of

{\hat{f}}_{m}

in

{\hat{ψ}}_{0} (f)

is 1. Then, the multiplicity of

{\hat{f}}_{m^{*}}

in

{\hat{ψ}}_{0} (f)

is

| A_{f} (\bar{w} / 2) | - 1

. Moreover, the multiplicity of

{\hat{f}}_{m}^{'}

in

{\hat{ψ}}_{0} (f^{'})

is 1 and the multiplicity of

{\hat{f}}_{{(m^{'})}^{*}}^{'}

in

{\hat{ψ}}_{0} (f^{'})

is

| A_{f^{'}} (\bar{w} / 2) | - 1 = | A_{f} (\bar{w} / 2) | - 1

. Since

f_{m}

(resp.,

f_{m^{'}}^{'}

) is the same as

f_{m^{*}}

(resp.,

f_{{(m^{'})}^{*}}^{'}

) when rotated 180 degrees about

(n / 2, \bar{w} / 2)

, we have

f_{m} = f_{m^{'}}^{'}, f_{m^{*}} = f_{{(m^{'})}^{*}}^{'}

and the multiplicity of

f_{m}

(resp.,

f_{m^{'}}^{'}

) equals one in

ψ_{0} (f)

(resp.,

ψ_{0} (f^{'})

). Now for all

b \in A_{f} (\bar{w} / 2) ∖ {m, m^{*}}

and all

b^{'} \in A_{f^{'}} (\bar{w} / 2) ∖ {m^{'}, {(m^{'})}^{*}}

, we have

f_{b} = f_{b^{'}}

since

{\hat{f}}_{b} = {\hat{f}}_{b^{'}}^{'}

. Hence, we conclude

ψ_{0} (f) = ψ_{0} (f^{'})

.

References

Ouahabi, A.A.; Amalian, J.-A.; Charles, L.; Lutz, J.-F. Mass spectrometry sequencing of long digital polymers facilitated by programmed inter-byte fragmentation. Nat. Commun. 2017, 8, 967. [Google Scholar] [CrossRef] [PubMed]
Launay, K.; Amalian, J.-A.; Laurent, E.; Oswald, L.; Ouahabi, A.A.; Burel, A.; Dufour, F.; Carapito, C.; Clément, J.-L.; Lutz, J.-F.; et al. Precise alkoxyamine design to enable automated tandem mass spectrometry sequencing of digital poly(phosphodiester)s. Angew. Chem. 2021, 133, 930–939. [Google Scholar] [CrossRef]
Acharya, J.; Das, H.; Milenkovic, O.; Orlitsky, A.; Pan, S. String reconstruction from substring compositions. SIAM J. Discret. Math. 2015, 29, 1340–1371. [Google Scholar] [CrossRef]
Pattabiraman, S.; Gabrys, R.; Milenkovic, O. Coding for polymer-based data storage. IEEE Trans. Inf. Theory 2023, 69, 4812–4836. [Google Scholar] [CrossRef]
Banerjee, A.; Wachter-Zeh, A.; Yaakobi, E. Insertion and deletion correction in polymer-based data storage. IEEE Trans. Inf. Theory 2023, 69, 4384–4406. [Google Scholar] [CrossRef]
Gabrys, R.; Pattabiraman, S.; Milenkovic, O. Reconstruction of sets of strings from prefix/suffix compositions. IEEE Trans. Commun. 2023, 71, 3–12. [Google Scholar] [CrossRef]
Ye, Z.; Elishco, O. Reconstruction of a single string from a part of its composition multiset. IEEE Trans. Inf. Theory 2023, 70, 3922–3940. [Google Scholar] [CrossRef]
Gupta, U.; Mahdavifar, H. A new algebraic approach for string reconstruction from substring compositions. In Proceedings of the 2022 IEEE International Symposium on Information Theory (ISIT), Espoo, Finland, 26 June–1 July 2022; pp. 354–359. [Google Scholar]
Margaritis, D.; Skiena, S.S. Reconstructing strings from substrings in rounds. In Proceedings of the IEEE 36th Annual Foundations of Computer Science, Milwaukee, WI, USA, 23–25 October 1995; pp. 613–620. [Google Scholar]
Levenshtein, V.I. Efficient reconstruction of sequences from their subsequences or supersequences. J. Comb. Theory Ser. A 2001, 93, 310–332. [Google Scholar] [CrossRef]
Batu, T.; Kannan, S.; Khanna, S.; McGregor, A. Reconstructing strings from random traces. In Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, ser. SODA’04, New Orleans, LA, USA, 11–14 January 2004; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 2004; pp. 910–918. [Google Scholar]
Marcovich, S.; Yaakobi, E. Reconstruction of strings from their substrings spectrum. IEEE Trans. Inf. Theory 2021, 67, 4369–4384. [Google Scholar] [CrossRef]
Yehezkeally, Y.; Bar-Lev, D.; Marcovich, S.; Yaakobi, E. Generalized unique reconstruction from substrings. IEEE Trans. Inf. Theory 2023, 69, 5648–5659. [Google Scholar] [CrossRef]
Cheraghchi, M.; Gabrys, R.; Milenkovic, O.; Ribeiro, J. Coded trace reconstruction. IEEE Trans. Inf. Theory 2020, 66, 6084–6103. [Google Scholar] [CrossRef]
Krishnamurthy, A.; Mazumdar, A.; McGregor, A.; Pal, S. Trace reconstruction: Generalized and parameterized. IEEE Trans. Inf. Theory 2021, 67, 3233–3250. [Google Scholar] [CrossRef]
Ravi, A.N.; Vahid, A.; Shomorony, I. Coded shotgun sequencing. IEEE J. Sel. Areas Inf. Theory 2022, 3, 147–159. [Google Scholar] [CrossRef]
Levick, K.; Shomorony, I. Fundamental limits of multiple sequence reconstruction from substrings. In Proceedings of the 2023 IEEE International Symposium on Information Theory (ISIT), Taipei, Taiwan, 25–30 June 2023; pp. 791–796. [Google Scholar]
Sima, J.; Li, Y.; Shomorony, I.; Milenkovic, O. On constant-weight binary B₂-sequences. In Proceedings of the 2023 IEEE International Symposium on Information Theory (ISIT), Taipei, Taiwan, 25–30 June 2023; pp. 886–891. [Google Scholar]
Yang, Y.; Chen, Z. Reconstruction of multiple strings of constant weight from prefix-suffix compositions. In Proceedings of the 2024 IEEE International Symposium on Information Theory (ISIT), Athens, Greece, 7–12 July 2024; pp. 897–902. [Google Scholar]

Figure 1. The graphs of the component functions

{f_{m}}

of the CWF induced by

U = {110101, 110101, 101110}

.

Figure 1. The graphs of the component functions

{f_{m}}

of the CWF induced by

U = {110101, 110101, 101110}

.

Figure 2. (a) The numbers

{a_{l, w}}

for

M (U)

. (b) The numbers

{a_{l, w}}

for

M (V)

. Circles are branching points and diamonds are merging points; disks are neither branching nor merging points.

Figure 2. (a) The numbers

{a_{l, w}}

for

M (U)

. (b) The numbers

{a_{l, w}}

for

M (V)

. Circles are branching points and diamonds are merging points; disks are neither branching nor merging points.

Figure 3. The component functions of g with with median weight

\bar{w} / 2

, where g is induced by

V = {1000111, 1110001, 1100011, 1010011}

. Circles are branching points and diamonds are merging points; disks are neither branching nor merging points.

Figure 3. The component functions of g with with median weight

\bar{w} / 2

, where g is induced by

V = {1000111, 1110001, 1100011, 1010011}

. Circles are branching points and diamonds are merging points; disks are neither branching nor merging points.

Table 1. The values of

f (l, m)

induced by

U = {110101, 110101, 101110}

.

Table 1. The values of

f (l, m)

induced by

U = {110101, 110101, 101110}

.

$f (l, m)$	$l = 1$	$l = 2$	$l = 3$	$l = 4$	$l = 5$	$l = 6$
$m = 1$	1	2	2	3	3	4
$m = 2$	1	1	2	2	3	4
$m = 3$	1	2	2	3	3	4
$m = 4$	1	1	2	2	3	4
$m = 5$	1	1	2	3	3	4
$m = 6$	0	1	2	3	3	4

Table 2. A checklist for some important notation introduced in Section 2.

Notation	Meaning	Definition
H	A multiset of h strings
$M (\cdot)$	The prefix–suffix compositions of a multiset	Definition 1
M	The prefix–suffix compositions of H	$M (H)$
$H$	The collection of all equivalent classes whose members are compatible with M	${[U] ∣ M (U) = M}$
f	A cumulative weight function	Definition 3
$H_{f}$	The multiset of strings corresponding to f	Definition 4
$f_{m}$	A component function of f	Definition 5
$med (f_{m})$	The median weight of $f_{m}$	Definition 9
$A_{f} (w)$	The labels of component functions with $med (f_{m}) = w$	Definition 9
$A_{f} (l, w)$	The labels of component functions such that $f_{m} (l) = w$	Definition 11
$a_{l, w}$	The multiplicity of $(l - w, w)$ in M	Definition 12
$b_{l, w}$	The number of length (l), weight (w) affixes whose weight remains the same if the length decreases	Definition 13
$c_{l, w}$	The number of length (l), weight (w) affixes whose weight decreases with the length	Definition 13

Table 3. The values of

g (l, m)

induced by

V = {1000111, 1110001, 1100011, 1010011}

.

Table 3. The values of

g (l, m)

induced by

V = {1000111, 1110001, 1100011, 1010011}

.

$g (l, m)$	$l = 1$	$l = 2$	$l = 3$	$l = 4$	$l = 5$	$l = 6$	$l = 7$
$m = 1$	1	1	1	1	2	3	4
$m = 2$	1	2	3	3	3	3	4
$m = 3$	1	2	3	3	3	3	4
$m = 4$	1	1	1	1	2	3	4
$m = 5$	1	2	2	2	2	3	4
$m = 6$	1	2	2	2	2	3	4
$m = 7$	1	1	2	2	2	3	4
$m = 8$	1	2	2	2	3	3	4

Table 4. A checklist for some important notation in Section 4.

Notations	Meanings
$P (w)$	A subset of $[2 h]$ whose size equals the number of component functions of median weight w
F	A bookkeeping function defined on $〚 ⌊ n / 2 ⌋ 〛 \times 〚 \bar{w} 〛$ that tracks the labels of $f_{m}$ as we assign values to $f_{m} (l)$ from $l = ⌊ n / 2 ⌋$ to $l = 0$
$F (l, w)$	A collection of sets that partitions labels of $f_{m} (l)$ according to their behaviors from length $⌈ n / 2 ⌉$ to l
$F$	A collection of bookkeeping functions
$R_{w}$	A collection of sets, each of which corresponds to $f_{m}$ ’s of median weight w that have the same graph over $〚 ⌈ n / 2 ⌉ 〛$
$r_{w}$	The size of $R_{w}$ and equals the number of different “half strings” with median weight w
$σ$	A “pairing” permutation on $[2 h]$ such that if $u \in R_{w, i}$ is paired with $v \in R_{\bar{w} - w, j}$ then $σ {(u)}^{*} = σ (v)$ .

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, Y.; Chen, Z. Reconstruction of Multiple Strings of Constant Weight from Prefix–Suffix Compositions. Entropy 2025, 27, 39. https://doi.org/10.3390/e27010039

AMA Style

Yang Y, Chen Z. Reconstruction of Multiple Strings of Constant Weight from Prefix–Suffix Compositions. Entropy. 2025; 27(1):39. https://doi.org/10.3390/e27010039

Chicago/Turabian Style

Yang, Yaoyu, and Zitan Chen. 2025. "Reconstruction of Multiple Strings of Constant Weight from Prefix–Suffix Compositions" Entropy 27, no. 1: 39. https://doi.org/10.3390/e27010039

APA Style

Yang, Y., & Chen, Z. (2025). Reconstruction of Multiple Strings of Constant Weight from Prefix–Suffix Compositions. Entropy, 27(1), 39. https://doi.org/10.3390/e27010039

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Reconstruction of Multiple Strings of Constant Weight from Prefix–Suffix Compositions^†

Abstract

1. Introduction

2. Notation and Preliminaries

3. Necessary and Sufficient Conditions for Unique Reconstruction

3.1. Necessity

3.2. Sufficiency

4. Reconstruction Algorithms

4.1. An Algorithm That Outputs a Multiset of Strings Compatible with M

4.2. An Algorithm That Outputs All Multisets of Strings Compatible with M

4.2.1. Scan Stage

4.2.2. Assembly Stage

5. Concluding Remarks

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Proof of Lemma 5

Appendix B. Proof of Lemma 8

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Reconstruction of Multiple Strings of Constant Weight from Prefix–Suffix Compositions †

Abstract

1. Introduction

2. Notation and Preliminaries

3. Necessary and Sufficient Conditions for Unique Reconstruction

3.1. Necessity

3.2. Sufficiency

4. Reconstruction Algorithms

4.1. An Algorithm That Outputs a Multiset of Strings Compatible with M

4.2. An Algorithm That Outputs All Multisets of Strings Compatible with M

4.2.1. Scan Stage

4.2.2. Assembly Stage

5. Concluding Remarks

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Proof of Lemma 5

Appendix B. Proof of Lemma 8

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Reconstruction of Multiple Strings of Constant Weight from Prefix–Suffix Compositions^†