Next Article in Journal
Cryptanalysis of an Image Encryption Algorithm Using DNA Coding and Chaos
Next Article in Special Issue
Generalized Orthogonal de Bruijn and Kautz Sequences
Previous Article in Journal
A Novel Hyper-Heuristic Algorithm with Soft and Hard Constraints for Causal Discovery Using a Linear Structural Equation Model
Previous Article in Special Issue
A DNA Data Storage Method Using Spatial Encoding Based Lossless Compression
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Reconstruction of Multiple Strings of Constant Weight from Prefix–Suffix Compositions †

1
School of Data Science, The Chinese University of Hong Kong, Shenzhen 518172, China
2
School of Science and Engineering, Future Networks of Intelligence Institute, The Chinese University of Hong Kong, Shenzhen 518172, China
*
Author to whom correspondence should be addressed.
This paper was presented in part at the 2024 IEEE International Symposium on Information Theory (ISIT 2024).
Entropy 2025, 27(1), 39; https://doi.org/10.3390/e27010039
Submission received: 15 November 2024 / Revised: 30 December 2024 / Accepted: 3 January 2025 / Published: 6 January 2025
(This article belongs to the Special Issue Coding and Algorithms for DNA-Based Data Storage Systems)

Abstract

:
Motivated by studies of data retrieval in polymer-based storage systems, we consider the problem of reconstructing a multiset of binary strings that have the same length and the same weight from the compositions of their prefixes and suffixes of every possible length. We provide necessary and sufficient conditions for which unique reconstruction up to the reversal of the strings is possible. Additionally, we present two algorithms for reconstructing strings from the compositions of prefixes and suffixes of constant-length constant-weight strings.

1. Introduction

The growing demand for archival data storage calls for innovative solutions to store information beyond traditional methods that rely on magnetic tapes or hard disk drives. Recent advancement in macromolecule synthesis and sequencing suggests that polymers, such as DNA, are promising media for future archival data storage, largely attributed to their high storage density and durability. Data retrieval in polymer-based storage systems depends on macromolecule sequencing technologies [1,2] to read out the information stored in the polymers. However, common sequencing technologies often only read random fragments of polymers. Thus, the task of data retrieval in these systems has to be based on the information provided by the fragments.
Under proper assumptions, one may represent polymers by binary strings and turn the problem of data retrieval into the problem of string reconstruction from substring compositions, i.e., from the number of zeros and the number of ones in substrings of every possible length. In [3], the authors characterized the length for which strings can be uniquely reconstructed from their substring compositions up to reversal. Extending the work of [3], the authors of [4,5] studied the problem of string reconstruction from erroneous substring compositions. Specifically, ref. [4] designed coding schemes capable of reconstructing strings in the presence of substitution errors, and [5] further proposed codes that can deal with insertion and deletion errors. Observing that it may not be realistic to assume that the compositions of all substrings are available, the authors of [6] initiated the study of string reconstruction based on the compositions of prefixes and suffixes of all possible lengths. In fact, ref. [6] considered the more general problem of reconstructing multiple distinct strings of the same length simultaneously from the compositions of their prefixes and suffixes. The main result of [6] reveals that for the reconstruction of no more than h distinct strings of the same length, there exists a code with a rate approaching 1 / h asymptotically. Following [6], the authors of [7] studied in depth the problem of reconstructing a single string from the compositions of its prefixes and suffixes. In particular, their work completely characterized the strings that can be reconstructed from the unique prefix and suffix compositions up to reversal.
The efficiency of data retrieval is a major concern for practical polymer-based storage systems, and thus, low-complexity algorithms for string reconstruction are of great interest. In the case of reconstruction from error-free substring compositions, ref. [3] described a backtracking algorithm for binary strings of length n with worst-case time complexity exponential in n . Moreover, refs. [4,8] constructed sets of binary strings that can be uniquely reconstructed with a time complexity polynomial in n. In the case of reconstruction from error-free compositions of prefixes and suffixes, refs. [6,7] presented sets of binary strings that can be efficiently reconstructed. For reconstruction in the presence of substitution composition errors, ref. [4] showed that when the number of errors is a constant independent of n, there exist coding schemes with decoding complexity polynomials in n.
We note that string reconstruction is a classic problem [9,10,11] and has been studied under various settings, including reconstruction from substrings [9,12,13] and from subsequences [10,11,14,15,16,17] under either combinatorial or probabilistic assumptions.
In this paper, we consider the problem of reconstructing h strings that are not necessarily distinct but have the same length n 1 and weight w ¯ n from their error-free compositions of prefixes and suffixes of all possible lengths. The problem of reconstructing multiple strings from prefix–suffix compositions becomes more amenable to analysis if the strings are of constant weight. This is because nice properties due to symmetry can be tethered to the prefix–suffix compositions. It is worth mentioning that the work of [18] studied the largest possible set of constant-weight binary B 2 -sequences, i.e., the set of constant-weight binary strings with the property that the real-valued sums of all distinct pairs of strings are different. Such sequences, albeit without the constraint of being constant weight, were used in [6] to ensure unique reconstructions of strings based on their prefix and suffix compositions.
Our first result is a characterization of the properties of constant-weight strings that enable unique reconstructions up to reversal, expanding our earlier work [19]. Additionally, we present two algorithms that reconstruct constant-weight strings from prefix–suffix compositions. Given prefix–suffix compositions as input, one of the algorithms can efficiently output a multiset of strings whose prefix–suffix compositions are the same as the input, and the other is able to output all multisets of strings up to reversal that are allowed by the input. Our analysis relies on the running weight information of the strings that can be extracted from the prefix–suffix compositions and the inherent symmetry of constant-weight strings and their reversals.
The rest of this paper is organized as follows. In Section 2, we present the problem statement and introduce necessary notation and preliminaries that are helpful for later sections. In particular, we introduce the notion of cumulative weight functions that capture the running weight information of a multiset of strings, which is used throughout the paper. In Section 3, we derive the necessary and sufficient conditions for unique reconstruction. Section 4 is devoted to the reconstruction algorithms. We conclude this paper and mention a few open problems in Section 5.

2. Notation and Preliminaries

Let n be a positive integer. Denote [ n ] = { 1 , 2 , , n } and n = { 0 , 1 , , n } . For integers n 1 , n 2 , define [ n 1 , n 2 ] = { n 1 , n 1 + 1 , , n 2 } if n 1 n 2 and [ n 1 , n 2 ] = if n 1 > n 2 . Let t = t 1 t 2 t n { 0 , 1 } n be a binary string of length n, and the reversal of t is denoted by t = t n t n 1 t 1 . The weight of t is the number of ones in t , denoted by wt ( t ) . The composition of t is formed by the number of zeros and the number of ones in t . More precisely, the ordered pair ( n wt ( t ) , wt ( t ) ) is called the composition of t . For 1 l n , the length-l prefix and the length-l suffix of t are denoted by t [ l ] and t [ l ] , respectively. We will use “∪” to denote both the set union and the multiset union. The exact meaning of “∪” will be clear from the context.
Definition 1. 
The set of compositions of all prefixes of a string t { 0 , 1 } n is called the prefix compositions of t , denoted by M p ( t ) . More precisely,
M p ( t ) = { ( j wt ( t [ j ] ) , wt ( t [ j ] ) ) 1 j n } .
The suffix compositions of t are similarly defined to be
M s ( t ) = { ( j wt ( t [ j ] ) , wt ( t [ j ] ) ) 1 j n } .
The prefix–suffix compositions of t are defined to be the multiset union of M p ( t ) and M s ( t ) , denoted by M ( t ) . Let U be a multiset of binary strings. Define M ( U ) to be the multiset union of M ( t ) , t U , i.e.,
M ( U ) = t U M ( t ) .
The multiset M ( U ) is called the prefix–suffix compositions of U.
Example 1. 
Take t = 110101 . The prefix compositions of t are
M p ( t ) = { ( 0 , 1 ) , ( 0 , 2 ) , ( 1 , 2 ) , ( 1 , 3 ) , ( 2 , 3 ) , ( 2 , 4 ) } ,
and the suffix compositions of t are
M s ( t ) = { ( 0 , 1 ) , ( 1 , 1 ) , ( 1 , 2 ) , ( 2 , 2 ) , ( 2 , 3 ) , ( 2 , 4 ) } .
Taking the multiset union of M p ( t ) and M s ( t ) , we get
M ( t ) = { ( 0 , 1 ) , ( 0 , 1 ) , ( 0 , 2 ) , ( 1 , 1 ) , ( 1 , 2 ) , ( 1 , 2 ) , ( 1 , 3 ) , ( 2 , 2 ) , ( 2 , 3 ) , ( 2 , 3 ) , ( 2 , 4 ) , ( 2 , 4 ) } .
Consider the multiset U = { 110101 , 110101 , 101110 } . The prefix–suffix compositions M ( U ) can be found to be
{ ( 0 , 1 ) 5 , ( 1 , 0 ) 1 , ( 0 , 2 ) 2 , ( 1 , 1 ) 4 , ( 1 , 2 ) 6 , ( 1 , 3 ) 4 , ( 2 , 2 ) 2 , ( 1 , 4 ) 1 , ( 2 , 3 ) 5 , ( 2 , 4 ) 6 } ,
where by ( i , j ) t , we mean t compositions of the form ( i , j ) , namely, compositions of i zeros and j ones.
Note that different multisets may result in the same prefix–suffix compositions. For example, reversing a string in a multiset gives rise to a different multiset that has the same prefix–suffix compositions.
Definition 2. 
Let U and V be two multisets of strings. The multiset V is said to be a reversal of U, denoted by V U , if | V | = | U | , and for any string t U the sum of the multiplicities of t and t in U equals the sum of the multiplicities of t and t in V. The collection of multisets that are reversals of U forms an equivalent class, denoted by [ U ] , i.e., [ U ] : = { V V U } .
Given the prefix–suffix compositions of a multiset H of h 1 binary strings of length n, we are interested in finding the h strings in H. The only constraint for the strings in H at this point is the length, and later, we will restrict them to the same weight. We do not impose any other constraints on H; in particular, H is allowed to have a string and its reversal at the same time. In the sequel, we denote M : = M ( H ) for simplicity. Clearly, any reversal of H has prefix–suffix compositions M. However, there may exist multisets that are not reversals of H but have the same compositions as H. If a multiset has prefix–suffix compositions M, we say the multiset is compatible with M. Let H : = { [ U ] M ( U ) = M } be the collection of all equivalent classes whose members are compatible with M. We say that H can be uniquely reconstructed up to reversal if and only if | H | = 1 .
As we will see later, it is helpful to present the information provided by each composition, i.e., the length and weight of the corresponding substring, on a two-dimensional grid. This motivates the following notation. Note that since | H | = h , there are 2 n h compositions in M. Denote the grid by
T : = { ( l , m ) l n , m [ 2 h ] } .
Assume the strings in H are given by h j , j = 1 , , h . Then, one can record wt ( h j [ l ] ) on the grid T with coordinates ( l , 2 j 1 ) and wt ( h j [ l ] ) on grid T with coordinates ( l , 2 j ) . Therefore, the task of reconstructing H from M becomes appropriate in identifying the second coordinate of ( l , m ) T , i.e., the label of the string in H, based on the weights of the prefixes and suffixes. To this end, we define an integer-valued bivariate function on T, as stated below.
Definition 3. 
A function f : T n is called a cumulative weight function (CWF) if it satisfies the following conditions:
(i) 
f ( 0 , m ) = 0 for any m [ 2 h ] ;
(ii) 
f ( l , m ) f ( l 1 , m ) { 0 , 1 } for any ( l , m ) [ n ] × [ 2 h ] ;
(iii) 
for each j [ h ] , there exists w j n such that f ( l , 2 j 1 ) + f ( n l , 2 j ) = w j for all l n .
If w j = w ¯ n for all j [ h ] , then f is said to be a constant-weight CWF or have constant weight  w ¯ .
It is clear that a CWF can be induced by the weights of the prefixes and suffixes of the strings in H. In particular, Item (iii) in Definition 3 is satisfied by taking w j = wt ( h j ) . At the same time, a CWF also identifies a multiset of h strings because one can reconstruct string h j using the weights of the prefixes given by { f ( l , 2 j 1 ) l n } straightforwardly.
Definition 4. 
Let f : T n be a CWF. The multiset H f : = { t j = t j , 1 t j , n t j , l = f ( l , 2 j 1 ) f ( l 1 , 2 j 1 ) for all l [ n ] , j [ h ] } is called the multiset of strings corresponding to CWF f.
Note that a CWF f uniquely determines H f (with an ordering of the strings induced by f), and the multiset H (or any of its reversals) induces CWFs that are equivalent up to permutation of the ordering of the strings in H. Therefore, one may use CWFs as a proxy for analyzing the reconstructibility of H based on M. By definition, a CWF f ( l , m ) consists of 2 h univariate functions obtained by fixing the variable m [ 2 h ] . It is convenient to deal with these component functions directly.
Definition 5. 
Let f : T n be a CWF. For m [ 2 h ] , let f m : n n be the function given by f m ( l ) = f ( l , m ) .
Example 2. 
Consider the multiset U = { 110101 , 110101 , 101110 } given in Example 1. The values of the CWF f : 6 ×   [ 6 ] 6 induced by U are given in Table 1. The graphs of the component functions f 1 , f 2 , …, f 6 are shown in Figure 1.
By Item (iii) of Definition 3, if f is the CWF induced by H, then f 2 j 1 and f 2 j record the weight information starting from the two ends of the same string in H. In other words, 2 j 1 and 2 j refer to the same string. As we will be constantly relating f 2 j 1 to f 2 j , or the other way around, let us introduce the following definition for notational convenience.
Definition 6. 
Let m [ 2 h ] . Define m [ 2 h ] by
m = m 1 if m is even , m + 1 if m is odd .
The problem of reconstructing a single string from its prefix–suffix compositions, i.e., the case where h = 1 in our setting, is examined in [7]. The authors of [7] introduced the so-called swap operation for a string t to generate all the strings that have the same prefix–suffix compositions as t , thereby deducing the conditions for a single string to be reconstructed uniquely up to reversal from prefix–suffix compositions. Specifically, the swap operation is performed on carefully chosen coordinates where t and t disagree, so as to produce new strings that maintain the same prefix–suffix compositions. Let f be the CWF induced by { t } and let f 1 , f 2 correspond to t and t , respectively. Using the language of CWFs, the swap operation should be performed over the domain where f 1 and f 2 take different values. Since f 1 , f 2 capture the running weight information from the two ends of the same string t , they must be 180-degree rotational symmetric. More precisely, if wt ( t ) = w ¯ , then f 1 should be the same as f 2 when it is rotated 180 degrees about ( n / 2 , w ¯ / 2 ) . With this observation, it follows that if f 1 and f 2 are the functions corresponding to the strings obtained by swapping bits of t with t , then f 1 , f 2 must be 180-degree rotational symmetric for them to record the weight information from the two ends of a single string.
Generalizing the idea of comparing t and t for producing new strings, we introduce the notions of discrepancy and maximal intervals between functions f m 1 and f m 2 for any m 1 , m 2 [ 2 h ] and h 1 as follows.
Definition 7. 
For m 1 , m 2 [ 2 h ] , define the discrepancy between the functions f m 1 and f m 2 to be the set D ( m 1 , m 2 ) : = { l [ n ] f m 1 ( l ) f m 2 ( l ) } . For k 1 , k 2 [ n ] , the set I : = [ k 1 , k 2 ] [ n ] is called a maximal interval (of the discrepancy) between f m 1 and f m 2 , if I is a nonempty set such that I D ( m 1 , m 2 ) , k 1 1 D ( m 1 , m 2 ) , and k 2 + 1 D ( m 1 , m 2 ) .
Due to Item (iii) of Definition 3, the maximal intervals between f m , f m exhibit symmetry about n / 2 , as shown in the next proposition.
Proposition 1. 
Let [ k 1 , k 2 ] n be a maximal interval between f m and f m . If k 2 + 1 < n k 2 , i.e., k 2 < n / 2 , then [ n k 2 , n k 1 ] is another maximal interval between f m and f m . Similarly, if k 1 > n / 2 , then [ n k 2 , n k 1 ] is another maximal interval between f m and f m . If k 1 n / 2 and k 2 n / 2 , then it is necessary that k 2 = n k 1 and so k 1 n / 2 and k 2 n / 2 .
Proof. 
Since f m ( l ) f m ( l ) for l [ k 1 , k 2 ] , by Item (iii) of Definition 3, we have f m ( l ) f m ( l ) for l [ n k 2 , n k 1 ] . The proposition follows by inspecting the intersection of [ k 1 , k 2 ] and [ n k 2 , n k 1 ] .    □
As the focus of this paper is on constant-weight strings, let us mention the following simple observation for constant-weight CWFs, which is also a consequence of Item (iii) of Definition 3.
Proposition 2. 
Assume f is a constant-weight CWF. Let m 1 , m 2 [ 2 h ] and k 1 , k 2 [ n 1 ] . If [ k 1 , k 2 ] [ n 1 ] is a maximal interval between f m 1 and f m 2 , then [ n k 2 , n k 1 ] is a maximal interval between f m 1 and f m 2 .
Example 3. 
Continuing Example 2, the CWF f has constant weight since it is induced by the multiset U in which all strings have weight 4. As shown in Figure 1, we observe the following:
  • The set [ 1 , 2 ] is a maximal interval between f 1 and f 6 , and [ 4 , 5 ] is also a maximal interval between f 2 and f 5 , as asserted by Proposition 2;
  • The set { 1 } is a maximal interval between f 2 and f 6 , and { 5 } is also a maximal interval between f 1 and f 5 , as asserted by Proposition 2.
Next, we introduce the notion of swap between functions f m 1 and f m 2 for any m 1 , m 2 [ 2 h ] that ensures the resulting component functions still form a CWF. In view of Proposition 1, the swap operation has to be defined properly so that the symmetry of f m 1 ,   f m 1 and f m 2 ,   f m 2 are preserved after swapping, i.e., Item (iii) of Definition 3 is still satisfied by the new functions obtained after swapping.
Definition 8. 
Let f be a CWF, m 1 , m 2 [ 2 h ] , and I n be a maximal interval between f m 1 and f m 2 . Let g be the CWF obtained from f by swapping the image of ( l , m 1 ) under f for that of ( l , m 2 ) , and the image of ( n l , m 1 ) under f for that of ( n l , m 2 ) for all l I . More precisely, if m 1 m 2 , then g satisfies g m = f m for m [ 2 h ] { m 1 , m 1 , m 2 , m 2 } and
g m 1 ( l ) = f m 2 ( l ) , l I f m 1 ( l ) , l I , g m 1 ( n l ) = f m 2 ( n l ) , l I f m 1 ( n l ) , l I , g m 2 ( l ) = f m 1 ( l ) , l I f m 2 ( l ) , l I , g m 2 ( n l ) = f m 1 ( n l ) , l I f m 2 ( n l ) , l I .
If m 1 = m 2 , then g satisfies g m = f m for m [ 2 h ] { m 1 , m 1 } and writing I = [ k 1 , k 2 ] , I ¯ : = [ n k 2 , n k 1 ] , we have
g m 1 ( l ) = f m 2 ( l ) , l I I ¯ f m 1 ( l ) , l I I ¯ , g m 1 ( n l ) = f m 2 ( n l ) , l I I ¯ f m 1 ( n l ) , l I I ¯ .
Denote the mapping ( f , I , m 1 , m 2 ) g by ϕ.
Example 4. 
Continuing Example 3, we perform the swap operation on f 1 , f 2 , f 5 , f 6 :
  • Let g = ϕ ( f , I = [ 1 , 2 ] , m 1 = 1 , m 2 = 6 ) . Then, the multiset corresponding to g is H g = { 011101 , 110101 , 110101 } by Definition 4. Observe in Table 1 and Figure 1 that f 1 ( l ) = f 6 ( l ) for all l 6 [ 1 , 2 ] , so g 1 = f 6 . This explains the fact that the first string in H g is the reversal of the third string in U. If we define g = ϕ ( f , I = { 1 } , m 1 = 5 , m 2 = 6 ) , then H g = H g .
  • Let g = ϕ ( f , I = { 1 } , m 1 = 2 , m 2 = 6 ) . Then, the multiset corresponding to g is H g = { 110110 , 110101 , 101101 } by Definition 4. Observe in Table 1 and Figure 1 that f 2 ( 1 ) = f 5 ( 1 ) , f 1 ( 5 ) = f 6 ( 5 ) , and f 5 ( l ) = f 6 ( l ) for all l 6 ( { 1 } { 5 } ) . By the swap operation, we have g 6 ( 1 ) = f 2 ( 1 ) = f 5 ( 1 ) and g 5 ( 5 ) = f 1 ( 5 ) = f 6 ( 5 ) . Thus, g 5 = g 6 . Indeed, the third string in H g is a palindrome.
The ideas of maximal intervals and swapping are particularly helpful in establishing the necessity of the conditions for unique reconstruction as we will see in Section 3.1.
If f is constant weight, then by the rotational symmetry of f m and f m , the behavior of f is completely characterized by ( l , f m ( l ) ) for all l n / 2 and m [ 2 h ] . This motivates us to look at the “median weight” of the component functions { f m } , introduced below.
Definition 9. 
For m [ 2 h ] , the median weight of f m is defined to be med ( f m ) = 1 2 f m ( n / 2 ) + f m ( n / 2 ) R . For w R , let A f ( w ) = { m [ 2 h ] med ( f m ) = w } be the set of labels of the component functions { f m } for which the median weight is w. If f is clear from the context, denote A f ( w ) by A ( w ) for simplicity.
The set A ( w ) plays an important role in showing the sufficiency of the conditions for the unique reconstruction in Section 3.2. Note that if f has constant weight w ¯ , then | A f ( w ¯ / 2 ) | must be even. In fact, for any m [ 2 h ] , we have m A ( w ¯ / 2 ) if and only if m A ( w ¯ / 2 ) , due to the 180-degree rotational symmetry of f m and f m about ( n / 2 , w ¯ / 2 ) .
As mentioned earlier, CWFs may be used as a proxy for reconstructing multisets given M. In fact, our reconstruction algorithms presented in Section 4 essentially reconstruct CWFs whose corresponding multisets are compatible with M, and such CWFs are said to be “solutions” to M.
Definition 10. 
A CWF f : T n is called a solution to the composition multiset M if the multiset equality M = { ( l f m ( l ) , f m ( l ) ) m [ 2 h ] , l [ n ] } holds.
Remark 1. 
If f is a solution to M and I n is a maximal interval between f m 1 and f m 2 , then g = ϕ ( f , I , m 1 , m 2 ) is also a solution to M.
In order to recover all multisets of strings compatible with M, it suffices to find all CWF solutions to M. Therefore, it is helpful to establish connections between multiset M and CWF f, which is what we will do next.
Definition 11. 
Let f : T n be a CWF. For ( l , w ) n 2 , let A f ( l , w ) = { m [ 2 h ] f m ( l ) = w } . When the underlying CWF f is clear from the context, denote A f ( l , w ) by A ( l , w ) for simplicity.
Definition 12. 
For ( l , w ) n 2 , let a l , w be the number of pairs ( l w , w ) in M if ( l , w ) ( 0 , 0 ) and define a 0 , 0 = 2 h .
Remark 2. 
By Definitions 11 and 12, | A ( l , w ) | is the number of functions in { f m } that satisfies f m ( l ) = w , and a l , w is the number of length-l prefixes and suffixes of weight w. Therefore, by Definition 10, a CWF f is a solution to M if and only if | A ( l , w ) | = a l , w for all ( l , w ) n 2 .
By Remark 2, to find a solution f to M, one may plot the elements of the multiset M on a two-dimensional grid (see Figure 2 for example) and construct a CWF f such that it passes the point ( l , w ) exactly a l , w times on the grid. Below, we mention a few basic properties of A ( l , w ) and a l , w that immediately follow from Definitions 11 and 12.
Proposition 3. 
(i) 
For l [ n ] and w 1 , w 2 n , if w 1 w 2 , then A ( l , w 1 ) A ( l , w 2 ) = .
(ii) 
For ( l , w ) n 1 2 , it holds that A ( l , w ) A ( l + 1 , w ) A ( l + 1 , w + 1 ) .
(iii) 
For ( l , w ) [ n ] 2 , it holds that A ( l , w ) A ( l 1 , w ) A ( l 1 , w 1 ) .
Note that (ii) (resp., (iii)) of Proposition 3 simply says that the weight of a substring cannot decrease (resp., increase) if its length increases (resp., decreases). As mentioned previously, a solution f to M must pass ( l , w ) exactly a l , w times. To further assist in finding such CWFs, we will be interested in the number of length (l) and weight (w) prefixes and suffixes whose weight remains the same if the length decreases, and the number of those whose weight decreases with the length. They are denoted by b l , w and c l , w , introduced in the next definition.
Definition 13. 
Let f be a solution to M. For all ( l , w ) [ n ] 2 , define b l , w = | A ( l , w ) A ( l 1 , w ) | and c l , w = | A ( l , w ) A ( l 1 , w 1 ) | . Moreover, define b l , 0 = | A ( l , 0 ) | , c l , 0 = 0 for all l [ n ] .
Proposition 4. 
Let f be a solution to M. The numbers { b l , w , c l , w ( l , w ) [ n ] × n ; w l } can be computed from the numbers { a l , w ( l , w ) n 2 ; w l } .
Proof. 
Since f is a solution to M, we have a l , w = | A ( l , w ) | . By Definition 13, b l , l = 0 , c l , l = a l , l for all l [ n ] . It remains to find b l , w , c l , w where 0 w l 1 . By (i) of Proposition 3, A ( l 1 , w ) and A ( l 1 , w 1 ) are disjoint. Therefore, b l , w + c l , w a l , w . At the same time, by (iii) of Proposition 3, we have a l , w b l , w + c l , w , and thus
a l , w = b l , w + c l , w , ( l , w ) [ n ] × n .
Using (ii) of Proposition 3, we have a l , w b l + 1 , w + c l + 1 , w + 1 . By (i) of Proposition 3, A ( l + 1 , w ) and A ( l + 1 , w + 1 ) are disjoint, so we also have b l + 1 , w + c l + 1 , w + 1 a l , w . Therefore,
a l , w = b l + 1 , w + c l + 1 , w + 1 , ( l , w ) n 1 2 .
It follows from (2) that b l , l 1 = a l 1 , l 1 c l , l for l [ n ] . Using (1), we obtain c l , l 1 = a l , l 1 b l , l 1 for l [ n ] . Therefore, we have found { b l , l 1 , c l , l 1 l [ n ] } . Next, from (2) and (1), we have b l , l 2 = a l 1 , l 2 c l , l 1 and c l , l 2 = a l , l 2 b l , l 2 for l [ 2 , n ] . Repeating this process, we can determine { b l , l i , c l , l i l [ i , n ] } for all i [ n ] .    □
Remark 3. 
As a consequence of Proposition 4, the numbers { b l , w } and { c l , w } can be found by inspecting M, and thus, they are properties of M in the sense that all solutions to M result in the same { b l , w } and { c l , w } . In fact, from the recursive procedures in the above proof, for w l , we have
b l , w = v = w l 1 a l 1 , v v = w + 1 l a l , v ,
c l , w = v = w l a l , v v = w l 1 a l 1 , v .
Since b l , w 0 , c l , w 0 , it follows that for all w l
v = w l 1 a l 1 , v v = w + 1 l a l , v ,
v = w l a l , v v = w l 1 a l 1 , v .
The numbers { a l , w } , { b l , w } , { c l , w } are instrumental in analyzing the possible behaviors of the component functions { f m } in Section 4. Before proceeding to present our main results, we summarize some important notation introduced in this section in Table 2 for ease of reference.

3. Necessary and Sufficient Conditions for Unique Reconstruction

In this section, we assume H is a multiset of h strings of length n and weight w ¯ . The main result of this section is stated in the following theorem.
Theorem 1. 
Let f be a solution to M. There is exactly one multiset of strings (up to reversal) compatible with M, i.e., | H | = 1 , if and only if f satisfies the following conditions:
(i) 
For any m 1 , m 2 [ 2 h ] with m 1 = m 2 , there exist at most two maximal intervals between f m 1 and f m 2 .
(ii) 
For any m 1 , m 2 [ 2 h ] with m 1 m 2 , there exists at most one maximal interval between f m 1 and f m 2 .

3.1. Necessity

To give a rough idea of why the conditions in Theorem 1 are necessary for unique reconstruction, let us first consider some simple examples for the case where there is a single string t . Suppose t = 011101 and so t = 101110 . A string s = 101110 , which has the same prefix–suffix compositions as t , can be obtained by swapping the first two and last two bits of t for those of t . Note that s is simply t and we only obtain the reversal of t after swapping. Using the language of CWFs, let f be the CWF induced by { t } and f 1 , f 2 be the functions corresponding to t , t . We observe that there are only two maximal intervals between f 1 and f 2 .
Next, let us examine an example where we produce a new string by swapping. Take t = 010101 and so t = 101010 . In this case, there are three maximal intervals between the corresponding functions f 1 , f 2 . Swapping the first two and last two bits of t with t , we obtain a new string s = 100110 . Clearly, s t , and s has the same prefix–suffix compositions as t .
From the above two examples, one may expect that if there are at least three maximal intervals between t and t , then t cannot be uniquely reconstructed, and therefore, the existence of at most two maximal intervals is necessary for the unique reconstruction of a single string up to reversal. A similar analysis can also be carried out for two strings that are not reversals of each other, and it turns out that the existence of at most one maximal interval is necessary for the unique reconstruction up to reversal in this case.
Lemma 1. 
Let f be a solution to M and let m [ 2 h ] . If there exist at least three maximal intervals between f m and f m , then | H | > 1 .
Proof. 
Let I 1 = [ k 1 , k 2 ] , I 2 = [ k 3 , k 4 ] , I 3 = [ k 5 , k 6 ] be three maximal intervals between f m and f m . Without loss of generality, we may assume 0 < k 1 k 2 < k 3 k 4 < k 5 k 6 < n . Construct g = ϕ ( f , I 1 , m , m ) . By Remark 1, g is also a solution to M. Let I ¯ 1 = [ n k 2 , n k 1 ] . By construction of g, we have g m f m on I 1 I ¯ 1 . Moreover, g m f m on either I 2 or I 3 since I ¯ 1 cannot equal both of them. Therefore, the string corresponding to g m is not the same as f m , f m and we have [ H g ] [ H f ] . Hence, if there exist at least three maximal intervals between f m and f m , then | H | > 1 .    □
Lemma 2. 
Let f be a solution to M, and let m 1 , m 2 [ 2 h ] with m 1 m 2 . If there exist at least two maximal intervals between f m 1 and f m 2 , then | H | > 1 .
Proof. 
Let I 1 , I 2 be two maximal intervals between f m 1 and f m 2 . Without loss of generality, assume { n / 2 , n / 2 } I 1 . Construct g = ϕ ( f , I 1 , m 1 , m 2 ) . By Remark 1, g is also a solution to M. In the following, we will show that
{ f m 1 , f m 1 , f m 2 , f m 2 } { g m 1 , g m 1 , g m 2 , g m 2 } ,
implying | H | > 1 .
Since m 1 m 2 and I 1 is a maximal interval between f m 1 and f m 2 , by construction of g, we have g m 1 f m 1 , and I 1 is the only maximal interval between g m 1 and f m 1 . We claim g m 1 f m 1 also holds. Indeed, if g m 1 = f m 1 , then I 1 is the only maximal interval between f m 1 and f m 1 . However, since I 1 { n 2 , n 2 } , according to Proposition 1, there are at least two maximal intervals between f m 1 and f m 1 , which is a contradiction. Therefore, g m 1 f m 1 . So far, we have shown
g m 1 f m 1 , g m 1 f m 1 .
By construction of g, we have g m 1 f m 2 . If g m 1 f m 2 , then (7) holds and we are done.
Consider the case where g m 1 = f m 2 . Using arguments similar to those leading to (8), one can obtain
g m 2 f m 2 , g m 2 f m 2 .
By construction of g, we also have g m 2 f m 1 . Next, we would like to show g m 2 f m 1 for (7) to hold. Recall that the set I 1 is the only maximal interval between g m 1 and f m 1 . Since g m 1 = f m 2 , it follows that I 1 is the only maximal interval between f m 2 and f m 1 . Write I 1 = [ k 1 , k 2 ] . By Proposition 2, the set [ n k 2 , n k 1 ] is a maximal interval between f m 2 and f m 1 , and so f m 2 ( n k 1 ) f m 1 ( n k 1 ) . Since I 1 { n 2 , n 2 } , we have I 1 [ n k 2 , n k 1 ] = . By construction of g, we have g m 2 ( n k 1 ) = f m 2 ( n k 1 ) and it follows that g m 2 ( n k 1 ) f m 1 ( n k 1 ) , i.e., g m 2 f m 1 . Therefore, (7) also holds.
In summary, no matter whether g m 1 and f m 2 are the same or not, (7) always holds. It follows that the multisets corresponding to f , g satisfy [ H f ] [ H g ] , and thus, | H | > 1 .    □
The necessity part of Theorem 1 follows from Lemmas 1 and 2.

3.2. Sufficiency

From the above discussion on the necessity, it is not difficulty to see that if f is a solution to M such that the conditions in Theorem 1 hold, then any CWF g resulted from a series of the swap operations between f 1 , , f 2 h satisfies [ H g ] = [ H f ] . Therefore, the sufficiency of the conditions follows if one can further show that any solution to M can be obtained from repeated applications of the swap operation between f 1 , , f 2 h . However, it is, in general, not obvious to establish such a connection between f and an arbitrary solution to M. Thus, we take a different approach to showing the sufficiency. Our main idea is to translate the conditions in Theorem 1 to properties shared by all solutions to M and utilize these properties to establish the sufficiency of the conditions.
As mentioned before, the CWF f induced by h strings of length n and weight w ¯ is determined by the behaviors of the functions { f m } on n / 2 because of the constant weight. Based on the values that the functions { f m } take at n / 2 , i.e., the median weight med ( f m ) , the functions { f m } can be formed into groups A ( w ) , w = 0 , 1 / 2 , 1 , , w ¯ . In the following, we analyze the behaviors of the functions { f m } according to their membership in these groups. Let us first rephrase the conditions for f m , f m in Theorem 1 using their rotational symmetry.
Proposition 5. 
Let f be a solution that satisfies the conditions in Theorem 1. Then the following holds:
(i) 
For any m A ( w ¯ / 2 ) , either f m = f m or there are exactly two maximal intervals between f m and f m , and exactly one of the two intervals is contained in [ n / 2 ] .
(ii) 
For any m [ 2 h ] A ( w ¯ / 2 ) , there is exactly one maximal interval between f m and f m .
Proof. 
As mentioned previously, for any m [ 2 h ] , we have m A ( w ¯ / 2 ) if and only if m A ( w ¯ / 2 ) . For any m A ( w ¯ / 2 ) with f m f m , since f satisfies the conditions in Theorem 1, there is either one maximal interval or two maximal intervals between f m and f m . Since med ( f m ) = med ( f m ) , it follows that at least one of the maximal intervals is contained in [ n / 2 ] or [ n / 2 + 1 , n ] . Suppose there is only one maximal interval between f m and f m . Then, the maximal interval is contained in [ n / 2 ] or [ n / 2 + 1 , n ] , but by Proposition 1, there are two maximal intervals between f m and f m , which is a contradiction. So, there are exactly two maximal intervals. Now, suppose the two intervals are both in [ n / 2 ] or both in [ n / 2 + 1 , n ] . Then, by Proposition 1, there are more than two maximal intervals between f m and f m , which is a contradiction. It follows that exactly one of the two intervals is contained in [ n / 2 ] .
For any m [ 2 h ] A ( w ¯ / 2 ) , we have med ( f m ) med ( f m ) ; so, by Proposition 1, there exists one maximal interval between f m and f m that contains { n / 2 , n / 2 } . Furthermore, if there is another maximal interval contained in [ n / 2 ] or [ n / 2 + 1 , n ] , by Proposition 1, there are at least three maximal intervals between f m and f m , which is a contradiction to the conditions in Theorem 1. Therefore, there is exactly one maximal interval between f m and f m .    □
Example 5. 
Consider the set of strings V = { 1000111 , 1110001 , 1100011 , 1010011 } . The CWF g induced by V is given by Table 3. One can check that g satisfies the conditions in Theorem 1. Below, let us verify what Proposition 5 claims. Note that the strings in V have the same weight w ¯ = 4 and A g ( w ¯ / 2 ) = { 5 , 6 , 7 , 8 } . From Table 3, we can observe that g 5 = g 6 , and { 2 } [ n / 2 ] = [ 3 ] and { 5 } are the only two maximal intervals between g 7 and g 8 . At the same time, [ 2 h ] A g ( w ¯ / 2 ) = { 1 , 2 , 3 , 4 } . From Table 3, we can observe that there is exactly one maximal interval between g 1 and g 2 , and the same holds for g 3 and g 4 .
Below, we introduce two more definitions that are helpful for discussing the behaviors of the functions { f m } in this subsection.
Definition 14. 
For m [ 2 h ] and I n , let G ( f m , I ) : = { ( l , f m ( l ) ) l I } be the graph of f m over I and denote G ( f m ) : = G ( f m , n ) .
Definition 15. 
An element ( l , w ) [ n ] 2 is called a branching point if b l , w > 0 and c l , w > 0 . An element ( l , w ) n 1 2 is called a merging point if b l + 1 , w > 0 and c l + 1 , w + 1 > 0 . The branching and merging points are so named because we would like to visualize the graphs { G ( f m ) m [ 2 h ] } evolving from l = n to l = 0 .
Example 6. 
Let U = { 110101 , 110101 , 101110 } and V = { 1000111 , 1110001 , 1100011 , 1010011 } as given in Examples 1 and 5. In Figure 2, we depict M ( U ) and M ( V ) by writing the non-zero numbers a l , w on top of the points ( l , w ) . The numbers b l , w and c l , w can be determined by (3) and (4) in Remark 3, from which branching and merging points can be identified using Definition 15.
Using Proposition 5, we examine the conditions in Theorem 1 in terms of the branching points and merging points on { f m } in a series of lemmas below. Lemma 3 first examines the functions { f m } for which m [ 2 h ] A ( w ¯ / 2 ) .
Lemma 3. 
Let f be a solution to M that satisfies the conditions in Theorem 1 and let m [ 2 h ] A ( w ¯ / 2 ) . If ( l , w ) G ( f m ) is a merging point, then there are no branching points in G ( f m , [ l ] ) .
Proof. 
If ( l , w ) G ( f m ) is a merging point, there exists m 1 [ 2 h ] { m } such that f m ( l + 1 ) f m 1 ( l + 1 ) and f m ( l ) = f m 1 ( l ) . We claim G ( f m , [ l ] ) = G ( f m 1 , [ l ] ) . Indeed, if G ( f m , [ l ] ) G ( f m 1 , [ l ] ) then there is at least one maximal interval between f m and f m 1 contained in [ l 1 ] , in addition to the one contained in [ l + 1 , n ] . Since f satisfies the conditions in Theorem 1, we must have m 1 = m . However, by Proposition 5, if m 1 = m there should be only one maximal interval between f m 1 and f m , leading to a contradiction. Hence, G ( f m , [ l ] ) = G ( f m 1 , [ l ] ) .
Suppose ( k , v ) G ( f m , [ l ] ) is a branching point. Then there exists m 2 [ 2 h ] { m , m 1 } such that f m 2 ( k ) = f m ( k ) and f m 2 ( k 1 ) f m ( k 1 ) . Since ( l , w ) G ( f m ) is a merging point and we have f m ( l + 1 ) f m 1 ( l + 1 ) , G ( f m , [ l ] ) = G ( f m 1 , [ l ] ) , there must exist m ˜ { m , m 1 } such that f m ˜ ( k 1 ) f m 2 ( k 1 ) and f m ˜ ( l + 1 ) f m 2 ( l + 1 ) . It follows that there are two maximal intervals between f m ˜ and f m 2 , and therefore, by the conditions in Theorem 1, we have m 2 = m ˜ .
If m ˜ [ 2 h ] A ( w ¯ / 2 ) , then by Proposition 5, there should be exactly one maximal interval between f m ˜ , f m 2 , which is a contradiction.
If m ˜ A ( w ¯ / 2 ) , then m ˜ = m 1 and m 2 = m 1 A ( w ¯ / 2 ) . Therefore, the median weights of f m are different from that of f m 1 , f m 2 and there exists l { n / 2 , n / 2 } such that f m ( l ) f m 1 ( l ) , f m ( l ) f m 2 ( l ) . Since G ( f m , [ l ] ) = G ( f m 1 , [ l ] ) , we have l < l . It follows that k < l . So there exists a maximal interval between f m , f m 2 that is contained in [ k , n ] , in addition to the one contained in [ k 1 ] . Since m [ 2 h ] A ( w ¯ / 2 ) we have m 2 m , and therefore, by the conditions in Theorem 1, there should be only one maximal interval between f m , f m 2 , which is a contradiction.
Thus, there are no branching points in G ( f m , [ l ] ) .    □
Remark 4. 
One can also verify that if ( l , w ) G ( f m ) is a branching point, where m [ 2 h ] A ( w ¯ / 2 ) , then G ( f m , [ l , n ] ) has no merging points.
Example 7. 
Continuing Example 5, let us use the CWF g to verify Lemma 3 and Remark 4. In this case, [ 2 h ] A g ( w ¯ / 2 ) = { 1 , 2 , 3 , 4 } . Note that
G ( g 1 ) = { ( 0 , 0 ) , ( 1 , 1 ) , ( 2 , 1 ) , ( 3 , 1 ) , ( 4 , 1 ) , ( 5 , 2 ) , ( 6 , 3 ) , ( 7 , 4 ) }
contains two branching points ( 5 , 2 ) , ( 6 , 3 ) and two merging points ( 1 , 1 ) , ( 2 , 1 ) , as shown in Figure 2.
The next three lemmas examine the behaviors of { f m } for which m A ( w ¯ / 2 ) . In particular, the discussion is based on whether f m , m A ( w ¯ / 2 ) are all the same or not.
Lemma 4. 
Let f be a solution to M that satisfies the conditions in Theorem 1. If f m , m A ( w ¯ / 2 ) are all the same, then there are no branching points in G ( f m , [ n / 2 ] ) for all m A ( w ¯ / 2 ) .
Proof. 
Suppose there exist branching points in G ( f m , [ n / 2 ] ) for some m A ( w ¯ / 2 ) and let ( l , w ) G ( f m , [ n / 2 ] ) be a branching point. Since f m 1 = f m 2 for any m 1 , m 2 A ( w ¯ / 2 ) , there must exist m ˜ [ 2 h ] A ( w ¯ / 2 ) such that f m ˜ ( l ) = f m ( l ) and f m ˜ ( l 1 ) f m ( l 1 ) . Moreover, since m ˜ A ( w ¯ / 2 ) , we have f m ˜ ( l ) f m ( l ) for some l [ l , n / 2 ] . It follows that there exist two maximal intervals between f m and f m ˜ . This is a contradiction to Item (ii) in Theorem 1 by noticing m m ˜ since m A ( w ¯ / 2 ) for all m A ( w ¯ / 2 ) .    □
If f m , m A ( w ¯ / 2 ) are not all the same, Lemma 5 shows that graphs of f m over [ n / 2 ] are essentially of two kinds. The proof of Lemma 5 is presented in Appendix A.
Lemma 5. 
Let f be a solution to M that satisfies the conditions in Theorem 1. If f m , m A ( w ¯ / 2 ) are not all the same, then there exists m 1 A ( w ¯ / 2 ) such that there are exactly two maximal intervals between f m 1 and f m 1 , and f m , m A ( w ¯ / 2 ) { m 1 , m 1 } are all the same. Moreover, it holds that G ( f m , [ n / 2 ] ) = G ( f m 1 , [ n / 2 ] ) for all m A ( w ¯ / 2 ) { m 1 , m 1 } or G ( f m , [ n / 2 ] ) = G ( f m 1 , [ n / 2 ] ) for all m A ( w ¯ / 2 ) { m 1 , m 1 } .
Using Lemma 5, we can further deduce the property of the branching points and merging points on f m , m A ( w ¯ / 2 ) .
Example 8. 
Let g be the CWF induced by V = { 1000111 , 1110001 , 1100011 , 1010011 } as in Example 5. In this case, A g ( w ¯ / 2 ) = { 5 , 6 , 7 , 8 } . Note that f m , m A g ( w ¯ / 2 ) are not all the same (see Figure 3). As pointed out by Lemma 5, there exists m 1 = 7 , such that there are exactly two maximal intervals ( { 2 } and { 5 } ) between g 7 and g 8 . Moreover, g 5 = g 6 and it holds that G ( g 5 , [ n / 2 ] ) = G ( g 6 , [ n / 2 ] ) = G ( g 7 , [ n / 2 ] ) .
In addition, observe in Figure 3 that ( 3 , 2 ) is the only branching point with l [ n / 2 ] on the graph of g 7 , as asserted in the next lemma.
Lemma 6. 
Let f be a solution to M that satisfies the conditions in Theorem 1. If f m , m A ( w ¯ / 2 ) are not all the same, then there exists m 1 A ( w ¯ / 2 ) such that there is a maximal interval [ l 1 + 1 , l 2 1 ] [ n ] between f m 1 and f m 1 , where l 2 n / 2 . Moreover, ( l 2 , f m 1 ( l 2 ) ) is the only branching point in G ( f m , [ n / 2 ] ) and there is no merging point in G ( f m , [ l 2 , n / 2 ] ) for all m A ( w ¯ / 2 ) .
Proof. 
By Lemma 5 and Proposition 5, there exists m 1 A ( w ¯ / 2 ) such that there is a maximal interval [ l 1 + 1 , l 2 1 ] [ n ] between f m 1 and f m 1 , where l 2 n / 2 . In addition, by Lemma 5, we have that ( l 1 , f m 1 ( l 1 ) ) is a merging point in G ( f m ) for all m A ( w ¯ / 2 ) . In what follows, let m A ( w ¯ / 2 ) .
Suppose there exists l [ n / 2 ] , l l 2 such that ( l , f m ( l ) ) is a branching point. By Lemma 5, there exist a [ 2 h ] A ( w ¯ / 2 ) such that f a ( l ) = f m ( l ) and f a ( l 1 ) f m ( l 1 ) . It follows that there are two maximal intervals between f a , f m : one is contained in [ l 1 ] and the other is contained in [ l + 1 , n ] (since the median weight of f a is different from that of f m ). However, we have a m , which is a contradiction to the conditions in Theorem 1. Therefore, ( l 2 , f m 1 ( l 2 ) ) is the only branching point in G ( f m , [ n / 2 ] ) .
Suppose there exists l [ l 2 , n / 2 ] such that ( l , f m ( l ) ) is a merging point. By Lemma 5, there exists a [ 2 h ] A ( w ¯ / 2 ) such that f a ( l ) = f m ( l ) and f a ( l + 1 ) f m ( l + 1 ) . Since ( l 2 , f m 1 ( l 2 ) ) is the only branching point in G ( f m ˜ , [ n / 2 ] ) for all m ˜ A ( w ¯ / 2 ) , it follows from Lemma 5 that there exists b A ( w ¯ / 2 ) such that G ( f b , [ l 2 ] ) G ( f a , [ l 2 ] ) . Therefore, there exist two maximal intervals between f a , f b : one contained in [ l 2 ] and the other is contained in [ l + 1 , n ] . However, we have a b , which is a contradiction to the conditions in Theorem 1. Thus, there is no merging point in G ( f m , [ l 2 , n / 2 ] ) .    □
So far, we have translated the conditions in Theorem 1 to properties of the branching points and merging points on { f m } . The advantage of doing so is that properties of branching points and merging points are shared by all solutions to M. Let f , f be two solutions to M. As a result of Remarks 2 and 3, ( l , w ) [ n ] 2 is a branching point in G ( f m ) for some m [ 2 h ] if and only if ( l , w ) is a branching point in G ( f m ) for some m [ 2 h ] . In particular, there is no branching point in G ( f m , I ) for I n if and only if there is no branching point in G ( f m , I ) . The same statements hold for merging points. In view of this, we can then facilitate the description of the conditions in Theorem 1 in terms of branching points and merging points to establish the sufficiency of the conditions in Theorem 1.
Let us present two simple propositions that relate f , f using branching points and merging points.
Proposition 6. 
Let f, f be two solutions to M and [ l 1 , l 2 ] [ n ] . If there is no branching point in G ( f m , [ l 1 , l 2 ] ) and f m ( l 2 ) = f m ( l 2 ) for some m [ 2 h ] , then for any l [ l 1 1 , l 2 ] it holds that f m ( l ) = f m ( l ) .
Proof. 
Suppose there exist l [ l 1 1 , l 2 ] such that f m ( l ) f m ( l ) . Let l [ l 1 , l 2 ] be such that f m ( l ) = f m ( l ) and f m ( l 1 ) f m ( l 1 ) . Let w = f m ( l ) . Since f , f are solutions to M, it follows that the number of pairs ( l w , w ) in M is at least 2, i.e., a l , w 2 . Moreover, we have b l , w 1 , c l , w 1 . Thus, by Definition 15, ( l , w ) is a branching point in G ( f m , [ l 1 , l 2 ] ) , resulting in a contradiction.    □
Proposition 7. 
Let f, f be two solutions to M and [ l 1 , l 2 ] [ n ] . If f m ( l 2 ) = f m ( l 2 ) and f m ( l 1 ) f m ( l 1 ) , there must be a branching point in G ( f m , [ l 1 + 1 , l 2 ] ) . Similarly, if f m ( l 2 ) f m ( l 2 ) and f m ( l 1 ) = f m ( l 1 ) , there must be a merging point in G ( f m , [ l 1 , l 2 1 ] ) .
Proof. 
The first part of the statement is a direct consequence of Proposition 6. For the second part, we observe that there exists l [ l 1 + 1 , l 2 ] such that f m ( l ) f m ( l ) and f m ( l 1 ) = f m ( l 1 ) . Without loss of generality, assume f m ( l ) = f m ( l 1 ) = w and f m ( l ) = f m ( l 1 ) + 1 . These two equations imply b l , w 1 and c l , w + 1 1 , respectively. Thus, by Definition 15, ( l 1 , w ) is a merging point in G ( f m , [ l 1 , l 2 1 ] ) .    □
In the next two lemmas, we show that if f , f are two solutions to M with f satisfying the conditions in Theorem 1, then the multiset equality { f m m [ 2 h ] } = { f m m [ 2 h ] } must hold, thereby proving the sufficiency of the conditions in Theorem 1 for unique reconstruction up to reversal.
Lemma 7. 
Let f, f be two solutions to M, with f satisfying the conditions in Theorem 1. Let ψ 1 ( f ) = { f m m [ 2 h ] A f ( w ¯ / 2 ) } be a multiset and define ψ 1 ( f ) accordingly. Then, ψ 1 ( f ) = ψ 1 ( f ) .
Proof. 
Let m S f : = [ 2 h ] A f ( w ¯ / 2 ) . Note that f m f m since their median weights are different. Thus, there are branching points in G ( f m ) . Let ( l , w ) be the branching point in G ( f m ) such that l l for any branching point ( l , w ) G ( f m ) . Let
r = 0 if w = f m ( l 1 ) , 1 if w = f m ( l 1 ) + 1 .
In other words, r is an indicator of the behavior of f m to the left of the branching point ( l , w ) . By definition of r, we have m S f A f ( l , w ) A f ( l 1 , w r ) .
Let S f = [ 2 h ] A f ( w ¯ / 2 ) and m S f A f ( l , w ) A f ( l 1 , w r ) . In the following, we will show f m = f m . Since there is no branching point in G ( f m , [ l 1 ] ) and f m ( l 1 ) = f m ( l 1 ) , by Proposition 6, we have f m ( l ) = f m ( l ) for any l [ 0 , l 1 ] . Suppose f m ( l ) f m ( l ) for some l [ l , n ] . Then by Proposition 7, there is a merging point in G ( f m , [ l , l 1 ] ) . But then by Lemma 3, there are no branching points in G ( f m , [ l ] ) , contradicting that ( l , w ) G ( f m , [ l ] ) is a branching point. Thus, f m ( l ) = f m ( l ) for all l [ l , n ] . It follows that f m = f m for any m S f A f ( l , w ) A f ( l 1 , w r ) .
Next, let us show that S f A f ( l , w ) A f ( l 1 , w r ) = A f ( l , w ) A f ( l 1 , w r ) . Toward a contradiction, suppose that there exists m 0 A f ( w ¯ / 2 ) A f ( l , w ) A f ( l 1 , w r ) . Then, f m 0 ( l ) = f m ( l ) and f m 0 ( l 1 ) = f m ( l 1 ) . Note that med ( f m 0 ) med ( f m ) . If l 1 n / 2 , by Proposition 7, there must be a branching point in G ( f m , [ l 1 ] ) , contradicting the assumption that l l for any branching point ( l , w ) G ( f m ) . If l 1 < n / 2 , there must be a merging point in G ( f m , [ l , n / 2 ] ) , but by Lemma 3 there should be no branching points in G ( f m , [ l ] ) , contradicting that ( l , w ) is a branching point. We thus conclude A f ( w ¯ / 2 ) A f ( l , w ) A f ( l 1 , w r ) = , and so, S f A f ( l , w ) A f ( l 1 , w r ) .
Note that for any m S f ( A f ( l , w ) A f ( l 1 , w r ) ) , we have f m f m . Therefore, the multiplicity of f m in ψ 1 ( f ) is | S f A f ( l , w ) A f ( l 1 , w r ) | = | A f ( l , w ) A f ( l 1 , w r ) | . Taking f = f , one can repeat the above arguments to show that f m = f m ˜ for any m ˜ S f A f ( l , w ) A f ( l 1 , w r ) and the multiplicity of f m in ψ 1 ( f ) is | A f ( l , w ) A f ( l 1 , w r ) | .
Since f , f are solutions to M, | A f ( l , w ) A f ( l 1 , w r ) | = | A f ( l , w ) A f ( l 1 , w r ) | , i.e., the multiplicity of f m in ψ 1 ( f ) equals the multiplicity of f m in ψ 1 ( f ) . Furthermore, this holds for distinct f m ψ 1 ( f ) . Since | S f | = | S f | , i.e., | ψ 1 ( f ) | = | ψ 1 ( f ) | , we obtain ψ 1 ( f ) = ψ 1 ( f ) .    □
Lemma 8. 
Let f, f be two solutions to M with f satisfying the conditions in Theorem 1. Let ψ 0 ( f ) = { f m m A f ( w ¯ / 2 ) } be a multiset and define ψ 0 ( f ) accordingly. Then, ψ 0 ( f ) = ψ 0 ( f ) .
The idea of the proof for Lemma 8 is similar to that for Lemma 7, whereas it relies on Lemmas 4 and 6 instead of Lemma 3. The complete proof is given in Appendix B.
It follows from Lemmas 7 and 8 that the conditions in Theorem 1 are sufficient for unique reconstruction up to reversal.
Example 9. 
Let us use Theorem 1 to determine whether the multiset U = { 110101 , 110101 , 101110 } given in Example 1 can be uniquely reconstructed up to reversal. The CWF f induced by U is given in Example 2. As shown in Figure 1, there are two maximal intervals ( { 1 } and { 4 } ) between f 2 and f 6 . This violates Item (ii) of Theorem 1, so we conclude that U cannot be uniquely reconstructed from M ( U ) up to reversal. Indeed, in Example 4, we found multisets not equivalent to U but compatible with M ( U ) .

4. Reconstruction Algorithms

As before, we assume in this section that M is the prefix–suffix compositions of the multiset H of h strings of length n and weight w ¯ . We present two algorithms that produce multisets of strings compatible with M. Both algorithms first construct CWFs and then find the corresponding multisets as in Definition 4. The algorithm in Section 4.1 is a greedy algorithm that outputs a single multiset compatible with M with running time O ( n h ) . The algorithm in Section 4.2 is able to output all compatible multisets up to reversal. Its running time is, in general, exponential as it relies on a breadth-first search to find all possible CWFs and solve a number of integer partition problems.

4.1. An Algorithm That Outputs a Multiset of Strings Compatible with M

To construct a multiset compatible with M, it suffices to find a CWF f that is a solution to M. In Algorithm 1, we construct such a CWF by assigning larger w to f 2 k 1 ( l ) , k [ h ] in a greedy way as l goes from n to 0.
Algorithm 1 Algorithm for obtaining one multiset of strings compatible with M
Entropy 27 00039 i001
By Remark 2, Algorithm 1 produces a multiset of strings compatible with M if the function f constructed in the algorithm is a CWF and | A f ( l , w ) | = a l , w for ( l , w ) n .
Claim 2. 
The function f constructed in Algorithm 1 is a CWF.
Proof. 
Let us first show that f is a mapping from T to n . Noticing Lines 7 and 8 in the algorithm, it suffices to show that for each l [ n 1 ] , k [ h ] , there exists w [ l ] such that 1 + s l , w + 1 k a l , w + s l , w + 1 . Note that s l , w + 1 is a non-increasing function of w. Thus, as w decreases from l to 1, s l , w + 1 increases from 0 to at most 2 h . Moreover, a l , w + s l , w + 1 = s l , w . Therefore, Lines 13 and 14 are well defined for each l [ n 1 ] , k [ h ] . It remains to show that f as constructed satisfies the conditions required in Definition 3. Clearly, Item (i) in the definition is satisfied according to Line 7 and Item (iii) is satisfied by Lines 7, 8 and 14.
As for Item (ii) in Definition 3, it suffices to show that if f ( l , 2 k 1 ) = w , then f ( l 1 , 2 k 1 ) { w , w 1 } . According to Line 12, it suffices to show that if k [ 1 + s l , w + 1 , min { s l , w , h } ] then k [ 1 + s l 1 , w + 1 , min { s l 1 , w , h } ] [ 1 + s l 1 , w , min { s l 1 , w 1 , h } ] = [ 1 + s l 1 , w + 1 , min { s l 1 , w 1 , h } ] . From (6), we have
s l , w + 1 = v = w + 1 l a l , v v = w + 1 l 1 a l 1 , v = s l 1 , w + 1 .
At the same time, from (5), we have
s l 1 , w 1 = v = w 1 l 1 a l 1 , v v = w l a l , v s l , w .
Therefore, [ 1 + s l , w + 1 , min { s l , w , h } ] [ 1 + s l 1 , w + 1 , min { s l 1 , w 1 , h } ] .    □
Next, we would like to show that | A f ( l , w ) | = a l , w . Before that, let us make some simple observations. Since M is the prefix–suffix compositions of h strings of length n, for each l [ n ] , the number of prefixes and suffixes of length l with weights in l is equal to 2 h . Moreover, since the h strings are of the same weight w ¯ , for each l [ n ] and w w ¯ , the number of prefixes and suffixes of length l with weight w is the same as the number of suffixes and prefixes of length n l with weight w ¯ w . These observations extend to the case where l = 0 since a 0 , 0 = 2 h by Definition 12. Thus, we have the following proposition.
Proposition 8. 
(i) 
w = 0 l a l , w = 2 h for all l n .
(ii) 
If M is the prefix–suffix composition of strings with constant weight, then a l , w = a n l , w ¯ w for all l n and w w ¯ .
Claim 3. 
For ( l , w ) n 2 , it holds that | A f ( l , w ) | = a l , w .
Proof. 
From Lines 7 and 8 in Algorithm 1, we have | A f ( 0 , 0 ) | = | A f ( n , w ¯ ) | = 2 h = a 0 , 0 = a n , w ¯ . In what follows, let l [ n 1 ] . It is clear that | A f ( l , w ) | = 0 = a l , w for all w > l . Since s l , w + 1 increases from 0 to at most 2 h as w goes from l to 1, there exists w [ l ] such that s l , w + 1 < h and s l , w h . From Lines 12 and 13 in the algorithm, for each w [ w + 1 , l ] , we have
A f ( l , w ) = { 2 k 1 1 + s l , w + 1 k a l , w + s l , w + 1 } .
Therefore, for all l [ n 1 ] and w [ w + 1 , l ] , we have | A f ( l , w ) | = a l , w .
Consider the case where w [ 0 , w 1 ] . Since s l , w + 1 s l , w h , we have
h 2 h s l , w + 1 = 2 h v = w + 1 w ¯ a l , v
= 2 h v = w + 1 w ¯ a n l , w ¯ v
= 2 h v = 0 w ¯ w 1 a n l , v
= v = 0 n l a n l , v v = 0 w ¯ w 1 a n l , v
= v = w ¯ w n l a n l , v = s n l , w ¯ w ,
where (9) follows by (ii) in Propositions 8 and (10) follows by Proposition (i) in Proposition 8. From Lines 12 and 14, for each w [ 0 , w 1 ] we have
A f ( l , w ) = { 2 k 1 + s n l , w ¯ w + 1 k a n l , w ¯ w + s n l , w ¯ w + 1 } .
Therefore, for all l [ n 1 ] and w [ 0 , w 1 ] , we have | A f ( l , w ) | = a n l , w ¯ w = a l , w
Lastly, note that a l , w + s l , w + 1 h , and similar to the above calculations that lead to h s n l , w ¯ w for w [ 0 , w 1 ] , one can also obtain h < 2 h s l , w + 1 = a n l , w ¯ w + s n l , w ¯ w + 1 . Then, from Lines 12, 13, and 14 we have
A f ( l , w ) = { 2 k 1 1 + s l , w + 1 k h } { 2 k 1 + s n l , w ¯ w + 1 k h } .
Therefore, for all l [ n 1 ] , we have
| A f ( l , w ) | = h s l , w + 1 + h s n l , w ¯ w + 1 = 2 h v = w + 1 w ¯ a l , v v = w ¯ w + 1 w ¯ a n l , v
= 2 h v = w + 1 w ¯ a l , v v = w ¯ w + 1 w ¯ a l , w ¯ v
= 2 h v = w + 1 w ¯ a l , v v = 0 w 1 a l , v
= a l , w ,
where (11) follows by (ii) in Propositions 8 and (12) follows by Proposition (i) in Proposition 8. Hence, for all l [ n 1 ] and w [ l ] , we have | A f ( l , w ) | = a l , w .    □
As a consequence of Claims 2 and 3, we have the following theorem.
Theorem 4. 
The output of Algorithm 1 is a multiset of strings compatible with M.
Algorithm 1 is an efficient algorithm with time complexity O ( n h ) , although it can only produce one multiset compatible with M, so it may not be helpful if one desires all compatible multisets. Nevertheless, let us mention one important application of Algorithm 1. In Theorem 1, the necessary and sufficient conditions for unique reconstruction given the prefix–suffix compositions M are described in terms of a CWF rather than M itself. Therefore, to determine the unique reconstructibility of M using Theorem 1, it is necessary that one should be able to come up with a CWF solution to M. Algorithm 1 does exactly what is needed for this purpose.
Moreover, when one has a CWF f solution to M at hand, in view of Lemmas 1 and 2, it is tempting to use the swap operation as defined in Definition 8 to enumerate all possible compatible multisets up to reversal. However, it is, in general, not easy to keep track of the swap operations. In the next subsection, we take a different route to constructing all compatible multisets by utilizing the inherent symmetry of the constant-length constant-weight strings, bypassing the difficulty brought about by the complexity of swap operations.

4.2. An Algorithm That Outputs All Multisets of Strings Compatible with M

As mentioned before, to find a multiset of strings compatible with M, one may plot the elements of multiset M on a two-dimensional grid and construct a CWF f such that it passes each point ( l , w ) exactly a l , w times on the grid. Moreover, one may infer the behavior of the component functions { f m } from the numbers { b l , w } , { c l , w } . Therefore, to obtain all possible multisets of strings (up to reversal) that are compatible with M, one may examine all possible behaviors of { f m } based on { a l , w } , { b l , w } , { c l , w } .
Since all h strings that give rise to M have the same length n and the same weight w ¯ , the graph of the component function f m is the same as that of f m when f m is rotated 180 degrees around ( n / 2 , w ¯ / 2 ) . As a result of this rotational symmetry, given the values of f m ( l ) for all m [ 2 h ] and l [ 0 , n / 2 ] , the remaining values of f m ( l ) , l [ n / 2 + 1 , n ] can be fully determined for all m [ 2 h ] . Thus, it suffices to reconstruct all possible { f m } from the midpoint n / 2 to 0 and then extend them from n / 2 to n. However, there is one catch. The reason why such an extension is possible is that f m , f m captures the running weight starting from the two ends of a single string. However, for functions g i , g j : n / 2 w ¯ , i j reconstructed from M, it is, in general, not clear whether g i , g j capture the weight information of the same string. Nevertheless, by the rotational symmetry, g i and g j capture the weight information of the same string only if their median weights sum to w ¯ when they are extended. Therefore, one may identify h pairs of functions from the 2 h component functions { g i } reconstructed from M such that the sum of median weights within each pair is w ¯ . With the identification of such pairs, the resulting CWF formed by { g i } corresponds to a multiset of strings compatible with M. Thus, to obtain all compatible multisets, one needs to enumerate all possible ways of forming pairs that satisfy the median weight constraint.
Based on the above discussion, our algorithm of constructing all multisets of strings compatible with M are divided into two stages. In the first stage, which we call the scan stage, all possible “half strings” are generated based on M. In the second stage, which we call the assembly stage, pairs of “half strings” are combined to form “full strings”. The details of the two stages are described below. For ease of discussion, below, the subscript of the component functions will be referred to as the label.

4.2.1. Scan Stage

In the scan stage, we keep track of the behaviors of the component functions from the midpoint n / 2 to 0. Consider the case where n is even. For each w w ¯ , a n / 2 , w indicates the number of component functions that evaluate to w at n / 2 . Moreover, we have w = 0 w ¯ a n / 2 , w = 2 h . As there are 2 h component functions, we may partition the 2 h labels into disjoint subsets of sizes a n / 2 , w , w w ¯ . If n is odd, a n / 2 , w is undefined, but the behavior of the component functions at n / 2 can be determined by b n / 2 , w , c n / 2 , w . Since w = 0 w ¯ ( b n / 2 , w + c n / 2 , w ) = 2 h , we can partition the 2 h labels into disjoint subsets of sizes b n / 2 , w , c n / 2 , w , w w ¯ . More precisely, as the first step of the scan stage, we construct a collection { P ( t / 2 ) t = 0 , , 2 w ¯ } of disjoint subsets of [ 2 h ] such that t = 0 2 w ¯ P ( t / 2 ) = [ 2 h ] and that if n is even,
| P ( t / 2 ) | = a n / 2 , t / 2 , t is even , | P ( t / 2 ) | = 0 , t is odd ;
if n is odd,
| P ( t / 2 ) | = b n / 2 , t / 2 , t is even , | P ( t / 2 ) | = c n / 2 , t / 2 , t is odd .
Observe that the elements in the set P ( t / 2 ) are the labels of the component functions whose median weight equals t / 2 . Therefore, we basically reconstruct the values of 2 h component functions at n / 2 by constructing the collection { P ( t / 2 ) } . Given the values of the component functions at n / 2 , we reconstruct their values at l n / 2 according to { b l , w } , { c l , w } as l goes from n / 2 to 0. Specifically, we keep track of the labels of the component functions as we assign values to the component functions at l = n / 2 , , 0 according to { b l , w } , { c l , w } , and obtain finer partitions of the 2 h labels as l goes to 0. The bookkeeping of the partitions is done by a function F that maps each ( l , w ) n / 2 × w ¯ to a collection of disjoint nonempty subsets of the 2 h labels. The labels in these disjoint subsets correspond to component functions that evaluate to w at l. Moreover, the subsets in F ( l , w ) , w w ¯ are all disjoint, and we have w = 0 w ¯ J F ( l , w ) J = [ 2 h ] for l n / 2 . The construction of F is described below.
By construction of { P ( t / 2 ) } , the component functions that evaluate to w ¯ at n / 2 are those with labels in P ( w ¯ ) , and for each w w ¯ 1 , the component functions that evaluate to w at n / 2 are those with labels in P ( w ) and P ( w + 1 / 2 ) . Therefore,
F ( n / 2 , w ¯ ) = { P ( w ¯ ) } , F ( n / 2 , w ) = { P ( w ) , P ( w + 1 / 2 ) } , w w ¯ 1 .
As the value of a component function at l 1 may remain the same as or decrease by one from the value at l, given F ( l , w ) , w w ¯ , we can further partition each subset in F ( l , w ) , w w ¯ into two subsets of sizes b l , w , c l , w for l = n / 2 , , 0 . Eventually, we obtain the set F ( 0 , 0 ) in which every element is a subset of the 2 h labels for which the corresponding component functions have exactly the same values at l = 0 , , n / 2 . Moreover, component functions with labels in different elements in F ( 0 , 0 ) are not equal. At this point, the behaviors of the 2 h component functions are determined over n / 2 . In particular, one can define 2 h component functions { g m } over n / 2 to be
g m ( l ) = w if m J F ( l , w ) J .
Note that there are different ways of partitioning subsets in F ( l , w ) , w w ¯ , and each of them leads to a distinct F. However, we are only interested in those F’s that result in distinct “half strings”, i.e., distinct multiset { g m } . In other words, we only care about the number of labels for which the corresponding component functions are the same over n / 2 . In fact, this is the reason why we only stipulate the size of the subsets in the initial partition { P ( t / 2 ) } . In order to construct all possible F, each of which leads to a distinct multiset { g m } , we need to enumerate different ways of partitioning subsets in F ( l , w ) , w w ¯ . This is accomplished as follows. Let q = | F ( l , w ) | and write F ( l , w ) = { J 1 , , J q } . Further, let K i J i be the labels for which the corresponding component functions have a value equal to w 1 at l 1 . Denote | K i | by x i . Since c l , w is the number of component functions that have values equal to w 1 at l 1 and have values equal to w at l, we have
i = 1 q x i = c l , w .
Every solution to (15) such that x i | J i | gives rise to a distinct partition of the subsets in F ( l , w ) . By enumerating all possible solutions to (15) for every l [ n / 2 ] and w w ¯ , we are able to find the set F of all possible F that leads to distinct { g m } via a breadth-first search. The scan stage is formally stated in Algorithm 2.
As a consequence of the scan stage, we obtain a set of all possible “half strings” from M in the sense of the following claim.
Claim 5. 
Let { t j j [ h ] } be a multiset of strings compatible with M and define the multiset of length- n / 2 prefixes and suffixes of t j , j [ h ] to be S = { s 2 j 1 = t j [ n / 2 ] , s 2 j = t j [ n / 2 ] j [ h ] } . Let S be the underlying set of S, i.e., S is the set of distinct strings in S. Then, there exists F F output by Algorithm 2 such that there is a bijection between F ( 0 , 0 ) and S that maps J F ( 0 , 0 ) to s S with | J | = | { m s m = s , s m S , m [ 2 h ] } | .
In other words, there exists F F such that every element J in F ( 0 , 0 ) can be identified with a distinct string s in S whose multiplicity in S equals | J | .
Proof. 
Let f be the CWF induced by { t j j [ h ] } with f 2 j 1 being induced by the running weight of t j and f 2 j by the running weight of t j . Then, f is a solution to M. Let us construct a function F ˜ that maps each ( l , w ) n / 2 × w ¯ to a collection of disjoint nonempty subsets of [ 2 h ] dependent on f. Given M, we can construct a collection { P ( t / 2 ) t 2 w ¯ } of disjoint subsets of [ 2 h ] such that t = 0 2 w ¯ P ( t / 2 ) = [ 2 h ] and that satisfies (13) or (14) based on the parity of n. Furthermore, there exists a permutation on [ 2 h ] such that P ( t / 2 ) is formed by m [ 2 h ] for which med ( f m ) = t / 2 . Define
F ˜ ( n / 2 , w ¯ ) = { P ( w ¯ ) } , F ˜ ( n / 2 , w ) = { P ( w ) , P ( w + 1 / 2 ) } , w w ¯ 1 .
For l [ n / 2 ] , define
F ˜ ( l 1 , w ¯ ) = { A f ( l 1 , w ¯ ) J J F ˜ ( l , w ¯ ) } , F ˜ ( l 1 , w ) = { A f ( l 1 , w ) J J F ˜ ( l , w ) F ˜ ( l , w + 1 ) } , w w ¯ 1 ,
where A f ( l , w ) is as given in Definition 11. Moreover, we exclude the empty set in F ˜ ( l , w ) for each ( l , w ) . It follows that J ˜ F ˜ ( l , w ) J ˜ = A f ( l , w ) for l n / 2 , w w ¯ . In addition, m , m [ 2 h ] are in the same set J ˜ F ˜ ( l , w ) if and only if the component functions f m and f m have the same graph over [ l , n / 2 ] . Therefore, | F ˜ ( 0 , 0 ) | equals the number of distinct graphs over n / 2 of f m , m [ 2 h ] , i.e., | F ˜ ( 0 , 0 ) | = | S | . Furthermore, there is a bijection between F ˜ ( 0 , 0 ) and S that maps J ˜ F ˜ ( 0 , 0 ) to s S with J ˜ = { m s m = s , s m S , m [ 2 h ] } .
The set F output by Algorithm 2 is the set of bookkeeping functions F that keep track of all admissible behaviors of the component functions given M. Moreover, every element in F ( 0 , 0 ) is a subset of the 2 h labels for which the corresponding component functions have the same graph over n / 2 . The construction of P ( t / 2 ) in Line 7 and K i in Line 22 in Algorithm 2 is oblivious of which labels in [ 2 h ] to choose but dependent on the admissible sizes of the sets. Since the size of P ( t / 2 ) must satisfy (13), (14) and the set X constructed on Line 20 enumerates all admissible sizes for K i , there exists F F such that | F ( 0 , 0 ) | = | F ˜ ( 0 , 0 ) | and a bijection between F ( 0 , 0 ) and F ˜ ( 0 , 0 ) that maps J F ( 0 , 0 ) to J ˜ F ˜ ( 0 , 0 ) with | J | = | J ˜ | . Therefore, there are bijections between F ˜ ( 0 , 0 ) , S and between F ( 0 , 0 ) , F ˜ ( 0 , 0 ) and it follows that there is a bijection between F ( 0 , 0 ) , S .    □

4.2.2. Assembly Stage

In the assembly stage, we construct CWFs for each F F by identifying pairs in { g m } whose median weights sum to w ¯ . As mentioned in the scan stage, F ( 0 , 0 ) is a partition of [ 2 h ] , and for each J F ( 0 , 0 ) , the component functions with labels in J have the same graph over n / 2 . As we would like to form pairs of component functions based on their median weights, it is helpful to group the elements of F ( 0 , 0 ) based on the median weight. More precisely, for each possible median weight w = 0 , 1 / 2 , 1 , , w ¯ , we construct a collection R w of sets for which the corresponding component functions have median weight w, given by
R w = { J J F ( 0 , 0 ) , J P ( w ) } .
Let r w = | R w | . Since different elements in F ( 0 , 0 ) correspond to component functions with different graphs, r w is the number of distinct component functions that have median weight w. Moreover, each element R w , i R w , i [ r w ] is a set of labels for which the corresponding component functions have median weight w and the same graph over n / 2 .
By the rotational symmetry, two component functions capture the weight information of the same string only if their median weights sum to w ¯ . Therefore, a label in R w , i R w , i [ r w ] must be paired with a label in R w ¯ w , j R w ¯ w , j [ r w ¯ w ] in order to combine two “half strings” into a single “full string”. Formally, the pairing of labels can be described by a permutation σ on [ 2 h ] such that if u R w , i is paired with v R w ¯ w , j , then σ ( u ) = m , σ ( v ) = m for some m [ 2 h ] , i.e., σ ( u ) = σ ( v ) .
Algorithm 2 Scan stage
Entropy 27 00039 i002
To enumerate all possible methods of forming pairs that satisfy the median weight constraint, we need to consider different methods of pairing a component function of median weight w { 0 , 1 / 2 , 1 , , w ¯ } with a component function of median weight w ¯ w . Let us first consider the case where w { 0 , 1 / 2 , 1 , , ( w ¯ 1 ) / 2 } . For i [ r w ] , j [ r w ¯ w ] , let y w , i , j be the number of labels chosen in R w , i to be paired with labels in R w ¯ w , j . Then, ( y w , i , j ) i [ r w ] , j [ r w ¯ w ] must satisfy
i = 1 r w y w , i , j = | R w ¯ w , j | , j [ r w ¯ w ] ,
j = 1 r w ¯ w y w , i , j = | R w , i | , i [ r w ] .
For each solution ( y w , i , j ) i [ r w ] , j [ r w ¯ w ] to (16) and (17), we partition R w ¯ w , j into disjoint subsets { V w , i , j i [ r w ] } and R w , i into disjoint subsets { U w , i , j j [ r w ¯ w ] } such that | V w , i , j | = | U w , i , j | = y w , i , j . The labels in V w , i , j are then paired with the labels in U w , i , j .
Consider the case where w = w ¯ / 2 . In this case, the labels in R w ¯ / 2 need to be paired with each other so we have a slightly different integer partition problem. For i [ r w ¯ / 2 ] , j [ r w ¯ / 2 ] , let y w ¯ / 2 , i , j be the number of labels chosen in R w ¯ / 2 , i R w ¯ / 2 to be paired with labels in R w ¯ / 2 , j R w ¯ / 2 . Then, y w ¯ / 2 , i , i must be even for all i and y w ¯ / 2 , i , j = y w ¯ / 2 , j , i for all i j . Moreover, ( y w ¯ / 2 , i , j ) i [ r w ¯ / 2 ] , j [ r w ¯ / 2 ] must satisfy
j = 1 r w ¯ / 2 y w ¯ / 2 , i , j = | R w ¯ / 2 , i | , i [ r w ¯ / 2 ] .
For each solution ( y w ¯ / 2 , i , j ) i [ r w ¯ / 2 ] , j [ r w ¯ / 2 ] to (18), we partition R w ¯ / 2 , j into disjoint subsets { U w ¯ / 2 , i , j j [ r w ¯ / 2 ] } such that | U w ¯ / 2 , i , j | = y w ¯ / 2 , i , j . The labels in U w ¯ / 2 , j , i are then paired with the labels in U w ¯ / 2 , i , j for i j , and the labels in U w ¯ / 2 , i , i are organized into y w ¯ / 2 , i , i / 2 pairs arbitrarily.
Let Y w = { ( y w , i , j ) i [ r w ] , j [ r w ¯ w ] } be the set of all solutions to the integer partition problem associated with w { 0 , 1 / 2 , 1 , , w ¯ / 2 } , and let Y = Y 0 × Y 1 / 2 × Y 1 × × Y w ¯ / 2 . Then, each ( y w , i , j ) Y corresponds to a distinct method of forming pairs of the component functions such that the median weight constraint is satisfied. Specifically, since R t / 2 , i , t 2 w ¯ , i [ r t / 2 ] are disjoint and
t = 0 2 w ¯ i = 1 r t / 2 R t / 2 , i = J F ( 0 , 0 ) J = [ 2 h ] ,
one can easily define a permutation σ on [ 2 h ] such that if u R w , i is paired with v R w ¯ w , j then σ ( u ) = m , σ ( v ) = m for some m [ 2 h ] . Furthermore, given σ , a CWF f can be determined by combining the paired component functions, i.e., those with labels u , v [ 2 h ] satisfying σ ( u ) = σ ( v ) . The corresponding multiset H f can then be found using Definition 4. The details are presented in Algorithm 3.
Theorem 6. 
The output H of running Algorithm 2 followed by Algorithm 3 is the set of all multisets compatible with M up to reversal.
Proof. 
Let { t j j [ h ] } be a multiset of strings compatible with M and define the multiset of length- n / 2 prefixes and suffixes of t j , j [ h ] to be S = { s 2 j 1 = t j [ n / 2 ] , s 2 j = t j [ n / 2 ] j [ h ] } . Let S be the underlying set of S. By Claim 5, there exists F F output by Algorithm 2 such that there is a bijection π between F ( 0 , 0 ) and S that maps J F ( 0 , 0 ) to s S with | J | = | { m s m = s , s m S , m [ 2 h ] } | . Denote the set J mapped to s under π by J s and denote { m s m = s , s m S , m [ 2 h ] } by I s . Since [ 2 h ] = s S J s = s S I s , a permutation σ ˜ on [ 2 h ] can be further constructed such that it is a bijection between J s and I s for every s S .
In Algorithm 3, given F, all possible permutations for pairing labels in R w and R w ¯ w for all w { 0 , 1 / 2 , 1 , , w ¯ / 2 } are found. In particular, there exists a permutation σ such that for any u , v [ 2 h ] satisfying σ ˜ ( u ) = σ ˜ ( v ) , it holds that σ ( u ) = σ ( v ) . The way σ is constructed is shown on Lines 23 to 24, 30, and 33 to 34 in Algorithm 3. Next, a function f is constructed according to F , σ on Lines 37 to 44 in Algorithm 3. It is easy to verify f is a CWF. The way that σ is constructed ensures that the multiset H f constructed on Lines 46 in Algorithm 3 satisfies H f { t j j [ h ] } , i.e., { t j j [ h ] } [ H f ] . It follows that any multiset compatible with M is in the same equivalent class of some element in the output H .
It remains to check that the elements in H are all distinct. In fact, let us show that the CWFs constructed in Algorithm 3 as multisets { f m } are distinct. Let F 1 , F 2 F with F 1 F 2 . Then, F 1 , F 2 correspond to distinct sets of “half strings”, and any pairing permutations σ 1 , σ 2 admissible for F 1 , F 2 , respectively, lead to distinct multisets of component functions. Furthermore, if σ 1 , σ 2 are two different pairing permutations admissible for F 1 , then the two multisets of component functions resulted from σ 1 , σ 2 are also different since each element R w , i R w corresponds to a distinct “half string”. Therefore, all CWFs constructed in Algorithm 3 are distinct as multisets. Moreover, since a multiset and its reversals induce the same multiset of component functions, if a multiset is in H , any of its reversals are not in H . Hence, H is a set of all multisets compatible with M up to reversal.    □
We end this section with an example of running Algorithms 2 and 3, and a checklist in Table 4 for some important notations used for discussing the algorithms.
Algorithm 3 Assembly stage
Entropy 27 00039 i003
Example 10. 
Consider the multiset U = { 110101 , 110101 , 101110 } given in Example 1. Let us go through Algorithms 2 and 3 to find all multisets compatible with M ( U ) (up to reversal). Note that n = 6 = a 3 , 2 ; thus, right before the steps for finding finer partitions in Algorithm 2, we have F ( 3 , 2 ) = { P ( 2 ) } , where P ( 2 ) = [ 6 ] , and F ( l , w ) is empty for the other values of ( l , w ) . Next, let us look into the nested for-loops to find finer partitions.
  • For l = 3 , w = 2 : Note that q = | F ( 3 , 2 ) | = 1 , and x 1 = c 3 , 2 = 4 . Choose K 1 P ( 2 ) to be [ 4 ] , a size-4 subset of P ( 2 ) . Construct G ( 2 , 2 ) = { [ 6 ] K 1 } , G ( 2 , 1 ) = { K 1 } , and let F = { G } .
  • For l = 2 , w = 2 : Take the only element F F . Note that q = | F ( 2 , 2 ) | = 1 and x 1 = c 2 , 2 = 2 . Then, K 1 = [ 5 , 6 ] F ( 2 , 2 ) . Construct G ( 1 , 1 ) = { K 1 } , and let F = { G } .
  • For l = 2 , w = 1 : Take the only element F F . Note that q = | F ( 2 , 1 ) | = 1 and x 1 = c 2 , 1 = 1 . Choose K 1 [ 4 ] F ( 2 , 1 ) to be { 1 } , a size-1 subset of [ 4 ] . Construct G ( 1 , 1 ) = F ( 1 , 1 ) ( [ 4 ] K 1 ) = { [ 5 , 6 ] , [ 2 , 4 ] } , G ( 1 , 0 ) = { K 1 } , and let F = { G } .
  • For l = 1 , w = 1 : Take the only element F F . In this step, q = | F ( 1 , 1 ) | = 2 and c 1 , 1 = 5 . Take J 1 = [ 2 , 4 ] , J 2 = [ 5 , 6 ] . The equation in Line 20 of Algorithm 2 becomes x 1 + x 2 = 5 , where x 1 { 0 , 1 , 2 , 3 } and x 2 { 0 , 1 , 2 } . The only solution to this equation is x 1 = 3 and x 2 = 2 . Construct G ( 0 , 0 ) = F ( 1 , 1 ) and let F = { G } .
  • For l = 1 , w = 0 : Take the only element F F . Note that q = | F ( 1 , 0 ) | = 1 and x 1 = c 1 , 0 = 0 . Construct G ( 0 , 0 ) = F ( 0 , 0 ) F ( 1 , 0 ) = { { 1 } , { 2 , 3 , 4 } , { 5 , 6 } } and output F = { G } .
At this point, Algorithm 2 terminates, and we obtain F that contains only one element. Let us call this element F. Observe that F contains three subsets of [ 6 ] : { 1 } , { 2 , 3 , 4 } , { 5 , 6 } . Tracing back how these three sets are generated, we see that { 1 } , { 5 , 6 } , { 2 , 3 , 4 } correspond to three “half strings”: 011, 110, 101, respectively.
As the first step of Algorithm 3, we need to construct R w for w { 0 , 1 / 2 , 1 , , w ¯ = 4 } . Since P ( w ) = for w 2 and P ( 2 ) = [ 6 ] , the only nonempty set among the R w ’s is R 2 = F ( 0 , 0 ) and r 2 = | R 2 | = 3 . To proceed, noticing that w ¯ / 2 = 2 , we need to find the set Y w ¯ / 2 = { ( y i , j ) i , j { 1 , 2 , 3 } , y i , i 2 N , y i , j = y j , i N , and ( 18 ) is satisfied } . (Here, we omit the first subscript w ¯ / 2 of y.) Since r w ¯ / 2 = 3 , from (18), we have the following three equations:
y 1 , 1 + y 1 , 2 + y 1 , 3 = 1 , y 2 , 1 + y 2 , 2 + y 2 , 3 = 2 , y 3 , 1 + y 3 , 2 + y 3 , 3 = 3 .
Here, we may take R w ¯ / 2 , 1 = { 1 } , R w ¯ / 2 , 2 = { 5 , 6 } , R w ¯ / 2 , 3 = { 2 , 3 , 4 } . The remaining steps in Algorithm 3 essentially pair up “half strings”. Each solution in Y 0 × Y 1 / 2 × Y 1 × × Y w ¯ / 2 indicates a way to assemble them. Since the only nonempty set among the R w ’s is R 2 , the Y w ’s for which w w ¯ / 2 are trivial. By calculation, Y w ¯ / 2 contains three feasible solutions:
y 1 , 1 y 1 , 2 y 1 , 3 y 2 , 1 y 2 , 2 y 3 , 2 y 3 , 1 y 3 , 2 y 3 , 3 = 0 0 1 0 2 0 1 0 2 or 0 1 0 1 0 1 0 1 2 or 0 0 1 0 0 2 1 2 0 .
The first solution suggests that we combine the first half string 011 with the reversal of the third one 101, resulting in a full string 011101. It also suggests that we combine two half strings 110 into 110011 (the second half comes from the reversal of 110) and that we combine two half strings 101 into 101101. Thus, the first solution can generate a multiset of strings  H 1 = { 011101 , 110011 , 101101 } that is compatible with M ( U ) . Similarly, the second solution gives H 2 = { 011011 , 110101 , 101101 } . Lastly, according to the third solution, we have another multiset H 3 = { 011101 , 110101 , 110101 } .
In summary, the output of Algorithm 3 is H = { H 1 , H 2 , H 3 } . According to Theorem 6, this gives all multisets compatible with M ( U ) up to reversal.

5. Concluding Remarks

We propose to use cumulative weight functions to describe the prefix–suffix compositions of a multiset of binary strings and facilitate this description to derive necessary and sufficient conditions for the unique reconstruction of multisets of strings of the same weight up to reversal. Moreover, two reconstruction algorithms are presented. One is an efficient algorithm that outputs one multiset of strings compatible with the given prefix–suffix compositions and can be used to assist in determining the unique reconstructibility of the given compositions. The other one is able to output all admissible multisets up to reversal that are compatible with the given compositions.
Many problems in the reconstruction of multiple strings remain open. For example, can one lift the constant-weight assumption and characterize the conditions for the unique reconstruction of multiple strings from prefix–suffix compositions? In addition, if the prefix–suffix compositions are erroneous, can one design low-redundancy encoding schemes for the strings such that they can be recovered efficiently?

Author Contributions

Conceptualization, Z.C.; Formal analysis, Y.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported in part by the Basic Research Project of Hetao Shenzhen-Hong Kong Science and Technology Cooperation Zone under Project HZQB-KCZYZ-2021067, the Guangdong Provincial Key Laboratory of Future Network of Intelligence under Project 2022B1212010001, the National Natural Science Foundation of China under Grant 62201487, and the Shenzhen Science and Technology Stable Support Program.

Data Availability Statement

Data sharing is not applicable to this article because the work is entirely theoretical, involving only mathematical statements and proofs.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof of Lemma 5

Recall that m A ( w ¯ / 2 ) if and only if m A ( w ¯ / 2 ) so the size of A ( w ¯ / 2 ) must be even. The case where | A ( w ¯ / 2 ) | = 0 is vacuously true and the case where | A ( w ¯ / 2 ) | = 2 follows from Proposition 5. Below we assume | A ( w ¯ / 2 ) | 4 .
Let m ˜ A ( w ¯ / 2 ) . If f m ˜ f m ˜ , then by Proposition 5, there are exactly two maximal intervals between f m ˜ and f m ˜ . Let m A ( w ¯ / 2 ) { m ˜ , m ˜ } . By the conditions in Theorem 1, there is at most one maximal interval between f m and f m ˜ . We claim that there is exactly one maximal interval between them. Indeed, if f m = f m ˜ , then there are two maximal intervals between f m and f m ˜ , contradicting the conditions in Theorem 1. Similarly, one can show that there is exactly one maximal interval between f m and f m ˜ .
Suppose G ( f m , [ n / 2 ] ) G ( f m ˜ , [ n / 2 ] ) and G ( f m , [ n / 2 ] ) G ( f m ˜ , [ n / 2 ] ) . Since there is exactly one maximal interval between f m , f m ˜ and exactly one maximal between f m , f m ˜ , it is necessary that G ( f m , [ n / 2 + 1 , n ] ) = G ( f m ˜ , [ n / 2 + 1 , n ] ) = G ( f m ˜ , [ n / 2 + 1 , n ] ) . However, by Proposition 5, there is a maximal interval between f m ˜ , f m ˜ that is contained in [ n / 2 + 1 , n ] , leading to a contradiction. Therefore, G ( f m , [ n / 2 ] ) = G ( f m ˜ , [ n / 2 ] ) or G ( f m , [ n / 2 ] ) = G ( f m ˜ , [ n / 2 ] ) for all m A ( w ¯ / 2 ) { m ˜ , m ˜ } .
Let a , b A ( w ¯ / 2 ) { m ˜ , m ˜ } . Suppose G ( f a , [ n / 2 ] ) G ( f b , [ n / 2 ] ) . Without loss of generality, we may assume further that G ( f a , [ n / 2 ] ) = G ( f m ˜ , [ n / 2 ] ) and G ( f b , [ n / 2 ] ) = G ( f m ˜ , [ n / 2 ] ) . Therefore, there is a maximal interval contained [ n / 2 ] between f a , f m ˜ . Then, by the conditions in Theorem 1, we have G ( f a , [ n / 2 + 1 , n ] ) = G ( f m ˜ , [ n / 2 + 1 , n ] ) . Similarly, we also have G ( f b , [ n / 2 + 1 , n ] ) = G ( f m ˜ , [ n / 2 + 1 , n ] ) . It follows that a b , and there are two maximal intervals between f a , f b , contradicting the conditions in Theorem 1. Therefore, G ( f a , [ n / 2 ] ) = G ( f b , [ n / 2 ] ) for all a , b A ( w ¯ / 2 ) { m ˜ , m ˜ } .
Lastly, consider the case where f m ˜ = f m ˜ . Let m A ( w ¯ / 2 ) { m ˜ , m ˜ } be such that f m f m ˜ . Since f m f m ˜ , by definition of A ( w ¯ / 2 ) and the conditions in Theorem 1, there exists exactly one maximal interval between f m , f m ˜ that is either contained in [ n / 2 ] or [ n / 2 + 1 , n ] . Without loss of generality, assume G ( f m , [ n / 2 ] ) G ( f m ˜ , [ n / 2 ] ) and G ( f m , [ n / 2 + 1 , n ] ) = G ( f m ˜ , [ n / 2 + 1 , n ] ) . By the 180-degree rotational symmetry of f m , f m ˜ and f m , f m ˜ , we have G ( f m , [ n / 2 ] ) = G ( f m ˜ , [ n / 2 ] ) and G ( f m , [ n / 2 + 1 , n ] ) G ( f m ˜ , [ n / 2 + 1 , n ] ) . Since f m ˜ = f m ˜ , we have G ( f m , [ n / 2 ] ) G ( f m , [ n / 2 ] ) and G ( f m , [ n / 2 + 1 , n ] ) G ( f m , [ n / 2 + 1 , n ] ) . Therefore, there exist two maximal intervals between f m , f m . The remainder of the proof for this case follows similarly to the case where f m ˜ f m ˜ .

Appendix B. Proof of Lemma 8

Consider the case where f m , m A ( w ¯ / 2 ) are all the same. By Lemma 4, there are no branching points in G ( f m , [ n / 2 ] ) for all m A f ( w ¯ / 2 ) .
Let m A f ( w ¯ / 2 ) . Note that for any m A f ( w ¯ / 2 ) , we have f m ( n / 2 ) = f m ( n / 2 ) . Therefore, by Proposition 6, we have G ( f m , [ 0 , n / 2 ] ) = G ( f m , [ 0 , n / 2 ] ) for all m A f ( w ¯ / 2 ) and all m A f ( w ¯ / 2 ) . Recall that m A f ( w ¯ / 2 ) if and only if m A f ( w ¯ / 2 ) , and G ( f m , [ n / 2 , n ] ) is the same as G ( f m , [ 0 , n / 2 ] ) when rotated 180 degrees about ( n / 2 , w ¯ / 2 ) . Moreover, the same holds for m A f ( w ¯ / 2 ) . It follows that G ( f m , [ n / 2 , n ] ) = G ( f m , [ n / 2 , n ] ) for all m A f ( w ¯ / 2 ) and all m A f ( w ¯ / 2 ) . Thus, f m = f m for all m A f ( w ¯ / 2 ) and all m A f ( w ¯ / 2 ) . Since f , f are solutions to M, we have | A f ( w ¯ / 2 ) | = | A f ( w ¯ / 2 ) | . Therefore, if f m , m A ( w ¯ / 2 ) are all the same, then ψ 0 ( f ) = ψ 0 ( f ) .
Consider the case where f m , m A ( w ¯ / 2 ) are not all the same. By Lemma 5, there exists m 1 A ( w ¯ / 2 ) such that there is a maximal interval [ l 1 + 1 , l 2 1 ] [ n ] between f m 1 , f m 1 , where l 2 n / 2 . Moreover by Lemma 6, ( l 2 , f m 1 ( l 2 ) ) is the only branching point in G ( f m , [ n / 2 ] ) and there is no merging point in G ( f m , [ l 2 , n / 2 ] ) for all m A ( w ¯ / 2 ) .
Let m A f ( w ¯ / 2 ) . Note that ( l 2 , f m 1 ( l 2 ) ) is the branching point in G ( f m ) such that l 2 l for any branching point ( l , w ) G ( f m ) . Similar to the proof of Lemma 7, let ( l , w ) = ( l 2 , f m 1 ( l 2 ) ) and
r = 0 if w = f m ( l 1 ) , 1 if w = f m ( l 1 ) + 1 .
By definition of r, we have m A f ( w ¯ / 2 ) A f ( l , w ) A f ( l 1 , w r ) .
Let m A f ( w ¯ / 2 ) A f ( l , w ) A f ( l 1 , w r ) . In the following we will show G ( f m , [ n / 2 ] ) = G ( f m , [ n / 2 ] ) . For notational convenience, let us define f ^ m = G ( f m , [ n / 2 ] ) and f ^ m = G ( f m , [ n / 2 ] ) . Further, define ψ ^ 0 ( f ) = { f ^ m m A f ( w ¯ / 2 ) } and ψ ^ 0 ( f ) = { f ^ m m A f ( w ¯ / 2 ) } .
By definition of l , there is no branching point in G ( f m , [ l 1 ] ) . Since f m ( l 1 ) = f m ( l 1 ) , by Proposition 6, we have f m ( l ) = f m ( l ) for any l [ 0 , l 1 ] . Suppose f m ( l ) f m ( l ) for some l [ l , n / 2 ] . Then, by Proposition 7, there is a merging point in G ( f m , [ l , l 1 ] ) . However, there is no merging point in G ( f m , [ l = l 2 , n / 2 ] ) , which is a contradiction. Thus, f m ( l ) = f m ( l ) for all l [ l , n / 2 ] . It follows that f ^ m = f ^ m for any m A f ( w ¯ / 2 ) A f ( l , w ) A f ( l 1 , w r ) .
In the following, we will show A f ( w ¯ / 2 ) A f ( l , w ) A f ( l 1 , w r ) . Toward a contradiction, suppose that there exists m 0 ( [ 2 h ] A f ( w ¯ / 2 ) ) A f ( l , w ) A f ( l 1 , w r ) . Then, f m 0 ( l ) = f m ( l ) and f m 0 ( l 1 ) = f m ( l 1 ) . Note that med ( f m 0 ) med ( f m ) . Since l n / 2 , by Proposition 7, there must be a merging point in G ( f m , [ l , n / 2 ] ) , which contradicts Lemma 6. Therefore, A f ( w ¯ / 2 ) A f ( l , w ) A f ( l 1 , w r ) .
Note that for any m A f ( w ¯ / 2 ) ( A f ( l , w ) A f ( l 1 , w r ) ) , we have f ^ m f ^ m . Therefore, the multiplicity of f ^ m in ψ ^ 0 ( f ) is | A f ( l , w ) A f ( l 1 , w r ) | . Taking f = f , one can repeat the above arguments to show that f ^ m = f ^ m ˜ for any m ˜ A f ( w / 2 ¯ ) A f ( l , w ) A f ( l 1 , w r ) and the multiplicity of f ^ m in ψ ^ 0 ( f ) is | A f ( l , w ) A f ( l 1 , w r ) | .
Since f , f are solutions to M, | A f ( l , w ) A f ( l 1 , w r ) | = | A f ( l , w ) A f ( l 1 , w r ) | , i.e., the multiplicity of f ^ m in ψ ^ 0 ( f ) equals the multiplicity of f ^ m in ψ ^ 0 ( f ) . Furthermore, this holds for distinct f ^ m ψ ^ 0 ( f ) . Since | A f ( w ¯ / 2 ) | = | A f ( w ¯ / 2 ) | , i.e., | ψ ^ 0 ( f ) | = | ψ ^ 0 ( f ) | , we obtain ψ ^ 0 ( f ) = ψ ^ 0 ( f ) .
Lastly, by Lemma 6, it holds that f ^ a = f ^ m 1 for all a A f ( w ¯ / 2 ) { m 1 , m 1 } or f ^ a = f ^ m 1 for all a A f ( w ¯ / 2 ) { m 1 , m 1 } . Thus, the multiplicity of f ^ m in ψ ^ 0 ( f ) is either 1 or | A f ( w ¯ / 2 ) | 1 . Without loss of generality, assume the multiplicity of f ^ m in ψ ^ 0 ( f ) is 1. Then, the multiplicity of f ^ m in ψ ^ 0 ( f ) is | A f ( w ¯ / 2 ) | 1 . Moreover, the multiplicity of f ^ m in ψ ^ 0 ( f ) is 1 and the multiplicity of f ^ ( m ) in ψ ^ 0 ( f ) is | A f ( w ¯ / 2 ) | 1 = | A f ( w ¯ / 2 ) | 1 . Since f m (resp., f m ) is the same as f m (resp., f ( m ) ) when rotated 180 degrees about ( n / 2 , w ¯ / 2 ) , we have f m = f m , f m = f ( m ) and the multiplicity of f m (resp., f m ) equals one in ψ 0 ( f ) (resp., ψ 0 ( f ) ). Now for all b A f ( w ¯ / 2 ) { m , m } and all b A f ( w ¯ / 2 ) { m , ( m ) } , we have f b = f b since f ^ b = f ^ b . Hence, we conclude ψ 0 ( f ) = ψ 0 ( f ) .

References

  1. Ouahabi, A.A.; Amalian, J.-A.; Charles, L.; Lutz, J.-F. Mass spectrometry sequencing of long digital polymers facilitated by programmed inter-byte fragmentation. Nat. Commun. 2017, 8, 967. [Google Scholar] [CrossRef] [PubMed]
  2. Launay, K.; Amalian, J.-A.; Laurent, E.; Oswald, L.; Ouahabi, A.A.; Burel, A.; Dufour, F.; Carapito, C.; Clément, J.-L.; Lutz, J.-F.; et al. Precise alkoxyamine design to enable automated tandem mass spectrometry sequencing of digital poly(phosphodiester)s. Angew. Chem. 2021, 133, 930–939. [Google Scholar] [CrossRef]
  3. Acharya, J.; Das, H.; Milenkovic, O.; Orlitsky, A.; Pan, S. String reconstruction from substring compositions. SIAM J. Discret. Math. 2015, 29, 1340–1371. [Google Scholar] [CrossRef]
  4. Pattabiraman, S.; Gabrys, R.; Milenkovic, O. Coding for polymer-based data storage. IEEE Trans. Inf. Theory 2023, 69, 4812–4836. [Google Scholar] [CrossRef]
  5. Banerjee, A.; Wachter-Zeh, A.; Yaakobi, E. Insertion and deletion correction in polymer-based data storage. IEEE Trans. Inf. Theory 2023, 69, 4384–4406. [Google Scholar] [CrossRef]
  6. Gabrys, R.; Pattabiraman, S.; Milenkovic, O. Reconstruction of sets of strings from prefix/suffix compositions. IEEE Trans. Commun. 2023, 71, 3–12. [Google Scholar] [CrossRef]
  7. Ye, Z.; Elishco, O. Reconstruction of a single string from a part of its composition multiset. IEEE Trans. Inf. Theory 2023, 70, 3922–3940. [Google Scholar] [CrossRef]
  8. Gupta, U.; Mahdavifar, H. A new algebraic approach for string reconstruction from substring compositions. In Proceedings of the 2022 IEEE International Symposium on Information Theory (ISIT), Espoo, Finland, 26 June–1 July 2022; pp. 354–359. [Google Scholar]
  9. Margaritis, D.; Skiena, S.S. Reconstructing strings from substrings in rounds. In Proceedings of the IEEE 36th Annual Foundations of Computer Science, Milwaukee, WI, USA, 23–25 October 1995; pp. 613–620. [Google Scholar]
  10. Levenshtein, V.I. Efficient reconstruction of sequences from their subsequences or supersequences. J. Comb. Theory Ser. A 2001, 93, 310–332. [Google Scholar] [CrossRef]
  11. Batu, T.; Kannan, S.; Khanna, S.; McGregor, A. Reconstructing strings from random traces. In Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, ser. SODA’04, New Orleans, LA, USA, 11–14 January 2004; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 2004; pp. 910–918. [Google Scholar]
  12. Marcovich, S.; Yaakobi, E. Reconstruction of strings from their substrings spectrum. IEEE Trans. Inf. Theory 2021, 67, 4369–4384. [Google Scholar] [CrossRef]
  13. Yehezkeally, Y.; Bar-Lev, D.; Marcovich, S.; Yaakobi, E. Generalized unique reconstruction from substrings. IEEE Trans. Inf. Theory 2023, 69, 5648–5659. [Google Scholar] [CrossRef]
  14. Cheraghchi, M.; Gabrys, R.; Milenkovic, O.; Ribeiro, J. Coded trace reconstruction. IEEE Trans. Inf. Theory 2020, 66, 6084–6103. [Google Scholar] [CrossRef]
  15. Krishnamurthy, A.; Mazumdar, A.; McGregor, A.; Pal, S. Trace reconstruction: Generalized and parameterized. IEEE Trans. Inf. Theory 2021, 67, 3233–3250. [Google Scholar] [CrossRef]
  16. Ravi, A.N.; Vahid, A.; Shomorony, I. Coded shotgun sequencing. IEEE J. Sel. Areas Inf. Theory 2022, 3, 147–159. [Google Scholar] [CrossRef]
  17. Levick, K.; Shomorony, I. Fundamental limits of multiple sequence reconstruction from substrings. In Proceedings of the 2023 IEEE International Symposium on Information Theory (ISIT), Taipei, Taiwan, 25–30 June 2023; pp. 791–796. [Google Scholar]
  18. Sima, J.; Li, Y.; Shomorony, I.; Milenkovic, O. On constant-weight binary B2-sequences. In Proceedings of the 2023 IEEE International Symposium on Information Theory (ISIT), Taipei, Taiwan, 25–30 June 2023; pp. 886–891. [Google Scholar]
  19. Yang, Y.; Chen, Z. Reconstruction of multiple strings of constant weight from prefix-suffix compositions. In Proceedings of the 2024 IEEE International Symposium on Information Theory (ISIT), Athens, Greece, 7–12 July 2024; pp. 897–902. [Google Scholar]
Figure 1. The graphs of the component functions { f m } of the CWF induced by U = { 110101 , 110101 , 101110 } .
Figure 1. The graphs of the component functions { f m } of the CWF induced by U = { 110101 , 110101 , 101110 } .
Entropy 27 00039 g001
Figure 2. (a) The numbers { a l , w } for M ( U ) . (b) The numbers { a l , w } for M ( V ) . Circles are branching points and diamonds are merging points; disks are neither branching nor merging points.
Figure 2. (a) The numbers { a l , w } for M ( U ) . (b) The numbers { a l , w } for M ( V ) . Circles are branching points and diamonds are merging points; disks are neither branching nor merging points.
Entropy 27 00039 g002
Figure 3. The component functions of g with with median weight w ¯ / 2 , where g is induced by V = { 1000111 , 1110001 , 1100011 , 1010011 } . Circles are branching points and diamonds are merging points; disks are neither branching nor merging points.
Figure 3. The component functions of g with with median weight w ¯ / 2 , where g is induced by V = { 1000111 , 1110001 , 1100011 , 1010011 } . Circles are branching points and diamonds are merging points; disks are neither branching nor merging points.
Entropy 27 00039 g003
Table 1. The values of f ( l , m ) induced by U = { 110101 , 110101 , 101110 } .
Table 1. The values of f ( l , m ) induced by U = { 110101 , 110101 , 101110 } .
f ( l , m ) l = 0 l = 1 l = 2 l = 3 l = 4 l = 5 l = 6
m = 1 0122334
m = 2 0112234
m = 3 0122334
m = 4 0112234
m = 5 0112334
m = 6 0012334
Table 2. A checklist for some important notation introduced in Section 2.
Table 2. A checklist for some important notation introduced in Section 2.
NotationMeaningDefinition
HA multiset of h strings
M ( · ) The prefix–suffix compositions of a multisetDefinition 1
MThe prefix–suffix compositions of H M ( H )
H The collection of all equivalent classes
whose members are compatible with M
{ [ U ] M ( U ) = M }
fA cumulative weight functionDefinition 3
H f The multiset of strings corresponding to fDefinition 4
f m A component function of fDefinition 5
med ( f m ) The median weight of f m Definition 9
A f ( w ) The labels of component functions with med ( f m ) = w Definition 9
A f ( l , w ) The labels of component functions such that f m ( l ) = w Definition 11
a l , w The multiplicity of ( l w , w ) in MDefinition 12
b l , w The number of length (l), weight (w) affixes
whose weight remains the same if the length decreases
Definition 13
c l , w The number of length (l), weight (w) affixes
whose weight decreases with the length
Definition 13
Table 3. The values of g ( l , m ) induced by V = { 1000111 , 1110001 , 1100011 , 1010011 } .
Table 3. The values of g ( l , m ) induced by V = { 1000111 , 1110001 , 1100011 , 1010011 } .
g ( l , m ) l = 0 l = 1 l = 2 l = 3 l = 4 l = 5 l = 6 l = 7
m = 1 01111234
m = 2 01233334
m = 3 01233334
m = 4 01111234
m = 5 01222234
m = 6 01222234
m = 7 01122234
m = 8 01222334
Table 4. A checklist for some important notation in Section 4.
Table 4. A checklist for some important notation in Section 4.
NotationsMeanings
P ( w ) A subset of [ 2 h ] whose size equals
the number of component functions of median weight w
FA bookkeeping function defined on n / 2 × w ¯
that tracks the labels of f m
as we assign values to f m ( l ) from l = n / 2 to l = 0
F ( l , w ) A collection of sets that partitions labels of f m ( l )
according to their behaviors from length n / 2 to l
F A collection of bookkeeping functions
R w A collection of sets, each of which corresponds to
f m ’s of median weight w that have the same graph over n / 2
r w The size of R w and equals the number of
different “half strings” with median weight w
σ A “pairing” permutation on [ 2 h ] such that
if u R w , i is paired with v R w ¯ w , j then σ ( u ) = σ ( v ) .
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yang, Y.; Chen, Z. Reconstruction of Multiple Strings of Constant Weight from Prefix–Suffix Compositions. Entropy 2025, 27, 39. https://doi.org/10.3390/e27010039

AMA Style

Yang Y, Chen Z. Reconstruction of Multiple Strings of Constant Weight from Prefix–Suffix Compositions. Entropy. 2025; 27(1):39. https://doi.org/10.3390/e27010039

Chicago/Turabian Style

Yang, Yaoyu, and Zitan Chen. 2025. "Reconstruction of Multiple Strings of Constant Weight from Prefix–Suffix Compositions" Entropy 27, no. 1: 39. https://doi.org/10.3390/e27010039

APA Style

Yang, Y., & Chen, Z. (2025). Reconstruction of Multiple Strings of Constant Weight from Prefix–Suffix Compositions. Entropy, 27(1), 39. https://doi.org/10.3390/e27010039

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop