Assembly Theory of Binary Messages

: Using assembly theory, we investigate the assembly pathways of binary strings (bitstrings) of length N formed by joining bits present in the assembly pool and the bitstrings that entered the pool as a result of previous joining operations. We show that the bitstring assembly index is bounded from below by the shortest addition chain for N , and we conjecture about the form of the upper bound. We define the degree of causation for the minimum assembly index and show that, for certain N values, it has regularities that can be used to determine the length of the shortest addition chain for N . We show that a bitstring with the smallest assembly index for N can be assembled via a binary program of a length equal to this index if the length of this bitstring is expressible as a product of Fibonacci numbers. Knowing that the problem of determining the assembly index is at least NP-complete, we conjecture that this problem is NP-complete, while the problem of creating the bitstring so that it would have a predetermined largest assembly index is NP-hard. The proof of this conjecture would imply P ̸ = NP since every computable problem and every computable solution can be encoded as a finite bitstring. The lower bound on the bitstring assembly index implies a creative path and an optimization path of the evolution of information, where only the latter is available to Turing machines (artificial intelligence). Furthermore, the upper bound hints at the role of dissipative structures and collective, in particular human, intelligence in this evolution.


Introduction
Assembly theory (AT) [1][2][3][4][5][6][7] provides a distinctive complexity measure superior to established complexity measures used in information theory, such as Shannon entropy or Kolmogorov complexity [1,5].AT does not alter the fundamental laws of physics [6].Instead, it redefines objects on which these laws operate.In AT, objects are not considered sets of point particles (as in most physics) but instead are defined via the histories of their formation (assembly pathways) as an intrinsic property where, in general, there are multiple assembly pathways to create a given object.
AT explains and quantifies selection and evolution, capturing the amount of memory necessary to produce a given object [6] (this memory is the object [8]).This is because, the more complex a given object, the less likely an identical copy can be observed without the selection of some information-driven mechanism that generated that object.Formalizing assembly pathways as sequences of joining operations, AT begins with basic units (such as chemical bonds) and ends with a final object.This conceptual shift captures evidence of selection in objects [1,2,6].
The assembly index of an object corresponds to the smallest number of steps required to assemble this object, and-in general-it increases with the object's size but decreases with symmetry, so large objects with repeating substructures may have a smaller assembly index than smaller objects with greater heterogeneity [1].The copy number specifies the observed number of copies of an object.Only these two quantities describe the evolutionary concept of selection by showing how many alternatives were excluded to assemble a given object [6,8].
AT has been experimentally confirmed in the case of molecules and has been probed directly via experiments with high accuracy by means of spectroscopy techniques, including mass spectroscopy, IR, and NMR spectroscopy [6,7].It is a versatile concept with applications in various domains.Beyond its application in the fields of biology and chemistry [7], its adaptability to different data structures, such as text, graphs, groups, music notations, image files, compression algorithms, human languages, memes, etc., showcases its potential in diverse fields [2].
In this study, we investigated the assembly pathways of binary strings (bitstrings) by joining individual bits present in the assembly pool and bitstrings that entered the pool as a result of previous joining operations.
A bit is the smallest amount and the quantum of information.Perceivable information about any object can be encoded via a bitstring [9,10], but this does not imply that a bitstring defines an object.Information that defines a chemical compound, a virus, a computer program, etc. can be encoded via a bitstring.However, a dissipative structure [11], such as a living biological cell (or its conglomerate, such as a human, for example) cannot be represented with a bitstring (even if its genome can).This information can only be perceived (so this is not object-defining information).Therefore, we use emphasis for the object in this paper since this term, understood as a collection of matter, is a misnomer in that it neglects (quantum) nonlocality [12].Nonlocality is independent of the entanglement among particles [13], as well as quantum contextuality [14], and it increases as the number of particles [15] grows [16,17].Furthermore, the ugly duckling theorem [9,10] asserts that every two objects we perceive are equally similar (or equally dissimilar).
Furthermore, a bitstring as such is neither dissipative nor creative.Its assembly process is what can be dissipative or creative.The perceivable universe is not big enough to contain the future; it is deterministic when going back in time and non-deterministic when going forward in time [18].But we know [2,11,[19][20][21][22][23][24][25][26][27][28][29] that it has evolved to the present since the Big Bang.Evolution is about assembling a novel structure of information and optimizing its assembly process until it reaches the assembly index.Once the new information is assembled (via a dissipative structure operating far from thermodynamic equilibrium, including humans), it enters the realm of the 2nd law of thermodynamics, and nature seeks how to optimize its assembly pathway.
At first, the newly assembled structure of information is discovered via groping [19], and its assembly pathway does not attain its most economical or efficient form at once.For a certain period of time, its evolution gropes about within itself.The try-out follows the try-out, not being finally adopted.Then, finally, perfection comes within sight, and from that moment, the rhythm of change slows down [19].The new information, having reached the limit of its potentialities, enters the phase of conquest.Stronger now than its less-perfected neighbours, the new information multiplies and consolidates.When the assembly index is reached, new information attains equilibrium, and its evolution terminates.It becomes stable.
Thanks to its characteristic additive power, living matter (unlike the matter of the physicists) finds itself "ballasted" with complications and instability.It falls, or rather rises, towards forms that are more and more improbable.Without orthogenesis life would only have spread; with it there is an ascent of life that is invincible.[19] The paper is structured as follows.Section 2 introduces the basic concepts and definitions used in the paper.Section 3 shows that the bitstring assembly index is bounded from below and provides the form of this bound.Section 4 defines the degree of causation for the smallest assembly-index bitstrings.Section 5 shows that the bitstring assembly index is bounded from above, and we conjecture about the exact form of this bound.Section 6 introduces the concept of a binary assembling program and shows that, in general, the trivial assembling program assembles the smallest assembly index bitstrings.Section 7 discusses and concludes the findings of this study.

Preliminaries
For K subunits of an object, O, the assembly index, a O , of this object is bounded [1] and from above via max(a O ) = K − 1. ( The lower bound (1) represents the fact that the simplest way to increase the size of a subunit in a pathway is to take the largest subunit assembled so far and join it to itself [1], and in the case of the upper bound (2), subunits must be distinct so that they cannot be reused from the pool, decreasing the index.
Here, we consider bitstrings C (N) k containing bits {1, 0}, with N 0 zeros and N 1 ones, having a length of N = N 0 + N 1 .N 1 is called the binary Hamming weight or the bit summation of a bitstring.Bitstrings are our basic AT objects [2], and we consider the process of their formation within the AT framework.Where the bit value can be either 1 or 0, we write * = {1, 0}, with * being the same within the bitstring C (N) k .If we allow for the second possibility that can be the same as or different from * , we write ⋆ = {1, 0}.Thus, C k = [ * ⋆], for example, is a placeholder for all four 2-bit strings.
In general we consider bitstrings, C k , to be messages transmitted through a communication channel between a source and a receiver, similar to the Claude Shannon approach [30] used in the derivation of binary information entropy where are the ratios of occurrences of zeros and ones within the bitstring k , and the unit of entropy (3) is a bit.Definition 1.A bitstring assembly index, a (N) , is the smallest number of steps, s, required to assemble a bitstring, C (N) k , of length N by joining two distinct bits contained in the initial assembly pool, P = {1, 0}, and bitstrings assembled in previous steps that were added to the assembly pool.Therefore, the assembly index, a (N) For example, the 8-bit string can be assembled in at most seven steps: 1. Join 0 with 1 to form C (2) Join C (2) , adding [010] to P = {1, 0, 01, 010}; 3.

7.
Join C (i.e., not using the assembly pool P), in six, five, or four steps:

= [00010111]
as only the doublet [01] can be reused from the pool.Therefore, bitstrings ( 5) and ( 6), despite having the same length, N = 8, Hamming weight, N 1 = 4, and Shannon entropy (3), have respective assembly indices a (8) (C k ) = 3 and a (8) (C l ) = 6 that represent the lengths of their shortest assembly pathways, which, in turn, ensures that their assembly pools, P, are distinct sets for a given assembly pathway.Tables 1 and A5  The following definition is commonly known, but we provide it here for clarity.
Without a loss of generality, we assume that, if N is odd, N 1 < N 0 (e.g., for N = 5, N 1 = 2, and N 0 = 3).However, our results are equivalently applicable if we assume the opposite (i.e., a larger number of ones for an odd N).The number |B (N) | of balanced bitstrings among all 2 N bitstrings is as follows where "⌊x⌋" is the floor function that yields the greatest integer less than or equal to x, and "⌈x⌉" is the ceiling function that yields the least integer greater than or equal to x.This is the OEIS A001405 sequence, the maximal number of subsets of an N-set, such that no one contains another, as asserted via Sperner's theorem, and approximated using Stirling's approximation for a large N. Balanced and even-length bitstrings, B Theorem 1.An N = 4-bit string is the shortest string with more than one bitstring assembly index.
Proof.The proof is trivial.For N = 1, the assembly index a (1) (C) = 0, as all basis objects have a pathway assembly index of 0 [2] (they are not assembled).N = 2 provides four available bitstrings with a (2) (C) = 1.N = 3 provides eight available bitstrings with a (3) (C) = 2.Only N = 4 provides sixteen bitstrings that include four stings with a (4) (C) = 2 and twelve bitstrings with a (4) (C) = 3 including |B (4) | = 6 balanced bitstrings, as shown in Tables 1 and 2. For example, to assemble the bitstring B 1 = [0101], we need to assemble the bitstring [01] and reuse it.Therefore, Interestingly, Theorem 1 strengthens the meaning of N = 4 as the minimum information capacity that provides a minimum thermodynamic black hole entropy [31][32][33].There is no disorder or uncertainty in an object that can be assembled in the same number of steps, s < 3.
The following definition, taking into account the cyclic order of bitstrings, is also provided for the sake of clarity.
k , is a ringed bitstring if a ring formed with this string by joining its beginning with its end is unique among the rings formed from the other ringed strings, R There are at least two and at most N forms of a ringed bitstring, R (N) k , that differ in the position of the starting bit.For example, for |B (4) | = 6 balanced bitstrings, shown in Table 2, two augmented strings with a (4) = 2 correspond to each other if we change the starting bit: Similarly, four augmented bitstrings with a (4) = By neglecting the notion of the beginning and end of a string, we focus on its length and content.In Yoda's language, "complete, no matter where it begins.A message is".
The numbers of the balanced |B  3 and Figure 1.
k | is close to OEIS A000014 sequence up to the eleventh term and its formula remains to be researched.We note that, in general, the starting bit is relevant for the assembly index.Thus, different forms of a ringed bitstring may have different assembly indices.For example, for N = 7 balanced bitstrings, B 34 and B 35 , shown in Table A15, have a (7) = 6.However, these bitstrings are not ringed since they correspond to each other and to the balanced bitstrings B 13 , B 18 , B 20 , B 28 , and B 30 with a (7) = 5.They all have the same triplet of adjoining ones.Definition 4. The assembly index of a ringed bitstring, R (N) k , is the smallest assembly index among all forms of this string.
Thus, if different forms of a ringed bitstring have different assembly indices, we assign the smallest assembly index to this string.In other words, we assume that the smallest number of steps where (R k ) l denotes a particular lth form of a ringed bitstring, R k , is the bitstring assembly index of this ringed string (cf., e.g., strings (19) and ( 20)).We assume that, if an object that can be represented with a ringed bitstring can be assembled in fewer steps, this procedure will be preferred in nature.
The distribution of the assembly indices of the balanced, ringed bitstrings, E k , is shown in Table 4.

Minimum Bitstring Assembly Index
In the following, we derive the tight lower bound of the set of different bitstring assembly indices.
Theorem 2 (Tight lower bound of the bitstring assembly index).The smallest bitstring assembly index, a (N) (C min ), as a function of N corresponds to the shortest addition chain for N (OEIS A003313).
Proof.Bitstrings, C min , for which a be formed in subsequent steps, s, by joining the longest bitstring assembled so far with itself

Only four bitstrings
have such an assembly index in this case.
An addition chain for N ∈ N, having the shortest length s ∈ N (commonly denoted as l(N)), is defined as a sequence, 1 and this corresponds to assembling a doublet, [ * ⋆], from the initial assembly pool, P. Thus, the lower bound for s of the addition chain for N, s ≥ log 2 (N) is achieved for N = 2 s .In our case, bitstrings (11) achieve this bound.The second step in creating an addition chain can be b Thus, finding the shortest addition chain for N corresponds to finding an assembly index of a bitstring containing bits and/or doublets and/or triplets generated via these doublets for N ̸ = 2 s since, due to Theorem 1, only they provide the same assembly indices {0, 1, 2}.Such strings correspond to linear molecules made of carbons (cf.supplementary material, S3.2 in [4]).
The smallest assembly indices, a (N) min , are shown in Table 5 for 1 ≤ N ≤ 21.Calculating the minimum length of the addition chain for N, as well as finding the shortest assembly pathway for a chemical molecule, has been shown to be at least as hard as NP-complete [4,34].

Degree of Causation for Minimum-Assembly-Index Bitstrings
Using the difference between the general AT lower bound (1) and the smallest bitstring assembly index (OEIS A003313), we can define the quantity capturing a degree of causation [6] of assembling the bitstrings of length N with the smallest assembly index, as shown in Figure 2.For N = 2 s , the degree of causation, D C (N) = 1, as all four bitstrings (11), can be assembled along a single pathway only; their assembly is entirely causal.However, for min can be assembled along different pathways.For example, there are two pathways for the bitstring   Equation (12) naturally divides the set of natural numbers into sections, 2 s ≤ N < 2 s+1 , and shows regularities that, for certain values of N, can be used to determine the smallest assembly index (i.e., the shortest addition chain for N) as a min = s and for each N being the sum of two powers of 2 (OEIS A048645), while, for the remaining Ñ not being the sum of two powers of 2 (OEIS A072823), where k = 2 for Ñ = {7, so the number of Ns within each section, not included in the set of general rules 2 s , ( 13), (15), and ( 16), is The shortest addition chain-sequence-generating factors for 1 ≤ s ≤ 5 are listed in Table 6, where the subsequent odd numbers of the form m k generate sequences where the last two values, a ( Ñ) min , are higher than those given via the general rule.Based on the OEIS A003313 sequence for N ≤ 10 5 , we have determined the number of exceptions, that is |N exc |, such that a (N exc ) min ̸ = {s, s + 1, s + 2} for 0 ≤ s ≤ 15, as shown in Table 7, where min(m k ) is the minimal generating factor, m k , shown in  Only living systems have been found to be capable of producing abundant molecules with an assembly index greater than an experimentally determined value of 15 steps [3,8].The cut-off between 13 and 15 is sharp, which means that molecules made through random processes cannot have assembly indices exceeding 13 steps [3,8].In particular, N = 15 is the length of the shortest addition chain for N, which is smaller than the number of multiplications to compute Nth power using the Chandah-sutra method (OEIS A014701, OEIS A371894).Furthermore, the values of the sequence A014701 are larger than the shortest addition chain for N / ∈ 2 2 + 2 l .These values (OEIS A371894) are not given via Equation ( 15), but Equation ( 16) provides their subset.Their Hamming weight is at least four in binary representation.Furthermore, the exceptional a min values bear similarity to the atomic numbers, Z, of chemical elements that violate the Aufbau rule [15], which correctly predicts the electron configurations of most elements.Only about twenty elements within 24 ≤ Z ≤ 103 (with only two non-doubleton sets of consecutive ones) violate the Aufbau rule.

Maximum Bitstring Assembly Index
In the following, we conjecture upon the form of the upper bound of the set of different bitstring assembly indices.In general, of all bitstrings, C k , having a given assembly index, shown in Tables 1 and A5-A12 (Appendix C), most have N 1 = ⌊N/2⌋, though we have found a few exceptions, mostly for non-maximal assembly indices, namely for a (8) = 4 (4 < 8) and for a (8) = 6 (24 < 26), for a (10) = 4 (2 < 5) and for a (10) = 5 (32 < 33), and for a (12) = 4 (2 < 3).These observations allow us to restrict the search space of possible bitstrings with the largest assembly indices to balanced bitstrings only: with the exception of N = 8, of all bitstrings, C (N) k , having a largest assembly index, most are balanced.We can further restrict the search space to ringed bitstrings (Definition 3).If a bitstring, C min , for which a The bitstring assembly index must be bounded from above, and a (N) (C max ) must be a monotonically nondecreasing function of N that can increase at most by one between N and N + 1. Certain heuristic rules apply in our binary case.For example, • For N = 7, we cannot avoid two doublets (e.g., 2 × [00]) within a ringed bitstring, E 28 = [0011100], and thus, a (7) = [111000101100], and thus, a (12) (C max ) = 8 < 9; • For N = 14, we cannot avoid two pairs of doublets and one doublet three times (e.g., 2 × [00], 2 × [11], and 3 × [01], and thus, a (14) (C max ) = 9 < 10; • etc.
Table 8 shows the exemplary balanced bitstrings, B max , having the largest assembly indices that we assembled (cf.also Appendix A).To determine the assembly index, a (18) = 11, of the bitstring for example, we look for the longest patterns that appear at least twice within the string, and we look for the largest number of these patterns.Here, we find that the two triplets [001] and [110] appear twice in E and are based on the doublets [00] and [11] also appearing in E . Thus, we start with the assembly pool {1, 0, [00], [001], [11], [110]} made in four steps and join the elements of the pool in the following seven steps to arrive at a (18) (E k ) = 11.On the other hand, another form of this balanced, ringed string, has a (18) (E l ) = 12.These results allow us to formulate the following conjecture.
However, at this moment, we cannot state whether this conjecture applies to ringed or non-ringed bitstrings.The assembly indices for N < 3 are the same for a given N, whereas the assembly indices for 4 ≤ N ≤ 10 were discussed above and are calculated in Appendix C for balanced and balanced, ringed bitstrings.
The conjectured sequence is shown in Figures 4 and 5, starting with a (0) = −1 (we note in passing that n = −1 is a dimension of the void, the empty set ∅, or (−1)-simplex).Subsequent terms are given via {0, 1, 2, 3, 4, 5, 5, 6, 7, 7, 8, 9, 9, 9, 10, . . .}, which is periodic for N = k(k + 3) and defines plateaus of a constant bitstring assembly index at a (N) (C max ) = 4k − 3, and This sequence can be generated using the procedure given by Listing 1  We note the similarity of this bound to the monotonically nondecreasing Shannon entropy of chemical elements, including observable ones [15].Perhaps the exceptions in the sequence of Conjecture 1 vanish as N increases.

Binputation
So far, we have assembled bitstrings "manually".Now, we shall automatize this process using other bitstrings as assembling programs.
Definition 5.The binary assembling program Q B is a bitstring of length s Q that acts on the assembly pool P and outputs the assembled bitstrings, adding them to the pool.Definition 6.The trivial assembling program Q is a binary assembling program with consecutive bits denoting the following commands: 0 ⇔ take the last element from P, join it with itself, and output; 1 ⇔ take the last two elements from P, join them with each other, and output.
As the assembly pool P is a distinct set to which bitstrings are added in subsequent assembly steps, only these two commands apply to the initial assembly pool, P = {1, 0}, containing only two bits, regardless of the starting command.

Theorem 3. If a bitstring, C
(N) min , can be assembled via an elegant trivial program of length s Q = a (N) (C min ), then N is expressible as a product of Fibonacci numbers (OEIS A065108), and the length, s Q , of any trivial program Q is not shorter than the assembly index of the string that this trivial assembling program assembles.
Proof.An elegant program is the shortest program that produces a given output [35,36].Furthermore, no program, P, shorter than an elegant program, Q, can find this elegant program, Q [35].If it could, it could also generate Q's output.But if P is shorter than Q, then Q would not be elegant, which leads to a contradiction.
The 1st bit of the trivial assembling program Q is irrelevant, as Q = 0 assembles C (2) with the smallest assembly index, a (2 s Q ) min = s Q , can be assembled with the same two programs starting with the reversed assembly pool P = {0, 1}.The remaining 2 s Q −1 − 2 programs will assemble some of the shorter bitstrings with the assembly index a (N) min = s Q .In general, all programs, Q, assemble bitstrings that have lengths expressible as a product of Fibonacci numbers (OEIS A065108), as shown in Table A1 (Appendix B), wherein, out of 2 s Q −1 programs (cf.Tables A1 and A4), the following applies: Thus, for * = 1, binary assembling programs, Q, assemble subsequent 2 words and their concatenations that have entropies (3) with ratios (4) where m = {1, 2, . . .s Q }, and F is the Fibonacci sequence starting from 1. Ratios (21) rapidly converge to lim s Q →∞ p 0,m = φ − 1 ≈ 0.618033989 and lim where φ is the golden ratio.Therefore, lim s Q →∞ H m ≈ 0.9594 is the binary entropy of the Fibonacci word limit.The Fibonacci sequence can be expressed through the golden ratio, which corresponds to the smallest Pythagorean triple {−3, 4, 5} [37,38].
However, for s Q ≥ 4, some of the programs are no longer elegant if * = 0 and some of the assembled bitstrings are not C min if * = 1.
For s Q = {4, 7} and s Q ≥ 10, and for the shortest bitstring assembled via the program Q, the program Q is not elegant for * = 0, and the shortest bitstring assembled via the program is not C min for * = 1.
However, the length, s Q , of any program, Q, is not shorter than the assembly index of the bitstring that this program assembles.
The trivial assembly programs, Q, and the bitstrings they assemble are listed in Tables 9 and A2-A4 (Appendix B) for one version of the assembly pool and for 1 ≤ s Q ≤ 6.
We note in passing that there are other mathematical results on bitstrings and the Fibonacci sequence.For example, it was shown [39] that having two concentric circles with radii {F n , F n+2 } and drawing two pairs of parallel lines orthogonal to each other and tangent to the inner circle, one obtains an octagon defined by the points of intersection of those lines with the outer circle, which comes very close to the regular octagon with n → ∞.Furthermore, each of these octagons defines a Sturmian binary word (a cutting sequence for lines of irrational slope), except in the case of n = 5 [39].
Perhaps the smallest assembly index given via Theorem 2 and the bitstrings of Theorem 3 are related to the Collatz conjecture, as the lengths of the strings (11) for N = 2 2k correspond to the numbers to which the Collatz conjecture converges, from N = (2 2k − 1)/3, k ∈ N (OEIS A002450).
Theorem 3 is also related to Gödel's incompleteness theorems and the halting problem.N cases of the halting problem correspond only to log 2 (N), not to N bits of information [40], and therefore, complexity is more fundamental to incompleteness than self-reference of Gödel's sentence [41].Any formal axiomatic system only enables provable theorems to be proved.If a theorem can be proved with an automatic theorem prover, the prover will halt after proving this theorem.Thus, proving a theorem equals halting.If we assume that the axioms of the trivial program given via Definition 6 define the formal axiomatic system, then the bitstrings that have lengths expressible as a product of Fibonacci numbers assembled through this program would represent provable theorems.
If we wanted to define a binary assembling program, Q B , that would use specific bitstrings other than the last one or two bitstrings in the assembly pool, we would have to index the bitstrings in the pool.However, at the beginning of the assembly process, we cannot predict in advance how many bitstrings will enter the assembly pool.Thus, we do not know how many bits will be needed to encode the indices of the strings in the pool.Therefore, we state the following conjecture.

Conjecture 2.
There is no binary assembling program (Definition 5) that has a length shorter than the length of the bitstring that has the largest assembly index that could assemble this string.
Theorem 3 would be violated if, in Definition 6, we specified the command "0", e.g., as "take the last element from the assembly pool, join it with itself, join with what you have already assembled (say at 'the right'), and output".Then, the 2-bit program "00" would produce the 6-bit string [000000] with the assembly index a (6) = 3.However, such a onestep command would violate the axioms of assembly theory since it would perform two assembly steps in one program step.An elegant program to output the gigabyte bitstring of all zeros would take a few bits of code and would have a low Kolmogorov complexity [42].However, such a bitstring would be outputted, not assembled.Furthermore, the length of such a program that outputs the bitstring [0 . . .] would be shorter than the length of the program that outputs the string [10 . . .], while in AT, the lengths of these programs must be the same if the strings have the same assembly indices.Definitions 5 and 6 and Theorem 3 are about binputation, about bitstrings assembling other bitstrings.
In particular, Theorem 3 confirms that the assembly index is related to the amount of physical memory required to store the information to direct the assembly of an object (a bitstring in our case) and set a directionality in time from the simple to the complex [8]: s Q -bit long trivial assembling programs (i.e., with s Q -bits of memory) can assemble 2 s Q -bit strings with minimal assembly indices, s Q and, for s Q ≥ 4, some shorter but more complex bitstrings with non-minimal assembly indices s Q .The memory defines the object [8].

Discussion and Conclusions
Consider the SARS-CoV-2 genome sequence defined by 29,903 nucleobases {A, C, G, T}, its initial version, MN908947 (Available online at https://www.ncbi.nlm.nih.gov/nuccore/MN908947, accessed on 16 May 2024), collected in December 2019 in Wuhan and its sample, OL351370 (Available online at https://www.ncbi.nlm.nih.gov/nuccore/OL351370,accessed on 16 May 2024), collected in Egypt nearly two years after the Wuhan outbreak, on 23 October 2021.In the MN version, the nucleobases are distributed as |A| = 8954, |C| = 5492, |G| = 5863, and |T| = 9594, and in the OL version, they are distributed as |A| = 8954, |C| = 5470, |G| = 5856, and |T| = 9623, following Chargaff's parity rules with the same count of adenines.We can convert these sequences into bitstrings by assigning two bits per nucleobase.For such N = 59,806, not being the sum of two powers of 2, with the degree of causation [6] given via Equation ( 14), the assembly index is bounded by 21 ≤ a (59,806) If a bitstring, C (N) , were to encode four DNA/RNA nucleobases, then the smallest assembly index bitstrings (as well as the strings generated via trivial assembly programs, Q, according to Definition 6), would not encode all nucleobases.For example, the bitstring C  24), given via Theorem 2, by one.The upper bound (24) was estimated by finding the smallest k = 244 that satisfies k(k + 3) ≥ N and using the relation a (N) (C max ) = 4k − 1 of Conjecture 1.We do not know the actual assembly indices of the MN and OL sequences.Their determination is an NP-complete problem, as we conjecture.However, we note a relatively wide range of 954 assembly indices that nature provides for this genome sequence.
There are twelve possible assignments of two bits per nucleobase with twelve different Hamming weights Hamming weights (25) show that all sequences are almost balanced (N/2 = 29,903).However, the later OL versions are less balanced, producing lower Shannon entropies (26) and showcasing the existence of an entopic force that governs genetic mutations [25].
The bounds of Theorem 2 and Conjecture 1 are shown in Tables 5 and 8 and are illustrated in Figures 4 and 5.No bitstring can be assembled in a smaller number of steps than is given by a lower bound of Theorem 2. However, some bitstrings cannot be assembled in a smaller number of steps than given by an upper bound.
We found it much easier to determine the assembly index of a given bitstring, C (N) k , than to assemble a bitstring so that it would have the largest assembly index.Similarly, a trivial bitstring with the smallest assembly index for N can have the form C (N) (11) or the form of a Fibonacci word generated via the trivial assembling program (Definition 6).Therefore, we state the following conjecture.

Conjecture 3. The problem of determining the assembly index of any bitstring, C (N)
k , is NPcomplete.The problem of assembling the bitstring so that it would have the largest assembly index for a large N is NP-hard.This corresponds to determining the largest assembly index value for a large N.
A proof of Conjecture 3 would also be the proof of the following known conjecture.

Conjecture 4. P ̸ = NP
Every computable problem and every computable solution can be encoded as a finite bitstring.Here, determining whether the assembly index of a given bitstring has its known maximal value corresponds to checking the solution to a problem for correctness, whereas assembling such a bitstring corresponds to solving the problem.Thus, AT would solve the P versus NP problem in theoretical computer science.There is ample pragmatic justification for adding P ̸ = NP as a new axiom [40]; rather than attempting to prove this conjecture, mathematicians should accept that it may not be provable and simply accept it as an axiom [43].
The bounds on the bitstring assembly index given via Theorem 2 and Conjecture 1, and the general bounds (1), and (2) on the assembly index [1], are illustrated in Figure 6 (adopted from [1] and modified; not to scale).The lower bound on the bitstring assembly index implies two paths of the evolution of information: 1.
a creative path (slanting lines in Figure 6); and 2.
an optimization path (vertical lines in Figure 6), since for some bitstrings, C m , of length N > 3, it admits the possible region of their assembly steps, a For 1 ≤ N ≤ 3, only the creative path is available, as there is nothing to optimize: a (1≤N≤3) = N − 1.The 2nd path becomes available already at N = 4, where the suboptimal number of three steps used to assemble a bitstring, [0101], can be optimized to a (4) min = 2.The evolution becomes interesting for N ≥ 7 (N > 7 for ringed strings, cf.Table 8), due to an upper bound on the bitstring assembly index.For each (N ≥ 7)-bit string, C m , suboptimally assembled in a (N≥7) (C m ) < s ≤ N − 1 steps, the search space is recursively explored to optimize the number of steps until the assembly index, a (N≥7) (C m ), of this bitstring is reached, where a We conjecture that, in general, the assembly of a novel, nontrivial bitstring, C (N+l) m , for l ∈ N, with a longer length, N + l, using the 1st path of evolution is NP-hard requires access to noncomputability and, thus, is available only to dissipative structures, including living beings, such as humans.This path represents "true" creativity.However, once this new bitstring is assembled, it is unlikely that it will be assembled optimally in s steps corresponding to its assembly index.This implies the 2nd path of minimizing the number of steps, s, required to assemble this newly found, nontrivial bitstring, C (N+l) m , towards its assembly index, which is only NP-complete.The bitstring C (N+l) m is reassembled in a simpler way, but such a reassembly is no longer creative.The 2nd path represents "generative creativity" available both to dissipative structures and to artificial intelligence.Figure 6.An illustrative graph of complexity against information capacity: the orange regions are impossible, as they are above or below the assembly index general bounds, the yellow region indicates the bitstring assembly index bounds, the green region contains structures that can be assembled via dissipative structures of nature, the red region contains structures that can only be assembled by humans, the blue circles and dots denote, respectively, the number of steps of suboptimally assembled bitstrings and their assembly indices, and the blue slanting and vertical lines denote, respectively, creative and optimization paths of the evolution of information (figure not to scale; see text for details).
To illustrate this process, consider two examples: one from biological evolution (the emergence of amphibians from fish) and another from technological evolution (the invention of an airplane).The fish began to evolve around 541 million years ago, forming a plethora of fish species and exploring the available search space, optimizing the fish assembly index and increasing the information capacity within the range delimited by the same upper bound fish plateau (cf.Conjecture 1).Around 400 million years ago, some species of fish began using areas with fluctuating water levels, where occasionally water was scarce.The next amphibian plateau of a larger assembly index was within sight.Proto-lungs developed by groping [19], allowing fish to obtain oxygen from the air instead of water.The breakthrough was made, and amphibians were formed, exploring the subsequent amphibian plateau and optimizing this evolutionary gain.Many inventions led to the first airplane: the invention of the airfoil (George Cayley), its use in gliders (Otto Lilienthal), a propeller, etc. Again, the search space was well explored, and the airplane plateau of a larger assembly index was close.Finally, it was the Wright brothers, bicycle retailers who realized the importance of combining roll and yaw control in their first suboptimal Wright Flyer foreplane configuration.Once it was shown that it could be done, other people began to optimize this invention, minimizing the number of steps required to recreate it.
AT captures the notion of intelligence, understood as a degree of ability to reach the same goal through different means (assembly pathways) [44], where a fundamental aspect of intelligence is collective behavior [45].Once the search space is saturated, the fish collectively explore it to develop lungs, just as humans, starting at least in the nineteenth century, began to think collectively about heavier-than-air flying machines.We assume that only dissipative structures can assemble novel structures of information and define living beings as dissipative structures provided with choice (the ability to select [6]) and humans as living dissipative structures provided with an abstract, modality-independent language.As shown in Figure 6, we predict a limit on complexity or a maximum assembly index, a H , achievable via nonhuman dissipative structures.These structures do not use an abstract, modality-independent language required for advanced human creativity.A human creative work also needs a certain minimum amount of information, N H .We take it for granted that, presently, only Homo sapiens has a gift of creativity that exceeds a H . Any creation is required to be shaped through the unique personality of its human creator(s) to such an extent that it is statistically one-time in nature [46]; it is an imprint of the author's personality.Subsequent plateaus of a (N) max > a H can also be thought of as scientific paradigms [47] that define the basic concepts and research practices in science.
Any structure of information assembled via a dissipative structure in s steps can belong to one of the four regions shown in Figure 6: We do not exclude the possibility that non-human dissipative structures are capable of suboptimally assembling structures C above a H , provided that their assembly indices satisfy a (N) (C) < a H . Thus, the optimization path shown in the white rectangle in Figure 6 is available only to humans.The results reported here can be applied in the fields of cryptography, data compression methods, stream ciphers, approximation algorithms [48,49], reinforcement learning algorithms [50], information-theoretically secure algorithms, etc.Another possible application of the results of this study could be molecular physics and crystallography.Overall, the results reported here support the AT, emergent dimensionality [12,15,[22][23][24][26][27][28]38], and the second law of infodynamics [25,29], and they invite further research.Table A13.|B (5) | = 10 balanced bitstrings.
while the m k numbers in red indicate that certain Ñs within the sequences they generate are exceptions to the general a ( Ñ) min = s + 2 rule.

Figure 4 .
Figure 4. Lower bound on the bitstring assembly index given by Theorem 2 (red) and log 2 (N) (red, dash-dot), upper bound on the bitstring assembly index given by Conjecture 1 (green), factual values of the bitstring assembly index (blue) and the ringed bitstring assembly index (cyan), and N − 1 (green, dash-dot) for the bitstring length 0 ≤ N ≤ 20.

Table 1 .
Distribution of the assembly indices for N = 4.

Table 3 .
Bitstring length: N; number of all bitstrings: 2 N ; number of balanced bitstrings: B (blue) as a function of the bitstring length, N.

Table 5 .
The lower bound on the bitstring assembly index (OEIS A003313).

Table 6 .
List of the shortest-addition chain-sequence-generating factors for 1 ≤ s ≤ 5.

Table 8 .
Exemplary balanced bitstrings, B ) bitstrings (red if below the conjectured value and green if above). nrng Matlab code to generate the conjectured bitstring assembly index upper bound.

Table A5 .
Distribution of the assembly indices for N = 5.

Table A6 .
Distribution of the assembly indices for N = 6.

Table A7 .
Distribution of the assembly indices for N = 7.

Table A8 .
Distribution of the assembly indices for N = 8.

Table A9 .
Distribution of the assembly indices for N = 9.

Table A10 .
Distribution of the assembly indices for N = 10.

Table A11 .
Distribution of the assembly indices for N = 11.

Table A12 .
Distribution of the assembly indices for N = 12.