Oritatami: A Computational Model for Molecular Co-Transcriptional Folding

We introduce and study the computational power of Oritatami, a theoretical model that explores greedy molecular folding, whereby a molecular strand begins to fold before its production is complete. This model is inspired by our recent experimental work demonstrating the construction of shapes at the nanoscale from RNA, where strands of RNA fold into programmable shapes during their transcription from an engineered sequence of synthetic DNA. In the model of Oritatami, we explore the process of folding a single-strand bit by bit in such a way that the final fold emerges as a space-time diagram of computation. One major requirement in order to compute within this model is the ability to program a single sequence to fold into different shapes dependent on the state of the surrounding inputs. Another challenge is to embed all of the computing components within a contiguous strand, and in such a way that different fold patterns of the same strand perform different functions of computation. Here, we introduce general design techniques to solve these challenges in the Oritatami model. Our main result in this direction is the demonstration of a periodic Oritatami system that folds upon itself algorithmically into a prescribed set of shapes, depending on its current local environment, and whose final folding displays the sequence of binary integers from 0 to N=2k−1 with a seed of size O(k). We prove that designing Oritatami is NP-hard in the number of possible local environments for the folding. Nevertheless, we provide an efficient algorithm, linear in the length of the sequence, that solves the Oritatami design problem when the number of local environments is a small fixed constant. This shows that this problem is in fact fixed parameter tractable (FPT) and can thus be solved in practice efficiently. We hope that the numerous structural strategies employed in Oritatami enabling computation will inspire new architectures for computing in RNA that take advantage of the rapid kinetic-folding of RNA.

* when Module C folds in the leftmost column of a Zag-row, its environment is to the left: always BT; and above: either A00,D0 or A01,D1 or A10,D0 or A11,D1. For each of these four environments, Module C is expected to fold into the following brick:

-
In the rightmost column of a Zag-row (Ñ): Module D folds in a 3ˆ6-region surrounded to the left: by either C0 or C1 and above: by DT. For each of these two environments, Module D is expected to fold into the brick DT:   The following lemmas prove that the molecule folds as claimed in the zig-rows.
Lemma 1 (Modules A and C-inner column-Zig row-no carry propagation). When folded in an inner column of a Zig-row from the output configurations of the previous brick Dc (resp. Bc) and under a brick Cx (resp. Ax) s.t. x`c ď 1 (i.e., no carry propagation), module A (resp. C) folds into the brick Apx`cq (resp. Cpx`cq) with output configurations β.
Proof. This property is proved by the folding certificates in Figures:

Lemma 8 (Modules A and C-Zag row).
When folded in Zag-row from the output configurations of the preceding brick BT or B2 and under a brick Cxc (resp. Axc), module A (resp. C) folds into the brick Az (resp. Cz) where z " x`c mod 2 with output configuration λ 3 , λ 1 3 , λ 4 , λ 1 4 or λ 5 which only differ by the positions of their bonds with the environment.
Proof. This property is proved by the folding certificates in Figures: Lemma 9 (Modules A and C-Zag row). When folded in Zag-row from the output configurations of the preceding brick Ax (resp. Cx) and under the bricks BzAyc (resp. DzCyc in an inner column or DT in the rightmost column), module B (resp. D) folds into the brick B2 (resp. D2) with output configuration λ 1 2 if y`c ď 1 and output configurations µ Y µ 1 if y`c " 2.
Proof. This property is proved by the folding certificates in Figures Lemma 10 (Modules A and C-inner column-Zag row-no carry propagation). When folded in an inner column of a Zig-row from the output configurations of the preceding brick Dc (resp. Bc) and under a brick Cx (resp. Ax) s.t. x`c ď 1 (i.e., no carry propagation), module A (resp. C) folds into the brick Apx`cq (resp. Cpx`cq) with output configurations β.
Proof. This property is proved by the folding certificates in Figures: All the folding tree certificates are given in the next supplementary section.

S.2. Folding Certificates
The following sections contain the certificates for the folding of each of modules in all possible environments in a human-readable and -checkable form (up to zooming in the PDF file for some of them). When reading these certificates, the top circles represent the input nascent configurations inherited from the end of folding of the previous module (there might be several of them). Then, the number of new bonds of each of the local configurations is written in the top left corner of that configuration. The path in bold shows the configurations with the maximum number of bonds, that is to say the only ones that are allowed to continue to grow by the inertial dynamics. To improve readability, we group in the same ball all paths that share a common prefix up to their last bond; the free end of each path in the group is drawn in random color; the size of the group is given in the lower right corner for cross-checking. The last level of the folding tree represents the output nascent configurations, that is the only configurations allowed to start the folding of the next module.  Output nascent configurations: θ Y θ 1

S.3. Analysis of Algorithms 1 and 2
Proof of Theorem 3. The key is that every step of the folding is computed locally in a fixed and known environment: at each step i, the δ beads to be folded look for their best positions by interacting with beads with fixed and known positions within a radius δ`1. It follows that one can compute the set of all suitable subrules, considering these Opδ 2 q bead types only (i.e., that place the i´δ`1-th bead of the molecule at the correct position). Oblivious O and inertial I dynamics differ as I only consider input nascent configurations which are output by the previous step, whereas O does not use any information from the previous step. This implies that one need to remember some information to connect one step to the next in I , whereas O needs no memory at all. Formally, a subrule R : B 2 Ñ ttrue, false, Ku is a symmetric function that states for each pair of beads if they attract each other (true) or not (false), or if this is undefined (K). We denote by dom R " tpa, bq P B 2 : Rpa, bq ‰ Ku the domain of R. Two subrules R 1 and R 2 are compatible, denoted by R 1 " R 2 if they agree for every pair where they are both defined, i.e., if for all a, b P dom R 1 X dom R 2 , R 1 pa, bq " R 2 pa, bq. We say that R 1 matches with R 2 if for all pa, bq P dom R 2 , R 1 pa, bq " R 2 pa, bq. If R 1 " R 2 , we denote by R 1 Y R 2 the subrule obtained by merging R 1 and R 2 , i.e. defined by R 1 Y R 2 pa, bq " R 1 pa, bq if pa, bq P dom R 1 , and " R 2 pa, bq otherwise.
For all 0 ď 1 ď |c|, we denote by c 1..i the prefix of length i of a configuration c. Algorithm 1 solves RDP for the oblivious dynamics in time linear in the length of the sequence as follows. It incrementally constructs a set of rules that place each bead at its desired position in each of the k environments. It proceeds by maintaining a set R of candidate subrules that place correctly the first i beads in every of the k environments. At the i-th step, the main procedure FINDRULEOBLIVIOUS() first extends each candidate subrule R in R by calling procedure EXTENDOBLIVIOUS() which scans all the possible attraction rule extensions of R for the new nascent bead (the pi`δ´1q-th) with the bead types it can reach in each of the k configurations, and retains only the ones that place the i-th bead at its correct position for all k configurations. Note that this extension of the rule does not change the positioning of the pi´1qth first beads since the pi`δ´1q-th bead is not yet produced when they are placed. Now, in order to keep the processing time constant for each bead, main procedure FINDRULEOBLIVIOUS() calls procedure PROJECTOBLIVIOUS() which retains only one representative of each subset of rules that define the same attractions for the δ´1 nascent beads (indexed from i`1 to i`δ´1), i.e., for the only bead types for which the rule matters in order to determine the positions of the upcoming beads. Once the n´δ-th bead is placed, the main procedure concludes by checking that the surviving rules place the last δ´1 beads at their desired positions.
Time and space complexity analysis. Let us first analyse the procedure EXTENDOBLIVIOUS. All the beads reachable by the pi`δ´1qth bead are located at distance at most δ`1 from c j i´1 in the j-th target environment. The size of N is thus at most 3kpδ`1q 2 . It follows that there are at most 2 3kpδ`1q 2 subrules ρ to consider. Testing each subrule takes Op5 δ q time and Opδ 2 q space. It follows that each execution of procedure EXTENDOBLIVIOUS takes Op5 δ 2 3kpδ`1q 2 q time and Opδ 2 2 3kpδ`1q 2 q space. Let us now analysis the procedure PROJECTOBLIVIOUS. Again, all the beads reachable by the pi`δ´1qth bead are located at distance at most δ from c j i in the j-th target environment. The size of N is thus at most 3kδ 2 . It follows that there are at most 2 3kpδ´1qδ 2 subrules ρ to consider. It follows that the size of the output set Π is bounded by 2 3kpδ´1qδ 2 . Procedure PROJECTOBLIVIOUS runs then in Op|R|`2 3kpδ´1qδ 2 q and uses at most Opδ 2 2 3kpδ´1qδ 2 q space. Note that the size of R in the main procedure FINDRULEOBLIVIOUS is always bounded by the size of the output Π of PROJECTOBLIVIOUS times the size of the set Σ output of of EXTENDOBLIVIOUS. The size of R is thus at most 2 3kpδ 3`2 δ`1q at all time. As the main procedure calls n´δ`1 times the procedures EXTENDOBLIVIOUS and the procedures PROJECTOBLIVIOUS, we conclude that Algorithm 1 runs in Opn¨5 δ 2 3kpδ 3`2 δ`1`pδ`1q 2 q q time and uses Opn¨δ 2 2 3kpδ 3`2 δ`1`pδ`1q 2 q q space, which are both linear in n for constant k and δ.
Algorithm 2 works similarly for the inertial dynamics I . In fact, we just extend the subrule technics by testing subrules for each possible input and corresponding output nascent configurations set, for each seed-target configurations pair and each time step. This multiplies the memory and time complexities by 2 k5 δ´1 . c 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).