Circuit topology for bottom-up engineering of molecular knots

The art of tying knots is exploited in nature and occurs in multiple applications ranging from being an essential part of scouting programs to engineering molecular knots. Biomolecular knots, such as knotted proteins, bear various cellular functions and their entanglement is believed to provide them with thermal and kinetic stability. Yet, little is known about the design principles of naturally evolved molecular knots. Intra-chain contacts and chain entanglement contribute to folding of knotted proteins. Circuit topology, a theory that describes intra-chain contacts, was recently generalized to account for chain entanglement. This generalization is unique to circuit topology and not motivated by other theories. In this paper, we systematically analyze the circuit topology approach to a description of linear chain entanglement. We utilize a bottom-up approach, i.e., we express entanglement by a set of 4 fundamental structural units subjected to 3 (or 5) binary topological operations. All knots found in proteins form a well-defined, distinct group which naturally appears if expressed in terms of these basic structural units. Prime knots, which are viewed by knot theory as undecomposable, are also made of these structural units connected in some specific way. In turn, this kind of connection shows the fundamental reason why prime knots cannot be decomposed in the rigorous sense of knot theory. We believe that such a detailed, bottom-up understanding of the structure of molecular knots should be beneficial for molecular engineering.

1 Linear chains, such as proteins and nucleic acids, demonstrate an immense structural diversity owing, in part, to a myriad of possible chain configurations which appear as various knots [1], slip-knots [2], and loops, and are believed to be of relevance to biological function of these molecules [3,4]. A three-dimensional structure of linear molecular chains is commonly described in terms of knot theory [5], which is a powerful and rigorous mathematical concept. The approach is generic and applicable to any linear chain, not limited to biological molecules. In terms of knot theory, a knot is a one-dimensional topological circle embedded into three-dimensional space; it is a continuous structure without free ends. In other words, in order to turn a linear chain into a knot, one has to join the chain ends. While discussing chains and knots, it might be convenient to think of a rope which we will tie and tangle.
The most basic, "undecomposable" knots are called prime knots. Some of them are shown in Fig.1, where the capital number specifies the number of crossings in the minimal crossing projection and the subscript is assigned in order to distinguish between knots with the same number of crossings. Here, the number of crossings in each knot cannot be decreased but can easily be increased by, for example, twisting some loops or threading the rope through a loop. Knots do not change upon deformations which do not break the rope, i.e., which do not break the continuity of the knot. Such deformations can be expressed via a sequence of specific deformations performed on a knot projection which are called Reidemeister moves.
The resulting structure could look very different form the original primary knot, but is perceived as equivalent by knot theory. This is one of the major ideological differences between knot theory and molecular engineering. Knot theory is designed for other purposes, namely to capture topological invariance under ambient isotopy (i.e. weather two knots can be deformed into each other); while in case of molecular engineering, even minor changes in shape of the chain might matter. For example, slip-knots -which are very common in proteins and crucially important for their proper functioning -are ignored by knot theory. However, the nature of slip-knots is rather geometric than topological and therefore knot theory has to ignore them. In our work, we aim to develop a theory which would serve for molecular engineering. All basic molecular engineering operations should have a clear and intuitive representations. Hence, having the basic structural units to build up a chain seems to be convenient. For example, it is well known that by cutting the loop of a slip-knots, the chain is reduced to a trefoil. Our theory should (and will) give a clear analytical visualization of this process. 2 In this paper, we will consider chains with different geometrical shapes. In order to make sure that two chains cannot be deformed into each other, we will join their ends to form a mathematical knot and calculate a so-called Alexander polynomial. Its definition and calculation can be found in any knot theory textbook or in our previous paper [6]. If two knots have different Alexander polynomials, then these knots must be different. The inverse statement is usually correct, but not always.  An application of knot theory to proteins is centered around a search for prime knots in a spatial protein structure. The only knots which have been so far found in proteins [7] are 3 1 , 4 1 , 5 2 , 6 1 . What is a fundamental reason for this choice? What property exactly separates these knots from other knots? The answer to the second question is known. The knots found in proteins can be formed following the so-called twisted hairpin folding mechanism [8] outlined below. This mechanism is rather a phenomenological explanation which does not provide a fundamental difference between knots in terms of knot theory. In principle, a topological theory alone is not able to provide such a reason because physical properties of the chain must matter. In our theory, the twisted hairpin mechanism appears naturally as part of the formalism. A few years ago, the concept of circuit topology was suggested in order to account for intra-chain contacts [9][10][11], which are also very important for proteins. Very recently, circuit topology was generalized to account for chain entanglement as well, focusing on applicability to real-life molecules [6]. The new framework is still in its development stage and lacks certain rigorousness, especially in comparison to very-well developed knot theory.
In this study we attempt to strengthen the foundation of generalized circuit topology and demonstrate that this theory appeals to the "natural and inherent" language describing entanglement. To demonstrate this, we will, among others, re-discover some known results which appear smoothly as internal part of circuit topology. We believe it will be useful for molecular engineering and will help puzzle out the design principle of naturally evolved protein knots. to-right or right-to-left, we see the same structure. Each structure is called an s-contact, or a soft contact, and cannot by untied by pulling the chain ends. The term "contact" was coined to stress similarity to intra-chain, non-entanglement contacts which can also be treated by circuit topology [6] but are not considered in the present paper. Each contact has two contact sites. (In case of intra-chain contacts, contact sites are the two chain segments which are linked together.) An s-contact is supposed to be contained between its contact sites, i.e., contact sites are the boundaries of the structural unit as viewed while moving along the chain. We define that contact sites are located where the chain passes through the loops, so that the entangled segment is located in-between. What happens to the chain outside to the structural unit (i.e. on either side from the contact sites) is irrelevant. Since s-contacts should be considered as 3-dimentional entities, the exact position of contact sites depends on many parameters [6], e.g., on the knot tightness, but it always represents the knot structure. Also, contact sites can migrate along the chain during a chain deformation.
This uncertainty is essential in order to be able to catch the fixed topological structure of a flexible, mobile chain. Contact sites are depicted with a red ball on several structures shown in Fig.2. Contacts should be given names. We usually use capital letters, such as contact A, contact 5 B, etc. In Fig.2a, each chain has one contact. If we move along the chain from any end and write down contact sites we encounter, we will get AA, which is a code for one contact.
How to distinguish between the four different s-contact shown in the figure? Fig A −o A, A −e A. This notation is called the string notation of circuit topology. It codes a chain entanglement as a string of letters. One of the advantages of this notation is the ability to apply combinatorial analysis directly to a description of entanglement. Also, note that the attributes introduced above are universal and do not depend on the chain orientation, i.e., in which direction we move along the chain, left-to-right or right-to-left. projections used by knot theory and can be useful in building a link between knot theory and circuit topology.
So far, we identified 4 stable "basic units" of chain entanglement, and called them scontacts. It should make sense on the intuitive level because any messy blob of a rope is held together by loops hooking to each other, which is the essence of entanglement. Also note that if we flip only one crossing in any chain from Fig.2, the s-contact will disappear and that chain will untie. Let us consider s-contacts in the view of knot theory. To form a knot from a rope, one has to join its ends. To make a rope form a knot, one has to cut the knot somewhere. A e A corresponds to 3 1 . This knot can be right-handed, as in other. The sequence of corresponding moves is shown in Fig.3b Otherwise, it will not occur, rendering the molecular knot stable. We will discuss it in Section 3.2. However note in Fig.2d that A +o A and A −o A look like a mirror reflection.
Such a flip of symmetry matters in proteins, so we must retain it for molecular engineering purposes.
We claim that any knot can be considered as a set of only 4 s-contacts connected according to the rules discussed in the next sections. However, s-contacts might not be easy to spot, even in such a simple case of 4 1 knot. Where exactly are s-contacts in the prime knots shown in Fig.1? We will break it down below, but at the current state of our theory, it is easier to go in the opposite direction, i.e., to tie s-contacts on a rope and then identify the resulting knot.

II. SPX CONFIGURATIONS OF S-CONTACTS
One s-contact has two contact sites and appears in the string notation as AA where  contain different kinds of s-contacts and therefore cannot be deformed into each other.
The transition from Fig.4c is important in the context of protein folding and has been studied in the literature [12]. Knot theory cannot distinguish these two configurations because they correspond to the same knot [6]. However, in real molecules such a transition between these configurations requires energy and once again comes down to the question of stability; the transition might be very probable or might never happen, depending on 9 the physical properties of the chain. Circuit topology aims at addressing this question and provides consideration of different levels of structural stability.
SP configurations are similar to the notion of a connected sum used in knot theory.
To form a connected sum of two knots, one should cut each knot and merge together the resulting ends, which is the procedure demonstrated in Fig.4. configurations, see Fig.3. In X configuration, loops from one s-contact are connected to loops from another s-contact, and hence are not free. Therefore, contacts A +o A and A −o A are not identical and lead to different knots when they are parts of X configuration.
Let us consider the transition between S and P configurations by looking at the illustra-tions in Fig.5 and see again that it does not work for X configuration (C configuration will be considered in the next section). Also, the single deformation shown in Fig.4c is obvious only in case of two s-contacts. What if there are more s-contacts? What is the general rule?
Topologically, we can stretch the chain but we should not break it. When two loops are joined into an s-contact, they are connected and cannot be separated. In contrast to loops, one single s-contact can be moved along the chain freely. So, we take the blue contact from the right top corner of Fig.5   matter how many times the rope wraps around the large loop, the second contact will not be formed. Indeed, this "spiral" around the large loop will not be stable and will be easily untied by pulling the rope ends apart.
Each of the four s-contacts has an Alexander polynomial degree 2. Two s-contacts in any of SPX configurations have Alexander polynomial degree 4. n s-contacts in any SP configuration have Alexander polynomial degree 2n, because it is a product of Alexander polynomials of single s-contacts. It is reasonable to expect that the Alexander polynomial degree scales the same for X configurations as well. Indeed, Fig.6 shows a clear pattern of Alexander polynomials, depending on the kind of s-contacts in cross. The easiest pattern appears for positive even contacts. A +e A corresponds to t 2 − t + 1; A +e B +e AB corresponds to t 4 − t 3 + t 2 − t + 1. One can predict that A +e B +e C +e ABC has the Alexander polynomial The corresponding prime knot, 7 1 , is not drawn here, but can be easily deduced from the pattern. A +e A is shown in Fig.2c. Then the right end of the chain makes one circle around the horizontal segment and forms A +e B +e AB from Fig.6.
Another similar circle around the horizontal segment will lead to A +e B +e C +e ABC. A simple calculation shows that this chain indeed has the Alexander polynomial we expected. It would be interesting to investigate further on the relationship between Alexander polynomials and s-contacts (the systematic representation in Fig.5 might be helpful), but it is beyond the scope of the present paper. Here we only hypothesize that such a relation exists. One should, however, note that the chains in Fig.6 have a different number of crossings, correspond to prime knots which also have another number of crossings; yet they all have Alexander polynomials of the same degree. We attribute it to the pattern we outlined.
Despite having the same Alexander polynomials, the configurations from Fig.6 do not visually resemble the prime knots from Fig.1. This occurs because of the way the loops intertwine in cross configurations. How can we be extra sure that they are actually the same, i.e., that they can be continuously deformed into each other? One way is to deform them both into the loop representation from Fig.5. However, such manipulations require many-step, major deformations which are hard to follow and, quite frankly, tedious to draw.
More importantly, this would have no practical use. Indeed, we aim at describing proteins and other linear macromolecules. The 4 s-contacts (+3 1 , −3 1 , +4 1 , −4 1 ) are to be found and identified automatically by a computer, not by a naked eye. s-contacts are not obvious in prime knots. Fig.7a shows the equivalence of 6 2 knot from Fig.1 and A +e B +o AB form Fig.6. Surprisingly, it requires only a minor deformation. 5 1 and 6 3 can be treated similarly. 5 2 and 6 1 will be discussed in the next section.  C configuration might seem as a specific case of X, and some of their properties are indeed similar, but not identical. What makes C resemble X rather than P and S, is that C cannot be turned into another configuration (unlike P and S).
As clear from is not an s-contact and will be considered in the next sub-section. Hence, the notation can be simplified: (A +e B +e )AB → A +2e A, where the digit signifies the number of s-contacts.
Note that C configuration is not symmetric. If the rope is read from another end, the full string notation should be used, e.g., A +e B +e (AB). This simplified notation is used in Fig.1 because, unlike chains, mathematical knots are closed structures and need to be cut to become chains. No matter where a prime knot is cut, the resulting linear chain is always the same. Using a simplified notation is desirable because it stresses that concerted s-contacts are, in essence, only one contact because it can be untied by one move unhooking only one 16 loop. Alexander polynomials of concerted contacts, Fig.8, are degree two, which corresponds to one s-contacts. The factor at t 2 shows the number of s-contacts in the concerted contact, e.g., 3t 2 − 5t + 3, which is A 3o A, consists of 3 s-contacts. This is in line with the conjecture made in the previous section arguing that the Alexander polynomial degree scales with the number of s-contacts for all SP, X and C configurations. SP is similar to connected sum and the corresponding scaling of Alexander polynomial is well-known in knot theory. C corresponds to a special class of knots called twist knots, for which the scaling was proved as well. X, as far as we know, does not have a direct counterpart in knot theory, and so the scaling conjunction for X has not been proven (or even considered) yet.
Concerted contacts are closely related to each other. If we twist a rope once, we form a loop. By threading the rope through this loop, we get A e A. If we twist a rope twice and thread it, we get A o A, see Fig.8. If we twist a rope three times and thread it, we get A 2e A.
Twisting four times gives A 2o A, etc. This is the twisted hairpin mechanism [7,8] which leads to a formation of twist knots. All the knots found in proteins can be tied by this mechanism.
In other words, concerted and only concerted contacts have been found in proteins. Here we would like to point out that C configuration does not arise as a result of some artificial mechanism of twisting the rope, but it comes out from the bottom-up consideration of possible arrangements of contact sites of s-contacts, which is more fundamental and general.
The swirling part of concerted knots can be seen in Fig.1, especially for 5 2 and 6 1 . However, their s-contacts are still not so easy to spot.

B. Slip-knots
As we stated above, (A +e B −e )AB is impossible. What if we try to tie it anyway? After all, we are interested in considering all possible configurations. Fig.9 shows the treatment of even contacts, i.e., A e A. Odd contacts, i.e., A o A, can be treated similarly. In order to form a negative contact along with a positive contact, one has to reverse the direction of the chain.
In Fig.9 it is marked with a green ball which is located where the chain passes through the s-contact. The left part of the rope up to the green ball is the same in all configurations.
Let us consider it. Contact A (red balls) consists of two loops. The second red ball shows where the rope passes through the large loop, closing (or "fixing") contact A. Then, the rope passes through the large loop again (green ball), so potentially it might be a sign of another s-contact. However, in this case no s-contact is formed because it does not match any structure from Fig.2. In fact, we end up with the structure similar to the first or the last ones in Fig.2c but with the middle crossing flipped. So, we have an event worth noticing (passing through the large loop) but there is no s-contact. In the string notation, this event is shown as a subscript, where the sign indicates the direction in which the rope passes the s-contact: "+" if it coincides with the positive direction of the s-contact, "−" overwise.
Such a subscript forms a loop, but this loop is not fixed, so is does not form an s-contact.
To some extent, subscripts can be viewed as a half of an s-contact. In other words, if the rope passes through a loop and fixes it, it creates an s-contact; overwise it is unknot. If the rope passes through an s-contact and fixes it, it creates another s-contact; overwise it creates a subscript. Subscripts appear only when the rope passes through an s-contact. For example, in Fig.6 the rope passes through various loops, especially in configurations with different symmetries of contacts. But it does not create subscripts because those loops are not s-contacts. S-contacts are stable, i.e., cannot disappear or be untied, while subscripts are not stable.
In any configuration shown in Fig.9, the rope segment between the second red and green balls forms a slip-knot which can be untied by pushing it back through contact A (red balls), which would eliminate contact A and the corresponding subscript. In terms of knot theory, this transition is a sequence of Reidemeister moves. Hence, knot theory does not see subscripts and slip-knots. Why does circuit topology ignore unstable loops but consider unstable subscripts? It is because one has to find a balance between unstable and irrelevant.
Loops are very flexible structures which can appear and disappear easily. Subscripts, even though they can disappear, are more stable than loops and are observed in proteins as slipknots. Whether a subscript is "metastable" or not, depends on the physical properties of the rope and the size of the corresponding s-contact. A related issue in the context of stability is that, while forming a subscript, we do not specify where exactly the s-contact is pierced by the rope. To some extent, here the s-contact is considered as a loop, not as two loops. So far we deem it as sufficient because here the s-contact works only as a restriction of motion of the rope. What part of the s-contact exactly restricts the motion is less relevant.  Fig.8 we found that some combinations of s-contacts are impossible to be concerted.
However, in Fig.9 we see that these impossible configurations can be described by subscripts.
So, concerted configurations and configurations with subscripts are complimentary to each other. Also, laying out the formalism, we try to build an analogy between intra-chain (non-entanglement) contact and s-contact which describe entanglement. Since a subscript is similar to a half of an s-contacts, contact A (red ball) and the subscript (green ball) can be vaguely considered as being in series and sharing a contact site via the slip-knot.
Hence, configurations with subscripts are analogous to concerted series configurations of intra-chain contacts. In other words, we found that entanglement and intra-chain contacts have the same set of configurations in term of circuit topology, namely series, parallel, cross configurations for both intra-chain contacts and s-contacts, concerted parallel for intra-chain contacts vs. concerted s-contacts, and concerted series for intra-chain contacts vs. subscript configurations for s-contacts. This analogy matters for a completeness of circuit topology description of entanglement. Note that in case of intra-chain contacts, concerted contacts are more similar to SP configurations, while in case of entanglement, concerted contacts are more similar to X configuration.
As proclaimed above, we want circuit topology to be able to describe molecular operations easily and intuitively. It can be achieved by means of string notation. Let us consider Fig.9 because it has both kinds of concerted contacts. Let us cut it at the loop of the slip-knot, i.e., between red and green balls. In string notation it looks like only once through the right part (due to contact B). One should be careful with this last statement because the right part has no s-contacts, so one cannot strictly pass though it.
However, the right part swirls around the left part, forming a kind of a tunnel which the left part passes through. It can be visualized in Fig.9.

IV. CONCLUSION
Both knot theory and circuit topology aim to describe entanglement. Knot theory considers any entangled chain as a connected sum of prime knots [13]. Prime knots cannot be divided; they are undecomposable. Circuit topology splits any entangled chains (including prime knots) into basic structural units called s-contacts, and lists simple rules how s-contacts can be put together. These rules can be considered as binary operations defined on s-contacts. There are 3 main operations (SPX) which put two s-contacts in series (S), in parallel (P), or in cross (X); and two auxiliary operations, which make s-contacts concerted (C), or add subscripts (Sub). X, C and Sub cannot be changed, while S and P can be turned into each other as long as no other operation is present. It gives rise to an interesting algebra of these operations. We found in Section 2 that contacts as a whole can be dragged along the string, which explains the transition between S and P. What would happen if other operations were present? Let us consider ACABCB. Contacts A and B are in series, but they can never become in parallel because they cannot be dragged along the string. The dragging is blocked by contact C. Indeed, if we want to drag contact A, we would also have to drag everything locked between the letters "A", i.e., the letter "C". But it is not the whole contact, hence such a drag is forbidden. In our previous work [6] we introduced the notion of circuits.
A circuit is a segment of a string which consists only of pairs of letters and subscripts of the same letters. In other words, a circuit can be isolated from other contacts. By "isolated" we mean "can be put in series". Circuits can be dragged along the string. Obviously, circuits can consist of several circuits, e.g. AABCBC consists of AA and BCBC. If a circuit does not contain smaller circuits, then it is undecomposable and hence correspond to a prime knot.
It would be interesting to further investigate this algebra and the detailed construction of prime knots out of circuits. For example, we said that SP looks similar to a connected sum. Why? It is because the circuit AABB consists of smaller circuits, namely AA and BB, and hence AABB is not a prime knot, which implies that it must be a connected sum of prime knots. In principle, coding entanglement as a string of letters offers an advantage of being able to apply combinatorial analysis (even before considering the algebra of circuit topology operations). In this paper, we employed it in a very mild proportion in order to count the number and kind of possible s-contacts (A +e A, A +o A, A −o A, A −e A), see Fig.2, and all possible configurations of pairs of s-contacts, Fig.5. Indeed, two s-contacts cannot have more then 2+2=4 loops, and we considered all configurations consisted of 2, 3, and 4 loops. This pair-wise consideration is sufficient to code entanglement, i.e., to specify the unique string corresponding to the chain, but, as we just saw by ACABCB, the dynamics of the chain, i.e., the mobility of s-contacts can be affected by other contacts, so that larger scale structures such as circuits have to be considered.
So far we have considered only chains consisted of a small number of s-contacts. It might be sufficient when it comes to molecular engineering since all the knots so far found in proteins consist only of 1 or 2 s-contacts, Fig.1. While listing these knots, we did not specify their chirality because it does not lead to any topological distinction but only flips all the crossings in the knot. However, sometimes in the literature their chirality is reported [7], namely the knots in proteins are A +e A, A −e A, A o A, A −2e A, A +2o A, which are +3 1 , −3 1 , 4 1 , −5 2 , +6 1 . As said above, A o A or 4 1 is achiral, hence one cannot specify its sign as long as it is not in cross with other s-contacts. So, this list contains all single s-contacts (A +e A, A −e A, A o A) and two s-contacts concerted (A −2e A, A +2o A). Why does this list not contain A +2e A, A −2o A? It has been agreed upon [8] that topology cannot answer this question because it is related to the chemical structure of a protein chain. Also, it might be the case that these two configurations do exist, just have not been found yet. All these 5 found knots consist of concerted s-contacts only (single s-contacts are considered as a limiting case of concerted).
The physical reason behind this is still unknown and lies beyond pure topology and the scope of this paper, though some speculations can be made. In order to tie a concerted structure, one has to thread a chain through a loop only once; whereas other configurations (knots) require two events of threading, thereby making them more complicated to tie. Another reason might be related to the 3D shape of the chain. To tie a concerted structure, one has to twist the chain a few times in order to form the spiral-like shape, see Fig.8. Such a shape might be natural for proteins and induce less stress on the chain than other shapes. In other words, the twisting motion can be done automatically by the chain itself in order to attain the preferable spiral-like shape. Circuit topology might be a convenient approach to work with such problems because it can be naturally generalized to account for relevant physical properties. Indeed, circuit topology differentiates between stable configurations (s-contacts), meta-stable configurations (subscripts, i.e., slip-knots), and not-stable configurations (single loops). Each kind of s-contact possesses its own energy; and a transition between s-contacts requires some energy (maybe in a form of entropy penalty). By building up knots out of s-contacts, one can analytically estimate the energetical complexity of various transitions.
All the illustrations shown so far came from the pursue to consider all possible configurations consisted of 1 and 2 s-contacts. It is a bottom-up approach when we combine the "basic units" and see which knots we end up with. Let us now go in the opposite direction.
We will consider a fairly complicated knot and break it down to s-contacts. As mentioned above, it is a tedious procedure which should be done by a computer, not by a naked eye.
On the other hand, it helps to visualize and appreciate how s-contacts work in real life. We chose to consider knot 9 46 because it has the same Alexander polynomial as knot 6 1 from Fig.1. In this paper, we use Alexander polynomials only to distinguish between knots while developing our approach. Alexander polynomials work very well, but fail is some rare cases.
Let us see if our circuit topology can catch the different between 6 1 and 9 46 . Fig.10a shows a sequence of moves to deform 9 46 to a more eye-friendly representation with one large loop.
All the moves are in 3D. The string notation is (A −2o C −o )B +e +B−B ABC. Fig.10b color-codes the s-contacts. Every rope segment trapped by a loop restricting its motion, gives rise to an s-contact site or to a subscript. Notice the use of the simplified notation for C configuration and the treatment of the subscripts originated from the loop passing through contact B (marked in dash). S-contacts B and C are in parallel. However, they cannot be deformed to be in series because their cross relation to contact A does not allow any transformation. So, circuit topology clearly differentiates between 6 1 and 9 46 . Note that 9 46 contains the same s-contact, A 2o A, which comprises 6 1 . Also note that 6 1 and 9 46 contain a different number of s-contacts, hence the scaling of Alexander polynomial with the number of s-contacts does not hold in this case. The main reason for this is the presence of subscripts which are not a part of knot theory (see Fig.9 where the pattern is broken as well). Whether chains with mixed operations (SP and X and C) follow the same scaling is unclear.
We hope we have demonstrated how circuit topology can be used to describe simple