Routes towards Novel Collagen-Like Biomaterials

: Collagen plays a major role in providing mechanical support within the extracellular matrix and thus has long been used for various biomedical purposes. Exemplary, it is able to replace damaged tissues without causing adverse reactions in the receiving patient. Today’s collagen grafts mostly are made of decellularized and otherwise processed animal tissue and therefore carry the risk of unwanted side effects and limited mechanical strength, which makes them unsuitable for some applications e.g., within tissue engineering. In order to improve collagen-based biomaterials, recent advances have been made to process soluble collagen through nature-inspired silk-like spinning processes and to overcome the difﬁculties in providing adequate amounts of source material by manufacturing collagen-like proteins through biotechnological methods and peptide synthesis. Since these methods also open up possibilities to incorporate additional functional domains into the collagen, we discuss one of the best-performing collagen-like type of proteins, which already have additional functional domains in the natural blueprint, the marine mussel byssus collagens, providing inspiration for novel biomaterials based on collagen-silk hybrid proteins.


Introduction
Collagens are extracellular, fibrous structural proteins fulfilling a variety of functions related to providing mechanical support. Collagen is historically defined to be a component of the extracellular matrix (ECM) and further characterized by its primary and secondary structure, as well as the potential for hierarchical self-arrangement that can lead to large, complex assemblies. However, several collagens have been described that do not constitute a component of an ECM; for this review, as a simplification, all proteins that contain collagen-like structural elements will be regarded as collagens.
As often in biology, most definitions tend to become blurred around the edges: Since a highly diverse class of collagens is used by mollusks as major load-bearing structures within threads (i.e., mussel byssus [1]), and even another class of natural protein fibers, namely silk, sometimes contain collagen-like motifs [2], it can, depending on the context, be worthwhile to look at collagen as a "sibling" of silk. Both collagen and silk have long been used as suture materials, confirming their suitability for biomedical applications, and have since been developed into a vast variety of biomaterials with remarkable properties. In this review, we try to give an insight into the current approaches of collagen research for use in medicine and try to show recent advances utilizing knowledge gained from the examination of different silk-like structures to yield novel collagen-based materials.

Molecular Structure of Collagen
Collagen is typically found in all animals, occurring early in evolution in several simple multicellular organisms such as sponges (Poriferae) [3] and Cnidariae [4]; however it is not present in plants and fungi, where its role has been taken over by polysaccharides such as cellulose and chitin. While it has been shown that collagen plays a crucial role in the formation of the animal extracellular matrix [5], collagen-like proteins have also been identified in several bacteria [6], where they mask the bacterial cell from animals' immune receptors and facilitate the formation of biofilms as well as the adhesion to host tissue.
Not taking into account several exceptions from the rule, such as collagen-based insect silks which are spun from specialized glands [2], collagens in higher eukaryotes are generally not secreted outside the respective tissue and contribute to and often largely define its mechanical and biochemical characteristics. While several classes of collagens are fibrous, especially those abundantly found in bone, skin, tendon, cartilage and connective tissues (types I, II, III, V, XI, XXIV and XXVII) [7], most people associate collagen with gelatin (made from type I collagen).
While the extraction of gelatin from collagen-rich tissues involves partial hydrolysis of the protein, as well as the partial or complete unfolding of the triple helix due to thermal denaturation, some mild extraction methods for triple helical collagen exist (see below) [8,9].
Although based on the same polypeptide, gelatin and collagen have vastly different properties, which largely stem from the fact that mature collagen has lost its ability to self-assemble into discrete fibrils, and therefore, once the triple helix has been denatured, restructures into an amorphous gelatin network.
Recent advances in both the recombinant production of synthetic collagen-like proteins and new, mild extraction and purification techniques out of natural sources, as well as the advanced processing of these extracts have opened a range of possibilities allowing the production of collagen-based materials with remarkable mechanical and biological properties. Many new methods to process collagen are biomimetic and some are inspired by the natural fiber spinning process of silk-producing animals [10][11][12].
Collagen fibers are hierarchically assembled structures that can be readily identified by their amino acid sequence [13]. A specific consensus sequence has been identified for collagen which follows the pattern (GXY) n , with the X-position often being proline residues and the Y-position often being 4-hydroxyproline residues. On a genetic level the high glycine and proline content of collagen-encoding genes creates GC-rich patches, which makes the identification of such genes from genomic sequences difficult, so that most known collagen sequences were obtained by cross-referencing cDNA-libraries with Edman-based peptide sequencing [14,15].
The (GXY) n consensus sequence can be explained by the spatial requirements for the formation of the triplehelical collagen tropomolecule [16]: Being a right-handed triple-helix with one turn every three amino acids and a very narrow diameter of 1.6 nm, in order for the molecule to fold, steric hindrances due to bulky amino acid side chains must be omitted on every position pointing said residue inwards, explaining the occurrence of glycine as every third residue. This glycine is essential, and mutations or deletions of the consensus motif have been shown to have a strong destabilizing effect on the collagen molecule, often described as "nicks" due to the localized breaking of the rigid helix and the resulting angular flexibility [17,18].
The amino acid distribution of the X-and Y-position is less stringent but, for most animal collagen, contain a comparably high amount of proline and 3-or 4-hydroxyproline, since these residues are slightly more stable than other amino acids in their cis-configuration, which is closer to the ideally required angle within the helix and thereby increases the thermodynamic stability of the fibril. The hydroxyl group of hydroxyproline, which is added post-translationally by specific enzymes (e.g., proline-4-hydroxylase, P4H), further increases the helical stability by allowing the formation of intermolecular hydrogen bonds between the α-chains involved in helix formation. Insufficiently hydroxylated collagen helices tend to partially unfold, thereby destabilizing the collagen fibrils and tissues based thereon, resulting in the pathological condition known as scurvy [19]. Some collagen-like proteins, such as streptococcal bacterial collagens [6,[20][21][22][23] and sawfly cocoon silk [2], do not contain hydroxyproline and instead stabilize the triple helix via the formation of intermolecular salt-bridges between charged amino acids.
Since collagen is a highly abundant class of proteins that fulfills a variety of roles, it also contains a highly diverse group of subclasses. The main variances between collagen subtypes are not necessarily large differences in the collagen helix, but the presence of non-collagenous functional domains flanking the collagen core domain, which provide additional functionalities and physicochemical properties to the molecule.
The mollusk byssus threads, produced by the mussels Pinnidae, Mytilidae and the Dreissenidae, serve as a holdfast structure and have extraordinary mechanical and chemical properties being able to resist the harsh environment of the intertidal zone in which these species reside, and combine features of structured, collagen-based block copolymers, as well as the concept of crystalline silk-like domains within an amorphous matrix. The byssus thread is produced in a specific secretory organ, the so-called mussel foot, which, in addition to its main anchoring function, serves as the main sensory and locomotive organ in the young mussel. The soluble byssus precursor proteins are excreted into a groove within this gland, where they are believed to be mixed and molded into the required shape by the flexible mussel foot and, after successful attachment via the formation of an adhesive plaque to a suitable surface has been achieved, quickly form an insoluble fiber upon opening of the groove as they come in contact with the seawater [24].
The "sibling" material silk typically comprises distinct, specialized proteins (fibroins) [25], and they strongly differ from fibers within the ECM in that they have usually been spun by the animal, i.e., processed from a highly-concentrated precursor solution within dedicated glands that can very tightly control the necessary parameters influencing fiber formation, such as drawing speed, pH and the concentration of salts and metal ions [26].

Motivation
The aim of this review is to give an insight into the current state of natural and synthetic fibrous collagens regarding their use as surgical sutures in wound repair and tissue engineering. Applying an understanding of the established processing of silk towards processing of collagen and collagen-like materials might be a way to overcome the shortcomings of current manufacturing collagen-based biomaterials.

Collagen as a Biomaterial
Due to the similarity of the collagen sequence and content within the extracellular matrix of different vertebrate species, animal-derived collagen is an attractive material for biomedical applications, because it is often compatible with the existing extracellular matrix of a receiving patient regardless of source or processing. Since the triple-helical fold typically does not induce an immune response, collagen is non-antigenic, as well as non-toxic, biodegradable (the rate of degradation is controllable via chemical crosslinking), can be formulated in a variety of shapes and forms, and is chemically modifiable to fulfill many specific purposes [27].
Collagen-based biomaterials are used as bacteriostatic shields and barriers, as sponges and pellets for tissue regeneration and accelerated blood coagulation, as gels for sustained drug delivery, as absorbable surgical sutures (catgut), as well as grafts for tissue engineering and replacement of tendons, bones, blood vessels and skin [28].
Most of the collagen used for these purposes has been extracted from animal sources and consists largely of the most abundant type I collagen. During the natural aging of collagen, intermolecular crosslinks with other molecules inside the ECM are formed, which makes the extraction of high-purity single molecule fibrils difficult especially from adult animals, and increases the cost of products that require pure, soluble tropocollagen in its non-hydrolyzed triple helical form. Therefore, many formulations allow for a variable degree of crosslinking or hydrolysis and proteolytic degradation, although both of these processes increase the complexity and variability of the handling and processing of extracted collagen between batches.
Having access to a source of collagen-like materials allowing the same kind of biocompatibility but show no batch-to-batch variability in their biophysical and chemical properties would immensely support the establishment of collagen-based products.

Towards Spun Collagen Fibers
Surgical sutures, historically known as catgut [29], have been a major utilization for collagen fibers in biomedical applications, since they get readily resorbed into the surrounding tissue over time and thereby promote wound healing and tissue regeneration. While catgut collagen is prepared from the decellularized small intestine of sheep and is therefore a fully matured and cross-linked collagen network that has been reshaped into fibers by cutting and stripping, its mechanical properties, although poor compared to that of silks and synthetic polymers in terms of toughness and ultimate stress, are still well suited for its purpose: Given that the catgut suture closely resembles the original structure of the intestinal endothelium, the elastic modulus is very similar to that of most internal soft-tissue ECMs and therefore will show comparable deformation when stressed, which in turn reduces the strain on the suture itself.
Nevertheless, the biomechanical properties of living tissue in the human body span several orders of magnitude both in elasticity and toughness [30], with, for instance, an aorta valve having a low ultimate stress of 0.3-0.8 MPa, compared to that of skin (1-20 MPa) or tendon (50-100 MPa). Ruptured cartilage, tendon and bone, even though they mostly consist of type I collagen, cannot typically be sutured with catgut, since the material does not have the required tensile strength and toughness, and instead require the use of synthetic materials with lower biocompatibility and no biodegradability [31].
Researchers have therefore begun to investigate processing methods in order to improve the mechanical properties of collagen-based fibers and thereby tailor the material to the requirements of the tissue it is intended to be used in. One major improvement was the wet-spinning of extracted or recombinantly produced, soluble collagen [10,12,32], which demonstrated the possibility of producing fibers from this source material outside of a biological system. Since the mechanical properties were still poor, these materials were mostly used to demonstrate their applicability for in vitro tissue engineering. However, due to the high cost of soluble full-length collagen, no commercial products have been made available so far.
Another recent approach [11] used a microfluidic system to produce mechanically highly stable fibers from acid-soluble type 1 collagen ( Figure 1). The authors suggest that the small diameter and the directional laminar shear forces in the spinning channel help in axially aligning the collagen fibrils in a fashion comparable to the fibril formation from tropocollagen during biosynthesis, thereby yielding a product with similar mechanical characteristics as tendon.
All of these approaches use the intrinsic property of collagen triple helices to be soluble but still folded in dilute acetic acid and therefore require an independent first step inducing the folding of the collagen triplehelix from the α-chain precursors. As a result of this limitation, no conditions such as high temperature or denaturing solvents, which would melt the triple helix, can be used during production, processing and sterilization. It is, therefore, highly desirable to find collagen-like proteins that intrinsically possess the ability to form soluble triple helices without requiring complex biological systems and mild conditions for their manufacture.

Collagen-Analogues as Basis for Future Biomaterials
As mentioned above, biomaterials based on natural collagens today are still mostly decellularized animal tissues that have been mechanically or chemically processed into the desired morphologies. This approach carries the intrinsic risk of adverse immunological reactions in the receiving patient due to allergies towards the graft [33], as well as the spreading of viral or prion-based contaminants [34].
The currently used methods for the extraction of soluble collagen from tissue include the use of mild solvents (phosphate buffered salt solution; dilute acetic acid) [9], which yield full length triple helices that still contain the terminal telopeptides. The best yields are gained when collagen is extracted from young tissues, such as calf skin and rat tail. Another option employs proteases such as pepsin, which utilizes the enzyme's unspecific activity to digest all non-collagen-like domains and therefore removes telopeptides, which results in higher yields even from more strongly cross-linked tissues [35].
Recently, collagen-based scaffolds for tissue engineering have been produced from solubilized and purified collagen that often has been blended with other polymers due to the limited availability of the soluble collagen source material [36,37].
When purified collagen is required for the production of mechanically stable biomaterials such as grafts for tendon replacement, the trade-off between batch-to-batch homogeneity and yield will always be a restricting factor. While large amounts of animal tissue are available for the extraction of collagen not intended for clinical use, the current sources of human collagen are limited to scarcely available extractable tissues (such as dermis) [8] and biotechnologically produced collagen made from human fibroblast cell culture (e.g., CosmoDerm) [38].
To overcome these limitations, different approaches have been made to find substitutes for natural vertebrate collagen which have adequate properties for tissue engineering, regenerative medicine and other biomedical applications.
Collagen mimetic peptides were originally used in determining the structure, stability and folding kinetics of collagen and collagen-like sequences [39][40][41], which provided remarkable insight into the role of each amino acid within the (GXY) n -repetitive sequence and led to attempts at stabilizing short triplehelical peptides by incorporating non-canonical amino acids, such as 4-fluoroproline [42] or N-isobutylglycine [43]. Furthermore, these experiments identified the reason for the comparably slow folding of nucleated triplehelices to be based upon the slow cis/trans-isomerization rates of residues in the X-and Y-positions [44][45][46].
Using these peptides, researchers also noted that the formation of gelatin-like networks could be largely avoided by incorporating amino-or carboxyterminal nucleation sites that would promote helix formation within the cross-linked trimer at a much higher rate than between non-nucleated single-chain constructs [44,46]. Possible nucleation sites can be TRIS-scaffolds [47,48], 1,2,3-propan-tricarboxylic-acid scaffolds [49], collagen III derived cystine knots [50] and bacterial or bacteriophage-derived protein domains [20,51].
All of these approaches use peptide synthesis, which is typically limited to a maximal peptide length of 30-50 amino acids. Since most collagen sequences are strictly water soluble but tend to form gels at higher molecular weights, peptide synthesis suffers from problems with limited solubility and aggregation as the molecular weight (MW) increases. Native chemical ligation has been used to increase the chain length after synthesis [52], which increased the thermal stability of the product but reduced the MW homogeneity of the resulting proteins.
Biotechnological collagen production is non-trivial and suffers from the complex requirements of collagen synthesis (Figure 2a). In vivo, human type I collagen is assembled within fibroblasts from two pro-α1(I) and one pro-α2(I) protocollagen chains, both around 1400 amino acids long, to form a procollagen triplehelix. Both the α-collagens as well as the procollagen get enzymatically modified by prolyl-3-, prolyl-4-and lysyl-oxidase, after which the N-and C-terminal telopeptides are cleaved by specific proteases to yield the 300 nm long tropocollagen triplehelix. These tropocollagen trimers can then assemble into collagen fibrils during secretion and further form macroscopic collagen fibers with varying degrees of chemical crosslinking [53,54]. The most common bacterial production host, E. coli, is unable to produce full-length collagens due to size constraints, but has successfully been utilized in the production of "collagen-like proteins" [55], which are truncated proteins derived from human collagen. Nevertheless, E. coli has no intrinsic apparatus to catalyze the aforementioned secondary modifications, which therefore result in unstable products, mostly due to the lack of 4-hydroxyproline. A newly discovered bacterial P4H found in Bacillus anthracis [56,57], might be able to overcome this limitation in future experiments. In addition, recent advances have been made in coexpressing eukaryotic proline-4-hydroxylase in origami-type E. coli [58], which provides conditions similar to that of the endoplasmic reticulum (ER) within its cytosol and therefore allow the activity of the transgenic P4H-complex, as well as the yeast P. pastoris [59,60] which, being an eukaryote, has the necessary organelles for early-stage collagen assembly and the ability to produce high MW proteins, although only with poor yields and varying degrees of hydroxylation [61,62].
Interestingly, bacterial collagen-like proteins have been identified [21], which function as a virulence factor in Streptococcus pyogenes: The Scl1 and Scl2 proteins are secreted and carboxyterminally anchored within the cell wall of the bacterium, where a terminal domain induces trimerization and, thereby, nucleation resulting in the formation of stable triple helical collagen rods (Figure 2b). The proteins then attach to collagen-binding proteins in the extracellular matrix and shield the bacterium from the immune system of e.g. vertebrates. On a structural basis, these proteins are remarkable, because they contain no non-canonical amino acids, and instead of utilizing hydroxyproline, stabilize the collagen helix by forming salt bridges via charged amino acids between the α-chains [22]. In addition, the trimerizing V-domain has been shown to be an effective collagen nucleation inducer, even for non-streptococcal collagens and collagen-mimetic peptides [20].
These bacterial collagen-like proteins can be recombinantly produced in high yields in common bacteria such as E. coli. When the immunogenic V-domain is proteolytically cleaved and the remaining collagen cross-linked to increase the thermal stability, they have been shown to be non-immunogenic when implanted into mice for up to 6 weeks [63].
However, it is unclear whether long-term presence and degradation of bacterial collagen-like proteins exposes the host to short, immunogenic peptides, since several examples are known that show different immunogenity between triplehelical and denatured collagen [64][65][66][67].
Nevertheless, since these proteins circumvent all the problems associated with the production of eukaryotic collagens in bacterial hosts mentioned earlier, they are strong candidates for future materials based thereon; however, their low melting point of 36-38 • C requires the presence of a non-collagenous folding domain which has been shown to be strongly immunogenic [63] and necessitates other means of stabilization, such as glutaraldehyde crosslinking, once this domain has been removed. Increasing the stability of the Scl2 protein by multimerization does not significantly increase its melting point [68]. Therefore, while the low temperature stability of the Scl proteins is of no consequence for the production of three-dimensional scaffolds [69][70][71], it hinders the spinning of stable collagen fibers.
The three collagen-like proteins found in sawfly silk (SfC A-C) [2] have been recombinantly produced in a similarly successful fashion, but so far they have not been investigated as potential future biomaterials.
Engineered collagens (eCols) are closely related to collagen mimetic peptides but are produced biotechnologically in E. coli. A model collagen consisting of (GPP) 50 produced at high yields, was found to be sufficiently stable to be used in common applications even without containing hydroxyproline, and, when an aminoterminal nucleation site was introduced, demonstrated the ability to form collagen triple helices rather than gelatin networks [72]. The mentioned eCol has been successfully used as a matrix for the fixation of collagen-mimetic peptides without the use of crosslinking agents, and could be used as a substratum in cell culture, showing its applicability for tissue engineering.
In addition, fully synthetic genes also allow the direct incorporation of integrin-specific cell adhesion sites, such as the RGD motif and more collagen-typical adhesion sites such as GFOGER-like sequences [73]. When processed into more tightly packed structures such as fibrils and fibers, direct integrin mediated cell adhesion has been shown to be impaired [74], and thus will possibly require more complex solutions to mediate the adhesion between eCol-fibrils and the cells within the ECM. One recently described group of proteins, the so-called COLIBRIs (COLlagen INtegrin BRIdging proteins) [75,76], which form a bridge between natural fibrillar collagen and integrin including the well-known proteins fibronectin and von Willebrand factor, might be able to fulfill this role. Thus, despite the promising results, the development of eCols is still in its early stages.
Synthetic collagen-based block copolymers are artificially designed constructs and combine some of the approaches mentioned so far. When short collagen-like (GPP) n -peptides are positioned around a hydrophilic unstructured core domain and expressed in P. pastoris, a stable gelatin-like hydrogel can be produced with high yields [77,78]. Since these structures, like engineered collagens, are based on synthetic genes, the addition of short peptide sequences such as cell adhesion motifs or calcification sites is trivial. When the same authors replaced the randomized central sequence with a B. mori-inspired silk sequence and the (GPP) n -motif with a hydrophilic (GXY) n -polymer containing a large amount of charged amino acids in the X and Y position, they obtained a hybrid material that formed micelles and fibrils depending on pH, thereby confirming that rational design on the basis of silk and collagen is possible [79].
However, the most successful example for a collagen-based block copolymer is provided by nature: The class of collagens, so-called preCols, found in the byssus produced by marine mussels as a hold-fast structure.

Mussel Byssus-Silk, Collagen or Both?
Among the byssus-producing mussels, the blue mussel, Mytilus edulis and the closely related mediterranean mussel Mytilus galloprovincialis are the best characterized. These species produce a bundle of threads which connects the soft mussel to the substrate, and the threads show a mechanical gradient with increasing stiffness from the proximal to the distal portion of the fiber. This allows the byssus to dissipate high amounts of mechanical energy without causing radial stress between the parts with highly different elastic moduli, thereby withstanding the mechanical challenges caused by the tidal currents without damaging the soft mollusk's organs and tissues.
The byssus thread itself is rather complex and contains a large set of matrix proteins, called mussel foot proteins (mfp's), which, in addition to contributing to the chemical and physical properties of the fibrous portion of the thread, form an adhesive plaque that is just as remarkable as the rest of the byssus, since it provides strong and durable underwater adhesion to a variety of substrates with different chemical compositions. These somewhat amorphous matrix proteins interact with the main load bearing structure of the byssus, which are called preCols because of their collagen-like structure.
The three known preCols of M. edulis are block-copolymers which consist of a central collagen-like core domain that is surrounded by flanking domains [80].
Even though the preCols differ strongly from vertebrate collagens in that they are homotrimers and do not undergo the same kind of propeptide processing, their central collagen-like core domain has been described to be most closely related to that of fibrillar collagens (type I-III) [81]. It has to be noted, however, that most of the mechanical properties of the preCols are attributed to the flanking domains, which make up a significant portion of every preCol [14,81,82].
Depending on the type of preCol, these flanking domains contain motifs that are similar to other structural proteins: preColD contains silk-fibroin-like flanks, the flanks of preColP shows similarity to elastin and preColNG contains plant cell wall-like sequences, which include features of alanine-rich β-sheets as well as glycine-rich helices (Figure 3a,b). While the macroscopic byssus threads of M. edulis and M. galloprovincialis differ slightly in size and in their mechanical properties, the preCols are closely related on a molecular level and only differ in short inserts/deletions and point mutations [83].
In addition to the flanking domains, the preCols all contain terminal domains which are rich in histidine and 3,4-dihydroxyphenylalanine (DOPA, post-translationally oxidized tyrosine) which form sacrificial metal ion mediated complexes that can absorb mechanical stress and reassemble over time without taking damage [84]. In addition, they undergo a slow quinone-based tanning, thereby forming irreversible crosslinks with other proteins in the byssus and providing a redox-dependent mechanism further stabilizing the byssus [85,86].
One remarkable observation is the fact that the previously mentioned mechanical gradient corresponds to the abundance of preColP and preColD within the respective section of the byssus thread. The proximal section of the byssus, which connects to the stem within the mussel and has a Young's modulus of around 50-80 MPa, is rich in preColP; while the distal portion with a stiffness of E i = 500-600 MPa contains high amounts of preColD [83]. The third mussel byssus collagen, preColNG, has a constant concentration throughout the thread (Figure 4).
With the preCols making up between 70% and 90% of the protein fraction of the byssus thread, it stands to reason that these flanking domains have a high influence on the mechanical properties of the fiber, and for biotechnologically produced preColD it could be shown that β-sheet crystals could be induced within the protein by ethanol treatment in a similar fashion as with materials made of silk fibroin or biotechnologically produced spider silk proteins [87].
Investigation of the biotechnologically produced cwCT-domain derived from the carboxyterminal flank of preColNG showed that this protein undergoes reversible structural transitions between random-coil and β-hairpin structures when in contact with lipid vesicles, suggesting that the flanking domains of this preCol can be triggered by external stimuli during byssus formation and thereby greatly influence the overall properties of the thread in a switchable fashion [88]. As other authors have noted [89], this suggests that the distal portion of the mussel byssus is likely a β-sheet crystalline particle reinforced polymer matrix, providing a nearly 10-fold increase in stiffness and a threefold increase in breaking stress and toughness compared to that of the proximal portion of the fiber, which can be viewed as a fiber reinforced polymer with dedicated elastic domains. . The proximal part of the mussel byssus is elastic and contains fibrils of preColP embedded in a matrix of PTMP1 (proximal thread matrix protein 1). This matrix protein allows the fibrils to slide freely and act as within a fiber-reinforced composite material. The distal portion of the mussel byssus has a much higher stiffness, in addition to the ability to dissipate large amounts of mechanical energy without taking irreversible damage. The content of matrix protein in the distal thread is much lower and the collagen domains, instead of reinforcing an amorphous matrix, are tightly packed around silk-like crystalline β-sheets. Modified with permission from [89] (Elsevier 2014).

Hierarchical Assembly and Structure of preCols in the Byssus
During biosynthesis, the preCol monomers form homotrimers of their unfolded α-chains before being transported into storage vesicles. During this stage, the rod-like trimers further assemble into hexagonal higher-order 7 + 1 structures, in which they are stored until the byssus neogenesis triggers secretion ( Figure 3) [90,91]. The morphology of these pre-secreted structures suggest that the collagen domain has a large influence on the early stages of assembly, whereas the other functional groups, such as the flanking domains and the His/DOPA-rich termini are kept inactive until they get molded into the final byssus thread within the mussel foot groove. This activation is most likely a result of the preCols coming into contact with a different chemical environment (oxidizing, high pH), the sudden presence of matrix proteins which have been shown to interact with preCols [92], and the influence of mechanical stimuli introduced by contractions within the mussel foot.
As mentioned above, the proximal section of the mussel byssus differs from classical silk in that it comprises a dedicated mix of matrix proteins which embed the load-bearing, fibrous preCols. In this portion of the byssus, matrix proteins, in particular the proximal thread matrix protein (PTMP)1 with two von Willebrand factor-like domains (which makes up about 30% of the proximal byssus), strongly interact with the preCol-assemblates and is thought to have a great influence on the higher order assembly of preColP and NG [93]. The current understanding is that PTMP1 builds a soft matrix with the ability to bind and lubricate the rod-like preCol during deformation, thereby forming a classical fiber-reinforced polymer with the ability to mitigate shear-induced damage to the comparably stiff collagen fibrils, while still allowing enough movement for the flanking domains to exercise their elastic behavior (Figure 4).
Given that the distal portion of the fiber contains more than 90% (w/w) preCols, however, many of the physical attributes of the distal byssus must stem from the intrinsic ability of preColD to form stable assemblies without requiring the interaction with other mussel foot proteins. Most of its properties can therefore be explained by the hierarchical arrangement of preCols, which is a result of the block-like character of the protein. Instead of being a fiber reinforced matrix, the distal byssus thread is more like a particle reinforced block-copolymer, with the β-sheet crystals formed by the preColD-flanks embedded in a densely packed collagen matrix (Figure 3).
This would explain the threefold higher prevalence of defects in the collagen sequence ("nicks") of preColD compared to preColP: Instead of being the ultimate load-bearing structure within a soft matrix (proximal byssus thread), the distal collagen domain of preColD is the flexible component in a particle reinforced system during low-stress situations and therefore absorbs small amounts of force before stronger mechanical loads result in the gradual unfolding of sacrificial bonds within the His/DOPA-termini and, subsequently, the β-crystalline flanks [94].

Chemical Modifications of Byssus Proteins
The mussel byssus collagens are based on the canonical amino acids but, after being secreted into the ER, undergo two kinds of chemical modification that have a strong influence on the stability of the resulting material.
Like most kinds of collagen, the repetitive sequence (GXY) often contains 4-hydroxyproline at the Y position because of the stabilizing effect of the resulting hydrogen bond with adjacent collagen chains. The chemical reaction is catalyzed within the ER by a mussel prolyl-4-hydroxylase (P4H) [95], which, due to the low solubility of the α-subunit, has not been well characterized [96]. Like in other collagens, the absence of hydroxyproline manifests itself in a lower stability of the collagen helix and thereby a lower melting point, a less compact collagen assembly and a higher tendency to unfold under destabilizing conditions.
The second preCol-modification is the oxidation of tyrosine by a tyrosinase, an ER-resident oxidase that produces DOPA from tyrosine within the His/DOPA-rich termini of the protein. When oxidized, these residues, in addition to the metal-chelation and tanning mentioned earlier, are also responsible for many of the covalent connections to other mussel foot proteins via crosslinks based on lysine, cysteine and histidine residues [80].
Although not yet observed for the preCols, many of the adhesive properties of the adhesive plaque can be attributed to the presence of DOPA within mfp-3 [97], which gets modified by the same enzyme.
These secondary protein modifications make it hard to investigate byssus proteins, mostly because of the inability to extract these highly cross-linked proteins from the thread and the difficulty to biotechnologically produce preCols and preCol-analoga with high levels of proline hydroxylation.

preCol-Based Biomaterials
Extraction of preCols without partial hydrolysis is not possible from the byssus because of the high degree of crosslinking, while the extraction of soluble proteins from the mussel foot is not feasible due to low yields [24]. While efforts have been made to use reconstituted mussel byssus as functional biocompatible matrices that retain some of the properties of the natural threads, mainly the ability to modify their mechanical behavior upon metal-ion binding, the overall mechanical performance is not comparable to that of pristine byssus threads [98].
The length of >1000 amino acids as well as the collagen-typical posttranslational processing challenges the biotechnological production in the same fashion that other full-length collagens do. Furthermore, it was shown that the expression of the isolated flanking domains [88] of most preCols is only possible in E. coli when they are combined with stabilizing tags, such as SUMO (small ubiquitin-like modifier) [99].
Despite these challenges, it was recently possible to express and purify non-hydroxylated preColD in the yeast P. pastoris, which showed the ability to form fibrils and which could be post-treated to selectively convert the unstructured flanking domains into β-sheets [87]. While it is unlikely that biotechnologically produced, full-length mussel byssus collagens will directly be used as biomaterials for tissue engineering due to concerns regarding possible immunogenic properties, their block-like assembly could add a variety of functional domains to the toolkit of synthetic biology towards the design of tailored collagen-like materials for biomedical applications. Furthermore, it might be possible to rationally design block-copolymers which resemble the overall structure of preColD, but consist of domains that have independently been shown to be fully biocompatible.

Conclusions
The major drawback in creating collagen-based fibers from soluble precursors is the limited availability of extractable source material: While an abundance of cheap, collagen-rich animal tissues is available that can readily dismiss the low yields of some methods such as acetic acid extraction, the scarcity of human tissue for collagen production often justifies the use of pepsinization to improve extraction yield but simultaneously reduce product homogeneity. Even though several biomaterials can be manufactured from partially degraded tropocollagen or even gelatin-like polypeptides, the production of high-performance materials, such as fibers spun using a microfluidic technology showing tendon-like mechanical properties, has only been possible with highly homogeneous, soluble collagen containing intact, non-cross-linked telopeptides.
The biotechnological production of natural collagens is currently limited by the large size of the proteins and the high level of posttranslational modification (e.g., cleavage of prepeptides, proline and lysine hydroxylation, prearrangement into nanofibrils) that often cannot be emulated by most production hosts, while collagen-mimetic peptides produced by solid phase synthesis are even more constrained regarding the size of the resulting product.
Future collagen-like proteins used in biomedical applications will most likely overcome these problems by incorporating a variety of strategies to simplify the requirement of post-translational modification, or omit them completely, such as nature does in bacterial and insect silk collagen. Other possible strategies might be to modify the collagen protein sequence to not require non-canonical amino acids and still be structurally stable, as well as utilizing stabilizing domains inspired by a variety of hosts, such as flanking domains derived from the mussel byssus preCols, which will likely be incorporated in a block-copolymer-like fashion and provide specifically engineered functionalities.
In conclusion, examples are presented for the successful use of synthetic biology and ab initio design, combined with biotechnological protein production, as well as the applicability of processing methods inspired by the processing of natural and artificial silks, to create novel types of high-performance, biocompatible, collagen-based materials.