Reversed Proteolysis—Proteases as Peptide Ligases

: Historically, ligase activity by proteases was theoretically derived due to their catalyst nature, and it was experimentally observed as early as around 1900. Initially, the digestive proteases, such as pepsin, chymotrypsin, and trypsin were employed to perform in vitro syntheses of small peptides. Protease-catalyzed ligation is more efﬁcient than peptide bond hydrolysis in organic solvents, representing control of the thermodynamic equilibrium. Peptide esters readily form acyl intermediates with serine and cysteine proteases, followed by peptide bond synthesis at the N-terminus of another residue. This type of reaction is under kinetic control, favoring aminolysis over hydrolysis. Although only a few natural peptide ligases are known, such as ubiquitin ligases, sortases, and legumains, the principle of proteases as general catalysts could be adapted to engineer some proteases accordingly. In particular, the serine proteases subtilisin and trypsin were converted to efﬁcient ligases, which are known as subtiligase and trypsiligase. Together with sortases and legumains, they turned out to be very useful in linking peptides and proteins with a great variety of molecules, including biomarkers, sugars or building blocks with non-natural amino acids. Thus, these engineered enzymes are a promising branch for academic research and for pharmaceutical progress.


Introduction
A large number of data on proteases cleaving all sorts of peptide bonds is available in the scientific publications and textbooks. About 100 years ago, the prototypic mammalian food digesting enzymes were already investigated, such as pepsin, an aspartic protease, as well as the serine proteases trypsin, chymotrypsin, and elastases, followed by carboxypeptidases, which belong to the metalloproteinases. Later on, it turned out that blood coagulation including fibrinolysis and the innate immune response of the complement system depend on trypsin-like serine proteases. In the following decades, more proteases were discovered in all living organisms and of new types, e.g., the eponymous cysteine protease papain from the plant Carica papaya. Although their major task is always the cleavage of one or more peptide bonds, their physiological roles range from the unspecific digestion of proteins in food to different protein substrates with varying specificity. Many diverse substrates are found in different tissues under healthy conditions or even in cancer cells, whereas some proteases exhibit exclusive specificity for a single cut in only one protein, such as in several viral proteases. All these proteases or peptidases are catalysts, which reduce the required activation energy of a given reaction, following in case of proteolysis always the thermodynamically favored direction of cleavage. However, as pure catalysts, they all are capable of catalyzing the reverse or backward reaction. This concept was already suggested in the year 1898 by van 't Hoff for trypsin as a potential catalyst in protein synthesis, linking its own cleavage products again, while the experimental phenomenon was reported by Sawjalow in 1901 as plastein formation [1,2]. Later studies confirmed that pepsin, papain, or chymotrypsin synthesized from protein hydrolysates new polypeptides with an average of 40 amino acids, the insoluble plastein [3].
Since all living cells possess a highly efficient protein synthesis machinery with the ribosomes, there is usually little need for enzymes as catalysts in peptide ligation. However, or Asp at acidic pH as P1-residue of the scissile bond [25]. In particular, plant legumains have gained additional interest in recent years, as some members are distinct ligases, which are important in the synthesis of cyclic plant peptides [26]. Another special field in which peptide ligation is thought to play a role is the interaction of a variety of proteases with inhibitors that are first cleaved and then religated in the active site. Eventually, common proteases and optimized, and engineered variants such as trypsiligase and subtiligase can be used as peptide synthetases under nearly physiological conditions or in organic solvents, which increases the tendency for peptide ligation ( Figure 1B) [19].
. Cyclotides are 28 to 37 amino acids (aa) long. Ubiquitinylation and sumoylation require the specialized ligases E1, E2, and E3, being various enzymes in both pathways. E1 utilizes ATP hydrolysis for the formation of thioesters with the C-terminus of ubiquitin (U) or SUMO (S) and transfers them to E2 as thioesters. Eventually, up to 1000 specialized E3 ligases, such as the numerous really interesting new gene (RING) finger domain ligases, form isopeptide bonds between U/S and Lys side-chains of target proteins. Whereas polyubquitinylated proteins are degraded by proteasomes, several sentrin-specific proteases (SENPs) cleave the SUMO tags in regulatory processes. (B) Synthetic approaches using various proteases and their engineered variants.
Legumains are lysosomal caspase-related cysteine proteases, which are widespread among eukaryotes, with a marked specificity for Asn in a wider range around neutral pH or Asp at acidic pH as P1-residue of the scissile bond [25]. In particular, plant legumains have gained additional interest in recent years, as some members are distinct ligases, which are important in the synthesis of cyclic plant peptides [26]. Another special field in which peptide ligation is thought to play a role is the interaction of a variety of proteases with inhibitors that are first cleaved and then religated in the active site. Eventually, common proteases and optimized, and engineered variants such as trypsiligase and subtiligase can be used as peptide synthetases under nearly physiological conditions or in organic solvents, which increases the tendency for peptide ligation ( Figure 1B) [19].

Historical Outline of Developments in the Last Century
The very first experiments demonstrating that proteases can be used for the synthesis of peptide bonds were published by Bergmann and Fraenkel-Conrat [27,28]. A comprehensive overview of the field until the 1980s is given by Fruton [29]. To the early pioneers, it was obvious that thermodynamics disfavored peptide bond formation of charged N-and C-termini based on the zwitterionic nature of the amino acids themselves. However, as oligopeptides are often less soluble in aqueous buffers compared with the charged or more polar precursors, their removal from the solution, e.g., by precipitation, can shift the equilibrium toward the polypeptide product. Kinetic parameters of the peptidolytic hydrolysis reaction such as k cat and K M are useful indicators for the efficiency of the reverse reaction, the peptide ligation. In particular, K M can be used to assess at least partially the efficiency of the synthesis, whereas calculation of the ligation k cat is more complicated, due to the linking of two substrates. As catalysts, proteases accelerate the forward and backward reaction and the eventual reaching of the equilibrium. Since the chemical equilibrium constant for dipeptide bond formation of Xaa-Gly ranges from about 0.001 to 0.005 L/mol and the average free energy ∆G ≈ 5.0 kJ/mol, the hydrolysis reaction is favored in aqueous solution [3,30]. Bordusa gives a good overview of the two basic approaches in peptide synthesis catalyzed by proteases, namely thermodynamic or equilibrium control and kinetic control; see Figure 2 [31]. In addition, N-protected amino acids and peptides, esters of the C-terminal moiety and various other precursors of amino acids were used as substrates of protease-peptide ligases. As a general rule of thumb, it turned out that charged residues are better ligated in organic solvents and esters and N-terminally protected residues are suitable for aqueous solutions, when trypsin is used as catalyst. The esters of the carboxylate moiety form proper acyl intermediates as in the hydrolysis reaction. Similar observations were made for chymotrypsin in systematic analyses, which confirmed that an acetylated N-terminus is favorable for the ester component, such as Ac-NH 2 -Phe-CO-OEt. However, it was noticed that serine proteases, such as bovine ß-trypsin, α-chymotrypsin, and porcine plasmin not only cleave the reactive bond Lys15-Ala16 of the bovine pancreatic trypsin inhibitor (BPTI) but also resynthesize it until the thermodynamic equilibrium is reached [32]. Thermodynamic and kinetic control of peptide bond synthesis with protease catalysts. (A) Thermodynamic or equilibrium control dominates in catalysis by natural ligases or in organic media with low water content that shifts the chemical equilibrium. (B) Kinetic control is mostly achieved by synthetic precursors of peptides or proteins, containing a derivative of the C-terminal carboxylate with a good leaving group, such as carboxy esters of the residue with methanol (MeOH) and various other alcohols, as well as corresponding thioesters and thiols as leaving moieties. The formation of an acyl intermediate with serine or cysteine proteases is required [33]. Among serine proteases, the bacterial subtilisin BPN was quite successful in various ligation reactions. Eventually, an engineered version, the so-called subtiligase, was widely used as peptide ligating enzyme. Cysteine proteases such as carboxypeptidase Y from yeast or papain from the papaya plant showed similar results according to their specificity. Essentially, metalloproteinases, e.g., thermolysin (see below) and aspartic proteases, such as pepsin, rennin, or cathepsin D, synthesized peptide bonds in a similar manner as the above-mentioned cysteine and serine proteases, according to their own specificity, but with different kinetics, which depend on their underlying basic reaction mechanisms [34]. Then, in the late 1970s, progress was made, with the first spectacular example of porcine derived desalanine-B30-insulin, which was generated by cleavage with Zn 2+ -dependent carboxypeptidase A and coupled to Thr-OBut by trypsin catalysis to generate human insulin, whereby Thr-OBut was present in 50-fold excess in a dimethylformamide (DMF)/ethanol mixture [35]. Subsequently, this trypsin-catalyzed transpeptidation was commercially exploited to produce humanized insulin from the porcine variant by Novo A/S [36]. A similar approach succeeded in the papain and chymotrypsin catalyzed synthesis of opioid pentapeptides with N-tBOC amino acids as precursors or corresponding esters and phenylhydrazides as nucleophiles [37]. Another outstanding example is the industrial production of the food sweetener aspartame by ligation of amino acids with thermolysin, shifting the equilibrium by high molar excess of reactants Z-Asp-COOH and H 2 N-Phe-OMe and precipitation of the insoluble product [38].
The basic requirements and limitations of protease catalyzed ligation were well known in the 1990s, as aqueous solutions in the pH range of 5 to 9 favor aminolysis, which was mostly due to thermodynamic barrier set by the charged terminal carboxylate and ammonium groups, resulting in ∆G ≈ 2.1 kJ/mol derived from K synthesis of a dipeptide with one N/C-protecting group on the amino acids [39]. The first calorimetric analyses in the 1950s had already found ∆G ≈ 1.8 kJ/mol for a corresponding chymotrypsin catalyzed peptide bond formation from Bz-Tyr-COOH and H 2 N-Gly-CONH 2 [40]. However, the thermodynamic equilibrium can be largely shifted to peptide synthesis by using organic solvents, which reduce the ionization, as was demonstrated for a similar reaction to Z-Trp-Gly-CONH 2 in mixtures of water and 85% butanediol or other organic solvents, such as dimethyl sulfoxide (DMSO) [41]. Moreover, some organic solvents reduce the activity of proteolytic enzymes to a great extent, while others in particular polyalcohols such as glycerol stabilize them. Thermodynamic control of the equilibrium in aqueous solution may utilize the excess of one reactant or removal of the product, as demonstrated by the precipitation of the aspartame salt of Z-L-Asp-L-Phe-COOMe with D-Phe-COOMe, which forms by catalysis with thermolysin [42]. The special case of intramolecular peptide bond synthesis in the soybean trypsin inhibitor (SBTI) was already described in 1969, showing tryptic cleavage of the Arg-Ile64 bond, the subsequent cleavage of Arg63, and the ligation of a new Lys63 by carboxypeptidase B catalysis and peptide bond synthesis with trypsin to a SBTI containing Lys-Arg64 [43]. In general, biphasic aqueous-organic media are a way to circumvent the problem of inactivation of enzymes; e.g., thermolysin or pepsin are present in the water phase, whereby the ligation products accumulate in the organic phase [44].
Another route to peptide synthesis follows kinetic control in aqueous solution, involving peptidic carboxylate ester precursors with leaving groups X, resulting in acyl intermediates with serine or cysteine proteases, followed by aminolysis at higher basic pH, which ensures an excess of nucleophilic α-NH 2 groups [39,45]. These reactions of the type R1-COX + H 2 N-R2 → R1-CO-NH-R2 + X are not so much influenced by the thermodynamic equilibrium and its K synthesis constant but rather by the velocity of the competing aminolytic and hydrolytic reaction (i.e., the ratio of vA/vH), which are both catalyzed by the protease. As the reaction starts with a burst, the maximum yield can be obtained by inactivation of the protease at acidic pH, while an excess of acyl donor precursor seems favorable [46]. Nevertheless, examples of mixed thermodynamic and kinetic control were reported, such as the synthesis of the tripeptide Z-Ala-Phe-Leu-NH 2 by chymotrypsin, which was present in aqueous reverse micelles as activity enhancing microreactors [47].
Ideally, the protease specificity ensures little side products without the need for protecting groups and, in particular, the proper stereochemistry of natural proteins, consisting exclusively of L-amino acids, in contrast to organic synthesis with the ever present risk of racemization and increasingly lower yields with each additional peptide residue [48]. Basically, any natural protease has the potential to synthesize peptide bonds according to its specificity for the amino acid residues around the scissile or to be ligated bond. An overview of the most frequently used proteases as ligases is displayed with respective catalytic and specificity determining components in Figure 3. However, some proteases turned out to be more suitable under kinetic control, in particular serine and cysteine proteases that form acyl intermediates. Interestingly, the S1 specificity plays a crucial role in enhancing the nucleophilicity of carboxyamide precursors such as H 2 N-Arg-CO-NH 2 with acyl intermediates formed by Mal-Phe-OMe esters with α-chymotrypsin, while the known S3 specificity for Arg side chains might further exploited [49,50]. Experiments in the late 1980s and early 1990s focused on the synthesis of physiologically relevant peptides and small hormones, e.g., the cysteine protease papain and α-chymotrypsin and were used to generate the δ-sleep inducing peptide from three tripeptides [48,51]. Further examples are the ligation of two segments to gonadotropin-releasing hormone by chymotrypsin and human growth-hormone releasing factor (GRF) by trypsin, respectively [52,53]. In addition, enzymatic modifications of polypeptides and proteins, which were obtained from recombinant expression, have been performed. Thus, it was possible to convert GRF(1-29)-COOH into the more active GRF(1-29)-CO-NH2 with NH4OAc/NH3 in a 90% 1,4-butanediol solution by trypsin [54]. In addition, the zinc metalloprotease thermolysin from Bacillus thermoproteolyticus and the mammalian serine protease elastase were successful catalysts in coupling amino acid amides, such as Ty-CO-NH2 to human neuropeptide Y (1-35) [55]. Similar modifying syntheses were performed with the serine proteases carboxypeptidase Y from yeast, V8/endoproteinase Glu-C (Staphylococcus aureus), and the Glu/Asp-specific endoprotease (GSE, Bacillus licheniformes) [56][57][58]. The previ- In addition, enzymatic modifications of polypeptides and proteins, which were obtained from recombinant expression, have been performed. Thus, it was possible to convert GRF(1-29)-COOH into the more active GRF(1-29)-CO-NH 2 with NH 4 OAc/NH 3 in a 90% 1,4-butanediol solution by trypsin [54]. In addition, the zinc metalloprotease thermolysin from Bacillus thermoproteolyticus and the mammalian serine protease elastase were successful catalysts in coupling amino acid amides, such as Ty-CO-NH 2 to human neuropeptide Y (1-35) [55]. Similar modifying syntheses were performed with the serine proteases carboxypeptidase Y from yeast, V8/endoproteinase Glu-C (Staphylococcus aureus), and the Glu/Asp-specific endoprotease (GSE, Bacillus licheniformes) [56][57][58]. The previously mentioned enzymes were utilized in the more demanding semisyntheses of larger proteins. An outstanding example is the semisynthesis of the Hemoglobin A apo form, which was linked by V8 protease catalysis at positions 30-31 in 30% iso-propanol, resulting in the 141 residue long full length chain and subsequent reconstitution with a heme group [59]. Tryptic cleavage at Lys15-Ala16 and removal of the dipeptide Ala16-Arg17 by aminopeptidase K allowed incorporating the dipeptides Gly-Arg, Ala-Lys, and Leu-Arg in functional BPTI variants [60].
Further steps toward true ligases were the chemical conversion of the serine protease subtilisin from Bacillus subtilis thiolsubtilisin and selenolsubtilisin, in which the catalytic Ser221 was replaced by cysteine an selenocysteine, respectively ( Figure 4A) [61,62]. As the Cys221 variant displayed a strong enhancement of the ligase character in 1:1 mixtures of water and dimethylformamide (DMF), the Sec221 variant appeared to be an even better acyl transferase, increasing aminolysis with respect to Ser and Cys subtilisin by factors of 1000 and 10, respectively [63]. However, the catalytic Sec221 is prone to oxidation and forms the seleninic acid R-SeO 2 H, as observed in the crystal structure, which most likely impedes applications of selenosubtilisin as ligase [64]. Recombinant engineering of subtilisin led to the thiolsubtilisin variant Pro225Ala, with a 450-fold increase of the ratio aminolysis to hydrolysis, while additional mutations increased the stability in DMF and aqueous solution 50-to 100-fold [65,66]. Further studies of modified subtilisin variants resulted in the so-called subtiligases, which are discussed in the following Section 3.1.
In particular, the group around Jakubke demonstrated that low temperatures in aqueous solution, with an optimum below −10 • C in the frozen state, favored the formation of peptide bonds by α-chymotrypsin and β-trypsin in kinetically controlled reactions [67,68]. Similarly, porcine elastase synthesized peptide bonds according to its P1-specificity for Ser, Ile, and Val, while a bacterial Glu/Asp-specific endopeptidase synthesized Glu/Asp-Xaa bonds around −15 • C [69,70]. In addition, the group found significant ligase activity by the zymogens trypsinogen and chymotrypsinogen, e.g., with increased binding of nucleophiles in kinetically controlled reactions [71].

Developments in the 21st Century
Some routes employing trypsin and chymotrypsin as ligases were further followed, as in kinetically controlled reactions in the frozen state, resulting in the suppression of hydrolysis and increase of yields [68]. Established protease-ligases as subtilisin and V8 protease from Staphylococcus aureus, which is trypsin-like with P1-Glu specificity, were used to demonstrate that macromolecular crowding agents, such as polyethylene glycols (PEGs) and dextran enhanced the ligase reaction for the synthesis of triose-phosphate isomerase [72]. Similarly, subtilisin Carlsberg catalyzed the condensation of a 15-residue glycopeptide from a tripeptide as acyl acceptor and a 12-mer as acyl donor in a mixture of water and DMF (1:9) [73].

Subtiligase
Bacterial subtilisins were already structurally characterized in the early 1970s as subtilisin BPN from Bacillus amyloliquefaciens, which exhibits the α/β-hydrolase fold of serine proteases [74]. A good example demonstrating that subtilisin BPN can ligate peptide bonds is the resynthesis of the scissile bond in chymotrypsin inhibitor 2 ( Figure 4B) [75]. The significantly reduced hydrolytic activity of subtilisin BPN ; in organic solvent mixtures, such as 50% DMF, was explained by corresponding crystal structures, with His64 rotated by 180 • around the Cβ-Cγ bond, thereby disrupting the catalytic triad with Asp32 and Ser221 [76]. Engineered thermostable variants became increasingly important due to their capacity for protein degradation in laundry detergents [77]. As already mentioned, subtilisin variants gained interest as protein ligases from the mid-1990s onwards. Thus, the subtilisin BPN ; double mutant Ser221Cys/Pro225Ala allowed synthesizing full-length Ribonuclease A (124 residues) from six esterified peptides in aqueous solution, with a ligation efficiency of about 70% in each step and final milligram yields [78]. Early applications were the ligation of various proteins preferentially at their N-terminus with biotin or PEG-linked peptides, i.e., their esterified precursors [24]. Among the precursor peptides, glycolate phenylalanyl esters and benzyl thioesters could be linked with free amino termini of peptides by subtiligases [79]. Phage display resulted in additional mutations, such as Met124Leu/Ser125Ala, which increased the activity nearly 3-fold ( Figure 4C) [80]. a broad substrate specificity over hydrolysis, in particular at low pH around 4.5 [81]. Subtiligase facilitated N-terminal tagging for proteomics either combined with click chemistry-based derivatization for nanomolar concentration quantification of products or with high affinity binding of biotin/avidin for the analysis of human blood proteins [82,83]. Moreover, subtiligase catalyzed labeling contributed to the investigation of the so-called -aminome, whereby a database of proteolysis in healthy and apoptotic cells was established [84]. Mutants of subtiligase improved the efficiency in ligating N-terminally Cysfree peptides to protein thioesters, as well as a significant higher ligation rate for Glu at P1′ with the Tyr217Lys mutant ( Figure 4C) [85]. Recently, an extensive engineering and selection process yielded more than 70 subtiligase mutants with differential specificity around the scissile/ligated bond, allowing for the N-terminal extension of most P1-P1′ combinations of amino acid residues [86,87]. Known as peptiligase or omniligase-1, a subtilisin variant ensured the cyclization of θ-defensins or retrocyclins, antimicrobial, and fungicidal peptides with 18 residues, which are expressed in old world monkeys, but not in the great apes or in humans [88,89]. Thymoligase is a further engineered and structurally determined peptiligase, which can synthesize the therapeutic polypeptide thymosin-α1 (28 aa long) with yields > 90% [90]. It deviates from subtiligase with several mutations such as an Asn225, while Asn156 and Asp166 at S1 and Arg217 at S1′ shift the specificity toward basic P1 and acidic P1′ residues.
. It was pointed out that subtiligase favors thioester binding associated thiolysis with a broad substrate specificity over hydrolysis, in particular at low pH around 4.5 [81]. Subtiligase facilitated N-terminal tagging for proteomics either combined with click chemistrybased derivatization for nanomolar concentration quantification of products or with high affinity binding of biotin/avidin for the analysis of human blood proteins [82,83]. Moreover, subtiligase catalyzed labeling contributed to the investigation of the so-called α-aminome, whereby a database of proteolysis in healthy and apoptotic cells was established [84]. Mutants of subtiligase improved the efficiency in ligating N-terminally Cys-free peptides to protein thioesters, as well as a significant higher ligation rate for Glu at P1 with the Tyr217Lys mutant ( Figure 4C) [85]. Recently, an extensive engineering and selection process yielded more than 70 subtiligase mutants with differential specificity around the scissile/ligated bond, allowing for the N-terminal extension of most P1-P1 combinations of amino acid residues [86,87]. Known as peptiligase or omniligase-1, a subtilisin variant ensured the cyclization of θ-defensins or retrocyclins, antimicrobial, and fungicidal peptides with 18 residues, which are expressed in old world monkeys, but not in the great apes or in humans [88,89]. Thymoligase is a further engineered and structurally determined peptiligase, which can synthesize the therapeutic polypeptide thymosin-α1 (28 aa long) with yields > 90% [90]. It deviates from subtiligase with several mutations such as an Asn225, while Asn156 and Asp166 at S1 and Arg217 at S1 shift the specificity toward basic P1 and acidic P1 residues.

Sortases
Although several bacterial transpeptidases were known before, the characterization of sortase A from Staphylococcus aureus in 1999 was a hallmark for enzymes that can act as proteases and peptide ligases [91]. Sortase A cleaves polypeptides between threonine and the glycine of the Leu-Pro-Xaa-Thr-Gly (LPXTG) motif, i.e., the sorting signal, and transfers the acyl intermediates to the bacterial cell wall. Thus, the switch from protease to ligase activity is integrated in the overall transpeptidase function. The NMR structure revealed a β-barrel structure, apparently with a catalytic dyad, consisting of Cys184 and His120 and the activity stimulating a Ca 2+ -binding site ( Figure 5A) [92]. X-ray crystallography provided a corresponding ligand-free enzyme structure and a substrate complex ( Figure 5B,C) [93]. It turned out that nearly all Gram-positive bacteria possess one or more sortases of different classes: while sortase A (SrtA) uses LPXTG, SrtB has a different sorting motif, e.g., NPQTN in S. aureus, as SrtC and SrtD appear to be more specialized, such as targeting LPNTA ( Figure 5C) [94].

Legumains
Whereas the caspase-related legumains or asparaginyl endopeptidases (AEPs) of animals cleave various protein targets in lysosomes at acidic pH with a distinct specificity for P1-Asn and Asp residues, their plant counterparts diverged in proteases and enzymes that preferentially ligate peptide bonds (Figures 6A, B) [25,109]. The latter reaction is favored when cyclic polypeptides are formed, such as the prototypic sun flower trypsin inhibitor (SFTI-1), which comprises 14 residues and an internal disulfide bridge [110]. Similarly, several plant families produce cyclotides, comprising 28 to 37 residues and a so-called cysteine knot, which is formed by three internal disulfides [111]. Both cyclotides and cyclic SFTI molecules appear to be host defense molecules with antimicrobial and insecticidal properties, which gained interest as potential new pharmaceutical drugs [112]. The cyclization takes place after the cleavage of the C-terminal pro-peptide with a P1-Asn from the cyclotide precursor protein through a transpeptidation reaction, which is even catalyzed in alien foreign cyclotides by AEPs from Nicotiana benthamiana [113]. The transpeptidation reaction involves an acyl-transfer step from the acyl-AEP intermediate to the N-terminal residue of the cyclotide domain. In the context of plant legumains, occasionally, some publications use the term "butelase"; however, these enzymes are closely related, true AEP/ligases of the legumain family [114].
As already known for human legumain with an Asp near the active site converted to a succinimide in the backbone, a shift from acidic pH ≤ 5 to neutral pH / 7 suffices to favor the cyclization activity as well in plant AEPs [115,116]. For both human and plant legumains, the major factor of the switch from protease activity to ligase activity is neutral pH; Early studies demonstrated that SrtA could ligate polypeptides, e.g., enhanced green fluorescent protein (eGFP) was linked to dimers and higher oligomers, and it could attach PEGylated peptides to eGFP [95,96]. Moreover, peptide nucleic acids were ligated to other polypeptides and conjugate peptides to ethanol amine-linked glycosylphosphatidyl inositol, corresponding to natural GPI anchors [97,98]. Apart from the artificial sortase specific linker, various chimeric proteins were built from domains that kept their full functionality, such as GFP bound to immunoglobulin G (IgG) molecules [99]. It was demonstrated that GFP "sortagging" of various protein targets was possible in the cytosol of yeast and HEK293 cells [100]. The ligation reaction catalyzed by SrtA was exploited in various approaches, among them the generation of chimeric proteins that were linked to peptidic precursors for click reactions or of an immunotoxin consisting of a Fab fragment and gelonin, which was directed against αHer2 as tumor specific target, resulting in strong cell killing activity [101,102]. Using LPET or Gly-rich motifs, SrtA was capable of ligating cyclic SFTI-1 and cyclotide variants, with proper disulfide formation [103]. SrtA conjugated camelid nanobodies in high yields to labels for single-photon emission computed tomography (SPECT) with indium 111, positron emission tomography (PET) with gallium 68, and the fluorescent dye Cy5 for fluorescence reflectance imaging (FRI) [104].
Similarly, SrtA ligation and corresponding labeling facilitated NMR studies of the PSD-93 and-95 oligomers and their interactions with PDZ3/SH3-GK, which are important for the assembly of the mega-N-methyl-D-aspartate receptor synaptic signaling complex at glutamatergic synapses [105]. Furthermore, SrtA was part of an immobilization strategy for highly specific receptor proteins on sensorchips used in biolayer interferometers, as well as for immobilization on magnetic nanoparticles, employed in GFP label single molecule fluorescence spectroscopy [105,106]. Triple to hepta mutants of sortase showed enhanced performance for in vivo labeling and domain linking due to Ca 2+ independent activity [107,108].

Legumains
Whereas the caspase-related legumains or asparaginyl endopeptidases (AEPs) of animals cleave various protein targets in lysosomes at acidic pH with a distinct specificity for P1-Asn and Asp residues, their plant counterparts diverged in proteases and enzymes that preferentially ligate peptide bonds ( Figure 6A,B) [25,109]. The latter reaction is favored when cyclic polypeptides are formed, such as the prototypic sun flower trypsin inhibitor (SFTI-1), which comprises 14 residues and an internal disulfide bridge [110]. Similarly, several plant families produce cyclotides, comprising 28 to 37 residues and a so-called cysteine knot, which is formed by three internal disulfides [111]. Both cyclotides and cyclic SFTI molecules appear to be host defense molecules with antimicrobial and insecticidal properties, which gained interest as potential new pharmaceutical drugs [112]. The cyclization takes place after the cleavage of the C-terminal pro-peptide with a P1-Asn from the cyclotide precursor protein through a transpeptidation reaction, which is even catalyzed in alien foreign cyclotides by AEPs from Nicotiana benthamiana [113]. The transpeptidation reaction involves an acyl-transfer step from the acyl-AEP intermediate to the N-terminal residue of the cyclotide domain. In the context of plant legumains, occasionally, some publications use the term "butelase"; however, these enzymes are closely related, true AEP/ligases of the legumain family [114].
Catalysts 2021, 11, x FOR PEER REVIEW 12 of 19 was shown that the predominantly proteolytic -legumain from Arabidopsis can easily ligate linear precursors for SFTI variants, corroborating the general idea of proteases as catalyzing the forward and backward reaction [122]. However, a computational approach using quantum mechanics/molecular mechanics (QM/MM) calculations for human legumain indicated that two different backward reactions of the ligation are possible [123]. Figure 6. Legumain ligases. (A) Human legumain structure as surface representation. The catalytic Cys189 is labeled, as well as the specificity pockets from S4 to S2′ (PDB 4AW9). Residues that shape these pockets are colored as hydrophobic (green), basic (blue), and acidic (red). (B) Active site of plant -legumain from A. thaliana, shown without the so-called LSAM domain (PDB 5NI/). The catalytic triad of Cys189, His148, and Asn42 is depicted as ball-and-stick models, as well as residues that shape the S1 subsite (Glu187, Asp231, Arg44, while His45 was omitted) and the S1′ pocket (Glu190). (C) Plant-legumain displayed as an acyl intermediate model based on the chloromethyl ketone complex (PDB 5OBT). The P1-Asn is well accommodated by the charged residues in the S1 pocket. The catalytic water is depicted as a red sphere, whereby its hydrolytic reactivity depends on the presence residues such as a Val in the prime side of human legumain. In plant legumains, it can be replaced by a Gly, resulting in a larger hydrophobic region, bordered by Val182, Tyr190, and Tyr192, favoring peptide ligase activity.

Trypsin and Trypsiligase
The interaction of trypsin with the above-mentioned inhibitors, e.g. SFTI, demonstrates the potential of trypsin as ligase in one of best studied examples. An NMR study confirmed the very rigid structure of SFTI bound to the active of trypsin, exhibiting only marginal conformational deviations with respect to the crystal structure ( Figures 7A, B) [124]. It was also shown that trypsin resynthesizes the open P1-P1′ bond of the acyclic SFTI-1 [5,6] permutant, while it cleaves this scissile bond in cyclic SFTI-1, until in both The catalytic Cys189 is labeled, as well as the specificity pockets from S4 to S2 (PDB 4AW9). Residues that shape these pockets are colored as hydrophobic (green), basic (blue), and acidic (red). (B) Active site of plant γ-legumain from A. thaliana, shown without the so-called LSAM domain (PDB 5NIJ). The catalytic triad of Cys189, His148, and Asn42 is depicted as ball-and-stick models, as well as residues that shape the S1 subsite (Glu187, Asp231, Arg44, while His45 was omitted) and the S1 pocket (Glu190). (C) Plant γ-legumain displayed as an acyl intermediate model based on the chloromethyl ketone complex (PDB 5OBT). The P1-Asn is well accommodated by the charged residues in the S1 pocket. The catalytic water is depicted as a red sphere, whereby its hydrolytic reactivity depends on the presence residues such as a Val in the prime side of human legumain. In plant legumains, it can be replaced by a Gly, resulting in a larger hydrophobic region, bordered by Val182, Tyr190, and Tyr192, favoring peptide ligase activity.
As already known for human legumain with an Asp near the active site converted to a succinimide in the backbone, a shift from acidic pH ≤ 5 to neutral pH ≈ 7 suffices to favor the cyclization activity as well in plant AEPs [115,116]. For both human and plant legumains, the major factor of the switch from protease activity to ligase activity is neutral pH; however, this is predominantly dependent on the respective tissue expression. Based on structures of the proform and mature γ-legumain from Arabidopsis thaliana, functional assays and molecular dynamics suggested a hydrophobic residue in position 184 (Gly in human legumain) nearby the catalytic water as a major mechanistic determinant of the ligase, enhanced by an interplay with the P2 -residue of the substrate ( Figure 6C) [117,118]. A comparison of several plant AEPs identified a loop close to the active site as additional ligase enhancer, the so-called marker of ligase activity (MLA) [119]. Structural studies on AEPs from the Violaceae family identified the determinants of the ligase activity near the specificity subsite S1 as LAD1, including the central gatekeeper residue Gly(184) or Ile/Val/Cys/Ala in AEP-ligases, and near S2 and S1 as LAD2, which is mostly a Gly-Pro pair, respectively [120]. Although first approaches of engineered AEP ligases for special peptide targets by plant and E. coli expression are made, their potential seems limited to the special case of cyclic peptides with certain sequence requirements [121]. Recently, it was shown that the predominantly proteolytic β-legumain from Arabidopsis can easily ligate linear precursors for SFTI variants, corroborating the general idea of proteases as catalyzing the forward and backward reaction [122]. However, a computational approach using quantum mechanics/molecular mechanics (QM/MM) calculations for human legumain indicated that two different backward reactions of the ligation are possible [123].

Trypsin and Trypsiligase
The interaction of trypsin with the above-mentioned inhibitors, e.g., SFTI, demonstrates the potential of trypsin as ligase in one of best studied examples. An NMR study confirmed the very rigid structure of SFTI bound to the active of trypsin, exhibiting only marginal conformational deviations with respect to the crystal structure ( Figure 7A,B) [124]. It was also shown that trypsin resynthesizes the open P1-P1 bond of the acyclic SFTI-1 [5,6] permutant, while it cleaves this scissile bond in cyclic SFTI-1, until in both cases, the equilibrium is reached with a ratio 9:1 (cyclic/open) [125]. Interestingly, polymer supported trypsin and chymotrypsin successfully linked the linear precursors of various chemically synthesized cyclotides, which in turn inhibited the free proteases in the picomolar range [126].  The major specificity determinant is P1-Lys, while several prolines and an internal disulfide rigidify the cyclic inhibitor.
(C) Trypsiligase derived from rat trypsin II with the four crucial mutations (orange) depicted as ball-and-stick side-chains (PDB 4NIV). The crystal structure shows a zymogen-like conformation, with large disordered regions around the catalytic Ser195, the S1 pocket and the 148-loop. Only the Lys60Glu mutation could be confirmed by electron density. The mutant residues Asn143His, Tyr151His, and Asp189Lys were modeled according to the positions in mature bovine trypsin.

Conclusions
Protease-derived peptide ligases cannot compete with recombinant expression in the area of protein synthesis. Nevertheless, they gain utility as increasingly valuable tools in combined approaches of organic polypeptide synthesis, which generates building blocks of up to 50 residues, and the covalent linkage by various ligases. An overview of the most relevant protease-ligases with in vitro reactions is shown in Table 1. In addition, ligases The prototypic bovine β-trypsin structure (PDB 1TLP). The catalytic triad and specificity determining residues are shown as ball-and-stick models. (B) Trypsin bound sunflower trypsin inhibitor-1 (SFTI-1) in the active site with a scissile bond that is religated in the thermodynamic equilibrium (PDB 1SFI). The major specificity determinant is P1-Lys, while several prolines and an internal disulfide rigidify the cyclic inhibitor. (C) Trypsiligase derived from rat trypsin II with the four crucial mutations (orange) depicted as ball-and-stick side-chains (PDB 4NIV). The crystal structure shows a zymogen-like conformation, with large disordered regions around the catalytic Ser195, the S1 pocket and the 148-loop. Only the Lys60Glu mutation could be confirmed by electron density. The mutant residues Asn143His, Tyr151His, and Asp189Lys were modeled according to the positions in mature bovine trypsin.
A first step toward a trypsin-based ligase was the trypsin mutant Asp189Glu, with a significantly improved ligase activity, followed by the more efficient triple mutant Lys60Glu/Asp189Ser/Asp194Asn of trypsin [127,128]. In addition, changes at the oxyanion hole of trypsin reduced the hydrolytic activity and improved the ligase efficiency, in particular for the mutant Gln192Pro with protein substrates [129]. The anionic rat trypsin II quadruple mutant Lys60Glu/Asn143His/Glu151His/Asp189Lys, termed trypsiligase, adopts a zymogen-like state with a disordered activation domain in the absence of ligands, whereas an active and ordered conformation is induced by inhibitor binding ( Figure 7C) [130]. Similar to sortases and bacterial transglutaminases, trypsiligase catalyzes N-terminal linking of antibody Fab fragments in high yields [131]. The proteolytic activity of this trypsin variant is strictly limited to the cleavage sequence Tyr-Arg-His, with the relatively rare occurrence of 0.5% in the human proteome, allowing for the successful N-and C-terminal ligation of Fab fragments with PEG and carboxyfluorescein [132]. A recent overview was published as a book chapter with detailed protocols for the usage of trypsiligase in polypeptide and protein synthesis [133].

Conclusions
Protease-derived peptide ligases cannot compete with recombinant expression in the area of protein synthesis. Nevertheless, they gain utility as increasingly valuable tools in combined approaches of organic polypeptide synthesis, which generates building blocks of up to 50 residues, and the covalent linkage by various ligases. An overview of the most relevant protease-ligases with in vitro reactions is shown in Table 1. In addition, ligases are most efficient in a variety of protein-modifying reactions, by attaching natural and synthetic peptides, molecular labels, glycans, membrane anchors, reactive organic molecular moieties, drugs, or even protein domains. Table 1. Structural and functional parameters of ligating proteases. PDB codes, ligated bond (often involving modified amino acids or peptides), reaction conditions, and yields are shown. The listed enzymes were regularly used to catalyze peptide bond formation and for linking proteins with modifying molecules of various types. 1 HFIP, hexafluoroisopropylalcohol; 2 the yield is estimated from a chromatogram.; 3 Selenolsubtilisin C derived from subtilisin Carlsberg transfers a cinnamoyl acyl intermediate to H 2 N-Gly; 4 yield estimated from a 4-times higher rate constant compared to thiolsubtilisin. The incorporation of 4-fluorohistidine as a catalytic residue in RNase based on subtiligase ligation is one of the most stunning historical examples of the usefulness of protein ligases [78]. Nowadays, the rapidly evolving field of biological click chemistry is based on the apparently unlimited usage of non-natural amino acids (nnaa or nnca for non-canonical) with numerous applications from biophysical applications to analytical and pharmaceutical research. In combination with protease-derived ligases, the tedious recombinant expression of proteins for the incorporation of nnaas might be facilitated or circumvented. It remains to be seen how useful the more advanced, engineered protease-ligases can become outside labs performing basic research, such as in biotechnological and pharmaceutical applications. At least in theory, sortases, subtiligases, and trypsiligases should allow modifying any protein, which contains the required amino acid sequences for the specific ligation reaction. Thus, labels for monitoring and warheads for disabling disease-related enzymes could be linked to highly specific therapeutic proteins in an easy and elegant manner.
Funding: This study was supported by the Austrian Science Fund (FWF) with the D-A-CH grant I 3877-B21.

Data Availability Statement:
No new data were created or analyzed in this study. Data sharing is not applicable to this review article.