Structural and Functional Modeling of Artificial Bioactive Proteins

Abstract: A total of 32 synthetic proteins designed by Michael Hecht and co-workers was investigated using standard bioinformatics tools for the structure and function modeling. The dataset consisted of 15 artificial α-proteins (Hecht_α) designed to fold into 102-residue four-helix bundles and 17 artificial six-stranded β-sheet proteins (Hecht_β). We compared the experimentally-determined properties of the sequences investigated with the results of computational methods for protein structure and bioactivity prediction. The conclusion reached is that the dataset of Michael Hecht and co-workers could be successfully used both to test current methods and to develop new ones for the characterization of artificially-designed molecules based on the specific binary patterns of amino acid polarity. The comparative investigations of the bioinformatics methods on the datasets of both de novo proteins and natural ones may lead to: (1) improvement of the existing tools for protein structure and function analysis; (2) new algorithms for the construction of de novo protein subsets; and (3) additional information on the complex natural sequence space and its relation to the individual subspaces of de novo sequences. Additional investigations on different and varied datasets are needed to confirm the general applicability of this concept.


Introduction
Proteins are molecules fundamental to life, and their biological function is driven by their structure [1,2].The modeling of the protein structure and function relationship is important and challenging [1,2].Recent progress in protein biochemistry and biophysics has enabled the construction of artificial (de novo) proteins with specific properties [1][2][3].The predominant part of the possible protein sequences and structures not tested by evolution may be evaluated by de novo protein design, which could provide solutions to new protein-structure/function targets [2][3][4][5][6].The goal of designing de novo proteins is to construct the molecules that structurally and functionally mimic natural proteins and to discover new structure-function relationships compared with those found in nature [2,6].
The sequence space of proteins is huge and complex [1,2].It has evolved in time influenced by the evolutionary processes of selection and mutation [1,2].By contrast, the subsets of de novo designed protein sequences, including the one tested in this research, are limited, internally consistent, of high sequence identity and artificially designed for a specific purpose.There is no standardized dataset of artificial proteins for the testing of bioinformatics algorithms, in contrast to the naturally-occurring proteins [7].However, there are several sets of well-characterized artificial proteins that may be used for the testing of algorithms concerning protein structure and function [1][2][3][4][5][6].In our research, we tested standard bioinformatics methods using the de novo protein subset of Hecht et al. [3,6,[8][9][10][11][12].It represents a well-characterized and sufficiently large subset of 15 synthetic αand 17 β-proteins Information 2017, 8, 29; doi:10.3390/info8010029www.mdpi.com/journal/information(Hecht_α and Hecht_β) [3,6,[8][9][10][11][12][13].A dataset of this size was sufficient for this study due to the small number of variables analyzed in comparison to the number of members of each class (Hecht_α and Hecht_β) [14].For the structural characterization of the dataset, Hecht and co-workers used: size-exclusion chromatography, NMR and circular dichroism spectroscopy (CD), X-ray crystallography, electrospray mass spectrometry (ESMS), differential scanning calorimetry (DSC) and several other methods [3,6,9,12,[15][16][17][18][19][20][21].Michael Hecht and his co-workers [8][9][10]22] were the first to design functional de novo protein structures, as well as simple algorithms based on binary polar(p) and nonpolar(n) amino acid patterning.They showed that complex molecular structures could be made using amino acid polarity patterns pnppnnp and pnpnpnp that define stable α-helices and β-strands, respectively [8][9][10].Hecht_α proteins were not explicitly designed to be functional, but recent studies have shown that they also provide the biological functions necessary to maintain cell growth where genes encoding enzymes essential for amino acid biosynthesis have been deleted [6,11].They have been named SynRescue proteins [3,6,11].
The existing methodology for protein structure-function analysis has been derived and tested using natural proteins, so that, as a rule, the construction patterns for different de novo protein subsets have been extracted from the natural sequence space [1,2].Until recently, due to the small number of non-natural proteins, it was not possible to analyze or test the standard bioinformatics methods on sufficiently large datasets.The overall goal of the paper is to investigate whether the use of standard bioinformatics algorithms is applicable for efficient and accurate structure-function modeling of the synthetic proteins subsets Hecht_α and Hecht_β.The applicability of the presented methodology will be discussed considering de novo protein subsets recently reported by Woolfson and co-workers [4,5,[23][24][25] and Baker and co-workers [1,26].

Results and Discussion
The de novo protein design of Hecht et al. [9,10,12] is based on the empirical observation that the second base of the nucleotide triplet of the genetic code (Table A1) specifies amino acid polarity, i.e., the second U/T of the codon specifies nonpolar amino acids in the selected protein α-strands and β-sheets, while the second A of the codon specifies polar amino acids (Section 3.1).The dataset of 32 newly-designed αand β-proteins (Hecht_α and Hecht_β) consisting of well-defined and stable structures is the first sufficiently large subset of synthetic sequences that can be used for the testing of standard bioinformatics models and algorithms, derived from the natural protein sequences (Tables S1  and S2) [3,6,9,12,13].
First, we will investigate: 1. computational techniques to detect periodicity in Hecht_α helices and Hecht_β sheets and hydrophobicity values assigned to the individual amino acids along different protein structural segments [27,28]; 2.
We will also analyze specific functional aspects of the artificial α-protein sequences that arise from the ligand-receptor interplay of their 3D structure and the reactive patterns of their natural ligands (heterogens) [37][38][39].This second aspect of the protein structure-function relationship will be inspected utilizing: 1. the protein virtual spectroscopy technique, i.e., the informational spectrum method (ISM), based on the amino acid electron-ion interaction potential (EIIP) [40][41][42][43][44]; 2.
the 3DLigandSite method that uses predicted Hecht_α and Hecht_β protein structures and the ligands present in homologous natural structures to predict ligand binding sites [38,39].

Spectral Analysis of Hecht_α and Hecht_β Proteins
We inspected the periodicity in Hecht_α (Table S1) and Hecht_β (Table S2) protein structures using the method of Cornette et al. [27], which is based on the results of 38 published hydrophobicity scales compared for their ability to identify the characteristic 3.6 residue period of α-helices (Table A2).As suggested by the authors, we applied the normalized PRIFT scale since this technique maximizes the amphipathic index of the Fourier transform [27].In addition to the Fourier sequence analysis, we also performed an alternative least-squares spectral analysis (LSSA) that, for short peptide components, provides a more reliable estimate of periodicity [27].
Both techniques of spectral analysis led to the same result and confirmed that one pronounced frequency peak at the position x = 0.28 characterizes all Hecht_α proteins (Table 1).Hecht_β proteins are characterized by another pronounced frequency peak at a different position, x = 0.45, which enables simple and accurate virtual spectroscopy screening of both artificial protein classes.Moreover, the peak positions of both artificial protein classes presented in Table 1 and Figure 1 are in marked agreement with previously-published results for natural proteins [27,45].
The value of 0.28 (i.e., 101 • ) that we measured for Hecht_α proteins is identical to the finding of Eisenberg et al. [45] that 157 segments of α-helix exhibited a peak at 100 • .The α-structural repeat of 3.6 residue/turn, approximated by polar and nonpolar residue patterns pnppnnp [9], is obtained from the dominant peak value at 360 • /101 • ∼ = 3.6 (calculated according to Cornette et al. [46]).For Hecht_β proteins, we identified one distinct frequency peak at 0.45 (i.e., 162 • ), which is confirmed by the maximum peak at 160 • reported by Eisenberg et al. [45] for the average of 220 strands of β-structure.Therefore, it is not surprising that the methods of Eisenberg et al. [45] and Cornette et al. [27] predict the same peak positions for the tested proteins.

Synthetic Proteins
Frequency Peak It is worth mentioning that although large datasets of natural proteins may be used to extract characteristic peaks for αand β-structures, the procedure may not always work well for the natural proteins on an individual basis [27,45].The absence of distinct peaks in one natural αand one natural β-protein (of experimentally-determined structure [13,22]) is shown in Figure A3 (proteins 1cc5 and 1amg-2-AS, respectively).By contrast, all Hecht_α and Hecht_β proteins can be predicted individually (Table 1 and Figure 1), which enables fast structural screening by means of the computational techniques presented and the testing of new bioinformatics methods and algorithms for structural predictions.This is because distinct polarity patterns encode the well-defined artificial structure of all Hecht_α and Hecht_β proteins [3,9].

Hydropathy Analyses of Hecht_α and Hecht_β Proteins
The hydropathy of Hecht_α and Hecht_β proteins was investigated using a sliding block based on the Kyte-Doolittle scale (Table A2) [50].This method sums up the hydrophobicity values of amino acid residues.It is often used for identifying surface-exposed regions, as well as transmembrane regions, depending on the size of the sliding block used, e.g., a short window of 7-9 is used for the exposed regions and a large window of 19-21 is for the transmembrane regions [28, 50,51].
Figure 2 shows the position of three predicted surface-exposed regions (P1, P2 and P3) of Hecht_α and Hecht_β proteins based on three-point moving average values of the nine amino acid sliding block.
Detected peaks of Hecht_α proteins predict combinatorial turn positions and possible antigenic sites.Figures 2a and 3a,b show that out of three predicted sites, the first and the third sites (P1 and P3) locate two important amino acids, 25 and 77, which precede residues at positions 26 and 78; these latter presumably stabilize the dimeric structure of Hecht_α SynRescue proteins with charged or hydrogen bonding groups [3].This is valid for all 15 Hecht_α SynRescue proteins presented in Figure 2a and confirms the results of Murphy, Greisman and Hecht regarding Hecht_α-protein structure [3].The presumed interactions between P1 and P3 regions indicate that there could be an antigenic site of SynRescue proteins accessible at the interaction-free P2 position, which is predicted by several methods to be in the vicinity of region 53-59 (Figure 3a-c), i.e., near the turn between helix 2 and helix 3 [3,6], while the amino acids at or near positions 26 and 78 could influence dynamic structures that fluctuate between monomeric and dimeric states [3].Hydropathy analyses

Hydropathy Analyses of Hecht_α and Hecht_β Proteins
The hydropathy of Hecht_α and Hecht_β proteins was investigated using a sliding block based on the Kyte-Doolittle scale (Table A2) [50].This method sums up the hydrophobicity values of amino acid residues.It is often used for identifying surface-exposed regions, as well as transmembrane regions, depending on the size of the sliding block used, e.g., a short window of 7-9 is used for the exposed regions and a large window of 19-21 is for the transmembrane regions [28, 50,51].
Figure 2 shows the position of three predicted surface-exposed regions (P1, P2 and P3) of Hecht_α and Hecht_β proteins based on three-point moving average values of the nine amino acid sliding block.
Detected peaks of Hecht_α proteins predict combinatorial turn positions and possible antigenic sites.Figures 2a and 3a,b show that out of three predicted sites, the first and the third sites (P1 and P3) locate two important amino acids, 25 and 77, which precede residues at positions 26 and 78; these latter presumably stabilize the dimeric structure of Hecht_α SynRescue proteins with charged or hydrogen bonding groups [3].This is valid for all 15 Hecht_α SynRescue proteins presented in Figure 2a and confirms the results of Murphy, Greisman and Hecht regarding Hecht_α-protein structure [3].The presumed interactions between P1 and P3 regions indicate that there could be an antigenic site of SynRescue proteins accessible at the interaction-free P2 position, which is predicted by several methods to be in the vicinity of region 53-59 (Figure 3a-c), i.e., near the turn between helix 2 and helix 3 [3,6], while the amino acids at or near positions 26 and 78 could influence dynamic structures that fluctuate between monomeric and dimeric states [3].Hydropathy analyses of Hecht_α and Hecht_β proteins based on the PRIFT scale of Cornette et al. [27] predicted the same bioactive sites as the Kyte-Doolittle scale (Figure 2). of Hecht_α and Hecht_β proteins based on the PRIFT scale of Cornette et al. [27] predicted the same bioactive sites as the Kyte-Doolittle scale (Figure 2).   of Hecht_α and Hecht_β proteins based on the PRIFT scale of Cornette et al. [27] predicted the same bioactive sites as the Kyte-Doolittle scale (Figure 2).A typical 3D structure of Hecht_β protein (#17) is presented in Figure 4.The analysis of Hecht_β proteins suggests the existence of three surface exposed regions corresponding to turn 1 (P1), turn 4 (P2) and turn 5 (P3).These positions are predicted to be antigenic regions (Figure 4b).The predicted probability of antigenicity [31] and solubility upon overexpression in Escherichia coli [35,36] for all Hecht_α and Hecht_β proteins is presented in Table A3.Those methods enable fast and simple virtual screening for desirable properties [31,35,36].All 17 Hecht_β proteins and 13 of 15 Hecht_α proteins were predicted to be soluble, using the SOLpro and Periscope methods (Table A3) [35,36].As for Hecht_α SynRescue proteins, the SOLpro method predicted that two of them, that is SynIlvA2 and SynFes2, were insoluble (Table A3).Periscope, a recently-published method for the quantitative prediction of soluble protein expression in the periplasm of Escherichia coli, predicted SynIlvA2 and SynFes2 to be soluble.Regarding the solubility-function relationship, there was no significant difference in the bioactivity between the SynIlvA2 and SynIlvA1 (Table A3, data by Fisher at al. [6]).The same is valid for the SynFes2 and SynFes6, since both of them accumulated iron successfully [6].The given data imply that the solubility prediction should be used with caution.A typical 3D structure of Hecht_β protein (#17) is presented in Figure 4.The analysis of Hecht_β proteins suggests the existence of three surface exposed regions corresponding to turn 1 (P1), turn 4 (P2) and turn 5 (P3).These positions are predicted to be antigenic regions (Figure 4b).The predicted probability of antigenicity [31] and solubility upon overexpression in Escherichia coli [35,36] for all Hecht_α and Hecht_β proteins is presented in Table A3.Those methods enable fast and simple virtual screening for desirable properties [31,35,36].All 17 Hecht_β proteins and 13 of 15 Hecht_α proteins were predicted to be soluble, using the SOLpro and Periscope methods (Table A3) [35,36].As for Hecht_α SynRescue proteins, the SOLpro method predicted that two of them, that is SynIlvA2 and SynFes2, were insoluble (Table A3).Periscope, a recently-published method for the quantitative prediction of soluble protein expression in the periplasm of Escherichia coli, predicted SynIlvA2 and SynFes2 to be soluble.Regarding the solubility-function relationship, there was no significant difference in the bioactivity between the SynIlvA2 and SynIlvA1 (Table A3, data by Fisher at al. [6]).The same is valid for the SynFes2 and SynFes6, since both of them accumulated iron successfully [6].The given data imply that the solubility prediction should be used with caution.The negative correlation between SOLpro predicted solubility and ANTIGENpro predicted antigenicity was significant for Hecht_α SynRescue proteins (r = −0.596,p = 0.019), but it was insignificant for Hecht_β proteins (r = 0.242, p = 0.349).A very similar result was observed for the correlation between Periscope predicted solubility and ANTIGENpro predicted antigenicity, significant for Hecht_α SynRescue proteins (r = −0.795,p = 0.0004) and insignificant for Hecht_β proteins (r = 0.095, p = 0.717).Solubility has an impact on antigenicity, because low soluble or insoluble antigens tend to form aggregates [53].Large, insoluble aggregates are more immunogenic than small, soluble molecules [53].Specific modifications of the analyzed molecules may be obtained by amino acid mutations [6,20].For example, when lysine mutations are introduced at the ends of the Hecht_β protein #45 in order to disfavor fibrillar structure formation and prevent oligomerization [20], the predicted solubility increases slightly, and the spectral peak indicates a small shift (Figure A4).

Virtual Spectroscopy and 3D Ligand Binding Prediction of Hecht_α (SynRescue) Proteins
The key goal in synthetic biology is to design and produce novel proteins with a specific structure and function [3].Screening of the third-generation computational libraries for de novo sequences that function in vivo yielded several Hecht_α sequences, termed SynRescue proteins [3,6], that rescue conditionally lethal mutants of Escherichia coli (auxotrophs) [3,6,11].From a practical standpoint, it would be desirable not only to construct and test a large number of structural patterns in proteins, but also to predict the functional characteristics of the newly-designed molecules.To address this problem, we analyzed 15 SynRescue protein sequences using the informational spectrum method (ISM) [40][41][42][43][44].This is a virtual spectroscopy method for structure-function analysis of proteins based on the amino acid electron-ion interaction potential (EIIP) in the Rydberg energy units [40][41][42][43][44].According to the underlying theory, the highest peaks found using this type of analysis represent hot spots, i.e., bioactive parts, of the protein molecule [40][41][42][43][44].
Table 2 presents the results of virtual spectroscopy for seven Hecht_α SynRescue proteins that save Escherichia coli auxotrophs with disrupted amino acid enzyme pathways for serine (SerB), glutamate/glutamic acid (GltA) and isoleucine (IlvA).

SynSerB Rescue Proteins
In their recent study, Digianantonio and Hecht [11] describe the mechanism by which SynSerB3 (Figure 5), a novel regulatory protein discovered in a library of Hecht_α SynRescue sequences, rescues knockout strains of Escherichia coli.The newly-constructed protein SynSerB3 provides the necessary function to maintain bacterial cell growth under conditions of serB gene deletion, which encodes phosphoserine phosphatase, an enzyme essential for serine biosynthesis [6,11].The important conclusion made by Digianantonio and Hecht is that de novo proteins, based on the binary coding patterns of the amino acid polarity and a library of Escherichia coli sequences, can be used to drive adaptive changes in the gene expression [11].However, to ensure the rescue function of the artificially-designed SynSerB protein sequences, Hecht and co-workers transformed a large library of 1.5 × 10 6 binary patterned de novo sequences into strains of Escherichia coli containing survival gene deletions [6,11].
Our results in Table 2, Figures 5 and 6 show that higher growth rates of auxotrophic/SerB knockout Escherichia coli strains [6] exerted by SynSerB3 and SynSerB1 Hecht_α proteins are characterized by two dominant frequency peaks at positions 0.45 (F/L92) and 0.15 (L30), of the periodograms and cross-amplitude (Figure 5a,b and Figure 6).By contrast, lower growth rates of auxotrophic/SerB knockout Escherichia coli strains [6] exerted by SynSerB4 and SynSerB2 proteins are characterized by the absence of one or both of the frequency peaks and both cross-amplitude peaks, respectively (Figure 5c,d and Figure 6).
The frequency peaks, i.e., hot spots, of SynSerB3 correspond to the first nonpolar residue of helix 2 (L30) and the central nonpolar residue of the helix 4 (F92) [3].The same results were obtained by least square spectral analysis (LSSA), as presented in Table 2. Consequently, ISM analysis of de novo protein sequences might be a useful supplementary procedure for the evaluation of potential bioactive sites and selection/virtual screening of novel protein nano-building blocks possessing specific functional characteristics [21].
Additional information related to the possible bioactive sites of the SynRescue protein may be obtained with 3DLigandSite.The method uses predicted de novo protein structures and the ligands present in homologous natural structures to predict ligand binding sites [38,39].The reactive patterns of the presumed natural ligands (heterogens) may be particularly useful for reconstructing the biochemical pathways of the auxotrophs that the novel proteins could rescue.As an example, the predicted ligand binding sites of the higher activity rescue protein SynSerB3 are given in Figure 7. binary coding patterns of the amino acid polarity and a library of Escherichia coli sequences, can be used to drive adaptive changes in the gene expression [11].However, to ensure the rescue function of the artificially-designed SynSerB protein sequences, Hecht and co-workers transformed a large library of 1.5 × 10 6 binary patterned de novo sequences into strains of Escherichia coli containing survival gene deletions [6,11].
Our results in Table 2, Figures 5 and 6 show that higher growth rates of auxotrophic/SerB knockout Escherichia coli strains [6] exerted by SynSerB3 and SynSerB1 Hecht_α proteins are characterized by two dominant frequency peaks at positions 0.45 (F/L92) and 0.15 (L30), of the periodograms and cross-amplitude (Figures 5a,b and 6).By contrast, lower growth rates of auxotrophic/SerB knockout Escherichia coli strains [6] exerted by SynSerB4 and SynSerB2 proteins are characterized by the absence of one or both of the frequency peaks and both cross-amplitude peaks, respectively (Figures 5c,d and 6).
The frequency peaks, i.e., hot spots, of SynSerB3 correspond to the first nonpolar residue of helix 2 (L30) and the central nonpolar residue of the helix 4 (F92) [3].The same results were obtained by least square spectral analysis (LSSA), as presented in Table 2. Consequently, ISM analysis of de novo protein sequences might be a useful supplementary procedure for the evaluation of potential bioactive sites and selection/virtual screening of novel protein nano-building blocks possessing specific functional characteristics [21].Additional information related to the possible bioactive sites of the SynRescue protein may be obtained with 3DLigandSite.The method uses predicted de novo protein structures and the ligands present in homologous natural structures to predict ligand binding sites [38,39].The reactive patterns of the presumed natural ligands (heterogens) may be particularly useful for reconstructing the biochemical pathways of the auxotrophs that the novel proteins could rescue.As an example, the predicted ligand binding sites of the higher activity rescue protein SynSerB3 are given in Figure 7.
When the binding sites of the higher activity rescue protein SynSerB3 in Figure 7 are compared to the lower activity SynSerB2 mutant, Figure 8  Hecht and co-workers showed that their de novo α-helical proteins frequently exhibit biological functions including the heme binding and peroxidase, esterase and lipase activities [6,10,54,55].In addition to enzymes, some specific cofactors are involved in metabolic reactions, e.g., adenosine triphosphate (ATP), nicotinamide adenine dinucleotide (NAD), nicotinamide adenine dinucleotide-5-phosphate (NAPD), flavin adenine dinucleotide (FAD), and Fe-protoporphyrin IX (HEM, i.e., heme) [56,57].When the binding sites of the higher activity rescue protein SynSerB3 in Figure 7 are compared to the lower activity SynSerB2 mutant, Figure 9 clearly shows that:

•
in the SynSerB2 mutant heterogen, FMN binds to binding site 2 instead of SF4/Mg (this region is located between structurally-important stabilizing amino acid positions 26 and 78 [3] of SynRescue proteins, Figure 9a,b); • FMN interaction with binding site 2 shifts the binding of SF4 to binding site 1, but the additional interaction with HEM (heme) and Fe 2+ is missing (region 16-31, Figure 9c,d); • the heterogen B12 in the SynSerB2 mutant disrupts the binding of ATP at binding site 4 (amino acid positions 4-7, Figure 8a,b); • the heterogen FAD in SynSerB2 mutant disrupts the binding of NAD at binding site 5 (amino acid positions 82-102, Figure 8c,d).
Information 2017, 8, 29 12 of 31 Schemes S1-S15, using the COBEpro method [32].It remains to be determined whether the binding of a specific heterogen (e.g., heme) to a stabilizing region (e.g., P1) may destabilize and modify the SynSer3 structure and make other sites of the molecule available to other bioactive heterogens.
According to Hecht et al., moderately active de novo heme proteins can serve as starting points for the laboratory-based enzyme evolution and the development of molecules with improved activity [55].As shown in Table 2 and Figures 7-9, different mutations at the specific positions of the SynSerB rescue protein may account for the bioactivity of SynSerB3 and the inactivity of SynSerB2.
Metabolic enzymes and transcription cofactors participate in transcriptional regulation and represent a direct link between cellular metabolism and regulated gene expression [56,57].They play an important role in the production of the proteins that are necessary for cellular function, metabolism and gene expression.The variety of biological functions exerted by the SynRescue protein group in Escherichia coli auxotrophs may derive from the ability of these synthetic proteins to bind different cofactors.The results presented demonstrate that the combining of ISM and 3DLigandSite methods might be a useful filter for the virtual selection of molecules with desirable properties.The binding assays determined that nearly 66% of the Hecht_α protein sequences of the third generation library bind heme (approximately half at a relatively high level) [10,54].Of the 80% of the proteins bound that exhibited peroxidase activity, 60% exhibited hydrolase activity and 36% lipase activity [54].The enzyme activity was up to 10 6 -times faster than the uncatalyzed reaction for peroxidase and up to 10 3 -times faster than the uncatalyzed reaction for hydrolase and lipase [54].Hydrolase activities rely on the protein alone, i.e., the enzymatic activity may be found in the absence of a cofactor, as well [10,54].It is important to note that nearly 30% of heme-binding proteins exhibited some level of enzymatic activity for all functions [54].In the absence of heme cofactor, esterase and lipase activities were reported in 30% and 20% of the third generation of Hecht_α de novo proteins, respectively, although at lower rates than for natural (evolved) enzymes [10,54].
Using the 3DLigandSite prediction method, the specific binding site was located for several of the cofactors (ATP, NAD, FAD, HEM, SF4).Heme binding is relatively easy to achieve with the de novo design since many peptides and proteins have been designed to bind heme [54].The interaction of the SynSer3 with heme and Fe 2+ in the aa region 16-31 was predicted by the COBEpro method (the epitope/exposed region NDRKNLH, aa 25-31) and by ISM (L30); Scheme S3 and Table 2, respectively.The list of predicted continuous epitopes for all Hecht_α proteins is shown in the The binding assays determined that nearly 66% of the Hecht_α protein sequences of the third generation library bind heme (approximately half at a relatively high level) [10,54].Of the 80% of the proteins bound that exhibited peroxidase activity, 60% exhibited hydrolase activity and 36% lipase activity [54].The enzyme activity was up to 10 6 -times faster than the uncatalyzed reaction for peroxidase and up to 10 3 -times faster than the uncatalyzed reaction for hydrolase and lipase [54].Hydrolase activities rely on the protein alone, i.e., the enzymatic activity may be found in the absence of a cofactor, as well [10,54].It is important to note that nearly 30% of heme-binding proteins exhibited some level of enzymatic activity for all functions [54].In the absence of heme cofactor, esterase and lipase activities were reported in 30% and 20% of the third generation of Hecht_α de novo proteins, respectively, although at lower rates than for natural (evolved) enzymes [10,54].
Using the 3DLigandSite prediction method, the specific binding site was located for several of the cofactors (ATP, NAD, FAD, HEM, SF4).Heme binding is relatively easy to achieve with the de novo design since many peptides and proteins have been designed to bind heme [54].The interaction of the SynSer3 with heme and Fe 2+ in the aa region 16-31 was predicted by the COBEpro method (the epitope/exposed region NDRKNLH, aa [25][26][27][28][29][30][31] and by ISM (L30); Scheme S3 and Table 2, respectively.The list of predicted continuous epitopes for all Hecht_α proteins is shown in the Schemes S1-S15, using the COBEpro method [32].It remains to be determined whether the binding of a specific heterogen (e.g., heme) to a stabilizing region (e.g., P1) may destabilize and modify the SynSer3 structure and make other sites of the molecule available to other bioactive heterogens.According to Hecht et al., moderately active de novo heme proteins can serve as starting points for the laboratory-based enzyme evolution and the development of molecules with improved activity [55].
As shown in Table 2 and Figures 7-9, different mutations at the specific positions of the SynSerB rescue protein may account for the bioactivity of SynSerB3 and the inactivity of SynSerB2.
Metabolic enzymes and transcription cofactors participate in transcriptional regulation and represent a direct link between cellular metabolism and regulated gene expression [56,57].They play an important role in the production of the proteins that are necessary for cellular function, metabolism and gene expression.The variety of biological functions exerted by the SynRescue protein group in Escherichia coli auxotrophs may derive from the ability of these synthetic proteins to bind different cofactors.The results presented demonstrate that the combining of ISM and 3DLigandSite methods might be a useful filter for the virtual selection of molecules with desirable properties.
IS-based phylogenetic analysis (ISTREE) [58] of the SynSerBRescue protein clustering and standard phylogenetic analysis of the protein sequences [59,60] provided the same information regarding the directed in vitro evolution of Escherichia coli synthetic rescue proteins for serine, i.e., SynSerBRescue (Figure A5).The evolution of synthetic rescue proteins is visualized in Figure A5, from the least active member SynSerB2 (closely related to the parent WA20 structure) to the most active members SynSerB3 and SynSerB1, distant from WA20.
This result confirms that at the level of phylogenetic/molecular evolution analysis, two different analytical methods, amino acid electron-ion interaction potential (ISTREE) and amino acid molecular homologies (TreeDyn), render identical conclusions.

SynIlvA Rescue Proteins
Another important example of further synthetic biology investigation is the auxotrophic/IlvA knockout Escherichia coli strains that have disrupted encoding of threonine deaminase.This enzyme catalyzes the first step in the production of isoleucine from threonine.Fisher et al. [6] report that functional SynIlvA1 rescue proteins lose the ability to save auxotrophs upon K to A mutation at the amino acid position 42.The A42 mutants of SynIlvA1 lose the ability to rescue Escherichia coli auxotrophs despite the fact that there is no clear difference in the information spectrum changes (Figure 10; EIIP values of K and A are almost identical at 0.0371 and 0.0373, respectively).3D and secondary structures, as modeled by the Phyre2 and 3DLigandSite methods are also identical (Figure 11, Figure A6, respectively).In A42 mutants, the iron/sulfur cluster-SF4 [61] necessary for the deaminase function remains intact at positions E26, S27 and L71, i.e., in the vicinity of the structurally important stabilizing region E26 and K78 (Figure 11b) [3].However, the 3DLigandSite method shows marked differences in heterogen Fe binding at the amino acid positions 9, 96 and 12, which is present in biologically-active SynIlvA1 (Figure 11c,d) and absent in the inactive A42 mutants of SynIlvA1.This seems to be in line with the reported functional promiscuity of SynIlvA1, "which was originally selected for its ability to rescue the isoleucine auxotroph ∆ilv but also rescues the ∆fes auxotroph, which is essential for the accumulation of iron" [3].Therefore, the lack of iron heterogen at specific positions of the molecule could be relevant for the loss of A42 mutant function in SynIlvA1.At this point, it seems reasonable to investigate further the use of the 3DLigandSite method for evaluating the impact of individual mutations on protein function and the directed evolution of novel SynRescue proteins [55].
"which was originally selected for its ability to rescue the isoleucine auxotroph Δilv but also rescues the Δfes auxotroph, which is essential for the accumulation of iron" [3].Therefore, the lack of iron heterogen at specific positions of the molecule could be relevant for the loss of A42 mutant function in SynIlvA1.At this point, it seems reasonable to investigate further the use of the 3DLigandSite method for evaluating the impact of individual mutations on protein function and the directed evolution of novel SynRescue proteins [55].

SynFes and SynGltA Rescue Proteins
Fes gene is not involved in biosynthetic pathways.It functions in iron acquisition by encoding enterobactin esterase, which cleaves the iron-bound enterobactin siderophore [6].This allows cells to acquire iron in iron-limited environments [6].Fisher et al. report that cells expressing the SynFes6 and SynFes2 rescue proteins accumulate six-and 10-fold more iron than control cells, respectively [6].The difference in SynFes6 and SynFes2 iron accumulation could be ascribed to three facts detected by the 3DLigandSite method:  in addition to the positions 13 (Fe/Fe 2+ ) and 49 (Fe 2+ ) shared by SynFes6 and SynFes2, SynFes2 has two extra Fe heterogen binding positions at amino acid sites 64 and 96;  at position Q49 in SynFes6, FAD and B12 could disrupt the binding of other heterogens (Fe 2+ , ATP, NAD, GAL, MAN and GLC), which is not the case for SynFes2;  SynFes2 has two additional binding sites for heterogen Ca at positions 1 and 49.
Table 3 shows the results of analysis for SynFesRescue proteins using EIIP information spectrum (ISM) analysis.The SynFes2 and SynFes6 rescue proteins that accumulate iron show two distinct frequency peaks at positions 26 (turn 1) and 37 (helix 2), Table 3 and Figure 12a.Position 26 belongs to the structurally important stabilizing part of SynRescue proteins [3].SynFes1 protein does not have the second peak at position 37, i.e., in the molecular EIIP spectrum; the peak of a nonpolar residue close to the central part of helix 2 is missing.

SynFes and SynGltA Rescue Proteins
Fes gene is not involved in biosynthetic pathways.It functions in iron acquisition by encoding enterobactin esterase, which cleaves the iron-bound enterobactin siderophore [6].This allows cells to acquire iron in iron-limited environments [6].Fisher et al. report that cells expressing the SynFes6 and SynFes2 rescue proteins accumulate six-and 10-fold more iron than control cells, respectively [6].The difference in SynFes6 and SynFes2 iron accumulation could be ascribed to three facts detected by the 3DLigandSite method: • in addition to the positions 13 (Fe/Fe 2+ ) and 49 (Fe 2+ ) shared by SynFes6 and SynFes2, SynFes2 has two extra Fe heterogen binding positions at amino acid sites 64 and 96; • at position Q49 in SynFes6, FAD and B12 could disrupt the binding of other heterogens (Fe 2+ , ATP, NAD, GAL, MAN and GLC), which is not the case for SynFes2; • SynFes2 has two additional binding sites for heterogen Ca at positions 1 and 49.
Table 3 shows the results of analysis for SynFesRescue proteins using EIIP information spectrum (ISM) analysis.The SynFes2 and SynFes6 rescue proteins that accumulate iron show two distinct frequency peaks at positions 26 (turn 1) and 37 (helix 2), Table 3 and Figure 12a.Position 26 belongs to the structurally important stabilizing part of SynRescue proteins [3].SynFes1 protein does not have the second peak at position 37, i.e., in the molecular EIIP spectrum; the peak of a nonpolar residue close to the central part of helix 2 is missing.Other members of the SynFesRescue family, SynFes3-5 and SynFes7/8, have one distinct peak at different positions: SynFes3 and SynFes5 at the helix 1 position and SynFes4 and SynFes7/8 at the helix 3 position (Table 3, Figure 12).In a similar way to SynSerBRescue (Figure A5), ISM-based phylogenetic analysis of the SynFesRescue clustering is comparable to standard phylogenetic/molecular evolution in the results for homologous sequences (Figure A7).This suggests that that a combined application of the 3DLigandSite and ISM methods is a useful step in the characterization of synthetic proteins.
The solubility parameter presented in Table A3 might also influence the structural-functional behavior of the artificial sequences, e.g., bioactive and less soluble SynFes2 was reported as forming an extended dimer similar to WA20, which was not the case for the more soluble SynGltA1 that forms a very weakly-associating dimer or an extended monomer [3].
Hecht et al. [62] have recently shown that SynGltA protein acts as a rescuer of Escherichia coli cells deleted for gltA gene.Deletion of this gene disables the citric acid cycle, and the rescue protein SynGltA restores it [62].ISM virtual spectroscopy, based on electron-ion interaction potential, predicted multiple bioactive sites located in all four helices of the SynGltA rescue protein (Figure 13).Another method, 3DLigandSite prediction in Figure 14, locates binding sites in SynGltA for several metabolic cofactors (ATP, NAD, FAD) that are of importance for the citric acid cycle [57].These findings are in line with the observation of Hecht et al. [62] that non-natural rescue proteins recover energy metabolism by activating alternative metabolic pathways.Other members of the SynFesRescue family, SynFes3-5 and SynFes7/8, have one distinct peak at different positions: SynFes3 and SynFes5 at the helix 1 position and SynFes4 and SynFes7/8 at the helix 3 position (Table 3, Figure 12).In a similar way to SynSerBRescue (Figure A5), ISM-based phylogenetic analysis of the SynFesRescue clustering is comparable to standard phylogenetic/molecular evolution in the results for homologous sequences (Figure A7).This suggests that that a combined application of the 3DLigandSite and ISM methods is a useful step in the characterization of synthetic proteins.
The solubility parameter presented in Table A3 might also influence the structural-functional behavior of the artificial sequences, e.g., bioactive and less soluble SynFes2 was reported as forming an extended dimer similar to WA20, which was not the case for the more soluble SynGltA1 that forms a very weakly-associating dimer or an extended monomer [3].
Hecht et al. [62] have recently shown that SynGltA protein acts as a rescuer of Escherichia coli cells deleted for gltA gene.Deletion of this gene disables the citric acid cycle, and the rescue protein SynGltA restores it [62].ISM virtual spectroscopy, based on electron-ion interaction potential, predicted multiple bioactive sites located in all four helices of the SynGltA rescue protein (Figure 13).Another method, 3DLigandSite prediction in Figure 14, locates binding sites in SynGltA for several metabolic cofactors (ATP, NAD, FAD) that are of importance for the citric acid cycle [57].These findings are in line with the observation of Hecht et al. [62] that non-natural rescue proteins recover energy metabolism by activating alternative metabolic pathways.
SynGltA restores it [62].ISM virtual spectroscopy, based on electron-ion interaction potential, predicted multiple bioactive sites located in all four helices of the SynGltA rescue protein (Figure 13).Another method, 3DLigandSite prediction in Figure 14, locates binding sites in SynGltA for several metabolic cofactors (ATP, NAD, FAD) that are of importance for the citric acid cycle [57].These findings are in line with the observation of Hecht et al. [62] that non-natural rescue proteins recover energy metabolism by activating alternative metabolic pathways.Frequency peaks in the periodograms of the SynGltA1 sequence were determined using single series Fourier analysis.Frequency peaks in the periodograms of the SynGltA1 sequence were determined using single series Fourier analysis.

Virtual Spectroscopy and 3D Structure Prediction of Hecht_β Proteins
Like the Hecht_α dataset, the Hecht_β dataset of de novo protein sequences was analyzed using the informational spectrum method (ISM).This virtual spectroscopy method is useful for structure/function analysis of proteins and the identification of functional protein domains [40][41][42][43][44]. The method is also applicable for the assessment of biological effects of mutations.Frequency peaks of the EIIP periodograms denote the important parts of the molecules.ISM analysis of the Hecht_β dataset, presented in Figure 15, shows that the sequences cluster into five different subgroups, each of them having a distinct peak at a different part of the β-protein.Those peaks identify important regions that predict bioactive epitopes (Figure 2b and Figure 4a,b, Table A4 and Schemes S16-S32).

Virtual Spectroscopy and 3D Structure Prediction of Hecht_β Proteins
Like the Hecht_α dataset, the Hecht_β dataset of de novo protein sequences was analyzed using the informational spectrum method (ISM).This virtual spectroscopy method is useful for structure/function analysis of proteins and the identification of functional protein domains [40][41][42][43][44]. The method is also applicable for the assessment of biological effects of mutations.Frequency peaks of the EIIP periodograms denote the important parts of the molecules.ISM analysis of the Hecht_β dataset, presented in Figure 15, shows that the sequences cluster into five different subgroups, each of them having a distinct peak at a different part of the β-protein.Those peaks identify important regions that predict bioactive epitopes (Figures 2b and 4a,b, Table A4 and Schemes S16-S32).The hydropathy of Hecht_α and Hecht_β proteins was also investigated using the sliding block based on the Kyte-Doolittle scale.Figure 2b shows that Hecht_β proteins are characterized by N-terminus and C-terminus epitopes and that the central part in the vicinity of turn 3 is buried (Figures 2b and 4a,b).ISM analysis in Figure 15 complements the Kyte-Doolittle method for the epitope detection (e.g., for N-terminus P1 detection in Figure 2b) and offers two simple rules for the antigenic site location, as follows: 1.If there are two N-terminus epitopes, then the peak 1 (0.11/aa14) and the peak 2 (0.21/aa26) are in the vicinity of the epitope 1 and epitope 2 ends, respectively.If Hecht_β protein has only one N-terminus epitope, peak 1 (0.11/aa14) is the central part of the antigenic site, and peak 2 (0.21/aa26) is at the epitope end.2. At the C-terminal end, there are also two possible epitopes.The peak 0.42 (aa53) is located within epitope 1, situated at the very end of the protein sequence (Figure 15, Table A4 and Schemes S16-32).It corresponds to the antigenic region P3 (aa53-56) determined by the Kyte-Doolittle method (Figure 2b).The peaks 0.32 (aa40) and 0.37 (aa47) are within C-terminus epitope 2 and correspond to the region P2 (aa42-45) predicted by the Kyte-Doolittle method (Figure 2b).The hydropathy of Hecht_α and Hecht_β proteins was also investigated using the sliding block based on the Kyte-Doolittle scale.Figure 2b shows that Hecht_β proteins are characterized by N-terminus and C-terminus epitopes and that the central part in the vicinity of turn 3 is buried (Figures 2b and 4a,b).ISM analysis in Figure 15 complements the Kyte-Doolittle method for the epitope detection (e.g., for N-terminus P1 detection in Figure 2b) and offers two simple rules for the antigenic site location, as follows: If there are two N-terminus epitopes, then the peak 1 (0.11/aa14) and the peak 2 (0.21/aa26) are in the vicinity of the epitope 1 and epitope 2 ends, respectively.If Hecht_β protein has only one N-terminus epitope, peak 1 (0.11/aa14) is the central part of the antigenic site, and peak 2 (0.21/aa26) is at the epitope end.
At the C-terminal end, there are also two possible epitopes.The peak 0.42 (aa53) is located within epitope 1, situated at the very end of the protein sequence (Figure 15, Table A4 and Schemes S16-32).It corresponds to the antigenic region P3 (aa53-56) determined by the Kyte-Doolittle method (Figure 2b).The peaks 0.32 (aa40) and 0.37 (aa47) are within C-terminus epitope 2 and correspond to the region P2 (aa42-45) predicted by the Kyte-Doolittle method (Figure 2b).
A typical example for the epitope location is protein #17 presented in Figure 4b.The results of ISM spectral analysis of Hecht_β proteins were tested by COBEpro detection of continuous epitopes given in Schemes S16-S32 and Table A4.The results are consistent with the fact that ISM virtual spectroscopy detects bioactive protein regions [40][41][42][43][44].

Structural and Functional Characterization of Hecht_α and Hecht_β Proteins
As discussed by Woolfson et al. [2,5], de novo protein design is closely related to the synthetic biology approach to producing standard sets of polypeptide components, which are designed to solve problems across different biological systems.If properly standardized, those components can be applied in a modular manner to different biochemical problems [5].
The results of virtual spectroscopy, hydropathy analysis and structure-function modeling based on the Hecht_α and Hecht_β protein dataset imply that proposed methods could be used for the virtual screening of artificial proteins.Additional investigations on different and varied datasets are needed to confirm the general applicability of this concept.The structure elucidation of proteins using NMR and crystallography is a slow and expensive process.It is estimated that the cost of determining each new structure is in the order of $100,000 [63].The number of known protein sequences is about 400-times larger than the number of experimentally-determined structures, and the number of new sequences grows much faster than the number of structures [64].However, the cost of computer modeling is much lower (on average $10 per compound [63]), which explains why the computational methods for protein structure and function prediction are important.
Our analysis of the structural-functional relationships and directed evolution of Hecht_α and Hecht_β proteins in Escherichia coli is in line with the new approach of Petoukhov [65] "for modeling genetically specified structures and processes in living organisms using mathematical tools of the theory of resonances".The analysis of the physico-chemical properties of amino acids related to the codon information values and transition-probability distributions in short-term evolution, as discussed by Jiménez-Montano et al. [66], could additionally contribute to a better understanding of how de novo-designed proteins can drive adaptive changes in gene expression in order to provide life-sustaining regulatory functions [11].

Protein Datasets
The α-protein dataset consisted of 15 de novo artificial proteins constructed by Hecht et al. (Hecht-α) [6], using a combinatorial library of Escherichia coli sequences designed to fold into 102-residue 4-helix bundles (Table S1).The synthetic genes were made using degenerate DNA codons [3,6]: ) was used to encode six polar residues (H, Q, N, K, D, E) and • NTN (N = A, T, C, G) was used to encode five nonpolar residues (F, L, I, M, V).
Neutral amino acids, with the exception of alanine (A) and cysteine (C), were occasionally used, according to the specificity of the helix/turn protein design [9].The amino acid septapeptide pattern pnppnnp, consisting of polar (p) and nonpolar (n) residues, served to approximate an α-structural repeat of the 3.6 residue/turn [9].The list of α-sequences is given in Table S1.

Spectral Analysis
The periodicity in de novo α-helical and β-sheet protein structures presented in Table 1 was investigated using the normalized PRIFT method of Cornette et al. [27], which is based on the results of 38 published hydrophobicity scales compared for their ability to identify the characteristic periods of helices/turns (Table A2).The informational spectrum method (ISM) based on electron-ion interaction potential (EIIP) was used to analyze the bioactivity of de novo αand β-proteins (Tables 2 and 3) [40][41][42][43][44].The values of EIIP for 20 amino acids are given in Table A2.
Least-squares spectral analysis (LSSA) estimates a frequency spectrum using a least squares fit of sinusoids to data samples [68][69][70].The method gives similar results as Fourier spectral analysis, but is more resistant to noise and appropriate if the time series is long enough to contain at least four cycles [68].The frequency axis is in units of 1/(x unit).The power axis is in units proportional to the square of the amplitudes of the sinusoids present in the data [68,69].

Hydrophobicity Profiles
Surface-exposed regions of de novo αand β-proteins presented in Figure 2 were identified using the Kyte-Doolittle scale (Table A2) [28,50].The analyses were based on 3-point moving average values of the 9 amino acid sliding blocks (Figure 2

•
The surface/solvent accessibility of amino acids in an amino acid sequence was predicted with the NetSurfP server (E = exposed, B = buried, Figure 3a) [30].

•
The Phyre2 server could not predict the 3D structure of the Hecht_β proteins because the models were insufficiently valid.The confidence was considered too low (<70%) for submission to 3DLigandSite [37].The FOLDpro method was used for protein fold recognition and template-based 3D structure prediction (Figure 4a, Figure A4) of all β-proteins [52].The protein 2D and 3D structures were presented using Unipro UGENE software [71].PDB files of the #17 and #45 models are supplied as Schemes S33-S37.

•
The informational spectrum-based phylogenetic analysis in A5a and A7a was done with the ISTREE web service tool [58] and the phylogenetic analysis in Figures A5b and A7b with the Phylogeny.frplatform (TreeDyn) [59,60].

Conclusions
De novo proteins designed by Hecht and co-workers [6,9,12] represent structurally and functionally well-characterized subset of α-helical and β-sheet proteins.This dataset may be successfully used both for testing the current methods for the analysis of artificially-designed molecules based on the specific binary patterns of amino acid polarity and for developing the new ones.
The comparative investigations of the bioinformatics methods on the datasets of both de novo and natural proteins may lead to: 1.
improvement of the existing tools for protein structure and function analysis, 2.
new algorithms for the construction of de novo protein subsets and 3.
additional information on the complex natural sequence space and its relation to the individual subspaces of de novo sequences.

Figure 1 .
Figure 1.Characteristic peaks of Hecht_α protein SynSerB3 and Hecht_β protein #17 were determined using the method of Cornette et al. [27].(a) Hecht_α protein SynSerB3 exhibits the peak at x = 0.28 (Fourier spectral analysis); (b) SynSerB3 exhibits an identical peak (x = 0.28) with the least-squares spectral analysis; (c) Hecht_β protein #17 exhibits the peak at x = 0.45 (Fourier spectral analysis); (d) #17 exhibits an identical peak (x = 0.45) with the least-squares spectral analysis.The amino acids belonging to the detected hot spots are marked in red.

Figure 1 .
Figure 1.Characteristic peaks of Hecht_α protein SynSerB3 and Hecht_β protein #17 were determined using the method of Cornette et al. [27].(a) Hecht_α protein SynSerB3 exhibits the peak at x = 0.28 (Fourier spectral analysis); (b) SynSerB3 exhibits an identical peak (x = 0.28) with the least-squares spectral analysis; (c) Hecht_β protein #17 exhibits the peak at x = 0.45 (Fourier spectral analysis); (d) #17 exhibits an identical peak (x = 0.45) with the least-squares spectral analysis.The amino acids belonging to the detected hot spots are marked in red.

1 S y n S e r B 2 S y n S e r B 3 S y n S e r B 4 S y n G l t A 1 S y n Il v A 1 S y n Il v A 2 S y n F e s 1 S y n F e s 2 S y n F e s 3 S y n F e s 4 S y n F e s 5 S y n F e s 6 S y n F e s 7 SFigure 2 .
Figure 2. (a) Surface-exposed regions (P) of 15 Hecht_α proteins identified using the Kyte-Doolittle scale; (b) surface-exposed regions (P) of 17 Hecht_β proteins identified using the Kyte-Doolittle scale.

1 S y n S e r B 2 S y n S e r B 3 S y n S e r B 4 S y n G l t A 1 S y n Il v A 1 S y n Il v A 2 S y n F e s 1 S y n F e s 2 S y n F e s 3 S y n F e s 4 S y n F e s 5 S y n F e s 6 S y n F e s 7 SFigure 2 .
Figure 2. (a) Surface-exposed regions (P) of 15 Hecht_α proteins identified using the Kyte-Doolittle scale; (b) surface-exposed regions (P) of 17 Hecht_β proteins identified using the Kyte-Doolittle scale.
clearly shows that:  in the SynSerB2 mutant heterogen, FMN binds to binding site 2 instead of SF4/Mg (this region is located between structurally-important stabilizing amino acid positions 26 and 78 [3] of SynRescue proteins, Figure 8a,b);  FMN interaction with binding site 2 shifts the binding of SF4 to binding site 1, but the additional interaction with HEM (heme) and Fe 2+ is missing (region 16-31, Figure 8c,d);  the heterogen B12 in the SynSerB2 mutant disrupts the binding of ATP at binding site 4 (amino acid positions 4-7, Figure 9a,b);  the heterogen FAD in SynSerB2 mutant disrupts the binding of NAD at binding site 5 (amino acid positions 82-102, Figure 9c,d).

Figure 11 .
Figure 11.Heterogens present in the predicted binding sites of SynIlvA1 rescue protein using the 3DLigandSite method.(a) Binding site 1: the heterogen is FMN; (b) binding site 2: the heterogen is SF4; (c) binding site 3: the heterogen is Fe 2+ ; (d) binding site 4: the heterogen is Fe.The K42 → A42 mutant does not have binding sites 3 and 4.

Figure 12 .
Figure 12. Analysis of SynFes proteins using the informational spectrum method (ISM).Frequency peaks and periodogram values of SynSerB sequences were determined using cross-spectrum (bivariate Fourier) (a) and single series Fourier analysis (b).

FFigure 12 .
Figure 12. Analysis of SynFes proteins using the informational spectrum method (ISM).Frequency peaks and periodogram values of SynSerB sequences were determined using cross-spectrum (bivariate Fourier) (a) and single series Fourier analysis (b).

Figure 13 .
Figure 13.Analysis of rescue SynGltA1 protein using the informational spectrum method (ISM).Frequency peaks in the periodograms of the SynGltA1 sequence were determined using single series Fourier analysis.

FFigure 13 .
Figure 13.Analysis of rescue SynGltA1 protein using the informational spectrum method (ISM).Frequency peaks in the periodograms of the SynGltA1 sequence were determined using single series Fourier analysis.

Figure 14 .
Figure 14.Heterogens present in the predicted binding sites of SynGltA1 protein using the 3DLigandSite method.(a) Binding site 1: heterogens are B12 and ATP; (b) binding site 2: the heterogen is SF4; (c) binding site 3: the heterogen is FMN; (d) binding site 4: heterogens are NAD, ADP and Mg; (e) binding site 5: heterogens are FAD and Ca.

Figure 14 .
Figure 14.Heterogens present in the predicted binding sites of SynGltA1 protein using the 3DLigandSite method.(a) Binding site 1: heterogens are B12 and ATP; (b) binding site 2: the heterogen is SF4; (c) binding site 3: the heterogen is FMN; (d) binding site 4: heterogens are NAD, ADP and Mg; (e) binding site 5: heterogens are FAD and Ca.

Figure A2 .
Figure A2.Characteristic frequency peaks of de novo genetically-encodable disulfide-rich peptides and mutants designed by Baker and co-workers [26] were determined using the PRIFT/LSSA method of Cornette et al. [27].(a) α-peptide HHH_06 and its mutants exhibited α-peak at the position 0.25; (b) β-peptide EEE_EEE_02 and its mutants exhibited β-peak at the position 0.45; (c) mixed class peptide EEHE_02 exhibited two small peaks at positions 0.09 and 0.44.Its mutants EEHE_02_0005 and EEHE_02_0008 were characterized by an additional peak at the position 0.5.Mixed class structures/mutants (c) had different distribution of the peaks than when compared to all α-peptides (a) and all β-peptides (b).

Figure 31 Figure A3 .
Figure frequency peaks of de novo genetically-encodable disulfide-rich peptides and mutants designed by Baker and co-workers [26] were determined using the PRIFT/LSSA method of Cornette et al. [27].(a) α-peptide HHH_06 and its mutants exhibited α-peak at the position 0.25; (b) β-peptide EEE_EEE_02 and its mutants exhibited β-peak at the position 0.45; (c) mixed class peptide EEHE_02 exhibited two small peaks at positions 0.09 and 0.44.Its mutants EEHE_02_0005 and EEHE_02_0008 were characterized by an additional peak at the position 0.5.Mixed class structures/mutants (c) had different distribution of the peaks than when compared to all α-peptides (a) and all β-peptides (b).Information 2017, 8, 29 25 of 31

F r e q u e n cFigure A3 .
Figure A3.Characteristic peaks of one natural α-protein and one natural β-protein were determined using the PRIFT method of Cornette et al.[27].(a) Natural α-protein 1cc5 did not exhibit the typical α-peak at x = 0.28 (Fourier spectral analysis); (b) α-protein 1cc5 did not exhibit the typical α-peak at x = 0.28 when the alternative method of least-squares spectral analysis was used; (c) natural β-protein 1amg-2-AS did not exhibit the typical α-peak at x = 0.45 (Fourier spectral analysis); (d) natural β-protein 1amg-2-AS did not exhibit the typical α-peak at x = 0.45 when the alternative method of least-squares spectral analysis was used.

Table 2 .
Prediction of the bioactive hot spots in the Hecht_α SynSer, SynGlt and SynIlv rescue proteins.LSSA, alternative least-squares spectral analysis.

Table 3 .
Prediction of bioactive hot spots in Hecht_α SynFes rescue proteins.
) [28,67,68].The ExPASy-ProtScale software tool of the ExPASy SIB Bioinformatics Resource Portal was used to compute and represent the amino acid profiles produced by the scale [28].3.3.2.Solubility, Antigenicity, Surface Accessibility, 2D/3D and Tree Structure Predictions

Table A3 .
Predicted probability of antigenicity and solubility upon overexpression in Escherichia coli for 32 Hecht_α and Hecht_β proteins.

Table A4 .
Predicted continuous epitopes of the Hecht_α and Hecht_β proteins.