Criteria for Engineering Cutinases: Bioinformatics Analysis of Catalophores

: Cutinases are bacterial and fungal enzymes that catalyze the hydrolysis of natural cutin, a three-dimensional inter-esteriﬁed polyester with epoxy-hydroxy fatty acids with chain lengths between 16 and 18 carbon atoms. Due to their ability to accept long chain substrates, cutinases are also effective in catalyzing in vitro both the degradation and synthesis of several synthetic polyesters and polyamides. Here, we present a bioinformatics study that intends to correlate the structural features of cutinases with their catalytic properties to provide rational basis for their effective exploitation, particularly in polymer synthesis and biodegradation. The bioinformatics study used the BioGPS method (Global Positioning System in Biological Space) that computed molecular descriptors based on Molecular Interaction Fields (MIFs) described in the GRID force ﬁeld. The information was used to generate catalophores, spatial representations of the ability of each enzymatic active site to establish hydrophobic and electrostatic interactions. These tools were exploited for comparing cutinases to other serine-hydrolases enzymes, namely lipases, esterases, amidases and proteases, and for highlighting differences and similarities that might guide rational engineering strategies. Structural features of cutinases with their catalytic properties were correlated. The “catalophore” of cutinases indicate shared features with lipases and esterases.


Introduction
Cutinases (EC 3.1.1.74) belong to the serine hydrolases super-class, together with lipases, esterases, proteases, and amidases. All the class members share the typical Ser-His-Asp catalytic triad and the oxyanion hole [1]. The natural substrate of cutinase is cutin, which, by covering all aerial surfaces of higher plants, is one of the main polymeric components of the plant cuticle [2,3]. Cutin is a three-dimensional insoluble hydrophobic polyester composed of hydroxyl and hydroxyepoxy fatty acids (usually carrying one to three hydroxyl groups), with saturated palmitic (C16) or stearic (C18) acids being the most common components [4]. In analogy to lipases and esterases, cutinases in vitro catalyze ester hydrolysis in aqueous environment, whereas they mediate esterifications and transesterifications of large and small substrates in low-water media (Table 1) [2,5].
The scientific literature reports numerous examples of applications of cutinase from Humicola insolens (HiC), cutinase 1 from Thermobifida cellulosilytica (Thc_cut1) and especially lipase B from Candida antarctica (CaLB) in polyesters and polyamides synthesis and modification [6][7][8][9]. The enzymatic biocatalyzed synthesis leads to polymers of moderate molecular weight [10], applicable in the biomedical, cosmetic and pharmaceutical sectors [11,12]. The processes are scalable and can be carried out at mild temperatures (50-80 • C) in solvent-free conditions, using bio-based substrates as well renewable immobilization carriers. The unique selectivity and activity of enzymes represent a powerful tool for improving the sustainability of polymers while achieving new products with controlled structures and advanced technical properties.
On the other hand, various cutinases were reported to be able to modify or degrade [13] different synthetic polyesters [14] and polyamides [15], such as diphenyl phthalate, butyl benzyl phthalate, di-(2-ethylhexyl)-phthalate [2,[16][17][18], poly(ethylene terephthalate) (PET), and bio-based polyesters as poly(butylene succinate) (PBS), poly(butylene adipate-co-terephthalate) (PBAT), poly(ethylene 2,5-furandicarboxylate) (PEF) and poly(lactic acid) (PLA) [19] (Table 1). Enzymatic hydrolysis of PET showed to occur at rather low rates, which are heavily affected by the crystallinity of the polymer. The access of the active site of the cutinase to the insoluble substrate is apparently one of the rate-limiting steps, because the widening of the areas surrounding the active sites of cutinases from Fusarium solani and Thermobifida fusca led to an increase of the rates of hydrolysis of PET and its oligomers. In addition, the binding of the cutinase to the surface of PET also seems to limit the rate of hydrolysis [20].
Our previous computational investigations of two cutinases employed in polycondensation reactions, HiC and Thc_cut1, have disclosed the hydrophobicity of their surface, a feature shared by lipase enzymes and correlated to the hydrophobicity of their natural substrates [17]. Thc_cut1 and HiC have superficial active sites, accessible to solvents and substrates, whereas most lipases have the catalytic triad buried in deep funnel shaped pockets [17]. Molecular dynamics simulations also indicated that Thc_cut1 lacks structural domains undergoing conformational changes that might cause the occlusion of the active site [21] as in the case of most lipases. The wider and more accessible active site of Thc_cut1, along with its efficiency under milder and less strict anhydrous conditions as compared to CaLB [22], represent crucial features for the development of strategies for polyester synthesis and modification, as well as for the design of optimized hydrolases starting from the Thc_cut1 scaffold.
Through the last forty years, high computational cost quantomechanics (QM) methodologies have been used to understand the physicochemical properties of enzymes [23]. For this reason, enzymatic systems have been studied mostly at the level of the catalytic residues, while combining molecular mechanics (MM) techniques to handle the remaining part of the protein structure, disregarding all the fine-grained electronic interactions and taking into account only the laws of classical mechanics to describe the atoms [24]. However, this approach has some limitations because of the excessive simplification of the system while it does not consistently reduce the time required for performing the calculations [25].
On that respect, molecular descriptors can quantify these factors and correlate them to the physicochemical properties of the catalytic site. Various types of molecular descriptors have been developed distinguished based on the specific type of interaction they focus on [26,27]. The underlying short-and long-range interactions are usually quantified based on molecular mechanics or based on group-additive models [28]. Although some descriptors might also employ quantum-mechanical calculations or machine learning [29], here we will restrict our analysis to local descriptors.
In this framework, the present study intends to rationalize the reaction site of a set of nine cutinases, thus gathering knowledge useful for guiding rational engineering and screening towards tailored catalytic activities applicable to the synthesis, modification and degradation of polyesters and polyamides. Nine cutinase enzymes were compared with 42 serine-hydrolases selected among proteases, lipases, esterases and amidases by using the BioGPS method (Global Positioning System in Biological Space) that computed molecular descriptors based on Molecular Interaction Fields (MIFs) described in the GRID force field [30][31][32]. The MIFs were exploited for the construction of catalophores able to visually represent similarities of the nine cutinases in terms of active site shape, and their ability to establish electrostatic and hydrophobic interactions. Thc_cut1 and CaLB were analyzed in detail to understand their functional relationships within the respective serine hydrolases sub-classes.

Cutinases Versusother Ser-Hydrolases: Generation of the Catalophores
The active sites of nine cutinases, five fungal and four bacterial reported in Table 1, were analyzed. The structural differences between bacterial and fungal cutinases refer to the N-and C-terminal regions and to disulfide bonds, since bacterial cutinases have one disulphide bond whereas fungal cutinases feature two or three disulfide bonds [5]. The bioinformatics analysis exploited the BioGPS methods that computed molecular descriptors based on Molecular Interaction Fields (MIFs) described in the GRID force field [30]. The GRID procedure calculates the interaction energies between a small chemical probe (i.e., a functional group) and all the nodes of a three-dimensional grid that spans the target molecule. The output is a pseudo-MIF, namely a three-dimensional matrix of interaction energy values. Although GRID molecular descriptors [32] were originally developed for drug-design applications [33], in the context of biocatalysis, they can extract information on the relevant structural and electrostatic surrounding able to stabilize the relevant transition states of enzymes differing in their catalytic properties. Conversely, the method does not require the modelling of the relevant transition state, which would request a computational demanding quanto-mechanic approach [22]. Table 1. Cutinase enzymes analyzed in the present study and residues composing their catalytic machinery, which were also used for cutinases structural alignment. The nine enzyme cavities were identified by the FLAP algorithm (Fingerprints for Ligands and Proteins) and stacked taking the catalytic triad as a reference [30]. The active sites were described in the GRID force field using four chemical probes, namely functional groups able to establish interactions with the active site of enzymes [25]: (i) H probe describing the shape of the cavity surrounding the active site, (ii) DRY probe (aromatic carbon) describing nonpolar interactions with the chosen cavity (  Figure 2). PseudoMIFs are defined as a particular type of MIFs where the molecular interaction fields are a representation of electron density fields centered on the atoms of the above-mentioned probes instead of the atoms of the enzyme and, as stated before, could also be seen as the complementary counterpart of MIFs. As a visual example, Figure 1 reports the MIFs calculated for Thc_cut1 with the probes assessing hydrophobic and electrostatic interactions. The spatial representation of MIFs generated by each different probe is referred as catalophore. The same GRID-based MIFs and the corresponding catalophores were generated also for 42 enzymes of five sub-families of serine-hydrolases reported in Table S1 of Electronic Supplementary Information (ESI). complementary counterpart of MIFs. As a visual example, Figure 1 reports the MIFs calculated for Thc_cut1 with the probes assessing hydrophobic and electrostatic interactions. The spatial representation of MIFs generated by each different probe is referred as catalophore. The same GRID-based MIFs and the corresponding catalophores were generated also for 42 enzymes of five sub-families of serine-hydrolases reported in Table S1 of Electronic Supplementary Information (ESI). In order to catch finer structural details able to distinguish between cutinases and lipases, we compared Thc_cut1 and CaLB. Namely the two hydrolases we have previously investigated and compared in the polycondensation of adipic acid with different diols. [22]. A preliminary bioinformatic analysis disclosed remarkable similarities between the two enzymes, whereas experimental data indicated that pressure, temperature and water activity, exert different effects on their efficiency. The catalophores of Thc_cut1 are dissected in Figure 2, where this cutinase is compared to CaLB catalophores and the shared regions of the corresponding MIFs are highlighted.  Figure 2 indicates that the active sites of CaLB and Thc_cut1 present similar wide regions able to establish hydrophobic interactions (blue grids) but in Thc_cut1 they are more extended, most probably due to the bulkiness of its natural substrate cutin. Notably, the two active sites differ consistently in terms areas able to engage H-bonds and in this case Thc_cut1 can donate H-bonds in a wider area, which is a feature connected to the presence of polar hydroxyl groups on cutin chains.

Microbial
The catalophores were computed for all 51 hydrolases and then processed separately according to the different sub-classes to obtain the representative catalophores of cutinases, lipases, esterases, amidases and proteases. The result was a series of three-dimensional maps of the more frequent physical-chemical features of the enzyme active sites for each processed sub-family ( Figure 3). In order to catch finer structural details able to distinguish between cutinases and lipases, we compared Thc_cut1 and CaLB. Namely the two hydrolases we have previously investigated and compared in the polycondensation of adipic acid with different diols [22]. A preliminary bioinformatic analysis disclosed remarkable similarities between the two enzymes, whereas experimental data indicated that pressure, temperature and water activity, exert different effects on their efficiency. The catalophores of Thc_cut1 are dissected in Figure 2, where this cutinase is compared to CaLB catalophores and the shared regions of the corresponding MIFs are highlighted. Figure 2 indicates that the active sites of CaLB and Thc_cut1 present similar wide regions able to establish hydrophobic interactions (blue grids) but in Thc_cut1 they are more extended, most probably due to the bulkiness of its natural substrate cutin. Notably, the two active sites differ consistently in terms areas able to engage H-bonds and in this case Thc_cut1 can donate H-bonds in a wider area, which is a feature connected to the presence of polar hydroxyl groups on cutin chains. The catalophores were computed for all 51 hydrolases and then processed separately according to the different sub-classes to obtain the representative catalophores of cutinases, lipases, esterases, amidases and proteases. The result was a series of three-dimensional maps of the more frequent physical-chemical features of the enzyme active sites for each processed sub-family ( Figure 3).
The analysis of the catalophores of cutinases (Figure 3e,h,m) indicates large shared areas, suggesting a wide homogeneity of the active sites in terms of ability of establishing both electrostatic and hydrophobic interactions. A similar trend is visible also for the lipases (Figure 3a,f,i), whereas the catalophores of amidases show a very low homogeneity in terms of hydrophobic features ( Figure 3d) and H-bond donor ability (O probe, Figure 3i). Amidases have a O-probe catalophore very restricted to an area in the proximity of catalytic Ser and His, and a minor conserved spot in the oxyanion hole ( Figure 3i). The same feature is conserved in all serine-hydrolase sub-classes with the exception of esterases ( Figure 3).
More information on the cutinases sub-class can be extracted by quantitatively evaluating the S-score for each cutinase ( Figure 4) [30]. The crystal structure of eight cutinases of the data set are available (Table 1). To enable full comparison with previous UPCA models we employed the same Thc_Cut1 model as in Refs [6,17]. The model was validated by noting that its RSMD of 0.90 Å(and backbone RMSD of 0.38 Å) with respect to the subsequently published 5LUI (the crystal structure of Thermobifida cellulosilytica cutinase [42] is consistent with that found across the PDB database for pairs of identical proteins [43]. The S-Score was calculated for each enzyme in the dataset and represents the weighted average of the contribution of the components (H, N1, O and DRY) in determining the entity of the superimposition of the MIFs of a particular enzyme with the ones of the pharmacophoric pseudomolecule.
The S-score of the single enzyme provides quantitative information on the similarity between its active site enzyme and the molecular interaction fields of the global pharmacophoric pseudomolecule generated by the algorithm for the nine cutinases. A large S-score value for a certain enzyme, implies that there are significant discrepancies when compared to the other elements in the same class. The largest S-score deviation was computed for the bacterial cutinase from Saccharomonospora virdis (structure 4WFI), which represents a known exception since there is no experimental evidence of its ability to hydrolyze cutin [41]. The smallest S-score deviation was obtained for the fungal 1AGY (Fusarium solani) cutinase (0.05). The S-score was also computed for all other hydrolases sub-family and complete data are reported in ESI, Figure S1. The origins of diversity and similarity within each hydrolase sub-class follows a similar trend for all classes (ESI, Figure S1b), since the ability to donate H-bonds (mapped by the O probe) and the shapes are the less conserved properties within each class, whereas the largest contribution to the homogeneity inside each class is given by the ability to accept H-bonds followed by the hydrophobic interactions. Catalysts 2021, 11, x FOR PEER REVIEW 6 of 17 More information on the cutinases sub-class can be extracted by quantitatively evaluating the S-score for each cutinase (Figure 4) [30]. The crystal structure of eight cutinases of the data set are available (Table 1). To enable full comparison with previous UPCA models we employed the same Thc_Cut1 model as in Refs [6,17]. The model was validated by noting that its RSMD of 0.90 Ǻ (and backbone RMSD of 0.38Ǻ) with respect to the subsequently published 5LUI (the crystal structure of Thermobifida cellulosilytica cutinase [42] is consistent with that found across the PDB database for pairs of identical proteins [43].  The S-Score was calculated for each enzyme in the dataset and represents the weighted average of the contribution of the components (H, N1, O and DRY) in determining the entity of the superimposition of the MIFs of a particular enzyme with the ones of the pharmacophoric pseudomolecule.
The S-score of the single enzyme provides quantitative information on the similarity between its active site enzyme and the molecular interaction fields of the global pharmacophoric pseudomolecule generated by the algorithm for the nine cutinases. A large Sscore value for a certain enzyme, implies that there are significant discrepancies when compared to the other elements in the same class. The largest S-score deviation was computed for the bacterial cutinase from Saccharomonospora virdis (structure 4WFI), which represents a known exception since there is no experimental evidence of its ability to hydrolyze cutin [41]. The smallest S-score deviation was obtained for the fungal 1AGY (Fusarium solani) cutinase (0.05). The S-score was also computed for all other hydrolases sub-family and complete data are reported in ESI, Figure S1. The origins of diversity and similarity

Catalophore Features of Cutinases in Comparison to Lipases/Esterases: H-Bonding Pattern
Experimental evidence indicated that cutinases share features both with esterases and lipases since are able to hydrolyze synthetic esters and emulsified triglycerides. Similarly to lipases, cutinases can also catalyze esterification and transesterification reactions in systems having low water activity [17]. To clarify the behavior of cutinases, the single probe components of their catalophores were compared to the other hydrolases classes ( Figure 5). It must be underlined that in a previous study [30] we demonstrated that the position of the catalytic residues is remarkably conserved in all Ser-hydrolases, therefore we used the residues of CaLB as a spatial reference in all the figures. The first important observation is that cutinases appear to be esterases endowed with an expanded ability to establish H bond donor interactions in a wider area of the active site (Figure 5a). All the catalophores of cutinases are wider in comparison to esterases and cutinases and they appear able to establish interactions through a wider pattern of polar or hydrophobic chemical groups conserved in a distinct location inside the active sites of the cutinases here analyzed.
The second observation is based on the comparison of cutinases with lipases (Figure 5b), showing how the catalophores of cutinases have distinct features as compared to lipases, since cutinases have the distinct ability to engage H-bonds in more peripheral areas. This property might be ascribed to the complex chemical nature of cutin, composed of hydroxyl and hydroxyepoxy fatty acids carrying one to three hydroxyl groups, and able to engage H-bonds. hydroxyl and hydroxyepoxy fatty acids carrying one to three hydroxyl groups, and able to engage H-bonds. Figure 5. Superimposition of the catalophores of (a) cutinases (black) and esterases (green); (b) cutinases (pink) and lipases (blue). Reference residues composing CaLB catalytic triad are highlighted.

Considerations on Thc_cut1 and CaLB Catalophoric Features in the Context of All Ser-Hydrolases
Considering their applicative interest in the synthesis and degradation of polyesters, Thc_cut1 and CaLB were compared with the catalophores of the different serine-hydrolases classes (Supplementary Figures S3 and S4).
The analyses allowed to identify the largest similarity between Thc_cut1 and the cutinases catalophores, especially in terms of its wide ability to accept H-bonds (N1 probe). The comparison between CaLB MIFs with the catalophores of the different serine-hydrolases classes (ESI, Figure S4) allowed to determine that the H-bond acceptor ability of CaLB corresponds largely to the features of esterases and cutinases, with minor differences in regions located far from the catalytic triad. CaLB differentiates from lipases catalophores mainly in the ability to donate and accept H bonds in the proximity of the His residue of the catalytic triad ( Figure 6).
Although the largely documented lack of a lid domain in CaLB makes this enzyme already an atypical lipase [44], it must be noted that the present analysis accounts only for the active sites of enzymes. Therefore, CaLB presents other features that must be responsible of the distinct behavior of CaLB among lipases.

Considerations on Thc_cut1 and CaLB Catalophoric Features in the Context of All Ser-Hydrolases
Considering their applicative interest in the synthesis and degradation of polyesters, Thc_cut1 and CaLB were compared with the catalophores of the different serine-hydrolases classes ( Supplementary Figures S3 and S4).
The analyses allowed to identify the largest similarity between Thc_cut1 and the cutinases catalophores, especially in terms of its wide ability to accept H-bonds (N1 probe). The comparison between CaLB MIFs with the catalophores of the different serine-hydrolases classes (ESI, Figure S4) allowed to determine that the H-bond acceptor ability of CaLB corresponds largely to the features of esterases and cutinases, with minor differences in regions located far from the catalytic triad. CaLB differentiates from lipases catalophores mainly in the ability to donate and accept H bonds in the proximity of the His residue of the catalytic triad ( Figure 6).

UPCA Model of BioGPS Descriptors
The description of the 51 different active sites by means of catalophores provides massive information on the distinct structural and electrostatic features of the enzyme active sites. The following step was the reduction of the complexity of the pseudo-MIFs by computing the BioGPC descriptors (see Materials and Methods for the detailed procedure) and then by analyzing statistically the information contained in the descriptors by means of Unsupervised Pattern Cognition Analysis (UPCA) [45,46]. Finally, the serine-Ser hydrolases were sorted and grouped into clusters where structural properties, explained by the BioGPS descriptors, are correlated with catalytic functions (e.g., protease, lipase, esterase, cutinase and amidase catalytic activity. The statistical model correlates the molecular descriptors with linearly dependent variables (Principal Components), and it visualizes the information in two-dimensional a mono-dimensional space, enabling the interpretation of the starting raw data and the quantitative evaluation of similarities among different active sites. Accordingly, the UPCA models gather macro-groups of enzymes based on their similarity, which might refer to one single probe (ESI, Figures S6-S9) or all probes collectively (global score, Figure 7). Therefore, the active sites are compared using a virtual chemical ligand and not looking at the features of the single amino acids, making the analysis of protein sequences or structural alignment unnecessary. On this basis, with reduced computational and time costs, the enzyme sub-classes were compared without the need of relying on precedent fundamental knowledge [31]. The global-score plot reported in Figure 7 indicates that all cutinases fall in a region at the interface between lipases and esterases. They share common hydrophobicity properties (Figure 7), with the exception of 3DCN and 4WFI cutinases located between the esterase and amidase clusters. Figure 7 discloses how their difference is ascribable prevalently to a lower hydrophobicity of the active site, as confirmed by the experimental studies of their specificity towards short chain triacyl glycerols [34]. Interestingly, fungal lipases CaLB (1TCA), Humicola lanuginosa (1DTE), Candida rugosa (CRL) fall within the esterases clusters, similarly as fungal cutinases (1AGY, 3QPD, Hic, 4PSE), since the fungal lipases differ from the bacterial and mammalian lipases in terms of electrostatic and hydrophobic properties of the active site. Although the largely documented lack of a lid domain in CaLB makes this enzyme already an atypical lipase [44], it must be noted that the present analysis accounts only for the active sites of enzymes. Therefore, CaLB presents other features that must be responsible of the distinct behavior of CaLB among lipases.

UPCA Model of BioGPS Descriptors
The description of the 51 different active sites by means of catalophores provides massive information on the distinct structural and electrostatic features of the enzyme active sites. The following step was the reduction of the complexity of the pseudo-MIFs by computing the BioGPC descriptors (see Materials and Methods for the detailed procedure) and then by analyzing statistically the information contained in the descriptors by means of Unsupervised Pattern Cognition Analysis (UPCA) [45,46]. Finally, the serine-Ser hydrolases were sorted and grouped into clusters where structural properties, explained by the BioGPS descriptors, are correlated with catalytic functions (e.g., protease, lipase, esterase, cutinase and amidase catalytic activity. The statistical model correlates the molecular descriptors with linearly dependent variables (Principal Components), and it visualizes the information in two-dimensional a mono-dimensional space, enabling the interpretation of the starting raw data and the quantitative evaluation of similarities among different active sites. Accordingly, the UPCA models gather macro-groups of enzymes based on their similarity, which might refer to one single probe (ESI, Figures S6-S9) or all probes collectively (global score, Figure 7). Therefore, the active sites are compared using a virtual chemical ligand and not looking at the features of the single amino acids, making the analysis of protein sequences or structural alignment unnecessary. On this basis, with reduced computational and time costs, the enzyme sub-classes were compared without the need of relying on precedent fundamental knowledge [31]. The global-score plot reported in Figure 7 indicates that all cutinases fall in a region at the interface between lipases and esterases. They share common hydrophobicity properties (Figure 7), with the exception of 3DCN and 4WFI cutinases located between the esterase and amidase clusters. Figure 7 discloses how their difference is ascribable prevalently to a lower hydrophobicity of the active site, as confirmed by the experimental studies of their specificity towards short chain triacyl glycerols [34]. Interestingly, fungal lipases CaLB (1TCA), Humicola lanuginosa (1DTE), Candida rugosa (CRL) fall within the esterases clusters, similarly as fungal cutinases (1AGY, 3QPD, Hic, 4PSE), since the fungal lipases differ from the bacterial and mammalian lipases in terms of electrostatic and hydrophobic properties of the active site.  The global score plot, which accounts for all types of probes used, show cutinases grouped between esterases and lipases. The two cutinases most widely applied so far in polyester synthesis and degradation, namely Thc_cut1 (Cut1) and cutinase from Humicola insolens (HiC in the plot, PDB: 4OYY) within the cutinases cluster despite the fact that they have only 9% sequence identity (Figure 7). This indicates how difficult it is to extract some rationale about cutinase common features on the basis of a primary structure comparison only. A more extended discussion of UPCA-BioGPS models for amidases, proteases, esterases and lipases has been previously reported [30].
It is interesting to note how the H-bond acceptor probe (O probe) clearly distinguishes between lipases and cutinases ( Figure 7). Therefore, the ability of active sites to donate H-bonds is the property that differentiates most evidently the various sub-classes of serine-hydrolases, when considering the distance of the clusters of lipases, amidases, and proteases. Although lipases, amidases, and esterases do not appear tightly grouped as in the case of proteases, each sub-class is widely distanced across the plot, with the exception of cutinases which are clearly gathered within the esterases cluster (Figure 7).
On the contrary, the shape plot ( Figure 7) reports a very scattered pattern for all hydrolases, indicating that the active site cavities have very different shapes, which are not directly correlated to the substrate they accept nor the reaction they catalyze, suggesting that the substrate specificity alone is insufficient to set clear boundaries to each sub-class.

Guidelines for Engineering Strategies Emerging from the Bioinformatics Analysis
In the context of polymer chemistry, the rational engineering of cutinases would be of interest both for the synthesis of new functional polymers but also for the biodegradation of recalcitrant polyesters and polyamides. Natural evolution conferred to cutinase active sites wide accessibility as compared to all other serine hydrolases able to catalyze the hydrolysis of ester or amide bonds, and that feature makes the scaffold of these hydrolases the most promising starting point for designing new enzymes able to attack or synthetize bulky polymers.
The present bioinformatic analysis is focused exclusively on the active sites of the enzymes and the comparison of the active 9 cutinases with 42 different hydrolases indicate a remarkable homogeneity of cutinases in terms of ability of establishing an extended network of hydrophobic and electrostatic interactions. On the contrary, the shape of the active site does not appear as a preserved feature despite the fact that they share cutin as natural substrate. The latter, however, is a hydrophobic polyester with variable chains differing in size and branching.
In the perspective of engineering new substrate specificity in cutinase scaffolds for the synthesis of aromatic or aliphatic polyesters, the catalophores of cutinases (Figure 3) act as a reference, since new mutants must preserve the electrostatic and hydrophobic features summarized by the corresponding catalophores, especially in the regions responsible for the catalytic mechanisms, namely the catalytic triad and the oxyanion hole.
Notably, so far CaLB showed to be the most effective enzyme in catalyzing the polycondensation of a wide range of aromatic and aliphatic diols and diacids [8] and the comparison of the catalophores of CaLB and Thc_cut1 (Cut1) indicates that hydrophobicity of the active site is quite similar whereas the cutinase is able to establish H bonds in a wider area. This feature is confirmed also in the global analysis presented in the UPCA plots ( Figure S5) that indicate how the O probe, mapping the ability of the active sites to donate H bonds, is the most effective in catching the differences between CaLB and Thc_cut1 (Cut1). Therefore, a strategy for the enlargement of substrate specificity of Thc_cut1 could be the mitigation of the H bond donor ability in the peripherical areas of its active site.
Concerning the task of engineering amidase activity in the active sites of serinehydrolases, this is a topic widely discussed in the recent literature and it assumes a new relevance in the context of the studies for the biodegradation of polyamides [25,30,47].
The catalophores of amidases clearly show very limited preserved structural and physical-chemical features. Interestingly, the O-probe discriminate very effectively the ami-dases from the other hydrolases sub-classes, even though the similarities of amidases are restricted to the area close to the catalytic triad. Therefore, the environment created by an extended network of electrostatic interactions in the active sites of cutinases might interfere with the establishment of the correct interactions responsible for the recognition and hydrolysis of amides.
It must be noted that the synthesis and the hydrolysis of the amide bond is catalyzed by CaLB, although at very low reaction rate [47] and other factors, not related to the active site features, can affect the ability of a hydrolase to accept and transform an amide. For instance, a polyamide is more polar than a polyester and the hydrophobic superficial area surrounding the entrance of the active site of cutinases and lipases will prevent the approach of polar molecules [17,22,44,48]. Although such structural factors are not accounted by the present bioinformatics analysis, the fact that lipase from Candida rugosa (PDB code: 1CRL) has a scarcely hydrophobic active site and differs significantly from all other lipases (Figure 7 and Figure S5) suggests that its known efficiency in transforming triglycerides is not connected to the active site features. Rather, the presence of hydrophobic superficial regions [44] favors the partition of substrates inside the active sites.

Dataset Composition
The dataset of serine-hydrolases was composed of 42 structures as used in a previous study [30] with the addition of 9 protein structures belonging to the class of cutinases (E.C. 3.1.1.74). All the crystal structures were retrieved from the Protein Data Bank (PDB). To ease comparison with previous published work we employed the model as in Refs [6,17] for Cut1. The model was validated by structural alignment with 5LUI. To work only on protein data and discarding the other components of the structures (like water, duplicated protein chains, inhibitors, etc.) the PyMOL software was used for pre-processing, leading to protein structures ready to be analyzed by BioGPS. The composition of the whole dataset is determined by 5 different enzyme classes, distinguished by their E.C. number: lipases (serine hydrolases defined as triacylglycerol lipase; E.C

Structure Alignment Procedure
PyMOL [49] alignment was carried out on alpha-carbons of the catalytic triad's and oxyanion hole's residues using the function pair fit. The target residues for lipases, amidases, esterases and proteases were retrieved from our previous study [30]. In the case of cutinases, the aligned residues are reported in Table 1.
In the particular case of the 3DCN (Glomerella cingulata cutinase) structure, the acidbase catalyst His 204 was not considered during the alignment procedure because its position in the catalytic machinery is very unusual, as reported by Nyon et al. [35]. Therefore, including this residue in the alignment would have turned out in a misalignment of the whole active site due to the perturbative effect of that His residue.
The enzyme active sites were analyzed by calculating the GRID molecular interaction fields to provide a detailed description of molecules. GRID probes H, O, N1 and DRY (which represent, respectively, the shape, H-bond donor and acceptor capabilities, hydrophobic interactions) were used to investigate the protein structure.
On this basis, with reduced computational and time costs, entire enzyme sub-families and mutants can be efficiently compared without the need of relying on whatsoever precedent knowledge.

Catalophore Calculation
FLAPpharm [45]-the algorithm used for this purpose-offers the possibility of aligning autonomously protein structures before the catalophore calculation, using as a target the whole working cavity defined on the enzyme surface that comprises active site and surrounding area. This kind of workflow turned out to be ineffective due to the fact that maximum cavity overlapping could not correspond to efficient catalytic residues superimposition. For this reason, structure alignment was user-guided according to the procedure described above.
The preliminary mapping of the working cavity on enzymes' surface that defines the working-cavity. FLAPsite algorithm-using the GRID force field-was exploited for this purpose. After proper structures alignment, the centroid of the cavity was placed in the catalytic serine of the protein structure, and two values defining cavities' shape and size (parameters called extension and thickness) were sat to 9 and 9, a value slightly higher than the one used in our previous work. Cavities' size was boosted to achieve a broad mapping of the catalophoric zones. Some other attempts of cavity generation with different size and shape were performed, leading to irregular shaped cavities or to accessory lobes that had to be discarded.
FLAPsite algorithm was executed for each enzyme and the output cavities were saved as separate pdb structures consisting of a set of points with specific coordinates.
After cavity definition, FLAPpharm catalophore calculation was performed for each sub-class of hydrolases and for some mixed datasets, always skipping the algorithm-based alignment step. The output of the algorithm is composed by the pseudoMIFs for each enzyme. The molecular interaction field is a high density of information object and can be used for different purposes. Positive interaction energy corresponds to unfavorable target-probe interaction or when the probe is positioned inside the Van der Waals sphere of the atoms of the target. Conversely, using the GRID points where interaction is more favorable (negative energy values) it is possible to analyze the macroscopic similitudes among different target objects. A pdb structure containing the catalophore MIFs (common fields). The outputs-deriving from GRID-based force field mapping with determination of type and energy of interactions-are presented in terms of the probes employed: H probe takes into account active site shape; O probe reveals the H-bond donor capabilities; N1 probe the H-bond acceptor ones; DRY probe evaluates the hydrophobic interactions.
Together with the pdb files generated by the algorithm and containing the grid cages, a scoring file containing values called "S-score" global [50] and split in the probe components values is generated. Where necessary, FLAPpharm command-line was used to compare pairwise protein structures in terms of similarity and differences of molecular descriptors, i.e., intersection and difference MIFs.
All the pdb files obtained were processed in PyMOL using the gridiso plugin. For the pseudopotential field of the MIFs, represented in terms of interaction energies, an energy cut-off of −0.1 kcal/mol was empirically applied in order to obtain a reasonable size of the visualized grid cages.

BioGPS Model
The BioGPS (Global Positioning System in Biological Space) model was obtained as already published [30] using BioGPS software version 16.01 (Molecular Discovery Limited, Borehamwood, Hertfordshire WD6 4PJ, United Kingdom. BioGPS is based on the "Common Reference Framework" composed by two main steps: the characterization of the protein active sites and their comparison. With the pseudo-MIF procedure, the mapped properties are considered as electron-density-like fields centered on each atom, corresponding to specific probe types (i.e., the interaction energies coming from GRID N1 probe were centered on carbonyl oxygen as H-bond acceptor). Afterwards, the algorithm reduces the complexity of the pseudo-MIFs selecting a number of representative points using a weighted energy-based and space-coverage function. For each active site the algorithm generated all possible combinations of four points; each combination is termed "quadruplet". Moreover, the function includes the geometrical information into each quadruplet. All possible quadruplets for each mapped active site were generated and stored into a bio-fingerprint (bitstring) that constitutes the Common Reference Framework. For catching similarities and differences between two or more active sites, the algorithm compares their Common Reference Frameworks using an "all against all" approach where each enzyme active site is compared with itself and with all the other enzyme active sites. The algorithm searches for similar quadruplets and then overlaps the corresponding 3D structures. At the end, the algorithm generates a set of Tanimoto scores [51] (BioGPS descriptors) represented by square matrixes, namely a series of probe scores (one for each original GRID probe) together with a global score. The descriptors are calculated for a given superposition by directly comparing the overlapping volumes of the pseudo-MIFs.

UPCA Analysis
The information contained in the BioGPS descriptors was then statistically analyzed by means of Unsupervised Pattern Cognition Analysis (UPCA) [45,46]). Unsupervised Pattern Cognition Analysis (UPCA), a well-established algorithmic platform for performing systematic analysis of data sets, was used to reduce the dimensionality of data. The UPCA algorithm converts a set of correlated descriptors into a new set of linearly independent variables (orthogonal transformation) called Principal Components (PCs). Principal Components are a linear combination of the original correlated variables. The first Principal Component (PC1) is calculated in order to maximize the variance of the object in the dataset. The following principal components are calculated to maximize the variance in the data that is not explained by the previous PC yet. [22,30]. The global score of the Pattern Cognition Analysis groups the objects on the firsts two Principal Components (PCs), (PC1 explains 14% of the variance and PC2 = 9%). The remaining variance has to be considered as noise or diversity that is not explained by this model. It is important to underline that UPCA is not a regression analysis, therefore it just analyzes the already existent correlation of variables without searching for the maximum correlation. Therefore, the remaining variance is most probably ascribable to heterogeneous substrate specificity, which appears as a predominant cause of variability inside Ser-hydrolase superfamily.

Conclusions
Catching the features that differentiate each sub-class would be of interest for introducing desired catalytic properties in optimized enzymatic scaffolds. On that respect, cutinases look particularly promising for applications in polymer chemistry because of their ability of hydrolyzing bulky esters thanks to superficial and accessible active sites. The present bioinformatics analysis provides insights on the environment present in the active sites in terms of electrostatic and hydrophobic interactions, in agreement with the concept that electrostatic interactions play the predominant role in the stabilization of the transition states of reactions here considered [52].
The distinction between the sub-classes of serine-hydrolases appears to be based on several criteria, determined by complex evolution drivers. Overall, the ability to accept H-bond in certain regions of the active sites appears to be the most largely preserved feature within each hydrolase sub-class. On the contrary, the ability to donate H-bonds (O-probe) appears to be a less preserved structural feature, in all sub-classes. Amidases are the group of serine-hydrolases that has the lowest extent of similarities in their active sites.
However, the behavior of enzymes such as lipase from Candida rugosa suggest that not all features responsible for the activity and specificity of an enzyme can be confined to the active site, since partition phenomena play a major role and the access of the substrate to the active site must not be given for granted. Therefore, in some cases the rational strategies should be driven by the mitigation of specific physical-chemical features of the enzymes.