An In-Silico Comparative Study of Lipases from the Antarctic Psychrophilic Ciliate Euplotes focardii and the Mesophilic Congeneric Species Euplotes crassus: Insight into Molecular Cold-Adaptation

Cold-adapted enzymes produced by psychrophilic organisms have elevated catalytic activities at low temperatures compared to their mesophilic counterparts. This is largely due to amino acids changes in the protein sequence that often confer increased molecular flexibility in the cold. Comparison of structural changes between psychrophilic and mesophilic enzymes often reveal molecular cold adaptation. In the present study, we performed an in-silico comparative analysis of 104 hydrolytic enzymes belonging to the family of lipases from two evolutionary close marine ciliate species: The Antarctic psychrophilic Euplotes focardii and the mesophilic Euplotes crassus. By applying bioinformatics approaches, we compared amino acid composition and predicted secondary and tertiary structures of these lipases to extract relevant information relative to cold adaptation. Our results not only confirm the importance of several previous recognized amino acid substitutions for cold adaptation, as the preference for small amino acid, but also identify some new factors correlated with the secondary structure possibly responsible for enhanced enzyme activity at low temperatures. This study emphasizes the subtle sequence and structural modifications that may help to transform mesophilic into psychrophilic enzymes for industrial applications by protein engineering.


Introduction
One of the most important factors that limits the distribution and abundance of life on Earth is the temperature. Excessively high temperatures break covalent bonds and ionic interactions between molecules, denature proteins, and destroy cell structures, with terrible consequence for living organisms adapted to live in temperate environment [1]. Conversely, low temperatures reduce biochemical reaction rates, inactive enzymes, and induce the formation of ice crystals that damages cell structures [2]. During the past decades, the extensive discovery of life at extreme thermal environments has converted our knowledge about life limitation. Perceptibly, the compatibility of those organisms with the habitat temperature is ultimately determined by their underlying genetic architecture. They must be suitably thermal adapted with the local environment as well as all their cell components [3].
The largest proportion of the biomass on earth is generated in the cold (≤5 • C). This is mainly due to the great number of microorganisms in the oceans, and other cold biomes such as the high alpine soils, terrestrial glaciers, perennially ice-covered lakes, and polar sea ice and ice sheets, in addition to the seasonally cold habitats [4]. Thereof, cold adaptation in the microbial world should be expected. To some extent, cell-specific adaptation strategies of cold adapted organisms have been identified. For example, it is known that to maintain cell membrane fluidity, psychrophiles increase the number of saturated bonds on fatty acid to introduce steric constraints that change the packing of the lipid bilayers [5]. In addition, microbes sometimes secrete ice-nucleating proteins and antifreeze-like proteins which impair ice crystal formation in the cells [6]. Most importantly, psychrophiles synthesize enzymes that efficiently work at low temperature [7].
In the last decades, research on enzymes produced by psychrophiles has exploded, as they constitute a tremendous potential in industrial application [7,8]. Psychrophilic enzymes are often characterized by high activities and reaction rates at low temperatures, and by decreased temperature stability compared to their mesophilic and thermophilic counterparts [9]. Efficient catalytic rate of psychrophilic enzyme is achieved largely by changing amino acids distribution and composition to confer increased molecular flexibility [9].
To date, many genomes from psychrophilic prokaryotes have been sequenced and exciting outcomes have been reported for various bacteria and Archaea species [10][11][12][13]. With the aid of these sequence data, it is possible to make a global identification of molecular cold adaptation [10]. However, psychrophilic eukaryotic microorganisms, including vast protozoan organisms, have been greatly ignored during this analysis.
In a previous study [20], we reported a complete sequence comparison of a pair of E. focardii and E. crassus patatin-like lipases in order to identify residues for site-directed mutagenesis to transform the psychrophilc enzyme into the mesophilic counterpart. In the present study, we performed a comparative study of 104 hydrolytic enzymes belonging to three different lipases families from E. focardii and E. crassus. By applying bioinformatics approaches, we compared amino acid composition related to the secondary and tertiary structures to extract relevant information relative to cold adaptation. Our results not only confirm the importance of several previous recognized amino acid substitutions for cold adaptation [9], but also identify some new factors in the secondary structure possibly responsible for enhanced enzyme activity in the cold environment.

Lipase Sequence Characterization and Analysis
By the analysis of the complete genome sequences, we identified 46 lipases from E. focardii and 58 lipases from E. crassus, which became the basic data for this investigation. Of the 46 lipases from E. focardii, 9 were determined to be patatin-like phospholipases, 29 αβ-hydrolase associated lipases, and 8 esterase lipases. Of the 58 lipases from E. crassus, 17 were identified as patatin-like phospholipases, 28 αβ-hydrolase associated lipases, and 13 esterase lipases (summarized in Table S1). The sequence alignments revealed a degree of similarity in the range of 53-73% between the two Euplotes species. High similarity is relevant at the level of the conserved motives reported in Table S2. Also the amino acid composition of the three lipases ORFs appeared very similar (Table S3).

Amino Acid Composition Preferences
To evaluate the detectable trends in the amino acid composition, E. focardii and E. crassus lipases were aligned and compared. The final alignments comprised of 37,556 multiple aligned amino acid sites. Despite the high level of conservation of the amino acid frequencies, there were some differences in composition that may be symptomatic for cold adaptation ( Figure 1): The strongest increasing of amino acid frequency in E. focardii was observed for Ser (1.43%) and for Ala (1.32%) residues. In contrast, the highest decreasing of amino acid frequencies in E. focardii resulted for Glu (1.07%) and Leu (2.04%). From this dataset, the frequency of individual amino acids and property groups were also computed ( Table 1). Despite the frequency of individual amino acid is fairly similar in lipases from both species (Figure 1), as indicated by p-values from Table 2, there were amino acid residues such as Ala, Asp, and Ser, significantly preferred in E. focardii with respect to E. crassus. On the other hand, residues Pro, Glu and Leu were significantly less favored in E. focardii. When comparing frequencies of occurrences of amino acid property groups, we observed that tiny and small amino acid groups were significantly preferred in E. focardii, whereas Glu residues were significantly avoided as shown by their corresponding p-values in Table 1.

Secondary Structural Elements
The amino acid composition of lipases of E. focardii and E. crassus based on the predicted secondary structural elements (see Section 4 Materials and Methods) are summarized in Table 2. Collectively taken, the total number of residues utilized by α-helices, β-sheets or random coils was similar in both species (p-value > 0.05, data not shown). However, the amino acids Glu and Leu show significantly low frequencies in the α-helices of E. focardii lipases (Table 2A). Furthermore, in the coil region of E. focardii lipases we observed that Ala, Asp, Gly, and Ser frequency is significantly high whereas Pro is significantly low (Table 2B). Except for an increase in frequency of the amino acid Ile, the E. focardii lipases β-sheets did not show any significant changes as compared to E. crassus (Table 2C).
Considering the biochemical properties of residues, there were less aliphatic and charged amino acids in the α-helices of the psychrophilic Euplotes (Table 2A). Except a preference of small amino acids in the coil region of E. focardii lipases, there were no other significant changes (Table 2B). The β-sheet regions of E. focardii lipases did not show any significant change compared to those from E. crassus (Table 2C).

Specific Amino Acid Substitutions
To better understand the individual contributions of the amino acid changes, we calculated the log odd scores (LOS) using the equations described in Section 4 Materials and Methods. Table 3 reports the LOS E. focardii values computed using Equation (1) (the LOS E. crassus scores calculated using Equation (2) showed similar results therefore are not reported in the table). The individual positive or negative values in Table 3 show that the magnitude of certain substitutions is favored or avoided, respectively. For example, the substitution of E. crassus Ala residues into Tyr in E. focardii is extremely avoided, being the LOS score of −11.30. In contrast, the substitution of E. crassus Lys into Ser in E. focardii is highly favored, being the LOS score of 9.81. In conclusion, values in Table 3 indicates that substitutions that increase the amount of Glu, Phe, Lys, and Tyr are avoided in E. focardii lipases with respect to those from E. crassus, whereas Ala, Asp, Gly, Ser, and Thr are favored.
We also analyzed amino acid substitutions in the light of the three-dimensional structures of the three lipases. To simplify this analysis, we compared a single representative member from each E. focardii and E. crassus lipase family, obtained as described under Section 4 Materials and Methods. Figures 2-4 report the superimposition of E. focardii (light blue) and E. crassus (green) patatin-like phospholipases, αβ-hydrolase, and esterases, respectively. These superimpositions do not reveal significant structural differences in term of RMSD of the protein backbones including the active sites (residues in yellow in Figures 2-4, unboxed panels). However, specific amino acid substitutions can be responsible for different interactions inside or between adjacent β-sheets that may interfere with the conformation of these enzymes (Figures 2-4, boxed panels), evidenced in violet in the 3D-structure. In general, we found a reduction in the number and/or strength of weak bonds in the E. focardii lipases (Table S4, in bold) in particular for ionic and van der Waals (VdW) interactions. Three-dimensional structures of patatin-like phospholipases. The distances between aligned C-alpha atom pairs are colored by a color spectrum, with blue specifying the minimum pairwise RMSD and red indicating the maximum. Active site aminoacids are reported in yellow sticks. In the boxes, the amino acids differences between E. focardii (in light blue) and E. crassus (in green) are reported in violet. The models were obtained by a threading method using the I-Tasser web server.   Modifications in both patatin-like phospholipases (Figure 2) such as E. focardii Lys 130 , Lys 177 and Thr 163 , and E. crassus Asp 65 and Glu 150 may increase the number of salt bridges or ionic interactions. However, only in E. crassus Tyr 260 and Phe 31 residues may give origin to an additional π-π stacking interactions through their aromatic side chains increasing the rigidity of the enzyme.
The E. crassus αβ-hydrolase shows the aminoacidic substitutions Gly 361 /Gln 368 that can produce additional H-bonds and VdW interactions stabilizing the β-sheet (Figure 3).
Finally, E. focardii esterases shows Gly 72 /Thr 72 and Asp 137 /Arg 137 substitutions that can produce additional VdW interactions and H-bond, respectively. However, these substitutions are localized mainly at the level of the loop than in the β-sheet (Figure 4), with few effects in the structural conformation of this esterase.

Euplotes Lipases Codon Usage
We previously reported that the E. focardii genome is A/T rich [23,24] and we proposed that such A/T predilection may be a consequence of cold-adaptation: An A/T-rich genome composition can facilitate DNA strand separation and access of the polymerases to their template, and hence favor DNA replication and transcription. To investigate if A/T predilection biased the codon usage in E. focardii with respect E. crassus, we examined codon composition of three representative ORF's from each lipase family (Table S5). This analysis revealed that the two Euplotes species prefer codons with low GC content, even though in E. focardii the tendency is much higher.

Discussion
The objective of this study is to perform an in-silico comparison of putative lipases in two Euplotes species in which E. focardii represents a psychrophilic organism and E. crassus is a mesophilic counterpart. The lipases from these two Euplotes species fall into three main families: Patatin-like phospholipase lipases, α-hydrolase associated lipases, and esterase lipases.
Taking the advantages of bioinformatics approach to create a comparative study of lipases, we systematically analyzed the composition variation and substitution preferences of amino acids in these lipase families, which may help to unravel the potential mechanism of molecular cold adaptation. Additionally, keeping in mind that lipases are of special commercial interest, this study will contribute to protein engineering of mesophilic lipases to render them psychrophilic, or vice versa. The analysis of proteins from two phylogenically close organisms that belong to the same taxonomic group reduce the number of amino acid changes due to genetic divergence that have been an obstacle in previous similar studies. The analysis was performed at different levels, through "in-silico" characterization, amino acid compositions, Student's t-test and, finally, by substitution patterns in the orthologous lipase proteins.
Previous attempts have been done for identifying the amino acid composition or amino acid substitution patterns. Gianese et al. [25] compared homologous structures from 7 and 21 different enzymes; Sadeghi et al. [26] compared 60 thermophilic structures and sequences with their mesophilic homologs. Furthermore, structural parameters distributions between 13 pairs of psychrophilic and mesophilic proteins were also reported [27]. However, these studies are limited by relatively small number of protein sequences taken from a wide variety of organisms. Several large-scale studies have also compared thermophile organisms with different growth temperatures to achieve a closer insight on protein thermostability at high temperatures. Some of the studies have focused on comparison within closely related lineages: Two mesophilic Corynebacterium species with slightly different optimum temperatures for growth, and two closely-related hyperthermophilic genera [28]. These works have detected general factors of cold adaptation. However, a large-scale comparative analysis between a strictly psychrophilic microorganism with a closely related mesophilic congeneric species was missing.
In this study, differences among E. focardii and E. crassus lipases based on their percentage amino acid compositions were found. Individual residue compositions combining with the substitution pattern in the orthologous proteins of two temperature species showed that in the psychrophilic E. focardii lipases there was a significant preference for small amino acid as Ala, Asp, Gly, Ser and Thr and a significant avoidance of Pro, Glu, Phe, Lys, and Leu residues (Tables 1 and 3). This residues selection is directly correlated with cold adaptation, since it is well known that small residues increase molecular flexibility that facilitate enzyme conformational change during catalytic activity at low temperatures. Starting with the conception that aliphatic amino acids are important in maintaining conformational stability and rigidity of mesophilic enzymes, we can interpret that they are highly avoided in the helix regions of E. focardii lipases (Table 2A). Contrary trends are observed for the aromatic amino acids Phe and Tyr, that favor the formation of aromatic-aromatic interaction, making molecules more rigid. However, increased exposure of hydrophobic residues to the solvent enhanced protein solvation, that is considered a characteristics of cold-adapted enzymes [29]. In addition, the amino acid Pro is a highly rigid residue which will increase the stability of the protein structure [30]. Moreover, Glu and Leu residues tend to favor and stabilize the formation of helical structures [31] and therefore these residues tend to decrease molecular flexibility. Finally, the charged amino acid group residues known to contribute to ion pair electrostatic interactions that maintain conformation stability in proteins surface [29] are also significantly avoided in E. focardii lipases coil regions (Table 2C).
The amino acid substitution pattern with LOS scores indicated the most biased amino acid substitutions pairs (Table 3). In terms of involvement in significant (|LOS| ≥ 5) substitutions pairs, Ala is the most favorable residue in E. focardii lipases, as Ala is ambivalent, which can be inside or outside of the molecule. Likewise, Ala lacks a gamma-carbon, which contributes to the formation of α-helix, and increases the number of residues with small steric hindrances. This analysis also revealed which substitutions are preferred in the psychrophilic lipases shown in bold in Table 3. In this case is confirmed the tendency to change rigid amino acid such as Trp, Phe, Lys, and Tyr into small ones, i.e., Ala, Asn, Ser, and Asp. From the analysis of the 3D-strucure, we found a reduction in the number and/or strength of weak bonds in the E. focardii lipases. This reduction of weak bonds seems to be necessary to achieve an appropriate flexibility of the whole or crucial parts of the enzyme structure [31,32]. It is interesting to note that in this specific case of E. focardii lipase families, there is a common strategy adopted from these enzymes compatible with the preservation of the structural characteristics and molecular flexibility. In conclusion, the results of our analysis are in agreement with those previously reported but provide more information related to secondary and tertiary structures. Our analysis provides a base for the rational design of protein mutations in enzyme engineering to be used to broaden their spectrum of activity.

Sequence Collection and Analysis
The lipase genes (104 genes) extrapolated from the E. focardii genome [24] were locally blasted [33,34] into the E. crassus genome in order to identify homologues. Both genomes are available at NCBI data base under the acc. Nos. MJUV00000000.1 and MECR00000000.1, respectively. These sequences were aligned using T-coffee multiple sequence alignment program. All alignments were inspected and verified manually for a minimum cut-off score of 60% identity with all other sequences. No attempt was done to remove paralogs. The corresponding amino acid sequences of the E. focardii were extracted in 58 final alignments.

Analysis of Amino Acid Composition
To estimate and compare the amino acid composition of psychrophilic and mesophilic lipases, EMBOSS Pepstats (https://www.ebi.ac.uk/Tools/seqstats/emboss_pepstats/) was used. The amino acids were divided into 12 property groups including, acidic amino acids: Asp and Glu; aliphatic: Ile, Leu, and Val; aromatic: His, Phe, Trp, and Tyr; basic: Arg, His, and Lys; charged: Arg, Asp, Glu, His and Lys; hydrophilic: Asp, Glu, Lys, Asn, Gln, and Arg; hydrophobic: Ala, Cys, Phe, Ile, Leu, Met, Val, Trp, and Tyr; neutral: Gly, Gln, His, Ser, and Thr; non-polar: Ala, Cys, Gly, Ile, Leu, Met, Phe, Pro, Val, Trp, and Tyr; polar: Arg, Asn, Asp, Glu, Gln, His, Lys, Ser, and Thr; small: Ala, Cys, Asp, Gly, Asn, Pro, Ser, Thr, and Val; and tiny: Ala, Cys, Gly, Ser, and Thr. Some of the amino acids are included in more than one property groups. The sum of frequencies of amino acids that fall in each property group were calculated for psychrophilic and mesophilic lipases and compared. The composition data were then analyzed, and a Student's t-test was applied to confirm significant difference between the two data sets.

Secondary Structure Prediction
The most common secondary structures in proteins are α-helices, β-sheets, and random coils. This analysis was intended to find out the structural parameter distribution between 58 pairs of psychrophilic and mesophilic proteins to elucidate the parameters contributing to the enzyme's specific activity at low temperature. With this specific purpose, secondary structural elements in protein sequences were predicted using PSIPRED (http://bioinf.cs.ucl.ac.uk/psipred/). PSIPRED is a highly reliable secondary structure prediction method with~83% reported prediction accuracy. The resulting predictions were used to compute frequencies of different amino acids and property groups of residues in three major secondary structural regions, helix (H), strand (E), and coil (C). The composition data were then analyzed, and a Student's t-test was applied to confirm significant difference between the two data sets.

Amino Acid Substitution Bias
All lipase sequences from E. focardii were searched against genome data set of E. crassus and vice versa, using BLASTP with 10 −3 expectation value cutoff and considerable length coverage. The pairwise alignments obtained from BLAST results of each lipase sequence in a query Euplotes species that showed best hit homolog in the subject Euplotes species was selected. The pairwise alignments (without gapped regions) were put in a custom Perl script to calculate amino acid substitution counts between the two lipases from respective species. The substitution counts were normalized to total amino acids present in each homolog pairs from two species and finally to all the pairs. The resultant frequency of substitutions was further used to calculate two types of likelihood log odd scores (LOS), as in equations are adapted from [35]: where F(X E. focardii →Y E. crassus ) represents normalized frequency of amino acid X in E. focardii substituted by an amino acid Y in E. crassus. The LOS values were calculated by using background substitution frequencies among the E. focardii and/or E. crassus lipases in the denominator. The LOS, therefore, indicated the pattern of substitutions that are predominantly due to their thermal adaptation and therefore minimize the effect of substitutions due to any speciation events in the evolution process.

Tertiary Structure Prediction and Codon Usage Estimation
E. crassus and E. focardii αβ-hydrolase and esterase lipase the three-dimensional structures were obtained by homology modeling using as templates the pdb structure files 1K8Q [36] and 6A0W [37] respectively. The sequence identities between the Euplotes lipases and the templates were 31.34% and 30.91% for E. crassus and of 32.88% and 29.53% for E. focardii, respectively. Patatin-like phospholipases structures were obtained by a threading method using the I-Tasser web server [38] since the sequence identities with the best templates were lower than 25%. All obtained structures were finally energy minimized using the steepest descent algorithm (till the maximum force < 1000.0 kJ/mol/nm) of GROMACS tools [39], analyzed (predicting non-covalent interactions inside the protein) using the RING 2.0 web server [40], and rendered using PyMOL software (The PyMOL Molecular Graphics System, version 2.4.1 Schrödinger, LLC.).
Codon frequency (per thousand) has been estimated from three representative sequences from the three lipase families of each species using http://genomes.urv.es/ CAIcal/.