Molecular Sciences Identification and Molecular Modeling of a Family 5 Endocellulase from Thermus Caldophilus Gk24, a Cellulolytic Strain of Thermus Thermophilus

The genome of T. caldophilus GK24 was recently sequenced and annotated as 14 contigs, equivalent to 2.3 mega basepairs (Mbp) of DNA. In the current study, we identified a unique 13.7 kbp DNA sequence, which included the endocellulase gene of T. caldophilus GK24, which did not appear to be present in the complete genomic sequence of the closely related species T. thermophilus HB27 and HB8. Congo-red staining revealed a unique phenotype of cellulose degradation by strain GK24 that was distinct from other closely related Thermus strains. The results showed that strain GK24 is an aerobic, thermophilic, cellulolytic eubacterium which belongs to the group T. thermophilus. In order to understand the mechanism of production of cellobiose in T. caldophilus GK24, a three-dimensional model of the endocellulase, TcCel5A, was generated based on known crystal structures. Using this model, we carried out a flexible cellotetraose docking study.

This general classification is still widely accepted, although several observations suggest that it is an oversimplified description of a complex set of interactions by cellulases.For example, not all endoglucanases appear to be able to act synergistically with exoglucanases.Recently, it was suggested that the concepts of processive and non-processive hydrolysis of cellulose are necessary when discussing the action of cellulases [27].Processive cellulases hydrolyze cellulose chains by continuous removal of cellulobiose units while remaining bound to the substrate.The non-processive cellulases randomly hydrolyze cellulose chains once, then dissociate from the substrate, and recombine for another cycle of hydrolysis, resulting in a rapid decrease in polymer length.The classification of processive and non-processive cellulases is based on whether the enzyme engages in a continuous cycle of hydrolysis without disassociating from the substrate after each cycle, or dissociates from the substrate and recombines for each cycle of hydrolysis.On the other hand, the classification of endoand exo-cellulases is based on the position of the cellulase binding site, either internally, or at one of the substrate ends, respectively.
All glycosyl hydrolases catalyze stereoslective hydrolysis, meaning that the configuration about the anomeric centre is either inverted or retained upon hydrolysis of the glycosidic linkage.Details of both reaction mechanisms are described by Tomme et al. [12].Retaining enzymes, but not inverting enzymes, also have transglycosylation activity [28].The catalytic amino acid residues in both types of enzymes can be tentatively identified by chemical modification or site-directed mutagenesis, in conjunction with three-dimensional (3D) structural analysis.X-ray crystallographic analysis of more than 20 celluases and xylanases, representing several families of glycosyl hydrolases (families 5, 6, 7, 8, 9, 10, 11, 12, 45, and 48), have been reported or actively studied [29].
In this report, we identified a new strain of thermophilic cellulolytic microorganism, Thermus caldophilus GK24, as a source of a novel thermostable endocellulase.We describe modeling of the Thermus caldophilus GK24 endocellulase TcCel5A, which belongs to Family 5, in complex with cellotetraose (G4).The results provide insight into the structural basis for complex formation between TcCelA and G4, and provide a better understanding of the molecular basis of the interaction between TcCelA and G4.

Sequencing of a whole genome shot gun library of Thermus calodphilus GK24
A random shotgun library was constructed using pGEMT-EASY (Promega) vectors.Genomic DNA was sonicated or fragmented by a Nebulizer (cat.7025-05 Invitrogen) to generate genomic fragments of 4~5kb.E. coli were transformed with the T. caldophilus GK24 genomic library, then individual colonies were isolated using Q-Pix (Genetix) and cultured.Solid phase extraction kits optimized for DNA were used to prepare plasmid DNA.Sequencing reactions were carried out using a Big Dye Sequencing kit protocol that came with the ABI 3700 Genetic Analyzer, according to the manufacturer's instructions.Gap closing combinatorial PCR using primer walking was performed to read the complete genomic DNA sequence [38].

Data Processing, Sequence Assembly, and Annotation
Initially, the raw sequence data of the shotgun library was assembled using the Phred/Phrap/Consed packages [39][40].Base-calling was done using the Phred program.The sequence data, together with the base-calling quality information for each base, were then used to assemble the whole genome using the Phrap program.The resulting contigs were further visualized and examined using the Consed program.Gap-closing was then performed using manual editing and the Autofinish function incorporated into the Consed package [41][42].Identification of ORF's, BLAST searches, and annotation were performed using an in-house bioinformatics server installed with the appropriate software.We compared T. caldophilus GK24 [43] with the complete T. thermophilus HB27 [44] and HB8 [45] genomes using the MUMer 3.0 package [46] and perl-based in-house analysis tools.

Detection of endocellulase activity
Six Thermus strains were screened for possible endocellulase activity on medium agar plates containing 1% CM-cellulose using the Congo red staining method [47].Endocellulase activity was observed as the formation of a clear halo around a bacterial colony.The Thermus strains used in this study were T. aquaticus YT1 (ATCC 25104), T. filiformis Wai33 A1 (ATCC 43280), T. thermophilus HB8 (ATCC 27634), T. thermophilus HB27 (ATCC BAA-163), T. flavus AT62 (ATCC 33923), and T. caldophilus GK24.Cells were grown aerobically at 70°C with shaking in Castenholz medium (ATCC medium 461) and plated on the same medium containing 2% agar (Difco).Where indicated, media was supplemented with 0.1% carboxymethyl-cellulose (CMC).The agar medium was immersed in 0.5% (w/v) Congo red for 15 min and destained with 1 M NaCl for 15 min.Active bands appeared as yellow halos on a red background.Samples were treated with 0.1 N HCl to turn the background dark blue, and photographed.

Purification of the enzyme
The culture supernatant from 3,000 ml culture of T. caldophilus GK24 was added with ammonium sulfate at 80% saturation.The precipitate was collected by centrifugation (13,000×g at 4ºC for 30 min), dissolved in a small volume of 20 mM acetate buffer, and then added with ammonium sulfate at 20% saturation.This enzyme solution was applied onto a column of Butyl-Toyopearl 650M (4.5 × 6.4 cm) (Tosoh Co.) equilibrated with 20 mM acetate buffer containing ammonium sulfate (20% saturation).The column was washed with 1,000 ml of the same buffer, and the enzyme was eluted with a linear gradient of ammonium sulfate from 20% to 0% (total volume of 1,000 ml).The active fractions were collected, and the protein was precipitated with ammonium sulfate at 80% saturation.The precipitate was collected by centrifugation, dissolved in a small volume of 20 mM acetate buffer, and then desalted by Bio-gel P-4 (2.5×20 cm) (Bio-Rad Laboratories Japan) equilibrated with 20 mM acetate buffer.The desalted enzyme solution was applied onto a column of DEAE-Toyopearl 650M (4.5×3.2 cm) equilibrated with 20 mM acetate buffer.The column was washed with 500 ml of the same buffer, and the enzyme was eluted with a linear gradient of NaCl from 0 to 0.5 M. The active fractions were collected, and the protein was precipitated with ammonium sulfate at 80% saturation.The precipitate was collected by centrifugation, dissolved in a small volume of 20 mM acetate buffer, and then applied onto a Superdex 75pg 16/60 (Pharmacia LKB Biotechnology) with equilibrated with 20 mM acetate buffer (pH 5.5) containing 150 mM NaCl.Pharmacia FPLC system (PUMP: P-500; LIQUID CHROMATOGRAPHY CONTROL: LCC-500; SINGLE PATH UV-MONITOR: UV-1; FRACTION COLLECTOR: FRAC-100, Pharmacia LKB Biotechnology) was used.The enzyme was eluted with the same buffer.The active fractions were collected, and the protein was precipitated with ammonium sulfate at 80% saturation.The precipitate was collected by centrifugation, dissolved in a small volume of 5 mM acetate buffer, and then desalted by Bio-gel P-4 (1.5×20 cm) equilibrated with 5 mM acetate buffer.

Activities of enzyme against various substrates
The following substrates were used: Alkali swollen cellulose (ASC), carboxymethyl-cellulose (CMC, degree of average polymerization, 500; Wako Pure Chemical), insoluble cellooligosaccharide (ICOS), cellotriose, cellotetraose, cellopentaose and cellohexaose.Concentration of substrates was adjusted to 0.5% in 100 mM acetate buffer (pH 7.0).The reaction mixture consisting of 0.1 ml of substrate solution and 0.1 ml of enzyme solution was incubated at 70ºC for 30 min and the enzyme activity was measured by Somogyi-Nelson method.One unit of enzyme activity was defined as the amount of enzyme that released 1 µmol reducing sugar as glucose per min.Cellotriose, cellotetraose, cellopentaose and cellohexaose were prepared by the method described by Miller [56].

Measurement of the optimum temperature
The optimum temperature for the enzyme activity was determined as follows; the enzyme was added to 0.5% cellohexaose in 100 mM acetate buffer (pH 7.0).After a 30 min reaction at various temperatures, each activity was measured.

Homology modeling of T. caldophilus GK24 Endocellulase
Homology modeling is the method of choice when a clear relationship based on sequence homology between the target protein and at least one other known structure exists.This approach is based on the assumption that the tertiary structures of two proteins will be similar if their amino acid sequences are related [48].Homologous sequence searching and alignment were carried out using FASTA [49] and the ClustalW program [50].Sequence identity between TcCel5A and the reference protein AcCel5A, from Acidothermus cellulolyticus (PDB code 1ECE) was 35%.When modeling TcCel5A, some residues at the N-and C-termini of the original sequence were removed, since there were no good templates for these fragments, and these residues were far away from the catalytic domain.For modeling the 3D structure of TcCel5A, the program MODELER [51][52][53] was used.MODELER is an implementation of an automated approach to comparative modeling based on satisfaction of spatial restraints [52][53][54].The initial structure was revised by refining loops and rotamers, checking bonds, and adding hydrogen atoms, then MM and MD simulations with the aid of the CVFF force field were used to optimize the initial structure.For energy minimization, 20,000 iterations of the steepest descent calculation were performed, then the conjugated gradient calculation was carried out until convergence at 0.005 kcalmol -1 Å -1 .After the above simulations, the final model was obtained by carrying out MD simulations using the Discover 3 software package [55] in Insight II software package developed by Biosym Technologies.The explicit solvent model for water, TIP3P, was also used.The final homology model of TcCel5A was solvent with a 10 Å water cap from the center of mass.Finally, a conjugate gradient energy minimization of the full protein was performed until the root mean-square (RMS) energy gradient was lower than 0.001 kcalmol -1 Å -1 .After this step, the quality of the initial model improved.

Identification of the binding site of TcCel5A
The Binding-Site module is a suite of programs in Insight II for identifying and characterizing protein binding sites and functional residues in proteins.We used the ACTIVESITE-SEARCH program [55] to search for protein binding sites by locating cavities in the TcCel5A structure, which were then used to guide protein-ligand docking experiments.The predicted binding site of TcCel5A was identified by comparing the conserved residues in endocellulases from A. cellulolyticus [32] and C. thermocellum [34] and combining the results.

Molecular modeling of cellotetraose docking into theTcCel5A binding site
During molecular docking, molecules fit together in a favorable configuration to form a complex system.The 3D structure of cellotetraose (G4) was generated using the BUILDER program [55], and the geometry was optimized.For the purpose of capturing the interaction of TcCel5A with G4, the advanced docking program Affinity [55] was used to perform automated molecular docking.A combination of Monte Carlo type and stimulated annealing procedures for docking a guest molecule with a host were employed to find the optimal structures of the TcCel5A-G4 interaction, based on the energy of the TcCel5A-G4 complex.A key feature was that the bulk of TcCel5A, defined as atoms not in the specified binding site, were held rigid during the docking process, while the binding site atoms and G4 atoms were allowed to move.The docked complexes of TcCel5A with G4 were further refined according to the criteria of interacting energy combined with geometrical matching quality.These complexes were then used as the starting conformation for energetic minimization and geometrical optimization to finalize the models of the TcCel5A-G4 docking interaction.We identifed a unique cellulolytic activity, and the putative endocellulase gene, in T. caldophilus GK24.Sequence analysis of 14 contigs, comprising the ~ 2.3 Mbps of the T. caldophilus GK24 genome, revealed a 13.7 kbp DNA region that was absent in the complete genome sequences of the closely related species T. thermophilus HB27 and HB8 [44][45].Within this 13.7 kbp DNA region (Figure 1), we identified 15 potential open reading frames (ORFs) which appeared to encode enzymes involved in carbohydrate metabolism.Four of the ORFs were predicted to encode a regulatory protein for sugar metabolism (DeoR), glucosamine-6-phosphate deaminase (GlmD), a putative sugar kinase (SugK), and a sugar ABC transporter, permease protein (SugT).One ORF appeared to encode a β-1,4glucanase with homology to the Family 5 glycosyl hydrolases.Of particular interest, one of the putative ORFs was predicted to encode a β-glucanase homologue.Based on sequence data, we examined the degradation of cellulosic material by T. caldophilus GK24 using Congo-red staining.As seen in Figure 2, strain GK24 had a unique phenotype of cellulose degradation that was distinct from other closely related Thermus strains (T.thermophilus HB27/HB8, T. flavus AT62, T. aquaticus YT1, and T. filiformis) (Figure 2).Thus, we were able to identify a unique phenotype of cellulolytic degradation by T. caldophilus GK24 that correlated with its unique genotype.

Purification and Activity of TcCel5A against various substrates
The culture supernatant of T. caldophilus GK24 was put onto a Butyl-Toyopearl 650M (4.5 × 6.4 cm).Enzyme fraction was pooled and further purified by DEAE-Toyopearl 650M and Superdex 75 pg 16/60.The molecular weight of TcCel5A was estimated to be about 44 kDa by SDS-PAGE (Figure 3A).TcCel5A showed the highest activity against cellotetraose.The activities of TcCel5A against various substrates are summarized in Table 1.The degradation of ASC by TcCel5A was analyzed by thin-layer chromatography.TcCel5A produced cellobiose and cellotriose (Figure 3B).Effect of temperature on the enzyme activity of TcCel5A.

Homology modeling of TcCel5A
Identification and alignment of eight endocellulases with homology to TcCel5A were done using FASTA [49], and ClustalW [50] (Figure 4).When the sequence of TcCel5A was compared to all known proteins in the Protein Data Bank (PDB), we found that AcCel5A had the highest sequence identity (35%) with TcCel5A, so AcCel5A was used to model the 3D structure of TcCel5A.All of the amino acid side chains of TcCel5A were set using the AUTO_ROTAMER program [55], which uses the library proposed by Ponder and Richards [57].In order to eliminate steric contacts and achieve a stable conformation, energy minimizations of 20,000 iterations and dynamics simulations of 150 ps were performed.The variation of potential energy with time during the 150 ps of molecular dynamics simulation is plotted in Figure 5(A).The potential energy fell rapidly in the first 20 ps, then decreased after that with very low deviation between successive steps.The dynamics simulations tended to equilibrium at 150 ps, thus, we chose the conformation at 150 ps as the final 3D structure for further analysis.The final structure was further checked using the PROFILE-3D and PROSTAT programs [54].The results of PROFILE-3D analysis are presented in Figure 5(B).All residues were scored positive, meaning that the positions of these residues in TcGalK model structure were reasonable.The PROSTAT program was used to check dihedral angles, bond lengths, and bond angles in the structures of TcCel5A and AcCel5A, and the main results are listed in Table 2.The U-W values for AcCel5A and TcCel5A showed that 96.7% and 95.6% of the amino acid residues, respectively, fell within the favored regions of the Ramachandran plot.Of the 308 residues examined in TcCel5A, eight bond angles deviated from the formal values, while two bond angles out of 317 amino acids examined had greater values than the reference values for AcCel5A.The RMS deviation (RMSD) of C α atoms between TcCel5A and AcCel5A was 0.61 Å, which is in a reasonable range.Overall, the final 3D structure of TcCel5A appeared to be reliable, based on comparison of its structure with that of AcCel5A and the assessments mentioned above.The final structural model of TcCel5A is presented in Figure 6(A).It includes 8 α-helices and eight β-sheets.As seen in Figure 6, the predicted structure of TcCel5A is typical for members of family 5 endocellulases.Active site residues included a nucleophilic glutamate (Glu312) at the C-terminus of βstrand 7, and the general acid-base pair, asparagine (Asp199)-glutamate (Glu200), at the C-terminus of β-strand 4.During MD simulations, we found that the most variable domain of TcCel5A was the activation loop between β-strands 4 and 5, indicating that this domain of family 5 endocellulases is the most variable part of these proteins, and represents a suitable candidate region for determining substrate selectivity.

Figure 6. (A)
The final structure of the TcCel5A-G4 complex.(B) Active site residues of TcCel5A that bind G4.

Identification of the binding site of TcCel5A
TcCel5A folds into a typical (α/β) 8 structure, with the carboxy-terminal ends of β-strands 4 and 7 contributing to the substrate binding cleft.There were no significant differences in domain orientations between the AcCel5A-G4 complex and the TcCel5A-G4 complex.The conserved sequence and structure of endocellulases from two different species is not surprising, given that their main biological functions are similar.Similarly, we would predict that G4 binds in a similar manner to both structures.Endocellulases that bind G4 have a classic 4/7 superfamily fold [58][59][60], which includes the activation loop, and contains the key catalytic residues for hydrolysis of cellulose i.e., the arginine/glutamate acidbase pair and glutamate at the termini of β-strands 4 and 7.This is also a characteristic feature of the sugar linkage-binding site [29,[61][62].To identify residues involved in the interaction between the active site of TcCel5A and its substrates, we defined a subset of binding pocket residues as residues in which any atoms were within 4.5 Å of G4.The binding-site was searched using Binding-Site software [55], which was also used to guide the TcCel5A-G4 docking experiment, and the residues comprising the binding pocket of TcCel5A are displayed in Figure 6(B), and listed in Table 3.The binding pocket makes up the activation loop consisting of Asn199, Glu200, Ala245, Phe246, Trp247, Tyr271, Val275, Tyr276 from the C-terminal loop between β-strands 4 and 5, and Glu312, Trp343, Asn348, Ser349, Gly350, Asp351, Thr352 from the C-terminal loop between β-strands 7 and 8.Other endocellulases have similar overall structures and almost superimposable catalytic clefts that include the 4/7 superfamily cellulose binding domain [29,[61][62].Based on sequence alignment, Asn161/Glu162/Glu282 in AcCel5A corresponded to Asn199/Glu200/Glu312 in TcCel5A.With the exception of Thr149, Ala245, Phe246, and Asn348, other residues comprising the binding site in TcCel5A were conserved in AcCel5A.As mentioned above, G4 was selected for this purpose.G4 is a significant endocellulase substrate, hydrolysis of which plays an important role in producing cellobiose.The molecular structure of G4 consists of four glucose molecules, Glc1-Glc2-Glc3-Glc4.The 3D structure of cellotetraose was generated using the BUILDER program, and the geometry was further optimized by using the DISCOVER 3 program.Hydrogen bonds play an important role in the structure and function of biological molecules, and particularly in enzymatic catalysis.The hydrogen bonds of the TcCel5A-G4 complex are listed in Table 3.There were 29 hydrogen bonds between G4 and TcCel5A.Glc3 of G4 formed hydrogen bonds with the O2 atom at the C2 position and with the side-chain amide group of Asn199.The Glc3 O2-hydroxyl group of G4 formed two hydrogen bonds with the side-chain carboxyl groups of Glu200 and Glu312.The Glc3 O3-, O4-and O6-hydroxyl groups formed four hydrogen bonds with His148, Trp343, Asp351 of TcCel5A, and the Glc2 O2-and O6-hydroxyl groups formed three hydrogen bonds with Trp247 and Val275.To determine the key residues of the binding pocket in this structural model, the total interaction energies for G4 and each individual amino acid in TcCel5A were calculated.This method, along with the distance from the substrate, can clearly show the relative significance for every residue.The total interaction energies (lower than -1.0 kcal/mol) for G4 and every amino acid of TcCel5A are listed in Table 4. Glu312, Glu200, Asn199, Asp351, His148, and Trp343 had strong interaction energies with G4, with Glu312 having the strongest interaction energy (-35.434kcal/mol).Another key residue was Glu200, which strongly interacted with G4 via a hydrogen bond, with an interaction energy of -13.466 kcal/mol.Both Glu200 and Glu312 also participated in strong electrostatic interactions with G4, indicating that they play important roles in the interaction between the G4 and TcCel5A.Asn199, Asp351, His148 and Trp343 in TcCel5A also appeared to be important determinants in binding, as they exhibited strong interactions with G4.As seen in Tables 3  and 4, this type of structure-based analysis can serve as a guide to the selection of candidate sites for further experimental studies and site directed mutagenesis.a Nearest distance (Å) between the cellotetraose and the residue of T. caldophilus GK24 Cel5A (TcCel5A).b Contact surface area (Å 2 ) between the substrate and the residue of TcCel5A.c Hydrophilic -hydrophilic contact (hydrogen bond).d Aromaticaromatic contact.e Hydrophobic -hydrophobic contact, f Hydrophobic -hydrophilic contact (destabilizing contact).+/indicates presence/absence of a specific contacts.* indicates residues contacting ligand by their side chain (including C α atoms).Based on previous work [16,[23][24][28][29]30], we propose the following nucleophilic catalytic mechanism for TcCel5A.As shown in Figure 7, G4 is stably bound in the center of the active site in TcCel5A through the formation of 29 hydrogen bonds (Table 3).During the glycosylation step, Glu312 acts as a nucleophile, attacking the C1 position of Glc3, displacing aglycon in an inverting reaction assisted by proton transfer to the glycosidic oxygen, resulting in the formation of an intermediate with two hydrogen bonds: one between the carboxyl group of Glu312 and the O atom (glycosidic oxygen) of the Glc3-Glc2 linkage; the other between the carboxyl group of Glu200 and the hydroxyl group in the C2 position of Glc3.The system then goes into an unstable transitional state, followed by the formation of a covalent glycosyl-enzyme (O2(Glc3)-OE1(Glu312)) intermediate.The second step of the reaction mechanism is deglycosylation.The carboxyl group of Asn200 snatches a proton from water.At the same time, the nucleophilic attack at the C1 position of Glc3 by the oxygen atom of water occurs with the help of Asn199 and Glu200.With the break between Glc3 and Glu312, the proton of water is transferred to the oxygen atom of Glu200.Finally, the charge of the system reaches a balance, and hydrolyzation terminates.The final step involves the attack of a water molecule assisted by Asn199 and Glu200, to release free cellobiose, with overall retention of an anomeric configuration.The above four steps proceed via transition states with considerable oxocarbonium-like character.

Conclusions
In this work, we have identified the putative cellulase gene of T. caldophilus GK24, through full genome sequencing, and demonstrated cellulase activity using Congo-red staining.The purification of cellulase produced by T. caldophilus GK24 was done.As for substrate specificity, TcCel5A showed the highest activity against cellotetraose (G4).TcCel5A exhibited the high enzyme activity on CMC and the product of degradation of ASC was cellobiose and cellotriose.From these results, we concluded that TcCel5A is an endo-type of cellulase.We constructed a 3D structural model of TcCel5A using the Insight II/Homology module.After energy minimization and molecular dynamics simulations, a refined structure was obtained.This structure was then used to perform a docking experiment using the substrate G4.As a result of simulated docking analysis, a model for the structure of the TcCel5A-G4 complex was obtained.The results indicated that several conserved amino acid residues in TcCel5A play important roles in maintaining a functional protein conformation, and are directly involved in binding to cellotetraose.Identification of specific interactions between TcCel5A with G4 is important for our understanding of the mechanism of binding of TcCel5A and G4.As is well known, hydrogen bonds play an important role in the structure and function of biological molecules, in particular, in enzymatic catalysis.

Figure 1 .
Figure 1.Linear comparison of the genomes of (A) T. thermophilus HB27 and T. caldophilus GK24 and (B) T. thermophilus HB8 and T. caldophilus GK24.The blue lines represent similar protein sequences (BLASTP search, > 60% similarity) between organisms.Gene organization in the 13.7 kbp region, including TcCel5A (blue arrow), of the T. caldophilus GK24 genome.Arrows indicate open reading frames, and their predicted translation products are indicated above the arrows.Black arrows indicate open reading frames of previously characterized genes (deoR, glmD, sugK, and sugT).Blue arrows indicate uncharacterized open reading frames found in the genome of T. caldophilus GK24.

Figure 5 .
Figure 5. (A) The potential energy of TcCel5A as a function of simulation time (ps).(B) Evaluation of the final structure of TcCel5A by PROFILE-3D.

Table 1 .
Activity of TcCel5A against various substrates.
a N.D.: not detected

Table 2 .
The results of dihedral angles, bonds, and angles of AcCel5A and TcCel5A checked by PROSTAT program.

Table 3 .
Hydrogen bonds between TcCel5A and cellotetraose.I is Hydrophilic atom type : N and O that can donate and accept hydrogen bonds, II is Acceptor : N or O that can only accept a hydrogen bond, III is Donor : N that can only donate a hydrogen bond.b Dist is distance (Å) between the cellotetraose and TcCel5A atoms.c Surf is contact surface area (Å 2 ) between the cellotetraose and TcCel5A atoms. a