Secretome Analysis of Thermothelomyces thermophilus LMBC 162 Cultivated with Tamarindus indica Seeds Reveals CAZymes for Degradation of Lignocellulosic Biomass

The analysis of the secretome allows us to identify the proteins, especially carbohydrate-active enzymes (CAZymes), secreted by different microorganisms cultivated under different conditions. The CAZymes are divided into five classes containing different protein families. Thermothelomyces thermophilus is a thermophilic ascomycete, a source of many glycoside hydrolases and oxidative enzymes that aid in the breakdown of lignocellulosic materials. The secretome analysis of T. thermophilus LMBC 162 cultivated with submerged fermentation using tamarind seeds as a carbon source revealed 79 proteins distributed between the five diverse classes of CAZymes: 5.55% auxiliary activity (AAs); 2.58% carbohydrate esterases (CEs); 20.58% polysaccharide lyases (PLs); and 71.29% glycoside hydrolases (GHs). In the identified GH families, 54.97% are cellulolytic, 16.27% are hemicellulolytic, and 0.05 are classified as other. Furthermore, 48.74% of CAZymes have carbohydrate-binding modules (CBMs). Observing the relative abundance, it is possible to state that only thirteen proteins comprise 92.19% of the identified proteins secreted and are probably the main proteins responsible for the efficient degradation of the bulk of the biomass: cellulose, hemicellulose, and pectin.


Introduction
Plant biomass consists of proteins, lignin, holocellulose (a fraction composed of cellulose fibers wrapped in hemicellulose-pectin), ash, salts, and minerals [1].The increase in agro-industrial activity has led to the accumulation of many lignocellulosic residues, such as wood and various agro-industrial residues around the world [2][3][4][5].The economic interest in these residues has increased significantly in recent years since they are renewable and cheap, having the potential to produce and generate chemicals and bioenergy [6][7][8].The conversion of lignocellulosic biomass into ethanol and other chemical compounds can be performed using a multi-enzyme system acting in synergism [9,10], and it is of fundamental importance to study different microorganisms and understand the secretion of the enzymes of interest that can be applied to these processes [11].
The analysis of the fungal secretome has gained great visibility since, through these studies, it is possible to know the proteins secreted by different microorganisms, especially carbohydrate-active enzymes (CAZymes), grown under different conditions [12][13][14][15].Based on their protein sequence similarities and three-dimensional folding structure, CAZymes are classified into several hundred different enzyme protein families [16].These enzymes are involved in many biological processes, and they are responsible for the degradation, synthesis, and modification of carbohydrates [17,18].
Thermophilic fungi are a promising source of new enzymes for cost-effective industrial applications, including abundant thermostable enzymes for biomass degradation and generation of chemicals and biofuels [19][20][21].Among them, a fungus that is described in the literature as a source of many CAZymes, especially glycoside hydrolases (GHs) and oxidative enzymes that aid in the breakdown of lignocellulosic materials, is the thermophilic ascomycete Thermothelomyces thermophilus (formerly Myceliophthora thermophila) [22][23][24][25].This filamentous fungus has been shown to be safe for large-scale production processes and can utilize cost-effective sources of plant biomass [26], as waste from the fruit pulp industry, especially tamarind seeds [27].
Tamarind (Tamarindus indica L.) is a fruit plant native to equatorial Africa, India, and Southeast Asia and grows in tropical and subtropical regions, with an ideal average temperature of 25 • C [28].It consists of pulp and seeds with a hard coating.Seeds constitute 30-40% of the fruit, with a large proportion being an agricultural by-product [29].According to Gonçalves et al. [30], the tamarind seed composition is 1.82 ± 0.01% ash, 33.07 ± 1.40% lignin, 33.31 ± 3.56% cellulose, and 10.45 ± 1.45% hemicellulose, proving to be a promising source for the detection of CAZymes.
Due to their constitution, tamarind seeds have been utilized for the cultivation of microorganisms to produce microbial enzymes that cleave lignocellulosic biomass or as substrates in assays to test enzymatic activity [31,32].In addition, these seeds are rich in xyloglucan, which corresponds to about 40% of their dry mass [33].In this context, this study aimed to report the elucidation of the secretome profile and categorization of CAZymes by function and family of the filamentous fungus T. thermophilus LMBC 162 cultivated by submerged fermentation using tamarind seeds as a carbon source, which is a residue from the fruit pulp industry [28].Obtaining these data, it was determined by relative abundance which are the main proteins responsible for the degradation of the biomass bulk: cellulose, hemicellulose, and pectin.

Maintenance of the Fungus and Culture Medium
The fungus T. thermophilus LMBC 162 used in this work was isolated in Ribeirão Preto, SP, Brazil.Its identification and deposit in GenBank with the accession code MK559967.1 was described by Contato et al. [31].The maintenance of the thermophilic microorganism was carried out through the inoculation of its spores in potato dextrose agar medium (PDA) (Sigma-Aldrich, Saint Louis, MO, USA), keeping it through successive transfers in glass tubes containing the same medium and incubating at the temperature of 40 • C. Afterward, the tubes were kept under refrigeration for up to 30 days.

Submerged Cultivation of T. thermophilus LMBC 162 for Protein Secretion Induction
The submerged cultivation was performed according to Contato et al. [30].A solution with 10 6 -10 7 spores/mL was prepared.The fungus was grown in test tubes and suspended in sterile distilled water, and its spores were counted in a microscope through a Neubauer chamber.The suspension was inoculated into 125 mL Erlenmeyer flasks with 25     O (0.16%), distilled water q.s.(100 mL) (5.0 mL); yeast extract (0.1 g); carbon source (1.0 g); distilled water q.s. 100 mL) [34].The medium was supplemented with 1% (w/v) of tamarind (Tamarindus indica, Fabaceae) seeds, which were previously pretreated (boiled in water, dried, and ground to 20 mesh) to secure the sanitary quality of the seeds and avoid the growth of other associated fungi.The Erlenmeyer flasks were incubated at 40 • C under static conditions for 72 h, the best conditions for protein induction described by Contato et al. [31] who showed that these conditions were substantially better than in shaken cultivation and with shorter (24 and 48 h) or longer (96 h) times for evaluating enzyme production.The cell biomass was filtered with the aid of a vacuum pump, and the filtrates were used as enzymatic extracts.This was performed in triplicate.

Protein Quantification
The proteins obtained in crude extract after cultivation were quantified using Bradford method [35].Reactions were added with 160 µL of Bradford's reagent and 40 µL of the enzymatic extracts and incubated for 5 min at room temperature.The absorbance was measured on a spectrophotometer (Shimadzu, Kyoto, Japan) at a wavelength of 595 nm, using bovine albumin as standard.The results were expressed in µg of protein/mL.

Sample Processing
The supernatant of T. thermophilus cultivated in tamarind seeds under submerged cultivation was collected by filtration after 72 h, concentrated by ultra-filtration (10,000 MWCO, PES membrane, Vivaspin, Littleton, CO, USA), rinsed twice with 5 mL of sodium acetate buffer 50 mM pH 5.0, and the proteins were separated using SDS-PAGE electrophoresis [36].

Characterization of the T. thermophilus LMBC 162 Using Liquid Chromatography-Tandem Mass Spectrometry (LC-MS/MS)
For secretome peptide mapping, concentrated T. thermophilus LMBC 162 was analyzed by reducing SDS-PAGE on 12% separation gels.For secretome LC-MS/MS analysis, 15-20 µg of identified secretome proteins were loaded onto an SDS-PAGE gel.Preparate PAGE gel electrophoresis was used to separate the protein secretomes from the complex carbohydrate and phenolic species accumulated in the supernatant.Proteins were briefly electrophoresed into the PAGE separating gel, with the electrophoresis being terminated after the bromophenol blue tracking dye had migrated 2-3 cm into the separating gel, stained with Coomassie blue, and the entire protein banding profile excised, processed for LC-MS/MS [37].Isolated gel bands were reduced with 10 mM Tris (2-carboxyethyl) phosphine for 1 h at room temperature, alkylated by 10 mM using 2-iodoacetamide for 1 h at room temperature in the dark, and digested overnight at 37 • C with 8 µg/mL trypsin (Promega V5072, Madison, WI, USA) (relation proteins in bands: trypsin of 10 µg:mL trypsin/LysC) using 25 mM ammonium bicarbonate buffer, pH 8.0.Peptides were extracted from the gel segments with three sequential extractions at room temperature using 0.3 mL, 0.2 mL, and 0.2 mL of 0.5% trifluoracetic acid, respectively.After intermittent mixing at the vortex, they were dried in SpeedVac (ThermoFisher Scientific, Waltham, MA, USA).Finally, samples were desalted by solid phase extraction using C18 pipet tips, following the manufacturer's recommendations (Agilent P/N A57003100, Agilent Technologies, Santa Clara, CA, USA).The desalted peptides were redissolved in 0.1% aqueous formic acid and injected onto a 75-micron × 50 cm capillary HPLC column packed with 2-micron C18 particles (Thermo P/N 164942, ThermoFisher Scientific, Waltham, MA, USA).Peptides were separated using a 60 min gradient of formic acid/acetonitrile with a flow rate of 250 nL/min and ionized in a Nanospray Flex (ThermoFisher Scientific, Waltham, MA, USA) ion source using stainless-steel emitters connected to a quadrupole-Orbitrap mass spectrometer (Fusion model, ThermoFisher Scientific, Waltham, MA, USA).Peptide ions were analyzed using a "high-low" "top-speed" data-dependent MS/MS strategy, wherein peptide precursors were analyzed at high resolution in the Orbitrap sector, selected for MS/MS using the quadrupole sector, fragmented by HCD in the ion routing multipole, followed by analysis of the fragment ions in the ion trap sector.MS/MS parameters used in the experiments are ions spray voltage (1900 W); capillary temperature (300 • C); mass range in full MS mode (375-1575 m/z); resolution setting for full mass MS scan, AGC target value, maximum injection time (120,000 nominal resolution, 4 × 10 5 ions, 50 ms, respectively); number of peptides selected to be fragmented in each duty cycle (datadependent acquisition limited only by cyclic rate, set at 5 s); value of normalized collision energy (32% HCD energy); resolution settings for MS/MS acquisition, AGC target value, maximum injection time (MS/MS analysis in the ion trap sector using a rapid scan rate, 5 × 10 4 ions, dynamic injection timing limits wherein the system maximizes the injection times available for relative to the stated cycle time and maximizing sensitivity, respectively); charges of precursor ions excluded (below +2 or above +6 were excluded); and dynamic exclusion time (dynamic exclusion was set to 45 s).
Each sample was analyzed twice by LC-MS/MS, and the two RAW data files were specified as a single sample for database searching using MaxQuant (version v2.0.1.0,Max-Planck-Institute of Biochemistry, Planegg, Germany) [38].Spectra were searched against a database of 18,464 T. thermophilus protein sequences downloaded from NCBI on 27 May 2022, using Thermothelomyces as a genus search term.Searches were annotated using Python version v3.11 (Python Software Foundation, Wilmington, DE, USA) to annotate NCBI T. thermophilus ID's by transferring annotations from related curated proteins at Uniprot (https://www.uniprot.org/accessed on 14 December 2022).Sequences with a false discovery rate (FDR or q-value) greater than 0.00 were removed from the analysis.Finally, we identified conserved CAZy domains using Hidden Markov Models (HMM) profiles available on the dbCAN2 web platform (https://bcb.unl.edu/dbCAN2/index.phpaccessed on 14 December 2022).Only domains with e-values > 10 −17 and coverage > 0.35 were considered.

Analysis of Secretome Protein Composition
To characterize the secretome of T. thermophilus LMBC 162, the supernatants of cultures were collected and analyzed using LC-MS/MS searching against a database of Thermothelomyces sequences downloaded from the NCBI.The identified proteins were annotated by searching the T. thermophilus sequences against curated protein sequences available in the Uniprot/Swiss-Prot database.Our analysis identified 79 proteins in the T. thermophilus LMBC 162 secretome (all non-anchored extracellular proteins).Taking into account the quantification of these proteins through their relative abundance referenced by the IBAQ value (sum of all the peptides intensities divided by the number of observable peptides of a protein), we found five diverse classes of CAZymes: 5.55% auxiliary activity (AAs); 2.58% carbohydrate esterases (CEs); 20.58% polysaccharide lyases (PLs); and 71.29% glycoside hydrolases (GHs), which were 54.97% cellulolytic GHs, 16.27% hemicellulolytic GHs, and 0.05 classified as other GHs.Furthermore, 48.74% of CAZymes have carbohydrate-binding modules (CBMs) (Figure 1).These values are consistent with others shown in the literature for other filamentous fungi [24,39].
Auxiliary activity (AA) enzymes.The AAs are families of catalytic proteins that are potentially involved in plant cell degradation through an ability to help the original glycoside hydrolase, polysaccharide lyase, and carbohydrate esterase enzymes to gain access to the carbohydrates comprising the plant cell wall [40].Among the 17 auxiliary activity enzymes, we observed eight (8) lytic polysaccharide monooxygenases (LPMOs), with the majority being from the AA9 CAZy domain (Table 1).Analyzing quantitatively by the relative abundance (IBAQ value), we identified that the AAs correspond to 5.55% of the proteins detected in the secretome analysis.Other AA CAZy domains found are AA3, AA5, AA7, AA8, AA12, and AA13.These results corroborate studies that, when performing the secretome profile of this microorganism with other cultivated sources, also verified the presence of these oxidative enzymes [22,24].Auxiliary activity (AA) enzymes.The AAs are families of catalytic proteins that are potentially involved in plant cell degradation through an ability to help the original glycoside hydrolase, polysaccharide lyase, and carbohydrate esterase enzymes to gain access to the carbohydrates comprising the plant cell wall [40].Among the 17 auxiliary activity enzymes, we observed eight (8) lytic polysaccharide monooxygenases (LPMOs), with the majority being from the AA9 CAZy domain (Table 1).Analyzing quantitatively by the relative abundance (IBAQ value), we identified that the AAs correspond to 5.55% of the proteins detected in the secretome analysis.Other AA CAZy domains found are AA3, AA5, AA7, AA8, AA12, and AA13.These results corroborate studies that, when performing the secretome profile of this microorganism with other cultivated sources, also verified the presence of these oxidative enzymes [22,24].

Carbohydrate esterases (CEs)
. CE catalyzes the de-O-or de-N-acylation by removing the ester decorations from carbohydrates.They represent biocatalysts important for bioconversion of plant biomass and saccharification of plant cell wall polysaccharide fractions that have not gone through an alkaline pretreatment or process that would destroy the ester linkages [41].In this study, seven (7) CEs were found, corresponding to 2.58% of the proteins of the T. thermophilus LMBC 162 secretome cultivated using submerged fermentation with tamarind seeds (Table 2).These values are consistent with those shown by Rocha et al. [42], who found a relative abundance of 2% CEs in the Trichoderma harzianum secretome when cultivated on sugarcane bagasse, and with the study by Machado et al. [39] who obtained 3.4% of esterases in the Trametes versicolor secretome cultivated on microcrystalline cellulose.Among the CEs identified in this work, two were highlighted with 2.29% relative abundance, that is, 88.75% of the CEs.They are an acetylesterase CE16, an enzyme that catalyzes the conversion of acetate esters and water into alcohols and acetate [43], and a pectinesterase CE8 (accession number G2QLD0) that catalyzes the de-esterification of pectin into pectate and methanol [44].They correspond to 1.19% and 1.10% of relative abundance, respectively.Other CE CAZy domains found are CE1, CE3, CE5, and CE12.a Hypothetical molecular weight of the proteins, b BLAST E-value is the number of expected hits of similar quality (score) that could be found just by chance, c the iBAQ corresponds to the sum of all the peptide intensities divided by the number of observable peptides of a protein.

Polysaccharide lyases (PLs).
PLs are a group of enzymes that cleave uronic acidcontaining polysaccharide chains via a β-elimination mechanism to generate an unsaturated hexenuronic acid residue and a new reducing end [45].In this work, six (6) PLs were visualized, which correspond to 20.58% of the secretome.However, there is a huge emphasis on a particular protein from the PL1 family, accession number (G2QH79) in the UniProt/Swiss-Prot database and hypothetical molecular weight of (34 kDa) (Table 3).This PL alone corresponds to 19.95% of relative abundance, which is 96.94% of all identified PLs.These values are higher than the 9% seen by Verma et al. [46] in the secretome of the fungal phytopathogen Ascochyta rabiei and the 7% of those seen by Rubio et al. [47] for Aspergillus nidulans.However, this is consistent with the study by dos Santos et al. [24] who used another strain of Myceliophthora thermophila and cultivated on lignocellulosic residues.Other PL CAZy domains found are PL3 and PL4.The classes of CEs and PLs are mainly responsible for the degradation of pectin, one of the main components of the cell wall of plants [48].a Hypothetical molecular weight of the proteins, b BLAST E-value is the number of expected hits of similar quality (score) that could be found just by chance, c the iBAQ corresponds to the sum of all the peptide intensities divided by the number of observable peptides of a protein.

Cellulolytic glycoside hydrolases (GHs).
The GHs are enzymes that catalyze the hydrolysis of the glycosidic linkage of glycosides, leading to the formation of a sugar hemiacetal or hemiketal and the corresponding free aglycon [49].The GHs that cleave sugars from cellulose are named cellulolytic glycoside hydrolases, and those that cleave sugars from hemicellulose are named hemicellulolytic glycoside hydrolases.Of the CAZymes found in the secretome of T. thermophilus LMBC 162, forty-nine (49) are GHs, corresponding to 71.29% in relative abundance.Among them, the majority are those that breakdown cellulose.They account for 54.97% of the CAZymes produced in the secretome (Table 4).One cellobiohydrolase GH7 and a glucoside hydrolase from the GH7 CAZy domain are the main proteins found.They can be seen with accession numbers G2Q665 and G2QNN8 in the UniProt/Swiss-Prot database.The hypothetical molecular weight of each is 56 kDa and 49 kDa, respectively.Other GHs CAZy domains found are GH6, GH15, GH31, and GH45.The value of cellulolytic GHs found is like those seen by Machado et al. [39] who found 48.1% for Phanerochaete chrysosporium and 48.0% for T. versicolor.

Hemicellulolytic glycoside hydrolases (GHs).
Regarding the hemicellulolytic GHs found (Table 5), they correspond to 16.27% in relative abundance, once again being equivalent to values reported by others [24,39].The hemicellulolytic GHs belong to the CAZy domains: GH2, GH3, GH10, GH11, GH16, GH26, GH27, GH43, GH47, GH55, GH62, GH74, GH76, GH79, GH92, GH93, GH125, GH131, and GH135.The hemicellulolytic GHs with the highest relative abundance are a xylanase GH10 with a hypothetical molecular weight of (45 kDa) and accession number (G2QJ91) in the Uniprot/Swiss-Prot database, with 2.11% of relative abundance; a xylanase GH11 with a hypothetical molecular weight of (24 kDa) and accession number (G2Q4M3), with a relative abundance of 2.59%; a xyloglucanase GH74 of hypothetical molecular weight of (79 kDa) and accession number (G2QHR7), with 5.39% of relative abundance; and an exo-α-L-1,5-arabinanase GH93 with a hypothetical molecular weight of (42 kDa) and accession number (G2Q5Q6), with a relative abundance of 2.30%.The GH74 found in this work showed a hypothetical molecular weight corresponding to the xyloglucanase found by Berezina et al. [50], who expressed a GH74 xyloglucanase from M. thermophila in Pichia pastoris.Another relevant factor is that this secretome was performed in tamarind seeds, which are rich in xyloglucan [32], thus proving why the GH74 was produced with the highest relative abundance.Regarding the GH93 family, it is known to hydrolyze linear α-1,5-L-arabinan [51].a Hypothetical molecular weight of the proteins, b BLAST E-value is the number of expected hits of similar quality (score) that could be found just by chance, c the iBAQ corresponds to the sum of all the peptide intensities divided by the number of observable peptides of a protein.
In the analysis of T. thermophilus LMBC 162 secreted proteins belonging to the hemicellulolytic GH family, two uncharacterized proteins were determined.The GH131 family are β-glucanases that exhibit activity for a wide range of β-glucan polysaccharides, including laminarin, curdlan, lichenan, and cellulosic derivatives [52], while the GH135 family has disclosed fungal glycoside hydrolases with the ability to degrade the fungal heteropolysaccharide galactosaminogalactan [53].

Carbohydrate binding modules (CBMs).
CBMs are protein domains found in CAZymes, whose main role is to recognize and bind specifically to carbohydrates.The consequences of this event result in different functions, such as increased hydrolysis of insoluble substrates, bringing the catalytic domain closer to the substrate, polysaccharide structure disruption, and cell surface protein anchoring [61].In the secretome of T. thermophilus LMBC 162, sixteen (16) CAZymes have CBMs (Table 7).In sum, CBMs are present in 48.74% of the proteins found in the T. thermophilus secretome profile, a value like those shown in the literature for other microorganisms [39].a BLAST E-value is the number of expected hits of similar quality (score) that could be found just by chance, b the iBAQ corresponds to the sum of all the peptide intensities divided by the number of observable peptides of a protein.
Observing the iBAQ/total iBAQ values as a percentage, it is possible to state that only thirteen (13) proteins comprise 92.19% of the identified proteins secreted, and they are probably the main proteins responsible for the degradation of the bulk of the biomass: cellulose, hemicellulose, and pectin (Table 8).The most abundant protein in the secretome of T. thermophilus LMBC 162 (31.43%) was a cellobiohydrolase, like the secretome of Trichoderma reesei RUT C30, where a cellobiohydrolase is the most abundant protein [12].However, the presence of other enzymes, such as β-xylanase, lytic polysaccharide monooxygenase, and pectinesterase, was reported.Nevertheless, one of the limitations of shotgun proteomics is incomplete sequence coverage when using only one protease.Therefore, there is a possibility that other proteins, such as small proteins due to the few theoretical peptides produced in digestion, were not detected [62].

Conclusions
The secretome analysis of T. thermophilus LMBC 162 cultivated by submerged fermentation with tamarind seeds, an abundant residue from the fruit pulp industry, reveals seventy-nine (79) CAZymes diversified into the five classes of CAZy database: 5.55% AAs; 1.48% CBMs; 2.58% CEs, 20.58% PLs; and 70.55% GHs, which are 54.97%cellulolytic GHs, 15.51% hemicellulolytic GHs, and 0.05 classified as other GHs.Between them, sixteen (16) CAZymes have CBMs, protein domains found in CAZymes, whose main role is to recognize and bind specifically to carbohydrates.In sum, CBMs are present in 48.74% of the proteins found in the T. thermophilus secretome profile, a value like those shown in the literature for other microorganisms.Observing the relative abundance, it is possible to state that only thirteen (13) proteins comprise 92.19% of the identified proteins secreted, and they are probably the main proteins responsible for the degradation of the bulk of the biomass: cellulose, hemicellulose, and pectin.The findings of this work allow us to say that tamarind seeds are a residue option for the identification and production of lignocellulosic CAZymes.

Institutional Review Board Statement:
The research meets all ethical guidelines, including adherence to the legal requirements of the study countries.

Table 1 .
Cont.Hypothetical molecular weight of the proteins, b BLAST E-value is the number of expected hits of similar quality (score) that could be found just by chance, c the iBAQ corresponds to the sum of all the peptide intensities divided by the number of observable peptides of a protein. a
a Hypothetical molecular weight of the proteins, b BLAST E-value is the number of expected hits of similar quality (score) that could be found just by chance, c the iBAQ corresponds to the sum of all the peptide intensities divided by the number of observable peptides of a protein.

Table 6 .
LC-MS/MS secretome analysis for glycoside hydrolases (GHs) that breakdown other components.Hypothetical molecular weight of the proteins, b BLAST E-value is the number of expected hits of similar quality (score) that could be found just by chance, c the iBAQ corresponds to the sum of all the peptide intensities divided by the number of observable peptides of a protein. a

Table 8 .
Proteins comprising 92.19% of the identified secreted proteins and classified according to their relative abundance (IBAQ/Total IBAQ) and with which part of the biomass they degrade: cellulose, hemicellulose, or pectin.Hypothetical molecular weight of the proteins, b the iBAQ corresponds to the sum of all the peptide intensities divided by the number of observable peptides of a protein. a