Genomic Analysis to Elucidate the Lignocellulose Degrading Capability of a New Halophile Robertkochia solimangrovi

Robertkochia solimangrovi is a proposed marine bacterium isolated from mangrove soil. So far, the study of this bacterium is limited to taxonomy only. In this report, we performed a genomic analysis of R. solimangrovi that revealed its lignocellulose degrading ability. Genome mining of R. solimangrovi revealed a total of 87 lignocellulose degrading enzymes. These enzymes include cellulases (GH3, GH5, GH9 and GH30), xylanases (GH5, GH10, GH43, GH51, GH67, and GH115), mannanases (GH2, GH26, GH27 and GH113) and xyloglucanases (GH2, GH5, GH16, GH29, GH31 and GH95). Most of the lignocellulolytic enzymes encoded in R. solimangrovi were absent in the genome of Robertkochia marina, the closest member from the same genus. Furthermore, current work also demonstrated the ability of R. solimangrovi to produce lignocellulolytic enzymes to deconstruct oil palm empty fruit bunch (EFB), a lignocellulosic waste found abundantly in palm oil industry. The metabolic pathway taken by R. solimangrovi to transport and process the reducing sugars after the action of lignocellulolytic enzymes on EFB was also inferred based on genomic data. Collectively, genomic analysis coupled with experimental studies elucidated R. solimangrovi to serve as a promising candidate in seawater based-biorefinery industry.


Introduction
Halophiles are extremophiles that require salt for growth [1]. They are equipped with adaptive mechanisms to survive in harsh osmotic conditions [2][3][4][5][6]. Halophilic microorganisms can be found in coastal and open ocean environments such as marine waters, saline lakes, and mangrove forests [7][8][9][10]. Mangrove forests contain plants that grow at the interface between land and sea [11] and are one of the world's most extensive reservoirs of naturally sequestered carbon, accounting for 30% of blue carbon stored [12,13]. Halophytes within mangrove forests play an important role in the degradation of woody plant material present in mangrove sediments and surfaces [14][15][16][17].
Aside from an ecological role, lignocellulolytic enzymes produced by bacteria, including cellulases, hemicellulases, ligninases and pectinases are also important for pre-treatment and saccharification of lignocellulosic biomass for biorefining applications [18]. These enzymes are classified into glycosyl hydrolases (GHs), carbohydrate esterases (CEs), auxiliary activities (AAs) and polysaccharide lyases (PLs) [19]. They work together to degrade complex lignocellulosic plant materials to produce pentose and hexose sugars that can be utilized as feedstocks for the product of biofuels and chemicals [20][21][22][23]. In Malaysia and Indonesia oil palm empty fruit bunch (EFB) is a major lignocellulosic waste from the palm oil industry. Extensive efforts are being invested in using such lignocellulosic materials for bioenergy production [24,25].
R. solimangrovi is a halophilic bacterium recently isolated from mangrove soil [26]. So far, within the Robertkochia genus, R. solimangrovi is only the second species reported after R.

Comparative Genomic Analysis
The genome of the other species belongs to the Robertkochia genus, R. marina was sequenced, assembled, and annotated similarly as R. solimangrovi. The genome can be accessed at DDBJ/EMBL/GenBank and DOE-JGI Genome Online Database (GOLD) with accession numbers QXMP00000000 and Ga0314139, respectively. The comparative genomic analysis of the two species was performed using Clusters of Orthologous Groups (COGs), genome alignment, homologous genes, and carbohydrate active enzymes (CAZymes). The protein coding genes encoded in both genomes were categorized according to COGs using RSP-BLAST [36] through the WebMGA server [37]. The genome organization between both Robertkochia spp. were compared by aligning the both genomes with the aid of Mauve v. 2.4.0 using default parameters [38]. The homologous gene clusters between both genomes were compared using OrthoVenn2, with the following settings: E-value of 0.05, inflation value of 1.5 and Markov Cluster as algorithm [39]. The P values for GO terms in a clusters overlapping were calculated by using hypergeometric distribution in OrthoVenn2.

CAZyme Screening and Analysis of Lignocellulolytic Genes
Putative CAZyme genes were annotated in the integrated dbCAN2 meta server following default settings, with dbCAN (using HMMER), CAZy (using DIAMOND) and PPR (using Hotpep) as detection methodologies [40,41]. The results obtained were downloaded and organized in R. The CAZymes were retained if they were recognized by at least two of the methods. The lignocellulolytic genes were further cross-checked with the annotations accessible in the CAZy database [19]. The lignocellulolytic genes were searched against NCBI non-redundant protein database using BLASTp to compare the similarity with the proteins available in the database. The secondary structure of the lignocellulolytic genes was predicted through GOR4 [42]. Additional features of the lignocellulolytic genes were examined using InterPro 77.0 [43].

Inoculum Preparation and Lignocellulolytic Enzyme Production
Oil palm empty fruit bunch (EFB) was collected from a palm oil mill located at Johor, Malaysia and was used as a lignocellulosic substrate. The EFB was washed, dried and ground into fibre form (2 mm) before use. A loopful of colonies of R. solimangrovi was inoculated in an Erlenmeyer flask with lignocellulolytic enzymes production medium (pH 7) containing MgCl 2 (5.0 g/L), MgSO 4 ·7H 2 O (2.0 g/L), CaCl 2 (0.5 g/L), KCl (1.0 g/L), NaCl (20.0 g/L), yeast extract (1.0 g/L), peptone (5.0 g/L) and EFB (10.0 g/L) [41], and incubated at 30 • C with 150 rpm in an orbital shaker. When the OD 600 >1.0, a 5% (v/v) bacterial inoculum was then transferred into a new lignocellulolytic enzymes production medium with same components and incubated under same conditions for 24 h to 96 h. Microbial growth, indicated by optical density at 600 nm (OD 600 ) was measured at regular time points: every 24 h until 96 h, using a spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA). A negative control (without bacterial cells) was also prepared. The microbial growth at each time interval was calculated by: OD 600 at each time interval minus OD 600 of the control set.

EFB Weight Loss Assessment and Lignocellulolytic Enzyme Activity Assays
The flasks with EFB and cells at each time interval (24 h, 48 h, 72 h, and 96 h) were centrifuged at 4 • C and 4500 rpm for 20 min. The remaining EFB was washed with 1× PBS buffer supplemented with 0.5 % (v/v) Tween 20. The EFB was dried at 60 • C and the weight of EFB was measured on an electronic balance until a constant weight was obtained. The weight loss as compared to control (EFB without inoculation) was calculated. The morphological changes of EFB were recorded. The structural changes of EFB were recorded, before and after the incubation, by using a Phenom Pro G5 scanning electron microscope (SEM) (Phenom-World BV) under 600× magnification and 5 kV accelerating voltage.
The supernatant obtained after centrifugation was utilized as crude enzymes for assays. The activities of nine lignocellulolytic enzymes were tested at every 24 h time interval until 96 h in the absence and presence of salt. Endoglucanase, exoglucanase, β-xylanase and β-mannanase activities were measured through the release of reducing sugars, based on a 3,5-dinitrosalicylic acid (DNS) method [44] with 1% (w/v) carboxymethyl cellulose (CMC) (Merck), Avicel ® (Merck), xylan from beechwood (Apollo Scientific) and locust bean gum from Ceratonia siliqua seeds (Sigma) as substrate, respectively. While ρ-nitrophenol (ρNP) based substrates (in 5 mM) were used to determine β-glucosidase, β-xylosidase, α-L-arabinofuranosidase, α-galactosidase and β-mannosidase activities, including ρNP-β-D-glucopyranoside (ρNPG) (Merck), pNP-β-D-xylopyranoside (ρNPX) (Merck), ρNP-β-α-Larabinofuranoside (ρNP-Ara) (Megazyme), ρNP-α-D-galactopyranoside (ρNPGa) (Apollo Scientific) and pNP-β-D-mannopyranoside (ρNP-βM) (Apollo Scientific), correspondingly. The reaction mixtures consisted of an equal volume of crude enzymes and substrates in 50 mM sodium phosphate buffer (pH 7) and were incubated at 30 • C for 30 min. For the experiment in the presence of salt, 2% (w/v) NaCl was added into the buffer. The optical density after incubation was measured at 540 nm (for reducing sugar assays using DNS) and 430 nm (for detecting the release of ρNP) by using a spectrophotometer (Thermo Fisher Scientific). One unit of enzyme activity (U/mL) was defined as the amount of enzyme that liberates 1 µmol of the respective product under assay conditions. Enzyme relative activity (%) was calculated by relative to the case of reaction at which maximum activity was taken as 100%. The experiment was repeated twice, and each set was conducted in biological triplicates with a prepared negative control (without enzymes).

Statistical Analysis
The data of biomass weight loss and enzyme activities were expressed as the mean ± SD. The student t-test was performed on the above-mentioned data by using SPSS v. 26 (SPSS Institute, Chicago, IL, USA) to examine the significant differences. A value of p < 0.05 was used as a criterion for statistical significance.

Genome Features of R. solimangrovi
The general genome features of R. solimangrovi are listed in Table 1. The genome of R. solimangrovi was 4.4 Mbp with G+C content of 40.72%. The genome of R. solimangrovi is exceptionally larger than its counterpart in the genus, i.e., R. marina (3.6 Mbp). While the G+C content of R. solimangrovi is slightly lower than R. marina (43.7%). Both species of Robertkochia have higher G+C percentage (40.7-43. 7%) compared to members of closely related genera such as Joostella marina (33.6%), Galbibacter marinus (37.0%) and Zhouia amylolytica (36.7%) [45][46][47]. Genome annotation of R. solimangrovi assigned 3669 protein coding genes and 51 RNA genes from a total of 3720 genes (Table 1). Notable features of the protein coding genes include the number of hypothetical proteins and putative horizontal transfer genes. A total of 1081 hypothetical proteins were encoded in the genome (29.5%), indicating a 1/3 portion of genes in R. solimangrovi could serve as potential candidates for new functional exploration. On the other hand, the 149 genes of annotated as horizontal transfer genes are potentially important for adaptabion of R. solimangrovi to its habitat. For example, a β-lactamase originated from archaeon Methanosarcina sp. MTP4 that has been transferred to R. solimangrovi could be used as a penicillin-binding protein to fight against the β-lactam antibiotics produced by other bacteria.

Genome Comparison: R. solimangrovi vs. R. marina
Both genomes of Robertkochia spp. were compared in terms of COGs, genome alignment, homologous genes, and CAZymes. A total of 77.4% and 76.9% from R. solimangrovi and R. marina were functionally classified into 21 categories of COGs, respectively (Supplementary Materials Table S1). The composition of genes in COG functional categories appeared to be similar for both R. solimangrovi and R. marina. The genes assigned to general function prediction were the most abundant (13.4-13.6%), followed by genes classified under amino acid transport and metabolism (8.3-9.6%) and cell wall/membrane/ envelope biogenesis (8.2-8.9%).
The genome alignment of the R. solimangrovi and R. marina genomes is demonstrated in Figure 1A, with distinctive profiles for both species. The clustering analysis among both Robertkochia species via OrthoVenn2 indicated that they shared 2127 clusters, with another 107 paralogous clusters solely belonging to R. solimangrovi ( Figure 1B). These paralogous gene clusters were enriched with GO-term sequence-specific DNA binding (GO:0043565), L-arabinose metabolic process (GO:0046373) and starch catabolic process (GO:00005983).

of 15
The genome alignment of the R. solimangrovi and R. marina genomes is demonstrated in Figure 1A, with distinctive profiles for both species. The clustering analysis among both Robertkochia species via OrthoVenn2 indicated that they shared 2127 clusters, with another 107 paralogous clusters solely belonging to R. solimangrovi ( Figure 1B). These paralogous gene clusters were enriched with GO-term sequence-specific DNA binding (GO:0043565), ʟ-arabinose metabolic process (GO:0046373) and starch catabolic process (GO:00005983).  [48]. A higher abundance of CAZymes was observed in the genome of R. solimangrovi as compared to R. marina, with a two-fold higher number of GHs encoded in the genome of R. solimangrovi and no PL was detected for R. marina.
Further comparison of CAZymes among these two species elucidated a total of 48 unique differences (   [48]. A higher abundance of CAZymes was observed in the genome of R. solimangrovi as compared to R. marina, with a two-fold higher number of GHs encoded in the genome of R. solimangrovi and no PL was detected for R. marina. Further comparison of CAZymes among these two species elucidated a total of 48 unique differences (Figure 2). Out of the differences examined, R. solimangrovi possesses 30 families of CAZymes that are not present in the R. marina. Most of the GHs and PLs identified in the genome of R. solimangrovi were not present in R. marina. For instance, R. solimangrovi possesses a series of GH28, GH53, GH88, GH105, GH106, GH127, GH145, GH146, PL1, PL10 and PL22 ( Figure 2) that are responsible for pectin degradation [49,50]. Interestingly, a protein that contained a PL1 with a CE8 domain was identified in R. solimangrovi. The combination of PL1 and CE8 in a single protein suggested that the CE8 domain could possibly de-esterify the pectate found in the mangrove environment, which then serves as the substrate for the pectin lyase PL1 domain [51]. GH146, PL1, PL10 and PL22 ( Figure 2) that are responsible for pectin degradation [49,50]. Interestingly, a protein that contained a PL1 with a CE8 domain was identified in R. solimangrovi. The combination of PL1 and CE8 in a single protein suggested that the CE8 domain could possibly de-esterify the pectate found in the mangrove environment, which then serves as the substrate for the pectin lyase PL1 domain [51].   (Table 2). These make up a total of 87 genes that encode enzymes that are likely to be involved in lignocellulose degradation. The BLASTp search on these lignocellulose degrading genes imparted that they shared between 42.3-82.5% similarity with proteins from other genera with halophilic origin such as Joostella, Flaviramulus, Fabibacter and Zhouia (Supplementary Materials Table S2).   Among the cellulases of R. solimangrovi (Table 2), the GH5 sub-family 46 and GH9 are responsible for either endoglucanase/exoglucanase activities, while the GH3 and GH30 families are usually encoding for β-glucosidase activity that release glucose from cellobiose [20]. In other studies, the enzyme encoding for GH5 sub-family 46 produced by the uncultured microorganisms found in the rumen of cows, was described to be active towards carboxymethyl-cellulose [52,53].
Several annotated lignocellulose degrading enzymes of R. solimangrovi feature additional domains that are suspected to improve enzymatic function. For instance, fibronectin type 3 domains found at the C-terminal of GH3 and GH43 sub-family 8 of R. solimangrovi (Supplementary Materials Figure S1), could potentially assist in cellulose hydrolysis. In addition, efficient cellulose and hemicellulose degradation also requires the action of non-catalytic CBMs to enhance the association of GHs to the substrate [55]. Several GHs encoded in the genome of strain CL23 contain CBMs, for instance, GH27 with CBM51, GH29 with CBM32 and GH43 sub-family 37 with CBM61. In addition, a GH26 of R. solimangrovi also possesses with a GT2 in the same gene (Supplementary Materials Figure S1). GT2 has shown to be involved in synthesis of polysaccharides such as cellulose and mannan exopolysaccharides [56]. However, the co-occurrence of both GH26 and GT2 within the same gene has not been previously reported.

Capability of R. solimangrovi to Deconstruct Oil Palm Empty Fruit Bunch
The presence of lignocellulose degrading genes in R. solimangrovi suggests that this bacterium can deconstruct lignocellulosic materials. Therefore, R. solimangrovi was cultured in a medium with empty fruit bunch (EFB). Structural and weight changes of EFB were monitored. A total of 30.4% EFB biomass weight was lost after 96 h of incubation (Supplementary Materials Figure S2A), a significant amount as compared to the control (only 0.3% biomass weight loss). The EFB after degradation by R. solimangrovi was examined under SEM (Supplementary Materials Figure S2B). The structure of EFB was altered to broken and rough surfaces after 96 h of incubation with R. solimangrovi (Supplementary Materials Figure S2B(ii)) as compared to control which was smooth and intact (Supplementary Materials Figure S2B(i)).
Highest enzyme activity was observed for α-arabinofuranosidase after 24 h of incubation. All tested lignocellulolytic enzymes demonstrated an increase of activity after 48 h of incubation (except for α-arabinofuranosidase). Reduced enzyme activities were seen for all of the tested lignocellulolytic enzymes after 72 h of incubation ( Figure 3). Under assay conditions with or without NaCl (2% (w/v)), similar enzyme activities were observed for β-glucosidase, α-arabinofuranosidase, α-galactosidase and β-mannosidase across the 96 h of incubation. The β-xylanase and β-xylosidase revealed higher enzyme activities when NaCl was absent in enzyme assay buffer (at 48 h, 72 h and 96 h of incubation). The genes encoding for β-xylanase (GH5 sub-family 13 and GH10) and β-xylosidase (GH43 and GH43 sub-family 31) contain high proportion of hydrophobic amino acids. This property does not allow the enzymes to counterbalance the hydrophobic interaction strengthened by the presence of salts in the surrounding environment [57], therefore, β-xylanase and β-xylosidase showed lower tolerance to saline condition.
In contrast, higher enzyme activities were seen for endoglucanase and exoglucanase (after 48 h of incubation) when enzyme assays were conducted in the presence of NaCl (2% (w/v). A higher acidic amino acid composition (glutamine and aspartic acid) was found in the sequences of GH5 sub-family 46 and GH9 that were likely contributed to endoglucanase and exoglucanase activities in saline conditions, respectively. In GH5, there is a total of 76 acidic amino acids as compared to 62 basic amino acids. A similar result was also observed in GH9, in which this enzyme consists of 101 acidic amino acids and 81 basic amino acids. A higher proportion of acidic acid enables the enzyme to have a stable solvation shell on its structural surfaces to prevent aggregation in presence of salt [57]. Furthermore, a significant proportion of the random coil was also predicted in GH5 subfamily 46 (45.9%) and GH9 (50.3%) according to GOR4 secondary structure prediction. This may allow the enzyme to have high degree of flexibility to prevent structural collapse under a high salt condition [57].

Potential Sugar Uptake and Metabolic Pathway Taken by R. solimangrovi
Upon the degradation of EFB by lignocellulolytic enzymes of R. solimangrovi, sugars such as glucose, mannose, galactose, xylose and arabinofuranose are produced. These could be acquired by cells as carbon sources for growth ( Figure 4). Several genes that encode for transporters are found in the genome of R. solimangrovi. These transporters are glucose/Na + co-transporters, ABC transporters, MFS transporters and EamA/RhaT family transporters (Supplementary Materials Table S4). Based on KEGG analysis, the fate of sugar monomers after transportation into the cells is different (Figure 4). The ᴅ-glucose depolymerized from the cellulose component of the EFB is in β-form [58]. In order to be utilized by cells, the β-glucose is firstly phosphorylated to β-glucose-6-phosphate by polyphosphate glucokinase and then converted to its α-form by glucose-6-phosphate isomerase (Supplementary Materials Table S4) [59]. The α-glucose-6-phosphate could then enter the glycolytic pathway for energy generation. Similarly, galactose undergoes a series of conversions into α-glucose-6-phosphate by galactose-1-epimerase, galactokinase, galactose-1-phosphate uridylyltransferase, phospho-sugar mutase (Supplementary Materials  Table S4) and subsequently enters the glycolytic cycle [59].

Potential Sugar Uptake and Metabolic Pathway Taken by R. solimangrovi
Upon the degradation of EFB by lignocellulolytic enzymes of R. solimangrovi, sugars such as glucose, mannose, galactose, xylose and arabinofuranose are produced. These could be acquired by cells as carbon sources for growth ( Figure 4). Several genes that encode for transporters are found in the genome of R. solimangrovi. These transporters are glucose/Na + co-transporters, ABC transporters, MFS transporters and EamA/RhaT family transporters (Supplementary Materials Table S4). Based on KEGG analysis, the fate of sugar monomers after transportation into the cells is different (Figure 4). The Dglucose depolymerized from the cellulose component of the EFB is in β-form [58]. In order to be utilized by cells, the β-glucose is firstly phosphorylated to β-glucose-6-phosphate by polyphosphate glucokinase and then converted to its α-form by glucose-6-phosphate isomerase (Supplementary Materials Table S4) [59]. The α-glucose-6-phosphate could then enter the glycolytic pathway for energy generation. Similarly, galactose undergoes a series of conversions into α-glucose-6-phosphate by galactose-1-epimerase, galactokinase, galactose-1-phosphate uridylyltransferase, phospho-sugar mutase (Supplementary  Materials Table S4) and subsequently enters the glycolytic cycle [59].
plementary Materials Table S4). Additional steps of conversion for pentose sugars (xylose and arabinofuranose) are required before they can enter the central metabolism. Initially, xylose is isomerized and phosphorylated into xylulose-5-phosphate by xylose isomerase and xylulokinase, respectively (Supplementary Materials Table S4). The xylulose-5-phosphate is then transformed by transketolase into β-fructose-6-phosphate in the pentose phosphate pathway and enters the glycolysis pathway. Likewise, arabinofuranose is isomerized, phosphorylated, and epimerized into xylulose-5-phosphate by arabinose isomerase, ribulokinase and ribulose-5-phosphate-4-epimerase, respectively (Supplementary Materials Table S4), and eventually glycolysis. Taken together, to the extent of our knowledge, this is the first genomic study of R. solimangrovi to reveal its lignocellulose degrading capacity. Genomic analysis coupled with experimental studies unleashed its potential to degrade lignocellulosic EFB from the oil palm industry, which could be valuable for seawater based biorefining. Future research could be directed to investigate the transcriptomic and proteomic response of R. solimangrovi on different lignocellulosic waste to reveal the enzymes that are highly expressed in a saline environment setting. Unlike glucose and galactose, the mannose is phosphorylated and isomerized by carbohydrate kinase and mannose-6-phosphate isomerase into β-fructose-6-phosphate (Supplementary Materials Table S4). Additional steps of conversion for pentose sugars (xylose and arabinofuranose) are required before they can enter the central metabolism. Initially, xylose is isomerized and phosphorylated into xylulose-5-phosphate by xylose isomerase and xylulokinase, respectively (Supplementary Materials Table S4). The xylulose-5-phosphate is then transformed by transketolase into β-fructose-6-phosphate in the pentose phosphate pathway and enters the glycolysis pathway. Likewise, arabinofuranose is isomerized, phosphorylated, and epimerized into xylulose-5-phosphate by arabinose isomerase, ribulokinase and ribulose-5-phosphate-4-epimerase, respectively (Supplementary  Materials Table S4), and eventually glycolysis.
Taken together, to the extent of our knowledge, this is the first genomic study of R. solimangrovi to reveal its lignocellulose degrading capacity. Genomic analysis coupled with experimental studies unleashed its potential to degrade lignocellulosic EFB from the oil palm industry, which could be valuable for seawater based biorefining. Future research could be directed to investigate the transcriptomic and proteomic response of R. solimangrovi on different lignocellulosic waste to reveal the enzymes that are highly expressed in a saline environment setting.

Supplementary Materials:
The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/genes13112135/s1, Figure S1: Domain organization of GH43 subfamily 28 (A), GH3 (B) and GH26 with GT2 (C) annotated in the genome of Robertkochia solimangrovi. SP, signal peptide; FN3, fibronectin type 3 domain; TM, transmembrane helix; Figure S2: Oil palm empty fruit bunch (EFB) deconstruction by R. solimangrovi as indicated by total biomass weight loss (A) and EFB structural changes (B). Scanning electron micrographs of EFB structure before strain inoculation (i) and after 96 h of incubation (ii); Table S1: Classification of protein coding genes according to Clusters of Orthologous Groups (COGs) for R. solimangrovi and R. marina; Table S2: Similarity of lignocellulose degrading genes of R. solimangrovi to other related sequences based BLASTp search; Table S3: Putative horizontal transferred genes of R. solimangrovi related to lignocellulose degradation, inferred from genomic data; Table S4: List of potential genes involved in uptake and metabolism of sugars, inferred from genome of R. solimangrovi.