Comparative Genomic Analysis of Agarolytic Flavobacterium faecale WV33T

Flavobacteria are widely dispersed in a variety of environments and produce various polysaccharide-degrading enzymes. Here, we report the complete genome of Flavobacterium faecale WV33T, an agar-degrading bacterium isolated from the stools of Antarctic penguins. The sequenced genome of F. faecale WV33T represents a single circular chromosome (4,621,116 bp, 35.2% G + C content), containing 3984 coding DNA sequences and 85 RNA-coding genes. The genome of F. faecale WV33T contains 154 genes that encode carbohydrate-active enzymes (CAZymes). Among the CAZymes, seven putative genes encoding agarases have been identified in the genome. Transcriptional analysis revealed that the expression of these putative agarases was significantly enhanced by the presence of agar in the culture medium, suggesting that these proteins are involved in agar hydrolysis. Pangenome analysis revealed that the genomes of the 27 Flavobacterium type strains, including F. faecale WV33T, tend to be very plastic, and Flavobacterium strains are unique species with a tiny core genome and a large non-core region. The average nucleotide identity and phylogenomic analysis of the 27 Flavobacterium-type strains showed that F. faecale WV33T was positioned in a unique clade in the evolutionary tree.


Introduction
The genus Flavobacterium consists of approximately 100 species isolated from diverse environmental sources, such as fresh water, sea ice, soil, and sediments [1]. Some species in the genus Flavobacterium have attracted interest for their ability to produce valuable enzymes that can be utilized as biocatalysts in bioremediation or wastewater treatment [2,3]. In addition, some species synthesize carbohydrate-active enzymes (CAZymes), including agarases [4], cellulases [5], and xylanases [6]. These enzymes are widely used as biocatalysts for the bioproduction of biofuels and biochemicals from renewable sources. In particular, the utilization of agar, one of the most abundant polysaccharides in nature and a major component in the cell walls of red algae, has attracted interest in the cosmetic, pharmaceutical, and food industries [7,8]. Agar comprises two main polysaccharides: agarose and agaropectin [9]. Agarose is formed by the repetition of β-D-galactose and 3,6anhydro-α-L-galactose. Agaropectin has the same basic building blocks as agarose, but the hydroxyl groups of the 3,6-anhydro-α-L-galactose units are partially substituted by sulfoxy, methoxy, or pyruvate residues [10]. Agarases are a group of glycoside hydrolases that digest agar into diverse oligosaccharides, and are classified into αand β-agarases according to their cleavage patterns [11]. The α-agarases cleave the α-1,3 linkages of agarose to form agaro-oligosaccharides, whereas β-agarases cleave the β-1,4 linkages of agarose to produce neoagaro-oligosaccharides [12]. Phylogenetically diverse marine bacteria, isolated mostly from seawater and marine sediments, including Alteromonas [13], Pseudoalteromonas [14], Pseudomonas [15], and Vibrio [16], have been reported to produce agarases with diverse catalytic activities and biotechnological application potentials [17]. 2 of 12 In the previous study, we isolated and characterized a novel species, Flavobacterium faecale WV33 T , from the stools of Antarctic penguins [18]. Notably, F. faecale WV33 T showed agarolytic activity on agar plates, indicating the presence of agarase-encoding genes. As there is no available genome information for the F. faecale WV33 T -type strain, sequencing its genome is essential to clone and characterize a variety of CAZymes, including agarases, at the molecular level, thus enriching our understanding of the abundance and distribution of CAZymes in metabolically diverse Flavobacterium strains. In addition, phylogenomics and pan-genomic analyses [19,20] will uncover the evolutionary information and biotechnological potential of F. faecale WV33 T . Here, we describe the complete genome sequence of the agarolytic bacterium, F. faecale WV33 T . Based on genome annotation analysis, seven putative genes encoding agarases were identified and their transcriptional expression was analyzed using quantitative RT-PCR. Phylogenomics, average nucleotide identity (ANI), and pan-genomic analyses of the 27 representative genomes of Flavobacterium type strains, including F. faecale WV33 T , were also performed.

Genome Assembly of Flavobacterium Faecale WV33 T
We determined that the genome of F. faecale WV33 T consisted of a 4,621,116 bp (4.6 Mbp) circular chromosome with a G + C content of 35.2% (Table 1). In total, 3984 coding DNA sequences (CDSs) were predicted, along with 18 rRNA and 67 tRNA genes, resulting in a gene density of 885 genes/Mb ( Figure 1). The identified 2573 CDSs in the genome were classified into functional categories based on the clusters of orthologous genes (COG) designation [21] (Table 2) and are presented in the circular map with color codes (Figure 1).

Mining for Gene Encoding Agarases
The presence of genes encoding agarases was first investigated in the complete genome of F. faecale WV33 T since F. faecale WV33 T showed agarolytic activity. Seven putative genes encoding agarases (agar.3, agar.162, agar.2965, agar.3018, agar.3061, agar.3068, and agar.3154; see sequences in Figure S1) were identified in the genome of F. faecale WV33 T . The amino acid sequence analysis of these putative agarases revealed that agar.162 and agar.3154 were closely related (forming one cluster), whereas agar.3068 was phylogenetically the farthest from them ( Figure 2A). Next, the mRNA expression of the putative agarase genes was investigated under inducible conditions (agar vs. glucose-supplemented media). As the two nucleotide sequences of the agar.162 and agar.3154 genes were Figure 1. Circular representation of the genome of F. faecale WV33 T . From the outer to inner circle: predicted protein-coding sequences (colored by COG categories) on the plus strand, predicted proteincoding sequences (colored by COG categories) on the minus strand, RNA genes (tRNAs, blue; rRNAs, red), GC content (blue/black), and GC skew (red/black). NC in color codes of COG represents no classified category.

Mining for Gene Encoding Agarases
The presence of genes encoding agarases was first investigated in the complete genome of F. faecale WV33 T since F. faecale WV33 T showed agarolytic activity. Seven putative genes encoding agarases (agar.3, agar.162, agar.2965, agar.3018, agar.3061, agar.3068, and agar.3154; see sequences in Figure S1) were identified in the genome of F. faecale WV33 T . The amino acid sequence analysis of these putative agarases revealed that agar.162 and agar.3154 were closely related (forming one cluster), whereas agar.3068 was phylogenetically the farthest from them ( Figure 2A). Next, the mRNA expression of the putative agarase genes was investigated under inducible conditions (agar vs. glucose-supplemented media). As the two nucleotide sequences of the agar.162 and agar.3154 genes were highly similar (a local similarity of 99% and a global similarity of 77%) and thus hampered the gene-specific primer design (Table S1) for evaluating individual gene mRNA expression, agar.162 and agar.3154 were excluded from the mRNA expression analysis. Quantitative RT-PCR (RT-qPCR) analysis revealed that the mRNA levels of all five putative agarases were significantly induced, but to different degrees, by the presence of agar in the medium compared to the expression observed when grown in a medium containing glucose ( Figure 2B). In particular, the mRNA expression level of agar.3061 was 14-fold higher in the medium containing agar than in the medium containing glucose.

CAZymes of Flavobacterium Faecale WV33 T
Since Flavobacterium strains are known to have diverse carbohydrate-active enzymes (CAZymes), we predicted CAZymes in the genome of F. faecale WV33 T against the db-CAN database [25], using three CAZyme annotation programs (HMMER, eCAMI, and DIAMOND). The results of the CAZyme analysis revealed that the genome of F. faecale WV33 T contained 154 genes encoding CAZymes (Table 3) predicted by all three annotation programs (high accuracy): the most abundant CAZyme was glycoside hydrolase (GH, 88), followed by glycosyltransferase (GT, 48), polysaccharide lyases (PL, 10), carbohydratebinding module (CBM, 5), and carbohydrate esterase (CE, 3), accounting for 33 CAZyme genes per Mbp in the genome. Notably, these high numbers of the predicted CAZymes correlated with the number of COG functional categories ([G] Carbohydrate transport and metabolism, 183). When the CAZymes predicted by one or two annotation programs were included (low accuracy), the total number of CAZymes reached 311.

ANI Analysis of the 27 Representative Genomes of Flavobacterium Strains
To assess the relationships among bacterial strains via the identity/similarity values of the homologous regions of the target genomes [26,27], we analyzed the ANI values of 27 representative genomes of Flavobacterium strains, including F. faecale WV33 T (Table 4), using pyani [28] with four algorithms: a MUMmer (ANIm), BLASTN (ANIb), legacy BLAST (ANIblastall), and alignment-free algorithm tetranucleotide signature frequencies (TETRA) (https://github.com/widdowquinn/pyani, accessed on 6 July 2021) after retrieving whole genome sequences from the NCBI database (https://www.ncbi.nlm.nih.gov/refseq/, ac-  [29]. Notably, the Flavobacterium strains showed relatively broad ranges of ANI values, irrespective of the pyani algorithms used in this study, indicating that the Flavobacterium genomes are divergent in sequences.

Pangenome Analysis of the 27 Representative Genomes of Flavobacterium Strains
As the pangenome is regarded as the whole genomic repertoire of phylogenetically related microorganisms [30], pangenome analysis is frequently used to study the genomic diversity of microorganisms and discriminate the core, accessory, and unique genes present in pangenomes [20]. As 27 genomes of Flavobacterium strains are divergent in size (Table 4) and ANI values (Figure 3), it is worthwhile analyzing the pangenomes of the Flavobacterium strains to elucidate their genome evolution and variability. Therefore, the whole genome sequences of the 27 Flavobacterium strains were examined using Roary [31] with default parameters, except for the blastp identity cutoff value. In the range of 70-100% blastp identity cut-offs, the maximum number of clusters reached 85,318 at 95% identity cut-off, and the maximum number of core (conserved) genes was 294 at 80% identity cutoff (Table S2). The combined core (294) and soft core (57) genes (≥80% identity) were estimated to be 351, and the total gene clusters were 57,805 (Table S2), suggesting that Flavobacterium strains tend to be very plastic and have a small core genome. The number of pangenome genes increased steadily up to 57,805 with the addition of each further genome sequence ( Figure 4A) and the number of core genes rapidly decreased and plateaued at 294 with the addition of a new genome ( Figure 4B). In addition, a gene presence/absence matrix plot ( Figure 4C) suggested that the Flavobacterium-type strains have a large and open pangenome.   were estimated to be 351, and the total gene clusters were 57,805 (Table S2), suggesting that Flavobacterium strains tend to be very plastic and have a small core genome. The number of pangenome genes increased steadily up to 57,805 with the addition of each further genome sequence ( Figure 4A) and the number of core genes rapidly decreased and plateaued at 294 with the addition of a new genome ( Figure 4B). In addition, a gene presence/absence matrix plot ( Figure 4C) suggested that the Flavobacterium-type strains have a large and open pangenome.

Phylogenomic Analysis of the 27 Representative Genomes of Flavobacterium Strains
Phylogenomics can reconstruct the evolutionary history of microorganisms at the genomic level with whole genomes or large fractions of genomes [32], and thus help infer the phylogenetic relationships of relevant microorganisms and gain insights into the molecular evolution mechanism [19]. Therefore, we analyzed the phylogenomics of 27 representative genomes of Flavobacterium strains (Table 4) using GToTree [33]. The phylogenomic analysis showed that F. faecale WV33 T was positioned close to F. crassostreae and F. commune ( Figure 5), similar to the same position analyzed in the pangenome analysis ( Figure 4C). We further analyzed the presence/absence of the genes encoding agarases across the 27 representative genomes of Flavobacterium strains with a protein family (Pfam) accession number (PF00722) ( Figure 5) using GToTree. Pfam:PF00722 (https://pfam.xfam. org/, accessed on 6 July 2022) corresponded to the GH family 16 (GH16) of the agarase in the CAZyme classification (http://www.cazy.org/, accessed on 6 July 2022). Notably, 18 genomes contained GH16 (PF00722) at different abundance: the highest abundance (12 genes) was in F. faecale WV33 T , followed by F. nackdongense (5 genes), F. endoglycinae (4 genes), F. album (4 genes), and F. jumunjinense (4 genes).

Discussion
The complete genome of F. faecale WV33 T consists of an approximately 4.6 Mb circular chromosome with a gene density of 885 genes/Mb. The genome size is similar to the aver age size of 4 Mb, with a minimum of 2.8 Mb (F. psychrophilum) and a maximum of 6.1 Mb (F. johnsoniae). Notably, [G] Carbohydrate transport and metabolism and [E] Amino acid transport and metabolism were relatively highly abundant in the COG category of the

Discussion
The complete genome of F. faecale WV33 T consists of an approximately 4.6 Mb circular chromosome with a gene density of 885 genes/Mb. The genome size is similar to the average size of 4 Mb, with a minimum of 2.8 Mb (F. psychrophilum) and a maximum of 6.1 Mb (F. johnsoniae). Notably, [G] Carbohydrate transport and metabolism and [E] Amino acid transport and metabolism were relatively highly abundant in the COG category of the genome of F. faecale WV33 T , similar to the other Flavobacterium strains. In particular, the abundance of [G] Carbohydrate transport and metabolism was further supported by the results of the CAZyme analysis (Table 3). Flavobacterium strains produce different types of carbohydrate-active enzymes, including polysaccharide-degrading enzymes. This is the case for F. faecale WV33 T , the sequenced genome of which contains seven putative agaraseencoding genes. Transcriptional analysis revealed that five of these putative agarases were significantly enhanced by agar, suggesting that the putative agarases could be agarmetabolizing proteins. A biochemical and enzymatic study of the purified putative agarases of F. faecale WV33 T would elucidate the mechanism of agar-depolymerization. Among the 27 representative genomes of Flavobacterium strains, 18 genomes of Flavobacterium strains contained the GH16 family. Notably, F. faecale WV33 T contained 12 genes of GH16s, which was the highest abundance obtained in the 18 Flavobacterium strains. Therefore, F. faecale WV33 T could be used as a whole cell catalyst or microbial source for biocatalysts for the production of biomaterials of industrial importance.
Pangenome analysis revealed that Flavobacterium strains tend to be highly plastic and have a small core genome. Therefore, Flavobacterium strains are unique species with a small core genome and an open pangenome. This suggests that extensive gene gain/loss has occurred in this genus during evolutionary events. To address the extensive gene gain/loss events, additional studies, including mobile genetic elements, are required. Combined ANI, pangenome, and phylogenomic analyses showed that F. faecale WV33 T was positioned in a unique clade in the tree. As only 27 complete genomes of Flavobacterium strains were utilized in this study, a large-scale comparative genomic study with high quality and complete genomes of Flavobacteria can enhance the evolutionary information of Flavobacterium strains and our understanding of how the flavobacterial genome evolved to adapt to different environmental niches.

Strains and Culture Conditions
F. faecale WV33 T was aerobically cultured in Luria-Bertani (LB; 10 g/L tryptone, 5 g/L yeast extract, and 5 g/L NaCl) medium at 30 • C on a rotary shaker at 250 rpm. For transcriptional expression analysis of the putative agarases, F. faecale WV33 T was grown in LB medium supplemented with 1 g/L sliced solidified agar or 1 g/L glucose as a control.

Genome Sequencing and Assembly
Genomic DNA of F. faecale WV33 T was extracted using a genomic DNA extraction kit (Macrogen, Korea) with RNase A treatment. The genome of F. faecale WV33 T was sequenced in single-molecule real-time (SMRT) cells using Pacific Biosciences (PacBio) RS II SMRT sequencing technology (PacBio, Menlo Park, CA, USA). After the sub-read filtering of raw data, 78,126 long reads and 812,749,150 base pairs with a genome coverage of 176 folds were generated and assembled de novo using a Canu v1.3 assembler [34]. The overlapping regions at both ends of a contig were identified and trimmed to generate a unique stretch on both ends using Circlator [35]. Open reading frames (ORFs) were predicted by comparing the data obtained using the RAST server (https://rast.nmpdr.org/, accessed on 8 January 2020), Prodigal 2.6.3 [36], and Glimmer 3.2 [37] analysis tools. The tRNA and rRNA genes were predicted using tRNAscan-SE v1.21 [38] and RNAmmer v1.2 [39], respectively. Functional predictions were based on RPS-BLAST searches (E-value < 10 −3 ) against the non-redundant GenBank protein database (www.ncbi.nlm.nih.gov/protein, accessed on 8 January 2020), clusters of orthologous groups (COG) database (www.ncbi.nlm.nih.gov/ COG, accessed on 9 January 2020), and KEGG database (www.genome.ad.jp/kegg, accessed on 9 January 2020). A graphical circular map of the genome was constructed and visualized using Circos v0.67 [40].

Quantitative Reverse Transcription PCR
The total RNA of F. faecale WV33 T was extracted using the easy-BLUE TM Total RNA extraction kit (iNtRON Biotechnology, Seongnam, Korea) and then treated with DNase I (Sigma-Aldrich, Saint Louis, MO, USA) at 37 • C for 30 min. The transcriptional expression levels of the five putative agarases of F. faecale WV33 T were determined using quantitative RT-PCR (RT-qPCR) using gene-specific primers (Table S1). Briefly, total RNA (1 µg) was subjected to cDNA synthesis using a ReverTra TM Ace qPCR RT Kit (Toyobo, Osaka, Japan). The qRT-PCR was performed on a Rotor-Gene (Qiagen, Hilden, Germany) with a SensiFAST TM SYBR No-ROX Kit (Bioline, Taunton, MA, USA), and the products were quantified using the comparative Ct (∆∆Ct) method. The gene encoding gyrB was used as the reference gene.

CAZyme Annotation
To analyze the CAZyme-related genes in F. faecale WV33 T , the sequenced genome of F. faecale WV33 T was subjected to a FASTA format to run_dbcan3 v3.04, which is the standalone version of the dbCAN2 annotation tool [25].

Phylogenomics Analysis of Flavobacterium Strains Using GToTree
A phylogenetic tree of the 27 Flavobacterium strains was constructed using GToTree v1.6.34 [33]. Ninety single-copy gene sets of Bacteroidetes were used to construct the tree at the species level. Psychrobacillus glaciei (GCA_008973485.1) was used as an outgroup to root the tree.

Conflicts of Interest:
The authors declare no conflict of interest.