Whole-Genome Analysis of Starmerella bacillaris CC-PT4 against MRSA, a Non-Saccharomyces Yeast Isolated from Grape

Starmerella bacillaris is often isolated from environments associated with grape and winemaking. S. bacillaris has many beneficial properties, including the ability to improve the flavor of wine, the production of beneficial metabolites, and the ability to biocontrol. S. bacillaris CC-PT4 (CGMCC No. 23573) was isolated from grape and can inhibit methicillin-resistant Staphylococcus aureus and adaptability to harsh environments. In this paper, the whole genome of S. bacillaris CC-PT4 was sequenced and bioinformatics analyses were performed. The S. bacillaris CC-PT4 genome was finally assembled into five scaffolds with a genome size of 9.45 Mb and a GC content of 39.5%. It was predicted that the strain contained 4150 protein-coding genes, of which two genes encoded killer toxin and one gene encoded lysostaphin. It also contains genes encoding F1F0-ATPases, Na(+)/H(+) antiporter, cation/H(+) antiporter, ATP-dependent bile acid permease, major facilitator superfamily (MFS) antiporters, and stress response protein, which help S. bacillaris CC-PT4 adapt to bile, acid, and other stressful environments. Proteins related to flocculation and adhesion have also been identified in the S. bacillaris CC-PT4 genome. Predicted by antiSMASH, two secondary metabolite biosynthesis gene clusters were found, and the synthesized metabolites may have antimicrobial effects. Furthermore, S. bacillaris CC-PT4 carried genes associated with pathogenicity and drug resistance. Overall, the whole genome sequencing and analysis of S. bacillaris CC-PT4 in this study provide valuable information for understanding the biological characteristics and further development of this strain.


Introduction
Non-Saccharomyces Starmerella bacillaris was commonly found in grape, winemaking, and related environments [1,2]. The species was first isolated in botrytis-affected wine in Napa Valley (USA) in 2002 and was subsequently isolated from white wine in Zemplin, Hungary and identified and named Candida zemplinina as a novel species [3,4]. After isoenzyme profiles, 26S rDNA restriction profiles, and 26S rDNA sequencing analysis, it was finally classified as S. bacillaris [5].
S. bacillaris has good tolerance to high osmotolerance, which can tolerate high concentrations of sugar and ethanol, and this strain produces glycerol and low levels of ethanol and a variety of flavors during fermentation [4,6]. Furthermore, S. bacillaris has a fructophilic character and Saccharomyces cerevisiae has a glucophlilic character, which enables them to coexist for a long time in the fermentation process [7]. Many studies have found that the mixed fermentation of S. bacillaris with S. cerevisiae is beneficial to improve the aroma of wine [8][9][10].
Chemical fungicides were the traditional method to inhibit pathogens, but repeated use of these compounds usually leads to various adverse effects, such as drug resistance [11]. However, biocontrol agents are a potential alternative to reduce the use of chemical fungicides. S. cerevisiae, S. boulardii, and Kluyveromyces marxianus have all been shown to have biocontrol effects [12][13][14]. S. bacillaris has also been shown to have biocontrol potential to

Yeast Strain and Growth Conditions
The strain S. bacillaris CC-PT4 (CGMCC No. 23573) was isolated from grapes in our previous study [17]. S. bacillaris CC-PT4 was inoculated in a 250 mL flask containing 100 mL yeast peptone dextrose broth medium and cultured at 30 • C for 24 h. Afterward, cell pellets were harvested by centrifugation at 3000 rpm for 10 min, followed by DNA extraction and sequencing.

Genome Sequencing and Assembly
The S. bacillaris CC-PT4 genome was sequenced by Illumina NovSeq sequencing platform (2 × 150 bp paired-end reads) and PacBio Sequel sequencing platform with continuous long read sequencing (Personalbio, Shanghai, China). After the sequencing, Falcon and CANU [23] software were used to assemble the sequencing data to construct contig and scaffold, and pilon v1.18 [24] software was used to correct the assembly. Finally, BUSCO (Benchmarking Universal Single-Copy Orthologs, v3.0.2) [25] was used to evaluate the integrity of genome assembly.
Functional annotation of protein-coding genes was performed by searching against databases, including NCBI nr, eggNOG, KEGG, Swiss-Prot, GO, P450, TCDB, Pfam, PHI, CAZy, DFVF. The related genes involved in stress adaptation, killer toxin, lysostaphin, and drug resistance were searched in the annotation of S. bacillaris CC-PT4 genome. Secreted proteins and membrane proteins were predicted by SingalP and TMHMM. Secondary metabolite biosynthesis gene clusters in the S. bacillaris CC-PT4 genome were predicted using anti-SMASH [34], and the predicted gene clusters were annotated with the MIBIG database [35]. The S. bacillaris CC-PT4 flocculation-and adhesion-related genes were analyzed according to the reported method [36]. First, the flocculation-and adhesionrelated protein sequences were downloaded from Saccharomyces Genome Database (SGD) and then S. bacillaris CC-PT4 genome and these sequences were subjected to BLAST analysis (e-value < 1 × 10 −5 , identity > 30%).

Nucleotide Sequence Accession Number
This Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the accession JAPDUH000000000. The version described in this paper is version JAPDUH010000000.

Genome Sequencing and Assembly
The S. bacillaris CC-PT4 whole-genome sequence was finally assembled into five scaffolds with a total length of 9451933 bp. The longest scaffold was 4159713 bp, the shortest scaffold was 8503 bp, N50 is 4154241 bp, and the GC content was 39.50% (Supplementary Table S1). The circular graphical map of the S. bacillaris CC-PT4 genome is shown in Figure 1.
ary metabolite biosynthesis gene clusters in the S. bacillaris CC-PT4 genome were predicted using anti-SMASH [34], and the predicted gene clusters were annotated with the MIBIG database [35]. The S. bacillaris CC-PT4 flocculation-and adhesion-related genes were analyzed according to the reported method [36]. First, the flocculation-and adhesion-related protein sequences were downloaded from Saccharomyces Genome Database (SGD) and then S. bacillaris CC-PT4 genome and these sequences were subjected to BLAST analysis (e-value < 1 × 10 −5 , identity > 30%).

Nucleotide Sequence Accession Number
This Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the accession JAPDUH000000000. The version described in this paper is version JAPDUH010000000.

Genome Sequencing and Assembly
The S. bacillaris CC-PT4 whole-genome sequence was finally assembled into five scaffolds with a total length of 9451933 bp. The longest scaffold was 4159713 bp, the shortest scaffold was 8503 bp, N50 is 4154241 bp, and the GC content was 39.50% (Supplementary Table S1). The circular graphical map of the S. bacillaris CC-PT4 genome is shown in Figure  1.  In this study, BUSCO software was used to evaluate the integrity of the genome. The software predicts the integrity of genome assembly through the evolutionary information of single-copy direct homologous genes existing in all species in the fungal community. Finally, the integrity of genome assembly was assessed by the percentage of the number of single-copy genes that were completely aligned in the genome sequence to the total number of single-copy genes. In the S. bacillaris CC-PT4 genome, 63.3% of the genes were completely aligned to the BUSCO single-copy gene, and 33.0% were not mapped (Supplementary Table S2).

Genomic Functional Element Profiling
It was predicted that the strain S. bacillaris CC-PT4 had 286 tandem repeats and 1020 interspersed repeats scattered in the genome (Supplementary Table S3). The pre-dicted results of non-coding RNAs in the S. bacillaris CC-PT4 genome were shown in Supplementary Table S4. Non-coding RNA (ncRNA) mainly includes tRNA, rRNA, snoRNA, microRNA, siRNA, snRNA, exRNA, piRNA, scaRNA, and lncRNA. S. bacillaris CC-PT4 had one-hundred-twenty-three tRNAs, four rRNAs, and seven other non-coding RNAs. It was predicted that the strain S. bacillaris CC-PT4 had 4150 genes found in the genome, accounting for 63.70% of the total genome length, and the average length of each gene is 1450.8 bp. There were 4266 exons, with an average of one exon per gene, accounting for 63.44% of the total genome length (Supplementary Table S5).
The NCBI nr database is a non-redundant protein database. The goal is to provide a comprehensive dataset representing complete sequence information of any species [37]. Annotation results of this database contain species information. In this study, the species information of 3538 genes annotated by S. bacillaris CC-PT4 in NCBI nr was counted, and the top 25 species involved 3074 genes ( Figure 2). However, ten genes whose sequence identity was not less than 97% among the annotated genes belong to the genus Starmerella, and nine genes are S. bacillaris, of which four genes had 100% sequence identity (     Further, eggNOG was used to annotate the function of S. bacillaris CC-PT4 annotated protein, and cluster analysis was carried out. The classification results of 3185 genes annotated by eggNOG on S. bacillaris CC-PT4 were shown in Figure 3. There were 499 genes with no clear function, which may be related to the lack of research on S. bacillaris and the lack of reference genes. The most abundant annotated genes with clear functional classification were translation, ribosomal structure, and biogenesis (321 genes); and genes related to carbohydrate transport and metabolism, lipid transport and metabolism, and amino acid transport and metabolism are 146, 108, 145, respectively. However, the information annotated by extracellular structures and nuclear structures was less (four genes). There were 57 genes related to secondary metabolites biosynthesis, transport, and catabolism. In addition, no cell-motility-related genes were found, which was consistent with the fact that yeast is a non-motile microorganism [38].  The S. bacillaris CC-PT4 genome had 2610 genes annotated in the KEGG, which were divided into eight categories and fifty subcategories. The results are shown in Figure 4. Among them, the genes related to function of genetic information processing were the most abundant, followed by signaling and cellular processes. The S. bacillaris CC-PT4 genome had 2610 genes annotated in the KEGG, which were divided into eight categories and fifty subcategories. The results are shown in Figure 4. Among them, the genes related to function of genetic information processing were the most abundant, followed by signaling and cellular processes.  Each entry in the Swiss-Prot database has detailed annotations. All sequence entries have been carefully verified by experienced molecular biologists and protein chemists through computer tools and reviewing relevant literature. The number of annotations of the S. bacillaris CC-PT4 genome in the Swiss-Prot database was 3363.
The GO database divides gene functions into molecular function, cellular component, and biological process. At the same time, a gene can be annotated multiple times by GO terms. The results of the S. bacillaris CC-PT4 genome annotated by the GO database were shown in Figure 5. The S. bacillaris CC-PT4 genome had 4109 annotated genes for molecular function, 8560 for cellular component, and 10,108 for biological process in the GO database. Each entry in the Swiss-Prot database has detailed annotations. All sequence entries have been carefully verified by experienced molecular biologists and protein chemists through computer tools and reviewing relevant literature. The number of annotations of the S. bacillaris CC-PT4 genome in the Swiss-Prot database was 3363.
The GO database divides gene functions into molecular function, cellular component, and biological process. At the same time, a gene can be annotated multiple times by GO terms. The results of the S. bacillaris CC-PT4 genome annotated by the GO database were shown in Figure 5. The S. bacillaris CC-PT4 genome had 4109 annotated genes for molecular function, 8560 for cellular component, and 10,108 for biological process in the GO database.
x FOR PEER REVIEW 8 of 18

Secondary Metabolite Biosynthesis Gene Clusters
Two secondary metabolite biosynthesis gene clusters were found in the S. bacillaris CC-PT4 genome predicted by antiSMASH ( Figure 6). They were the non-ribosomal peptide synthetase cluster (NRPS) and the terpene class, respectively. The above genes were annotated with the MIBIG database, and the annotation results with the highest BLAST score for each gene were shown in Table 3.

Secondary Metabolite Biosynthesis Gene Clusters
Two secondary metabolite biosynthesis gene clusters were found in the S. bacillaris CC-PT4 genome predicted by antiSMASH ( Figure 6). They were the non-ribosomal peptide synthetase cluster (NRPS) and the terpene class, respectively. The above genes were annotated with the MIBIG database, and the annotation results with the highest BLAST score for each gene were shown in Table 3.

Killer Toxin and Lysostaphin Encoding Genes
Killer toxin is a toxic protein secreted by yeast that can kill other yeasts but has no killing effect on itself [39]. Lysostaphin can specifically hydrolyze the pentaglycine crosslinks of S. aureus peptidoglycan, resulting in lysis of S. aureus, and has the same bacteriostatic effect on drug-resistant S. aureus [40]. According to the annotation results of the S. bacillaris CC-PT4 genome, two genes encoded killer toxins and one gene encoded lysostaphin (Table 4).

Adaptation to Stress Analysis
According to the annotation of the yeast genome in eggNOG, Swiss-Prot, and the NCBI nr database, the related genes involved in the stress adaptation were searched. The results showed that there were many genes in the S. bacillaris CC-PT4 genome that facilitate strain adaptation to harsh stress, including pH stress resistance, bile stress resistance, oxidative stress resistance, ionic and heavy metal stress resistance, heat stress resistance, and other stress resistance (Supplementary Table S6). Predicted results of secondary metabolite biosynthesis gene clusters in the Starmerella bacillaris CC-PT4.

Killer Toxin and Lysostaphin Encoding Genes
Killer toxin is a toxic protein secreted by yeast that can kill other yeasts but has no killing effect on itself [39]. Lysostaphin can specifically hydrolyze the pentaglycine crosslinks of S. aureus peptidoglycan, resulting in lysis of S. aureus, and has the same bacteriostatic effect on drug-resistant S. aureus [40]. According to the annotation results of the S. bacillaris CC-PT4 genome, two genes encoded killer toxins and one gene encoded lysostaphin (Table 4).

Adaptation to Stress Analysis
According to the annotation of the yeast genome in eggNOG, Swiss-Prot, and the NCBI nr database, the related genes involved in the stress adaptation were searched. The results showed that there were many genes in the S. bacillaris CC-PT4 genome that facilitate strain adaptation to harsh stress, including pH stress resistance, bile stress resistance, oxidative stress resistance, ionic and heavy metal stress resistance, heat stress resistance, and other stress resistance (Supplementary Table S6).

Drug Resistance Gene Analysis
According to the annotation results of the S. bacillaris CC-PT4 genome, it can be seen that there are 30 genes related to drug resistance in this strain (Supplementary Table S8).

Pathogenicity Analysis
The statistics of the number of PHI phenotype mutation type genes predicted based on the S. bacillaris CC-PT4 whole-genome were shown in Figure 8. A total of 1013 genes were annotated through the PHI database, of which the largest number was reduced virulence, with 545 genes, while the number of enhanced antagonism was zero. Through further analysis, it was found that there were 34 genes related to human disease (Table 5). SPI1, SUN4, UTR2, YPS1, YPS3 (Supplementary Table S7).

Drug Resistance Gene Analysis
According to the annotation results of the S. bacillaris CC-PT4 genome, it can be seen that there are 30 genes related to drug resistance in this strain (Supplementary Table S8).

Pathogenicity Analysis
The statistics of the number of PHI phenotype mutation type genes predicted based on the S. bacillaris CC-PT4 whole-genome were shown in Figure 8. A total of 1013 genes were annotated through the PHI database, of which the largest number was reduced virulence, with 545 genes, while the number of enhanced antagonism was zero. Through further analysis, it was found that there were 34 genes related to human disease (Table 5).    DFVF is a comprehensive online database of fungal virulence factors. The S. bacillaris CC-PT4 whole-genome was compared with the DFVF database and 529 annotation results were obtained. Indeed, 443 of these annotated genes also appeared in the annotation results of the PHI database, and 20 of them were related to human disease (Table 5).

Discussion
S. bacillaris is often isolated from grapes and winemaking environments, improves the flavor of wine, and acts as a biocontrol agent to inhibit fungi [15,41,42]. Our recent study has shown that S. bacillaris CC-PT4 also had inhibitory effects on MRSA and was tolerant to harsh environments, such as acids and bile salts [17]. In this study, the whole genome of S. bacillaris CC-PT4 was sequenced and analyzed. The results showed that the genome size of yeast was 9.45 Mb, and the GC content was 39.5%. GC content is a feature of microbial taxonomic descriptions [43]. The genome sizes of S. bacillaris type strain CBS 9494, S. bacillaris FRI751, and S. bacillaris PAS13 were 9.3 Mb, 9.3 Mb, and 9.4 Mb, respectively. The GC contents were 39.4%, 39.4%, and 39.45%, respectively [22,44,45]. The genome size and GC content of S. bacillaris CC-PT4 were close to these strains, which verified that the strain belonged to S. bacillaris.
The completeness of genome assembly was assessed in this study according to the percentage of the number of single-copy genes that are completely aligned in the genome sequence to the total number of single-copy genes. As a result, 63.3% of the genes of this strain could be completely aligned to the BUSCO single-copy gene. This may be due to the lack of data in the database or acceleration of genome evolution of the strain [46]. Research has also shown that the S. bacillaris strains isolated in the brewing environment are very diverse at the genetic level and contain a large number of genes of alien origin in the process of evolution [21,47].
The S. bacillaris CC-PT4 genome was predicted to contain 4150 protein-coding genes, and several databases were used to annotate these genes. However, according to the annotation by the NCBI nr database, the maximum number of annotated genes corresponding to Wickerhamiella sorbophila was 2432; that is, 68.74% of the genes annotated by S. bacillaris CC-PT4 have homology with Wickerhamiella sorbophila but only nine genes annotated with S. bacillaris (only 0.25%). In fact, according to the molecular phylogeny of whole-genome data, it has been proved that the phylogenies of Starmerella and Wickerhamiella are very close, belonging to an evolutionary branch [48]. In addition, it may be due to the lack of S. bacillaris protein data in the NCBI database, resulting in fewer annotations to S. bacillaris species. In fact, there were 4740 reference sequences of Wickerhamiella sorbophila protein in NCBI RefSeq and only 15 reference sequences of Starmerella bacillaris protein (https://www.ncbi.nlm.nih.gov/protein/, accessed on 26 October 2022).
The prediction of CAZy in the S. bacillaris CC-PT4 genome found that the highest content was glycosyl transferases (GTs) with 44 genes, followed by glycoside hydrolases (GHs) with 28 genes and without polysaccharide lyases (PLs). GH enzymes have the potential to hydrolyze complex carbohydrates, and GTs are important for surface structures recognized by the host immune system. Higher numbers of glycosyl transferases (GTs) and glycoside hydrolases (GHs) suggest the strain has potential in defense against pathogens and immune stimulation [49]. Moreover, 17 of these CAZy were secreted proteins, of which two are GTs and fifteen GHs (Supplementary Table S9). Among them, scaffold1.t1602 and scaffold2.t307 belong to GH18, which can degrade fungal cell wall. Additionally, scaffold2.t307 was annotated by Swiss-Prot as killer toxin subunits alpha/beta, which can interact with the cell walls of sensitive cell and block growth of them. Killer toxin subunit alpha is a potent chitinase, and the GH18 family also includes chitinases, indicating that scaffold2.t307 was annotated consistently by database CAZy and Swiss-Prot. This proves that scaffold2.t307 has the function of destroying fungal cell wall.
S. bacillaris CC-PT4 has been shown to inhibit the growth of methicillin-resistant Staphylococcus aureus (MRSA) [17]. In this study, it was found that this strain has a gene encoding lysostaphin (scaffold2.t1859) through annotation of the S. bacillaris CC-PT4 genome by Swiss-Prot (Table 4). Moreover, many bacteriostatic substances were also found in the annotation results of the secondary metabolite biosynthesis gene clusters in the S. bacillaris CC-PT4 genome ( Table 3). The compound predicted by scaffold1.t210 and scaffold1.t749 was ustilagic acid, which has a broad bacteriostatic effect against both bacteria and fungi [50]. Further, scaffold1.t750-annotated compound squalestatin S1 has antifungal effects [51]. The compounds annotated by scaffold1.t212 were either Sch-47554 or Sch-47555. Both compounds also have antifungal effects [52].
It has been shown that S. bacillaris CC-PT4 was tolerant of harsh conditions such as temperature, bile salts, and acids [17]. This study discovered the S. bacillaris CC-PT4 genome contains several genes that help this strain adapt to harsh conditions. There were seven F1F0-ATPases, two Na(+)/H(+) antiporters, one cation/H(+) antiporter, and multiple proton ATPases and related subunits in the yeast genome (Supplementary Table S6). F1F0-ATPase, cation/H(+) antiporter, and Na(+)/H(+) antiporter have the effect of exporting protons from the cytoplasm and are considered to be the main factors for regulating pH in cells, increasing the resistance of strain to acid [49,53]. Bile salts have toxic effects, mainly destroying the cell membrane and cell wall, inducing DNA damage and oxidative stress [54,55]. Production of bile salt hydrolase is an effective way to cope with the toxicity of bile salts, but bile salt tolerance can also be improved by actively excreting bile salts and expressing some genes that maintain the cell wall, cell membrane, and general stress response [54]. No related genes encoding bile salt hydrolase were found in the S. bacillaris CCPT4 genome, but there were genes encoding ATP-dependent bile acid permease (Supplementary Table S6). In addition, there are major facilitator superfamily (MFS) antiporters in the S. bacillaris CCPT4 genome (Supplementary Table S8), which can not only expel drugs from cells but also eliminate substances such as bile salts. These genes contribute to S. bacillaris CCPT4 resistance to bile salt stress. In addition, the genes involved in oxidative stress resistance were found in the S. bacillaris CCPT4 genome (Supplementary Table S6), such as superoxide dismutase. Magnesium transporter, zinc transporter, and metal resistance protein for ionic and heavy metal stress resistance were found in the S. bacillaris CCPT4 genome (Supplementary Table S6). A number of heat shock proteins were also identified, indicating the heat tolerance in S. bacillaris CCPT4. Moreover, the S. bacillaris CCPT4 genome contains genes encoding general stress response protein, DNA repair protein, cell wall integrity and stress response component 4, etc., which help this strain cope with harsh environments. By BLAST analysis, five genes in the S. bacillaris CCPT4 genome encoded flocculation proteins (FLO1, FLO5, FLO9, FLO10, FLO11). However, a gene had sequence identity with multiple fragments of a flocculation protein, or a gene had sequence identity with several flocculation proteins (Figure 7). This is because the sequence fragments in flocculation proteins are repetitive, and the increase or decrease in these repeating units also affects the adhesion properties of flocculation proteins [36]. Additionally, FLO1, FLO5, FLO9, and FLO10 have sequence homology [56]. Flocculation proteins are cell-cell adhesion, and adhesion is the interaction with other foreign surfaces. In the S. bacillaris CCPT4 genome, 65 genes associated with adhesion were found.
The safety-related genes of S. bacillaris CCPT4 were also analyzed, including drug resistance and pathogenic genes. Based on the annotation results of protein-coding genes, 30 genes related to drug resistance were identified, including ABC multidrug transporter, multidrug resistance protein, MFS transporter, MFS multidrug transporter, and MFS antiporter (Supplementary Table S8). ABC multidrug transporter may increase resistance to azoles. The MFS transporter family is a multidrug efflux system that can transport a variety of structurally unrelated compounds from cells, including cycloheximide and azoles, making strains resistant to many compounds [57,58]. Previous research has also shown that S. bacillaris CCPT4 was resistant to fluconazole and itraconazole but sensitive to amphotericin B [17]. Combined with the annotations from the PHI and DFVF databases, some virulence factors may be present in the S. bacillaris CCPT4 genome, so this strain should be used with more caution and further studies should be conducted.

Conclusions
In this study, the whole genome sequence of S. bacillaris CC-PT4 was assembled and bioinformatics analyses were performed. The whole genome size of the strain was 9.45 Mb and the GC content was 39.50%. Further, 4150 protein-coding genes were predicted and annotated using several bioinformatics databases. The annotation results of protein-coding genes revealed that many genes were related to adaptation to stress, secondary metabolite, antibacterial function, safety, etc., including two secondary metabolite biosynthesis gene clusters, and two genes encoded killer toxin and one gene encoded lysostaphin. In all, the whole genome sequence of S. bacillaris CC-PT4 helps to better understand the characteristics of this strain, which is conducive to mining and application of this strain.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/jof8121255/s1, Table S1: Statistics of genome assembly results, Table S2: Integrity assessment of genome assembly, Table S3: Statistics of repetitive sequences in the S. bacillaris CC-PT4 genome, Table S4: Statistics of non-coding RNAs, Table S5: Summary of predicted genes, Table S6: Stress responsive genes in S. bacillaris CC-PT4 genome, Table S7: S. bacillaris CC-PT4 whole-genome BLAST alignment with flocculin and adhesion proteins, Table S8: Drug resistance related genes in S. bacillaris CC-PT4 whole-genome, Table S9: List of carbohydrate-active enzyme genes in secreted proteins.