The Complete Genome of Brucella Suis 019 Provides Insights on Cross-Species Infection

Brucella species are the most important zoonotic pathogens worldwide and cause considerable harm to humans and animals. In this study, we presented the complete genome of B. suis 019 isolated from sheep (ovine) with epididymitis. B. suis 019 has a rough phenotype and can infect sheep, rhesus monkeys and possibly humans. The comparative genome analysis demonstrated that B. suis 019 is closest to the vaccine strain B. suis bv. 1 str. S2. Further analysis associated the rsh gene to the pathogenicity of B. suis 019, and the WbkA gene to the rough phenotype of B. suis 019. The 019 complete genome data was deposited in the GenBank database with ID PRJNA308608.


Introduction
Brucella is a genus of Gram-negative bacteria. They are small (0.5 to 0.7 by 0.6 to 1.5 µm), non-encapsulated, flagellated, facultatively intracellular coccobacilli [1]. Brucella causes the brucellosis in wild and domestic animals, even when transmitted from human to human. The brucellosis can have a considerable impact on human and animal health, as well as on economics, especially in developing countries where rural income relies largely on livestock breeding and dairy products [2]. The genus Brucella is generally classified into 10 species, which are Brucella abortus, Brucella melitensis, Brucella suis, Brucella ovis, Brucella canis, Brucella neotomae, Brucella pinnipedialis, Brucella ceti, Brucella microti, and Brucella inopinata, based on host preference and phenotypic characteristics [3,4]. Among these species, B. melitensis, B. suis and B. canis distribute more widely and virulently.
The strain named B. suis 019 infected sheep (ovine), rhesus monkeys and possibly humans. The 019 strain was first discovered in the 1980's when the sheep epididymitis, usually caused by the B. ovis, broke out widely in the province of Xinjiang, China. After that, the 019 strain was isolated from the semen of sick sheep (ovine) and initially identified as a B. ovis strain by the serological and bacteriological tests [5]. Then, this identification was confirmed by the biochemical tests [6]. Later, the significant differences between the 019 strain and the other B. ovis strains were found through a series of experiments. The animal experiments proved the 019 strain infected rhesus monkeys and caused damage to many organs [7]. The molecular biological experiments showed some featured genes of the 019 strain were quite different from those of B. ovis [8]. In 2010, Wang et al. revealed that there were significant differences between the 019 strain and the 63/290 reference strain on both DNA and amino acid levels and concluded that the 019 strain was a unique local strain to Xinjiang [9]. However, the taxonomic status and infection mechanism of the 019 strain were still confusing.
In 2013, we assembled the draft genome of B. suis 019 using 90 bp Next-generation sequencing (NGS) technology and performed the comparative genomic analysis to reveal that the 019 strain belongs to B. suis and is far from B. ovis or B. melitensis. Although the B. suis 019 draft genome made effective progress, the draft genome missed some important information, e.g., genomic structure variation or rearrangement. Since pathogenic bacteria often exhibit a high degree of genomic rearrangement [10], we assembled the complete genome of B. suis 019 using the 250 bp NGS technology with Sanger sequencing confirmation. We also compared the B. suis 019 complete genome with the other 15 Brucella complete genomes to reach two research goals: 1) to confirm the taxonomic status of B. suis 019 strain based on the complete genome analysis; 2) to associate B. suis 019 strain's rough phenotype and pathogenicity to some sequence features on the genome level.

Complete Genome Sequencing, Assembly and Annotation
The raw NGS data contained 2ˆ688,568 paired reads with the length of 251 bp. After removing low quality regions, adapters and viral sequences, a total of 1,368,448 cleaned reads were produced for genome assembly. Using the cleaned reads, 14 and 6 scaffolds were assembled for chromosome 1 and 2. Then, we used the PCR plus Sanger sequencing to fill the gaps (Methods), producing the B. suis 019 complete genome (80ˆdepth) containing two chromosomes with the length 2,098,391 bp and 1,204,433 bp, respectively (Supplementary file 1). The assembled B. suis 019 complete genome has a total sequence length of 3,302,824 bp, which is 3717 bp longer than the total length of the draft genome. This complete genome has the GC content 57.27%, which is very close to the GC content 57.28% of the draft genome. We predicted 1972 and 1119 proteins for 019 chromosome 1 and 2 (Supplementary file 2). Compared to the predicted 3529 ORFs using the draft genome, 3091 is closer to the total protein number of the other Brucella complete genomes ( Table 1). All of the predicted proteins were annotated by the NCBI NR database and the Gene Ontology terms (Supplementary file 3). These proteins were predicted to involve 125 KEGG metabolism pathways (Supplementary file 4). * These data were not used in this study. Chr1_len is the length of chromosome 1. Chr2_len is the length of chromosome 2. Gen# is the total gene number on this chromosome. Chr1_ID and Chr2_ID is the RefSeq or Genbank Accession Number.

Phylogenetic Analysis
Using 2,537 homologous genes from 51 Brucella genomes including the B. suis 019 draft genome (Methods), Phylogenetic Tree 1 was built to show six well-supported clades including B. melitensis, B. abortus, B. ovis, Brucella from marine mammals, B. suis and canis, and others ( Figure 1A). These results confirmed that B. suis 019 belongs to the B. suis and is closest to B. suis bv. 1 str. S2 ( Figure 1B). The vaccine strain B. suis S2, which had been developed in China in the 1970's, was effective for oral vaccination of sheep, goats, cattle and pigs [11]. B. suis S2 has been widely used for prevention of animal brucellosis in China over many years.
Using chromosome 1 and 2 sequences from 16 Brucella complete genomes (Table 1), we built Phylogenetic Trees 2 and 3, separately ( Figure 1C,D). Phylogenetic Trees 2 and 3 had congruence with Phylogenetic Tree 1 on three points. The first point was that B. suis 019 belongs to B. suis species rather than B. ovis. The second point was that the debated B. melitensis biovar Abotus 2308 belongs to the B. abortus and is not a biovar of B. melitensis. The last point was that the B. ovis is far from other species, which confirmed a previous study. In that study, Foster et al. used 20,154 SNPs from 13 Brucella genomes to show that most Brucella species had diverged from their common B. ovis ancestor in the past 86,000 to 296,000 years, which preceded the domestication of their livestock hosts [12].
Using 16 complete genomes, we found the discrepancy between Phylogenetic Trees 2 and 3. Phylogenetic Tree 2 revealed a well supported topology that placed B. ovis, B. microti, B. canis, B. suis, B. pinnipedialis, B. abortus and B. melitensis in different clades ( Figure 1C). Meanwhile Phylogenetic Tree 3 classified B. ovis and B. pinnipedialis into one clade and did not classify B. suis and B. canis into two well separated groups as Phylogenetic Tree 2 did. The latter phenomenon, named the paraphyly of B. suis, was discovered in two previous studies [12]. The results in these study suggested the paraphyly of B. suis could be attributed to chromosome 2 (Phylogenetic Tree 3). To further investigate the relationship between different Brucella species, we conducted a collinear analysis of 16 complete genomes to provide more detailed information on the genomic regions.

Comparative Genome Analysis Using 16 Complete Genomes
Chromosome sequences from B. suis 019 and the other 15 Brucella species were aligned using the Mauve software (Methods). Mauve identified nine and seven locally collinear blocks (LCBs) for the 019 chromosome 1 and 2 (Supplementary file 5). LCBs are conserved segments that appear to be internally free from genome rearrangements [13]. Compared to the smaller LCB size in genomes of other genura (e.g., 78 LCBs in Yersinia [10] and 243 LCBs in Shewanella [14]), the large LCB size in including three LCBs was observed on the B. abortus S19, 2308 and 9-941 genome. A 3952 bp inversion was observed on the B. ovis 25840 genome. These results supported a previous conclusion that Brucella chromosome 2 is more dynamic, perhaps owing to its hypothesized origin as a plasmid [12].

Comparative Genome Analysis Using 16 Complete Genomes
Chromosome sequences from B. suis 019 and the other 15 Brucella species were aligned using the Mauve software (Methods). Mauve identified nine and seven locally collinear blocks (LCBs) for the 019 chromosome 1 and 2 (Supplementary file 5). LCBs are conserved segments that appear to be internally free from genome rearrangements [13]. Compared to the smaller LCB size in genomes of other genura (e.g., 78 LCBs in Yersinia A 765,784 bp inversion including three LCBs was observed on the B. abortus S19, 2308 and 9-941 genome. A 3952 bp inversion was observed on the B. ovis 25840 genome. These results supported a previous conclusion that Brucella chromosome 2 is more dynamic, perhaps owing to its hypothesized origin as a plasmid [12]. Since B. suis 019 is closest to B. suis S2 among all the B. suis species ( Figure 1A), we conducted a CDS syntenic analysis between the B. suis 019 complete genome and the B. suis S2 draft genome. The results showed a good conservation of synteny and collinearity between B. suis 019 and the B. suis S2 genome ( Figure 4). Then, we blasted all the CDS sequences of B. suis S2 to the B. suis 019 complete genome. All of the 3230 CDS sequences of B. suis S2 could be covered by the hits from the B. suis 019 genome, 99.54% (3215/3230) of which have the identity 100% to the query sequences. We also blasted all the CDS sequences of B. suis 019 to the B. suis S2 draft genome. All of the 3091 CDS sequences of B. suis 019 were able to be covered by the hits from the B. suis S2 genome, 99.78% (3084/3091) of which have the identity 100% to the query sequences. The comparison between B. suis 019 and the B. suis S2 showed there were eight genes absent in the B. suis 019 ( Table 2) and 19 genes with significant mutation between B. suis S2 and B. suis 019 (Supplementary file 6). Among 19 genes, four genes have a single copy in the B. suis 019 complete genome (Table 2). Particularly, we found that a 21 bp nucleotide deletion in the rsh gene resulted in a seven amino acid deletion QKRASGD. Based on search results from the NCBI website, we reported this as a new mutation of rsh gene.
Since B. suis 019 is closest to B. suis S2 among all the B. suis species ( Figure 1A), we conducted a CDS syntenic analysis between the B. suis 019 complete genome and the B. suis S2 draft genome. The results showed a good conservation of synteny and collinearity between B. suis 019 and the B. suis S2 genome ( Figure 4) (Table 2). Particularly, we found that a 21 bp nucleotide deletion in the rsh gene resulted in a seven amino acid deletion QKRASGD. Based on search results from the NCBI website, we reported this as a new mutation of rsh gene. The product of rsh named GTP pyrophosphokinase rsh (EC: 2.7.6.5) is a mediator of the stringent response that coordinates a variety of cellular activities in response to changes in nutritional abundance. Rsh is required for Brucella to express the type IV secretion system VirB, a major virulence factor of Brucella and therefore plays a role in adaptation to low-nutrient environments. This was evidenced using the Rsh deletion mutants in B. suis and B. melitensis in a previous study [15]. Comparative transcriptional analysis between B. suis 1330 wild-type and Δrsh mutant showed the Rsh-dependent up-regulation of 198 genes and down-regulation of 181 genes, which together account for 11.6% of the genome [16]. The Rsh protein (Uniprot: Q8CY42) with a length of 750 AA (amino acids) has four Pfam domains, HD_4 (residues 26-176), RelA_SpoT (residues 235-346), TGS (residues 392-451) and ACT_4 (residues 669-747). The seven amino acid deletion (residues 37-43) belongs to the HD_4 domain. We used the PredictProtein server (https://www.predictprotein.org) to analyze the The product of rsh named GTP pyrophosphokinase rsh (EC: 2.7.6.5) is a mediator of the stringent response that coordinates a variety of cellular activities in response to changes in nutritional abundance. Rsh is required for Brucella to express the type IV secretion system VirB, a major virulence factor of Brucella and therefore plays a role in adaptation to low-nutrient environments. This was evidenced using the Rsh deletion mutants in B. suis and B. melitensis in a previous study [15]. Comparative transcriptional analysis between B. suis 1330 wild-type and ∆rsh mutant showed the Rsh-dependent up-regulation of 198 genes and down-regulation of 181 genes, which together account for 11.6% of the genome [16]. The Rsh protein (Uniprot: Q8CY42) with a length of 750 AA (amino acids) has four Pfam domains, HD_4 (residues 26-176), RelA_SpoT (residues 235-346), TGS (residues 392-451) and ACT_4 (residues 669-747). The seven amino acid deletion (residues 37-43) belongs to the HD_4 domain. We used the PredictProtein server (https://www.predictprotein.org) to analyze the rsh properties and predicted that all of the deleted seven amino acids had the coil secondary structures. These seven amino acids were predicted in the disorder regions. Moreover, the Glutamine (Q) on residue 37 was predicted as a protein binding region. Compared to the other three single copy genes (Table 2), the rsh gene with the seven amino acid deletion is more likely to be associated to the acquired pathogenicity of the B. suis 019.
Genes 2016, 7, 7 6 of 13 rsh properties and predicted that all of the deleted seven amino acids had the coil secondary structures. These seven amino acids were predicted in the disorder regions. Moreover, the Glutamine (Q) on residue 37 was predicted as a protein binding region. Compared to the other three single copy genes (Table 2), the rsh gene with the seven amino acid deletion is more likely to be associated to the acquired pathogenicity of the B. suis 019.   rsh properties and predicted that all of the deleted seven amino acids had the coil secondary structures. These seven amino acids were predicted in the disorder regions. Moreover, the Glutamine (Q) on residue 37 was predicted as a protein binding region. Compared to the other three single copy genes (Table 2), the rsh gene with the seven amino acid deletion is more likely to be associated to the acquired pathogenicity of the B. suis 019.    Gene-ID has two formats. The format "BSS2_xxxx" is the tag name in the GenBank data CP006961.1 (B. suis S2 chromosome 1) and CP006962.1 (B. suis S2 chromosome 2). The format "chrx_xxxx" is the gene ID of B. suis 019 complete genome (Supplementary files 1 and 2). The first eight genes are absent in the B. suis 019 complete genome. The other four genes have significant mutation from B. suis S2 to B. suis 019.

Beta-Ketoadipate Pathway and Lipopolysaccharide
One important characteristic of Brucella is that it shares two pathways with the soil microorganisms. These two pathways, beta-ketoadipate pathway and homoprotocatechuate pathway are widely distributed among diverse soil microorganisms and play a central role in the processing and degradation of plant-derived aromatic compounds. The B. suis 1330 genome (GenBank: AE014291.4 and AE014292.2) includes these two intact pathways, the genes of which are located on chromosome 2 [17]. The homoprotocatechuate pathway includes eight protein-coding genes (AE014292.2: BRA1155-BRA1162). These genes numbered from chr2_1076 to chr2_1084 (Table 3) were found in B. suis 019 in the same order on chromosome 2 with sequence identity 100%. The beta-ketoadipate pathway includes 12 protein-coding genes (AE014292.2:BRA0636-BRA0647). One previous study showed that at least 1 of the 12 genes carried by every Brucella genome except B. suis 1330 has become a pseudogene and 12 genes are completely missing in B. suis ATCC 23445 [18]. These 12 genes numbered from chr2_443 to chr2_454 (Table 3) were found in B. suis 019 in the same order on chromosome 2 with sequence identity 100%.  Lipopolysaccharide (LPS) is the major structural component of the outer membrane of gram-negative bacteria. It is composed of a lipid core, a core oligosaccharide, and a distal O-polysaccharide (O-PS) side chain [19]. The presence of the intact O-PS produces smooth phenotypes in B. melitensis, B. suis and B. abortus, while the absence or disruption of O-PS produces rough phenotypes in B. canis and B. ovis with the lipid core and the core oligosaccharide. An unexpected finding on the B. suis 019 was its rough morphology. Several studies indicated specific genes are important for the development of the smooth phenotype in Brucella [19]. Until now, 19 genes have been indicated as being important in producing smoothness. The disruption of 13 genes, Pgm (BR0058), WbkD (BR0510), WbkF (BR0511), Wzm (BR0520), Per (BR0521), Gmd (BR0522), WbkA (BR0529), ManB (BR0537), WbkE (BR0540), Wa** (BR0615), WboA (BR0981), WboB (BR0982) and ManBcore (BRA0348), resulted in a rough phenotype in B. melitensis and six genes, WbkC (BR0517), Wzt (BR0519), ManC (BR0538), ManA (BR0539), LpcC (BR1503) and ManCcore (BRA0347), were identified as playing roles that had not been fully determined (Table 3). Based on the alignment results, we found that the WbkA gene was disrupted in the B. suis 019 complete genome with the other 18 genes 100% identical to their orthologs in B. suis 1330. Previous studies have demonstrated that spontaneous excision of the WbkA glycosyltranferase gene was a cause of dissociation of smooth to rough Brucella [20]. Therefore, we proposed that the disruption of the WbkA gene resulted in the rough B. suis 019.

Sample Preparation, DNA-seq Library Construction
The B. suis 019 strain was obtained from the key laboratory of prevention and control of animal disease, Shihezi University. This bacteria had been originally isolated from the sperm of sheep in the province of Xinjiang in China in 1983, then processed by freeze-drying for long-term preservation. In this study, B. suis 019 was cultured at 37˝C for 72 h using the streak plate method. Single bacterial colonies were inoculated in the TS broth at 37˝C with shaking for 48 h. Bacteria were collected by centrifugation at 10,000 rpm for 1 min and washed twice with sterile deionized water. Total genomic DNA was extracted and purified by the GENEray™ Bacteria Whole Genome DNA Extraction Kit GK1072 (Generay Biotech, Shanghai, China). The DNA purity and concentration was measured by a NanoDrop™ spectrophotometer. DNA fragmentation was conducted using an ultrasound machine. DNA fragments of around 500 bp size were separated and collected using Agarose Gel Electrophoresis. Finally, one DNA library was constructed using the Illumina TruSeq™ (Illumina, San Diego, CA, USA) DNA Sample Prep Kits for the draft genome sequencing. The same procedure was conducted to construct another DNA library for the complete genome sequencing by a different experimenter one year later.

Draft Genome Sequencing, Assembly and Annotation
The DNA-seq library was sequenced using the Illumina HiSeq™ 2000 platform. De novo assembly of the B. suis 019 draft genome was performed using the SOAPdenovo 1.05 [21]. Gene prediction was performed using the software Glimmer 3.02 [22]. The raw NGS data contained paired reads with the left read length of 90 bp and right read length of 70 bp. We produced a total of 330 M bp cleaned NGS data, roughly covering 100 fold (100ˆ) of the B. suis 019 draft genome. The assembled B. suis 019 draft genome has a total sequence length of 3,299,107 bp with the GC content 57.28%. This assembly produced 30 scaffolds and 722 contigs (Methods). The scaffold N50 and contig N50 is 259,978 bp and 7677 bp, respectively. We predicted 3,529 ORFs with the average length of 804 bp. This data was submitted to the GenBank WGS database with ID ANOZ00000000.

Complete Genome Sequencing, Assembly and Annotation
The DNA-seq library was sequenced using the Illumina HiSeq™ 2000 platform. After removing low quality regions, adapters and viral sequences, the cleaned reads were produced for genome assembly using the software Fastq_clean [23]. De novo assembly of the B. suis 019 genome was performed using the Celera Assembler version 8.1 [24] to produce scaffolds with the default setting. Blastn [25] searching was conducted against the NCBI bacterial genome database with the scaffolds to find the best matched genome B. ceti TE28753-12 as the reference genome. Based on the reference sequence NC_022907.1 and NC_022908.1, we applied the LASTZ and Chain/Net to order the scaffolds on two chromosomes, respectively. The gaps within and between the scaffolds were closed with the GapFiller [26]. Gene prediction was performed using the software Prodigal 2.60 [27]. All putative genes were annotated based on the NCBI NR database. Functional categorization by Gene Ontology (GO) terms and KEGG pathway annotation was carried out based on the best 20 blastx hits from the NR database using the Blast2GO software [24].

Phylogenetic Analysis Using Homologous Genes
Using all the annotated genes in the B. suis 1330 genome as a reference, we aligned the genes of the other 50 genomes to the reference genes using the blastn software. Taking the sequence identity 70% as threshold, we obtained a total of 2537 homologous genes from the alignment results. These homologous genes were linked into 51 super homologous sequences. The length of the super homologous sequences is about 2,226,048 bp covering more than 2/3 of the Brucella genome. The multiple alignment of 51 super homologous sequences was implemented using ClustalW 2.0 [28]. At last, a phylogenetic tree was built using the UPGMA (unweighted pair-group method with arithmetic means) method in the software MEGA 5.0 [29].

Comparative Genome Analysis Using 16 Complete Genomes
Removing two genomes due to their abnormal genome size (ATCC 23445) or GC content (A13334), we used B. suis 019 with 15 other complete genomes for the analysis ( Table 1). The phylogenetic trees for chromosomes 1 and 2 were constructed using the software Mauve 2.4.0 with the default setting [13]. The collinear analysis and result display was also conducted using Mauve 2.4.0. To clearly display the collinear analysis result in Mauve, we rotated 15 genome sequences to roughly align the collinear regions to start from similar regions on the chromosome.

Conclusions
In this study, we presented the complete genome of B. suis 019 and conducted comparative genome analysis. B. suis 019 was identified to be closest to the vaccine strain B. suis bv. 1 str. S2. Based on further analysis results, we associated the rsh gene to the pathogenicity of B. suis 019, and the WbkA gene to the rough phenotype of B. suis 019.
Supplementary Materials: The following are available online at www.mdpi.com/link.