Strains Associated with Two 2020 Welder Anthrax Cases in the United States Belong to Separate Lineages within Bacillus cereus sensu lato

Anthrax-causing members of Bacillus cereus sensu lato (s.l.) pose a serious threat to public health. While most anthrax-causing strains resemble B. anthracis phenotypically, rare cases of anthrax-like illness caused by strains resembling “B. cereus” have been reported. Here, whole-genome sequencing was used to characterize three B. cereus s.l. isolates associated with two 2020 welder anthrax cases in the United States, which resembled “B. cereus” phenotypically. Comparison of the three genomes sequenced here to all publicly available, high-quality B. cereus s.l. genomes (n = 2890 total genomes) demonstrated that genomes associated with each case effectively belonged to separate species at the conventional 95% average nucleotide identity prokaryotic species threshold. Two PubMLST sequence type 78 (ST78) genomes affiliated with a case in Louisiana were most closely related to B. tropicus and possessed genes encoding the Bps exopolysaccharide capsule, as well as hemolysin BL (Hbl) and cytotoxin K (CytK). Comparatively, a ST108 genome associated with a case in Texas was most closely related to B. anthracis; however, like other anthrax-causing strains most closely related to B. anthracis, this genome did not possess Bps-, Hbl-, or CytK-encoding genes. Overall, results presented here provide insights into the evolution of anthrax-causing B. cereus s.l.

In addition to its ability to produce anthrax toxins and a polyglutamate capsule, the B. anthracis species, as defined by the United States Food and Drug Administration's Bacteriological Analytical Manual (FDA BAM; referred to hereafter as the "historical B.

B. cereus s.l. Strains with "B. cereus"-like Phenotypic Characteristics were Responsible for Welder Anthrax Cases in Louisiana and Texas
The genomes of three B. cereus s.l. isolates associated with two human clinical welder anthrax cases were sequenced in this study (Table 1 and Supplementary Table S1) [35]. Anthrax toxin-encoding genes cya, lef, and pagA were detected in all three genomes sequenced here (Table 2 and Supplementary Table S1). Notably, three separate single-or multi-locus sequence typing (SLST and MLST, respectively) methods were able to differentiate the two isolates associated with Patient F's case in Louisiana from the isolate associated with Patient G's case in Texas (i.e., eight-group panC phylogenetic group assignment, PubMLST seven-gene MLST, rpoB allelic typing; Table 2 and Supplementary Table S1). PubMLST's seven-gene MLST scheme for "B. cereus" [38,39] assigned the Louisiana and Texas isolates to sequence types (STs) ST78 and ST108, respectively (Table 2). To avoid potential taxonomic ambiguities associated with B. cereus s.l. species names (discussed in detail below), the three genomes sequenced here will be discussed largely within the context of their PubMLST STs, as the PubMLST framework is well-established [38,39] and likely interpretable to most readers.  34 Male Welder Fatal a Data from human clinical cases described by Dawson et al. [35]; b Patient F is a Mississippi resident who had recently worked as a welder in Louisiana [35]; c Patient F recovered after receiving anthrax antitoxin [35]; d Isolate BacLA2020b was recovered from a soil sample taken at Patient F's worksite in Louisiana via an investigation conducted by the U.S. Centers for Disease Control and Prevention (CDC) [35]; e NA, not applicable for environmental isolates.  [40]; all genomes were assigned to Group III using the seven-group scheme developed by Guinebretiere et al. (https://www.tools.symprevius.org/bcereus/; accessed on 20 February 2022) [41,42]; b Sequence type (ST) assigned using PubMLST's seven-gene multi-locus sequence typing (MLST) scheme for "B. cereus" [38,39] and BTyper3 v3.2.0 [40]; c rpoB allelic type (AT) [43] assigned using the original BTyper (BTyper v2.3.4) [44]; d Selected virulence factors detected in each genome using BTyper3 v3.2.0 [40]; presence and absence of virulence factors are denoted by "+" and "−", respectively. Virulence factors were first detected using default thresholds (70% amino acid identity and 80% coverage); virulence factor absence was confirmed by lowering the minimum coverage threshold to 0%; e Each of anthrax toxin-encoding cya, lef, and pagA were detected in all three genomes sequenced here; f None of poly-γ-D-glutamic acid (polyglutamate Cap) capsule-encoding capBCADE were detected in any genome sequenced here; g All of exopolysaccharide (Bps) capsule-encoding bpsXABCDEFGH were detected in BacLA2020a and BacLA2020b; one bpsE-like hit was detected in BacTX2020a (79% amino acid identity, 95% coverage); h All of hyaluronic acid (Has) capsule-encoding hasABC were detected in all three genomes sequenced here; i Closest species type strain genome relative to all validly published and effective B. cereus s.l. species (n = 26; accessed on 20 February 2022) via average nucleotide identity (ANI) and in silico DNA-DNA hybridization (DDH). ANI values were calculated using FastANI v1.31 [18] and BTyper3 v3.2.0 [40]; DDH values were calculated using the Genome-to-Genome Distance Calculator v3.0 (accessed on 20 February 2022) [45,46]; j Genome Taxonomy Database (GTDB) species assigned using GTDB Release 05-RS95 (17 July 2020) and GTDB-Tk v1.3.0 [47,48]; k Species and biovars assigned using the 2020 Genomospecies-Subspecies-Biovar (GSB) nomenclatural framework for B. cereus s.l. [6] and BTyper3 v3.2.0 [40]; multiple taxonomic labels are listed (separated by a semi-colon), as all genomes sequenced here can be referenced using shorted biovar notation.
For one taxonomic classification approach, the three genomes sequenced here were compared to the type strain genomes of all validly published and effective B. cereus s.l. species (n = 26; accessed on 20 February 2022) using (i) average nucleotide identity (ANI) values calculated via FastANI [18], JSpeciesWS [52], and OrthoANI [53], and (ii) in silico DNA-DNA hybridization (DDH) values calculated using the Genome-to-Genome Distance Calculator (GGDC; Table 2 and Supplementary Table S1). Using all ANI-and DDH-based methods, both ST78 genomes associated with Patient F's case in Louisiana were most closely related to the type strain genome of B. tropicus ( Figure 1 and Table 2), a species proposed in 2017 [54]. Similarly, using the increasingly popular Genome Taxonomy Database (GTDB) framework (which itself employs ANI-based methods) [50], both ST78 genomes were assigned to GTDB's B. tropicus species (Figure 1, Table 2, Supplementary Figure S1 and Table S1). However, using FastANI and OrthoANI, both ST78 genomes shared >95% ANI (widely accepted as the standard species threshold for prokaryotes) [3,18] with the type strain genomes of both B. tropicus and B. paranthracis. Conversely, using DDH, neither ST78 genome shared ≥70% DDH (also viewed as a species threshold) [45] with any B. cereus s.l. species type strain genome (i.e., both genomes shared 69.70% DDH with B. tropicus). However, the associated GGDC confidence intervals spanned the 70% species threshold (GGDC confidence interval 66.70-72.60% DDH).
Using all ANI-and DDH-based methods, the ST108 genome associated with Patient G in Texas was most closely related to B. anthracis, sharing >95% ANI and >70% DDH with the type strain genome ( Figure 1, Table 2 and Supplementary Table S1). Congruent with this, the ST108 genome sequenced here was assigned to GTDB's B. anthracis species ( Figure 1, Table 2, Supplementary Figure S1 and Table S1). However, using FastANI and OrthoANI, the ST108 genome additionally shared >95% ANI with the type strain genome of B. paranthracis. Additionally, despite being most closely related to B. anthracis, the ST108 genome sequenced here did not belong to the clonal, historical B. anthracis lineage typically associated with anthrax illness (Figure 1). Members of the historical B. anthracis lineage typically responsible for anthrax illness share ≥99.9% ANI with each other [6,18], but the ST108 genome sequenced here shared 97.5% ANI with the B. anthracis species representative genome (via FastANI; Table 2 and Supplementary Table S1).
These results can be contrasted with taxonomic assignment using a standardized, B. cereus s.l.-specific genomospecies-subspecies-biovar framework (referred to hereafter as the "2020 GSB" framework; Table 2 and Supplementary Table S1). All three genomes sequenced here were assigned to the B. mosaicus genomospecies (i.e., a genomospecies that encompasses all panC Group III genomes and most panC Group II genomes) [40], as well as biovar Anthracis, because all genomes possessed anthrax toxin-encoding genes cya, lef, and pagA ( Figure 1, Table 2, Supplementary Figure S1 and Table S1). Within the 2020 GSB framework, all three genomes can be referred to as B. mosaicus biovar Anthracis (full notation) or B. anthracis (shortened biovar notation; Figure 1, Table 2, Supplementary Figure  S1 and Table S1). Notably, none of the genomes sequenced here were assigned to subspecies anthracis within the 2020 GSB framework, as they did not share ≥99.9% ANI with the B. anthracis species representative genome and thus did not belong to the clonal, historical B. anthracis lineage typically associated with anthrax illness (Figure 1 and Supplementary Table S1) [6].
Overall, the ST78 and ST108 genomes from the Louisiana and Texas cases, respectively, belonged to different lineages within B. cereus s.l., even though all isolates possessed anthrax toxin-encoding genes and resembled "B. cereus" phenotypically (per the FDA BAM; Figure 1, Table 2, Supplementary Figure S1 and Table S1). The ST108 genome associated with Patient G in Texas shared 94.6% ANI and 50.5% in silico DDH with the ST78 genomes associated with Patient F's case in Louisiana (via FastANI and GGDC, respectively; GGDC confidence interval 55.7-61.3%). Thus, each of these two major anthrax-causing lineages are discussed separately in detail below.
tively, belonged to different lineages within B. cereus s.l., even though all isolates possessed anthrax toxin-encoding genes and resembled "B. cereus" phenotypically (per the FDA BAM; Figure 1, Table 2, Supplementary Figure S1 and Table S1). The ST108 genome associated with Patient G in Texas shared 94.6% ANI and 50.5% in silico DDH with the ST78 genomes associated with Patient F's case in Louisiana (via FastANI and GGDC, respectively; GGDC confidence interval 55.7-61.3%). Thus, each of these two major anthraxcausing lineages are discussed separately in detail below. . The heatmap to the right of the phylogeny denotes the following (from left to right): (i) whether a genome was sequenced in this study or publicly available ("Origin"); (ii) selected PubMLST lineages assigned using sevengene multi-locus sequence typing, to which the three genomes sequenced in this study were assigned ("MLST"); (iii) whether a genome possessed two or more anthrax toxin-encoding genes (cya, lef, pagA) or not ("Anthrax"); (iv) whether a genome possessed three or more cereulide synthetase (emetic toxin)-encoding genes (cesABCD) or not ("Emetic"); (v) whether a genome possessed four . The heatmap to the right of the phylogeny denotes the following (from left to right): (i) whether a genome was sequenced in this study or publicly available ("Origin"); (ii) selected PubMLST lineages assigned using seven-gene multi-locus sequence typing, to which the three genomes sequenced in this study were assigned ("MLST"); (iii) whether a genome possessed two or more anthrax toxin-encoding genes (cya, lef, pagA) or not ("Anthrax"); (iv) whether a genome possessed three or more cereulide synthetase (emetic toxin)-encoding genes (cesABCD) or not ("Emetic"); (v) whether a genome possessed four or more polyglutamate capsule-encoding genes (capBCADE) or not ("Cap"); (vi) whether a genome possessed six or more Bps exopolysaccharide-encoding genes (bpsXABCDEFGH) or not ("Bps"); (vii) whether a genome possessed two or more hyaluronic acid capsule-encoding genes (hasABC) or not ("Has"); (viii) whether a genome possessed three or more hemolysin BL diarrheal enterotoxin-encoding genes (hblABCD) or not ("Hbl"); (ix) whether a genome possessed cytotoxin K-encoding cytK-2 or not ("CytK-2"); (x) the panC Group to which each genome was assigned (using BTyper3 and an eight-group scheme; "panC"). The phylogeny was rooted along the outgroup genome (omitted for readability), with branch lengths reported in substitutions per site.

Anthrax-Causing ST78 Genomes May Possess Genes Encoding the Bps Exopolysaccharide
Capsule, as Well as Enterotoxins Hbl and CytK-2 The two ST78 genomes associated with Patient F's case in Louisiana (Table 1) were nearly identical. Genomes BacLA2020a and BacLA2020b, which were derived from Patient F and a soil sample from Patient F's worksite, respectively, shared >99.99% ANI (via FastANI, JSpeciesWS, and OrthoANI) and were identical in terms of pan-genome element presence/absence. Further, the two genomes differed by a single core SNP (identified via Snippy) in a gene annotated as sporulation kinase E. Notably, genes encoding the Bps exopolysaccharide capsule (bpsX-H) associated with some "atypical" anthrax-causing strains [2,27] were detected in the two ST78 genomes sequenced here (Figures 1 and 2, Table 2 and Supplementary Table S1). In addition to possessing all genes encoding the Bps exopolysaccharide capsule, the two ST78 genomes sequenced here possessed hyaluronic acid (HA) capsule-encoding hasABC, as well as genes encoding nonhemolytic enterotoxin, hemolysin BL, and cytotoxin K variant 2 (Nhe, Hbl, and CytK-2, encoded by nheABC, hblABCD and cytK-2, respectively; Figures 1 and 2, Table 2 and Supplementary Table S1). Genes encoding the poly-γ-D-glutamic acid (polyglutamate) capsule typical of anthraxcausing B. anthracis [2] (as defined by the FDA BAM) were not detected in either ST78 genome sequenced here (Figures 1 and 2, Table 2 and Supplementary Table S1).
When compared to all publicly available B. cereus s.l. genomes (n = 2890 total genomes, accessed on 20 March 2021; Supplementary Table S2), the two ST78 genomes sequenced here shared >99.9% ANI with four B. cereus s.l. genomes, all of which harbored anthrax toxin-encoding genes and belonged to ST78 (Table 3 and Supplementary Table S3). While the two ST78 genomes sequenced here differed by a single core SNP, each differed from publicly available ST78 genomes by 35-1002 core SNPs (mean and median of 217.2 and 63.5 core SNPs, respectively, n = 8 total ST78 genomes; Supplementary Figure S2). Based on core SNP distances and phylogenetic topology, the publicly available B. cereus s.l. genome most closely related to the ST78 genomes sequenced here was that of B. cereus s.l. strain 03BB87 (National Center for Biotechnology Information [NCBI] RefSeq Assembly Accession GCF_000789315.1), a ST78 strain isolated from a fatal welder anthrax case that occurred in a human patient in Lubbock, Texas in 2003 (35-36 core SNPs relative to the ST78 genomes sequenced here; Table 3, Supplementary Figure S2 and Table S3) [37].
Like the two ST78 genomes sequenced here, all six publicly available ST78 genomes possessed anthrax toxin-encoding cya, lef, and pagA (8/8 total ST78 genomes, 100.0%;  Tables S1 and S2). Two ST78 genomes did not possess Bps exopolysaccharide capsule-encoding genes (Figures 1 and 2). The first of these is the only ST78 genome reportedly isolated from outside of the U.S. (i.e., BC-AK, which was reportedly isolated from a kangaroo in China; Figure 3): this was the only genome not assigned to GTDB's B. anthracis species that possessed polyglutamate capsule-encoding genes typical of B. anthracis ( Figure 1 and Supplementary Figure S1). The second of the two Bps-negative ST78 genomes originated from Florida (U.S.); this genome possessed neither Bps exopolysaccharide nor polyglutamate capsule-encoding genes (Figures 1-3, Table 3 and Supplementary Figure S1). assigned using PubMLST's seven-gene multi-locus sequence typing (MLST) scheme for "B. cereus" [38,39] and BTyper3 v3.2.0 [40]. PubMLST Clonal Complex (CC) assignments are denoted in square brackets, where applicable; e Poly-γ-D-glutamic acid capsule-encoding genes capBCADE (cap) or exopolysaccharide Bps capsule-encoding genes bpsXABCDEFGH (bps) detected in each genome using BTyper3. Hyaluronic acid capsule-encoding genes (hasABC) were detected in all genomes. NA, not applicable, as neither cap nor bps were detected in the genome.

Figure 2.
Maximum likelihood phylogeny constructed using core genes detected among 55 genomes assigned to the Genome Taxonomy Database (GTDB) B. tropicus species, plus GTDB B. paranthracis outgroup genome B. cereus s.l. strain AH187 (NCBI RefSeq Assembly Accession GCF_000021225.1; omitted for readability). A predicted anthrax toxin gene gain event among PubMLST Sequence Type 78 (ST78) genomes is denoted by a black arrow. The heatmap to the right of the phylogeny denotes the following (from left to right): (i) whether a genome was sequenced in this study or publicly available ("Origin"); (ii) selected PubMLST lineages assigned using seven-gene multi-locus sequence typing, to which genomes sequenced in this study were assigned ("MLST"); (iii) whether a genome possessed two or more anthrax toxin-encoding genes (cya, lef, pagA) or not ("Anthrax"); (iv) whether a genome possessed four or more polyglutamate capsule-encoding genes (capBCADE) or not ("Cap"); (v) whether a genome possessed six or more Bps exopolysaccharide-encoding genes (bpsXABCDEFGH) or not ("Bps"); (vi) whether a genome possessed two or more hyaluronic acid capsule-encoding genes (hasABC) or not ("Has"). The phylogeny was rooted along the outgroup genome (omitted for readability), with branch lengths reported in substitutions per site. For complete ancestral state reconstruction results, see Supplementary Figure S3. omitted for readability). A predicted anthrax toxin gene gain event among PubMLST Sequence Type 78 (ST78) genomes is denoted by a black arrow. The heatmap to the right of the phylogeny denotes the following (from left to right): (i) whether a genome was sequenced in this study or publicly available ("Origin"); (ii) selected PubMLST lineages assigned using seven-gene multi-locus sequence typing, to which genomes sequenced in this study were assigned ("MLST"); (iii) whether a genome possessed two or more anthrax toxin-encoding genes (cya, lef, pagA) or not ("Anthrax"); (iv) whether a genome possessed four or more polyglutamate capsule-encoding genes (capBCADE) or not ("Cap"); (v) whether a genome possessed six or more Bps exopolysaccharide-encoding genes (bpsXABCDEFGH) or not ("Bps"); (vi) whether a genome possessed two or more hyaluronic acid capsule-encoding genes (hasABC) or not ("Has"). The phylogeny was rooted along the outgroup genome (omitted for readability), with branch lengths reported in substitutions per site. For complete ancestral state reconstruction results, see Supplementary Figure S3.

ST108 and Other Anthrax-Causing Members of the ST365 Clonal Complex Belong to B. Anthracis at Conventional Species Thresholds
Like the ST78 genomes associated with Patient F's case in Louisiana, the ST108 genome associated with Patient G in Texas possessed anthrax toxin-encoding cya, lef, and pagA, as well as HA capsule-encoding hasABC and Nhe-encoding nheABC (Figures 1 and  4, Table 2 and Supplementary Figure S1). However, unlike the ST78 genomes, the ST108 genome sequenced here did not possess genes encoding the Bps exopolysaccharide capsule, nor did it possess hbl or cytK-2 ( Figures 1 and 4, Table 2 and Supplementary Figure  S1). Based on ANI values calculated relative to all publicly available B. cereus s.l. genomes, the most closely related publicly available genome shared only 99.1% ANI with the ST108 Four of the six publicly available ST78 genomes were closed genomes; thus, the location of anthrax-associated virulence factors could be reliably evaluated. Of the four publicly available closed ST78 genomes, anthrax toxin-encoding genes were located on a plasmid, with HA capsule-encoding genes located on the same plasmid (Supplementary Table S3). However, interestingly, in one closed ST78 genome, anthrax toxin-and HA capsule-encoding genes were detected within the closed chromosome (i.e., in the anthrax-causing B. cereus s.l. strain 03BB87 chromosome, NCBI Nucleotide Accession NZ_CP009941.1; Supplementary Table S3). Three of four closed ST78 genomes possessed Bps exopolysaccharide capsule-encoding genes, which were located on an additional plasmid (i.e., not the same plasmid as the anthrax toxin-and HA-encoding genes; Supple-mentary Table S3). The sole ST78 genome that possessed polyglutamate capsule-encoding genes also harbored them on an additional plasmid (Supplementary Table S3).
Overall, within the GTDB B. tropicus species (i.e., the GTDB species to which all ST78 genomes were assigned), ST78 genomes were the only genomes in which anthrax toxin-, HA capsule-, Bps capsule-, and/or polyglutamate capsule-encoding genes were detected ( Figure 2). Furthermore, ST78 was the only lineage not assigned to GTDB's B. anthracis species, in which anthrax toxin-, HA capsule-, and/or polyglutamate capsule-encoding genes were detected (Figures 1 and 2, Supplementary Figure S1, Tables S1 and S2).

ST108 and Other Anthrax-Causing Members of the ST365 Clonal Complex Belong to B. anthracis at Conventional Species Thresholds
Like the ST78 genomes associated with Patient F's case in Louisiana, the ST108 genome associated with Patient G in Texas possessed anthrax toxin-encoding cya, lef, and pagA, as well as HA capsule-encoding hasABC and Nhe-encoding nheABC (Figures 1 and 4, Table 2 and Supplementary Figure S1). However, unlike the ST78 genomes, the ST108 genome sequenced here did not possess genes encoding the Bps exopolysaccharide capsule, nor did it possess hbl or cytK-2 (Figures 1 and 4, Table 2 and Supplementary Figure S1). Based on ANI values calculated relative to all publicly available B. cereus s.l. genomes, the most closely related publicly available genome shared only 99.1% ANI with the ST108 genome sequenced here and differed by 19,425 core SNPs via FastANI and Snippy, respectively (i.e., NCBI RefSeq Assembly Accession GCF_013343075.  Branch colors and clade labels differentiate genomes that are members of the clonal, historical B. anthracis lineage (darker pink) from genomes that are not (lighter pink). The heatmap to the right of the phylogeny denotes the following (from left to right): (i) whether a genome was sequenced in this study or publicly available ("Origin"); (ii) selected PubMLST lineages assigned using sevengene multi-locus sequence typing, to which the genomes sequenced in this study were assigned ("MLST"); (iii) whether a genome possessed two or more anthrax toxin-encoding genes (cya, lef, pagA) or not ("Anthrax"); (iv) whether a genome possessed four or more polyglutamate capsuleencoding genes (capBCADE) or not ("Cap"); (v) whether a genome possessed two or more hyaluronic acid capsule-encoding genes (hasABC) or not ("Has"); (vi) the panC Group to which each genome was assigned (using BTyper3 and an eight-group scheme; "panC"). The phylogeny was rooted along the outgroup genome (omitted for readability), with branch lengths reported in substitutions per site. For complete ancestral state reconstruction results, see Supplementary Figure S4.

Anthrax-Causing B. cereus s.l. Lineages Differ in Pan-Genome Composition
The core-and pan-genome sizes of all B. cereus s.l. species (defined using GTDB or the 2020 GSB taxonomy) that contained anthrax-causing strains were compared to those of the clonal, historical B. anthracis lineage typically associated with anthrax illness ( Figure  5). Intuitively, the historical B. anthracis lineage had both (i) a larger core genome and (ii) a smaller pan-genome pool size than the species defined using GTDB and the 2020 GSB . The heatmap to the right of the phylogeny denotes the following (from left to right): (i) whether a genome was sequenced in this study or publicly available ("Origin"); (ii) selected PubMLST lineages assigned using seven-gene multi-locus sequence typing, to which the genomes sequenced in this study were assigned ("MLST"); (iii) whether a genome possessed two or more anthrax toxin-encoding genes (cya, lef, pagA) or not ("Anthrax"); (iv) whether a genome possessed four or more polyglutamate capsule-encoding genes (capBCADE) or not ("Cap"); (v) whether a genome possessed two or more hyaluronic acid capsule-encoding genes (hasABC) or not ("Has"); (vi) the panC Group to which each genome was assigned (using BTyper3 and an eight-group scheme; "panC"). The phylogeny was rooted along the outgroup genome (omitted for readability), with branch lengths reported in substitutions per site. For complete ancestral state reconstruction results, see Supplementary Figure S4.
Overall, all anthrax toxin gene-harboring B. cereus s.l. genomes that (i) did not belong to the clonal, historical B. anthracis lineage typically associated with anthrax illness and (ii) did not belong to ST78 were assigned to PubMLST's ST365 clonal complex (CC), including the ST108 genome sequenced here (Figures 1, 3 and 4, Table 3, Supplementary  Figure S1 and Supplementary Tables S1 and S2). Notably, all ST365 CC genomes belonged to GTDB's B. anthracis species and were most closely related to the B. anthracis species representative genome via ANI and in silico DDH (Figures 1 and 4, Table 3 and Supplementary Tables S1 and S2); however, based on user-submitted species names in NCBI, many of these strains reportedly resembled "B. cereus" and "B. thuringiensis" (Supplementary  Tables S2 and S3).
Of all 30 ST365 CC genomes, five possessed anthrax toxin-encoding genes, as well as HA capsule-encoding hasABC (16.7% of 30 total ST365 CC genomes; Figures 1, 3 and 4). Notably, anthrax toxin gene-harboring members of the ST365 CC were considerably diverse: anthrax toxin gene presence within the ST365 CC was predicted to have been the result of at least two anthrax toxin gene acquisition events (Figure 4 and Supplementary Figure S4). Of the five ST365 CC genomes in which anthrax toxin-encoding genes were detected, four possessed genes encoding the polyglutamate capsule typical of anthrax-causing B. anthracis (80% of anthrax toxin gene-harboring ST365 CC genomes; Figures 1 and 4 and Tables 2 and 3). The only anthrax-causing ST365 CC genome that did not possess polyglutamate capsule-encoding genes was the ST108 genome sequenced here (Figures 1 and 4 and Table 2). Two ST365 CC genomes, as well as one ST778 genome, possessed polyglutamate capsule-encoding genes, but did not possess anthrax toxin-encoding genes (Figures 1 and 4, Supplementary Table S2). Unlike anthrax toxin gene-harboring members of ST78, anthrax toxin gene-harboring ST365 CC genomes did not possess hbl or cytK-2 ( Figure 1 and Supplementary Figure S1).
Of the four publicly available closed anthrax toxin gene-harboring ST365 CC genomes, anthrax toxin-, polyglutamate capsule-, and HA capsule-encoding genes were all plasmidencoded (Supplementary Table S3). In one genome (i.e., anthrax-causing B. cereus s.l. strain CI), anthrax toxin-and HA capsule-encoding genes were located on one plasmid, and polyglutamate capsule-encoding genes were located on a separate plasmid (NCBI Nucleotide Accession NC_014331.1 and NC_014332.1, respectively; Supplementary Table S3). However, interestingly, in the remaining three closed genomes, all anthrax toxin-, polyglutamate capsule-, and HA capsule-encoding genes were located on a single plasmid (Supplementary Table S3).

Anthrax-Causing B. cereus s.l. Lineages Differ in Pan-Genome Composition
The core-and pan-genome sizes of all B. cereus s.l. species (defined using GTDB or the 2020 GSB taxonomy) that contained anthrax-causing strains were compared to those of the clonal, historical B. anthracis lineage typically associated with anthrax illness ( Figure 5). Intuitively, the historical B. anthracis lineage had both (i) a larger core genome and (ii) a smaller pan-genome pool size than the species defined using GTDB and the 2020 GSB taxonomy, including GTDB's B. anthracis species (Figures 1 and 5 Figure S1).

and Supplementary
When the pan-genomes of anthrax toxin gene-harboring B. cereus s.l. genomes were compared, the anthrax toxin gene-harboring (i) historical B. anthracis, (ii) ST78, and (iii) ST365 CC lineages each differed significantly based on pan-genome orthologous gene cluster presence/absence (Bonferroni-corrected ANOSIM and PERMANOVA p < 0.05; Figure 6). Among anthrax toxin gene-harboring (i) historical B. anthracis, (ii) ST78, and (iii) ST365 CC lineage genomes, a total of 191, 542, and 59 orthologous gene clusters were present in all anthrax toxin gene-harboring genomes within the respective lineage but absent from all other anthrax toxin gene-harboring B. cereus s.l. genomes, respectively (Supplementary Tables S4-S6). Likewise, a total of 39, 180, and 31 orthologous gene clusters were absent from all anthrax toxin gene-harboring genomes within the respective lineage but present in all other anthrax toxin gene-harboring B. cereus s.l. genomes, respectively (Supplementary Tables S7-S9). When the pan-genomes of the three aforementioned anthrax toxin gene-harboring lineages were compared to the pan-genome of B. mosaicus as a whole (as defined by the 2020 GSB framework; n = 894 total B. mosaicus genomes, plus outgroup panC Group IV genome with NCBI RefSeq Assembly Accession GCF_006094295.1), only anthrax toxin gene-harboring ST78 had orthologous gene clusters, which were exclusively present (i.e., present in all anthrax toxin gene-harboring ST78 genomes and absent from all other B. mosaicus genomes, plus the outgroup genome; n = 19 orthologous gene clusters, Supplementary taxonomy, including GTDB's B. anthracis species (Figures 1 and 5 and Supplementary Figure S1).  (Supplementary Tables S4-S6). Likewise, a total of 39, 180, and 31 orthologous gene clusters were absent from all anthrax toxin gene-harboring genomes within the respective lineage but toxin gene-harboring ST78, 15 could not be assigned a function (78.9%; Supplementary Table S10). The remaining four orthologous gene clusters included (i) a deaminase with a toxin-deaminase domain (NCBI Protein Accession AIY73376.1); (ii) a response regulator (NCBI Protein Accession WP_001970204.1); (iii) a predicted restriction HNH family endonuclease (NCBI Protein Accession WP_001978396.1); (iv) a phosphorylase superfamily response regulator (NCBI Protein Accession WP_001970206.1; Supplementary Table S10).
Previously, we have shown that anthrax-causing B. cereus s.l. strains with "B. cereus"like phenotypic characteristics (per the FDA BAM) belong to two separate species at the conventional 95% ANI species threshold [3,6,40]: (i) one lineage that is most closely related to the B. anthracis species type strain genome, but itself is distinct from the clonal, historical B. anthracis lineage most commonly attributed to anthrax disease, and (ii) one lineage that is most closely related to the B. tropicus species type strain genome. Here, we showed that both anthrax-causing "B. cereus"-like lineages are represented among B. cereus s.l. strains responsible for welder anthrax cases in the U.S. This is despite the fact that both anthrax-causing lineages queried here manifested in similar symptoms (severe pneumonia) in similar patients (male welders in their thirties) in the same year (2020) in the same U.S. region (the Gulf Coast). Both lineages of anthrax-causing B. cereus s.l., which do not belong to the clonal, historical B. anthracis lineage, have been recently responsible for severe welder anthrax and cutaneous anthrax in the U.S. Gulf Coast region.
Using all publicly available B. cereus s.l. genomes, we further showed that, presently, all anthrax-causing B. cereus s.l. genomes that do not belong to the clonal, historical B. anthracis lineage belong to one of two PubMLST lineages: ST78 and the ST365 CC, which represent anthrax-causing "B. cereus"-like strains that are most closely related to the B. tropicus and B. anthracis species type strain genomes, respectively. Notably, these two lineages could be largely (albeit imperfectly) divided based on the type of capsule produced, as the Bps exopolysaccharide capsule first identified in strain G9241 [22] was only present in anthraxcausing ST78 strains from the U.S. (Tables 2 and 3). However, not all anthrax-causing ST78 strains possessed genes encoding the Bps exopolysaccharide capsule. For example, ST78 strain BC-AK, which had been reportedly isolated from a kangaroo in China, possessed genes encoding the polyglutamate capsule typical of B. anthracis (Figures 2 and 3). These results indicate that potential virulence-associated mechanistic differences between and within the anthrax-causing ST78 and ST365 CC lineages may exist, and geography may play a role in shaping these differences; however, future research is needed to elucidate this.

Recommendations for Effective and Unambiguous Communication of B. cereus s.l. Taxonomy for Anthrax-Causing Strains
Numerous nomenclatural frameworks exist for species-level taxonomic classification of B. cereus s.l. strains; recently, these frameworks have been comprehensively reviewed [3]. Currently, (i) there is no single, standardized list of B. cereus s.l. species accepted/used by all; (ii) there is no standardized way to assign B. cereus s.l. strains to species; and (iii) there is no standardized method for defining novel B. cereus s.l. species [3]. Furthermore, some taxonomic frameworks may not be accessible to all users. For example, researchers who do not have access to WGS cannot use some WGS-based nomenclatures [3].
Recently, we have hypothesized that the lack of standardized methods for delineating B. cereus s.l. species may lead to potentially dangerous misclassifications of a given strain's virulence potential, particularly among anthrax-causing strains [3,6]. For example, as demonstrated here, the genomes of all anthrax-causing members of ST78 are most closely related to the type strain genome of B. tropicus, a species proposed in 2017 [54]; this is despite the fact that anthrax-causing ST78 strains have been isolated from human clinical cases for over a decade before B. tropicus was published as a novel species [15,22,29,31,54]. Conversely, anthrax-causing members of the ST365 CC are most closely related to B.
To ensure that taxonomic assignments of anthrax-causing B. cereus s.l. strains are interpretable and unambiguous, there are several steps that readers can take, regardless of which taxonomic method they choose to employ [3]. First of all, it is recommended that readers explicitly detail the methods and protocols that were used to assign species names to strains, including the versions and access dates of all software, databases, and protocols used [3]. Secondly, readers should avoid making assumptions of virulence potential based solely on species names, particularly when the methods used to assign strains to species are unknown. For example, it is recommended to avoid treating user-assigned species names in NCBI as "ground-truth" taxonomic assignments for B. cereus s.l. strain genomes [3,55]. Third, readers may opt to employ a standardized SLST and/or MLST method to allow for interpretation across taxonomies [3] (e.g., the ST78 and ST108 genomes sequenced here can be referred to as such, independent of species names). Finally, readers may opt to adopt a standardized nomenclatural framework that employs the use of "biovar terms" [3,6], such as "biovar Anthracis". For example, within the 2020 GSB framework, any anthrax-causing B. cereus s.l. strain can be referred to as B. cereus s.l. biovar Anthracis or B. anthracis [3,6]. Overall, readers can avoid potential taxonomic ambiguities by (i) explicitly and transparently describing the methods and protocols used for taxonomic assignment, and (ii) interpreting taxonomic labels assigned using unknown or unspecified methods with extreme caution.

Whole-Genome Sequencing in Combination with Epidemiological, Microbiological and Clinical Data May Improve Surveillance of Anthrax Cases in the Future
While rare, anthrax-causing B. cereus s.l. strains that resemble "B. cereus" (per the FDA BAM) represent a serious public health threat, due to the severity of the disease they cause [2,35]. Furthermore, these strains may represent a serious occupational hazard to welders and other metalworkers, as numerous welder anthrax cases among workers have been reported [23,31,[35][36][37]56]. It is essential that efforts are undertaken to prepare for and respond to the severe clinical cases caused by these organisms.
To that end, WGS has tremendous potential to improve surveillance of anthrax-causing members of B. cereus s.l. [6,9], as it has improved surveillance and source tracking efforts for other pathogens [57][58][59]. Here, WGS confirmed that an anthrax-causing ST78 genome derived from an inhalation anthrax case associated with a welder was effectively identical to an environmental ST78 genome collected from a soil sample taken at the welder's worksite [35]. These two effectively identical genomes could be distinguished from all publicly available genomes, including those of other anthrax-causing ST78 strains. However, it is essential to note that the link between the environmental and human clinical ST78 genomes sequenced in this study would not have been established without prior epidemiological, clinical, and microbiological efforts, including: (i) linking the clinical case to the worksite associated with the case, and (ii) isolating the organism responsible from a soil sample collected from the worksite [35]. Epidemiological and clinical data (e.g., patient symptoms, patient history), plus microbiological data (e.g., a microbe's ability to produce a particular toxin, microbial growth temperature), have previously been important for linking B. cereus s.l. strains to human illness cases [60]; members of B. cereus s.l. are widespread throughout the environment and may potentially be incorrectly linked to a clinical case [1,14,15,60,61]. Hence, it is likely that WGS will be most valuable for elucidating anthrax-causing B. cereus s.l. cases and outbreaks when used in combination with epidemiological, clinical, and microbiological data.
Finally, future WGS-based studies of anthrax-causing B. cereus s.l. strains that do not belong to the historical B. anthracis lineage typically associated with anthrax illness will benefit from increasingly available B. cereus s.l. WGS data and associated metadata [6]. For example, international efforts to share and make WGS data publicly available can decrease the amount of time required to solve outbreaks, as well as the public health burdens imposed by the pathogens that cause them [62]. While the amount of publicly available B. cereus s.l. genomes is increasing, efforts to sequence B. cereus s.l. genomes are lagging far behind other bacterial pathogens (e.g., Salmonella enterica, Listeria monocytogenes) [63]. Future studies will be able to leverage a greater amount of WGS data and associated metadata to provide further insights into the evolution of anthrax-causing B. cereus s.l. and potentially identify novel, emerging lineages prior to human infection.

Clinical Case Information and Environmental Isolate Acquisition
Isolates associated with human clinical cases were acquired as described previously [35]. Additionally, as part of an epidemiologic investigation, samples were collected at both patients' worksites and homes to identify possible sources of infection [35]. Acquisition of the environmental isolate included in this study (Table 1) was initiated by inoculating 5 g of soil in 15 mL of heart infusion broth (HIB). The soil and HIB were vortexed and sonicated each for 30 s, respectively. This process was repeated 2 additional times. The soil and HIB mixture was then heated in a water bath for 30 min at 65 • C. After allowing the mixture to settle for several minutes, the supernatant and a 1:10 dilution of the supernatant in HIB was plated on R and F Anthracis Chromogenic Agar (R and F Products, Downers Grove, IL, USA). If individual colonies could not be determined from 1:10 dilution plates, additional dilutions were generated and plated. Suspect colonies were isolated for identification by polymerase chain reaction (PCR) and WGS.

Whole-Genome Sequencing
DNA was extracted using the Promega Maxwell 16 and the Maxwell 16 Cell DNA Purification Kit (Promega Corporation, Madison, WI, USA), which was used to generate a draft genome sequence by using the Nextera FLEX Kit (https://www.illumina.com) for library preparation and the iSeq 100 instrument (Illumina) with a 2 × 151-bp kit.
The core genome alignment produced by Panaroo was queried using snp-sites v2.5.1 (Wellcome Sanger Institute, Cambridge, UK) [78], which was used to identify core SNPs and constant sites within the nucleotide alignment (using the "-c" and "-C" options, respectively). The resulting core SNPs were supplied as input to IQ-TREE v1.5.4 (IQ-TREE Development Team, http://www.iqtree.org/about/, accessed on 20 July 2022) [79], which was used to construct a maximum likelihood (ML) phylogeny using the following parameters: (i) the general time reversible (GTR) nucleotide substitution model [80], (ii) one thousand replicates of the ultrafast bootstrap approximation [81], and (iii) an ascertainment bias correction to account for the sole use of variant sites (via the "-fconst" option, using constant sites output by snp-sites). All aforementioned steps were performed a second time, with the addition of an outgroup genome (i.e., the genome of panC Group IV B. cereus sensu stricto species type strain ATCC 14579; NCBI RefSeq Assembly Accession GCF_006094295.1).
All aforementioned steps (i.e., core-and pan-genome analyses, plus ML phylogeny construction with and without an outgroup) were additionally repeated, with the omission of genomes that did not have a year of isolation reported in their associated BioSample [63,82] (Supplementary Table S2). Each resulting phylogeny was rooted and time-scaled using LSD2 v1.4.2.2 (Thu-Hien To, Ås, Norway) [83] and the following parameters: (i) tip dates corresponding to the year of isolation associated with each genome; (ii) constrained mode (c), with the root estimated using constraints on all branches (-r as); (iii) variances calculated using input branch lengths (-v 1); (iv) 1000 samples for calculating confidence intervals for estimated dates (-f 1000); (v) a sequence length of 5,500,000. For phylogenies that included the outgroup genome, the outgroup was used to root the tree. For phylogenies that did not, the root was estimated using LSD2. Each resulting time-scaled phylogeny, plus the associated orthologous gene cluster presence/absence matrix produced by Panaroo, were supplied as input to Panaroo's "panaroo-fmg" command, which was used to estimate the pangenome size under the Finite Many Genes (FMG) model [84] with 100 bootstrap replicates.
Non-metric multidimensional scaling (NMDS) was performed, using the Panaroo pan-genome orthologous gene cluster presence/absence matrix produced among the 176 anthrax toxin-gene harboring B. cereus s.l. genomes as input (see above). Orthologous gene clusters present in all genomes were removed from the alignment, yielding a matrix of 4445 orthologous gene clusters variably present among the 176 genomes. The resulting matrix was supplied as input to the metaMDS command in the vegan v2.5-7 package (Jari Oksanen, Helsinki, Finland) [85] in R v4.1.2 (R Core Team, https://www.r-project.org/, accessed on 20 July 2022) [86], which was used to perform NMDS using a Jaccard distance metric, two dimensions, and a maximum number of 100,000 random starts; a solution was reached in <100 runs. The resulting scores were plotted in R using ggplot2 v3.3.5 (Hadley Wickham, Houston, TX, USA) [87]. The same pan-genome presence/absence matrix was supplied to the anosim and adonis functions in the vegan R package, which were used to conduct analysis of similarity (ANOSIM) [88] and permutational analysis of variance (PERMANOVA) [89] tests, respectively. Each test used lineage membership as a grouping factor (i.e., whether a genome belonged to ST78, the ST365 CC, or the historical B. anthracis lineage), a Jaccard distance metric, and 10,000 permutations. Each pairwise lineage combination was additionally tested (eight total tests; raw p < 0.001 for all tests, all ANOSIM R = 1.0, PERMANOVA R 2 = 0.68-0.93). A Bonferroni correction was applied to correct for multiple comparisons.

Anthrax Toxin Gene Presence Ancestral State Reconstruction
The ancestral states of internal nodes within each of the following phylogenies as they related to anthrax toxin gene presence/absence were estimated: (i) the 326-genome GTDB B. anthracis phylogeny and (ii) the 55-genome GTDB B. tropicus phylogeny. Briefly, the presence or absence of two or more of cya, lef, and pagA within each genome was treated as a binary state, and stochastic character maps were simulated on each phylogeny using the make.simmap function in the phytools v1.0-1 R package (Liam J. Revell, Boston, MA, USA) [90] and the following parameters: (i) the all-rates-different (ARD) model in the ape v5.6-1 R package (Emmanuel Paradis, Montpellier, France); (ii) root node prior probabilities for anthrax toxin gene presence or absence set to either "equal" (i.e., 0.5), or "estimated" (i.e., root node prior probabilities were estimated using the make.simmap function); (iii) 100 or 1000 simulations (for B. anthracis and B. tropicus, respectively). Results were plotted using the densityMap function in the phytools v1.0-1 R package ( Supplementary  Figures S3 and S4).  Table S1), plus all publicly available B. mosaicus genomes (per the 2020 GSB framework) using FastANI (Supplementary Table S2). Due to the fact that all ST78 genomes were closely related (ANI >99.8%), core SNPs were identified among all ST78 genomes using the default SNP calling pipeline implemented in Snippy v4.4.0 (Torsten Seemann, Melbourne, Australia) [91], with the closed chromosome of anthrax-causing ST78 B. cereus s.l. strain 03BB87 used as a reference (NCBI Nucleotide Accession NZ_CP009941.1). Gubbins v2.4.1 (Wellcome Sanger Institute, Cambridge, UK) [92] was used to remove recombination from the resulting core genome alignment, and snp-sites v2.5.1 [78] was used to query the resulting recombination-free alignment for core SNPs (using the "-c" option). Pairwise core SNP distances between all ST78 genomes were calculated using the "dist.gene" function in the ape v5.6-1 package [93,94] in R v4.1.2 [86]. The resulting core SNPs were additionally supplied as input to IQ-TREE, which was used to construct a ML phylogeny as described above, using the optimal nucleotide substation model selected using ModelFinder [95] (i.e., the K3Pu model) [96]. The resulting phylogeny was viewed in FigTree v1.4.4 (Andrew Rambaut, Edinburgh, Scotland, UK) [97]. All aforementioned steps were repeated, with the omission of the most distantly related ST78 genome (NCBI RefSeq Assembly Accession GCF_002117465.1; Supplementary Figure S2).
Snippy was additionally used to identify SNPs between the two ST78 genomes sequenced here, which were associated with Patient F's case in Louisiana (i.e., BacLA2020a and BacLA2020b; Table 1). The trimmed paired-end reads associated with the genome of strain BacLA2020a were mapped to the BacLA2020b reference genome; the resulting filtered Variant Call Format (VCF) file produced by Snippy was manually inspected for the presence of SNPs. These steps were repeated, using the trimmed paired-end reads associated with BacLA2020b as input and the BacLA2020a genome as a reference.

Conclusions
Here, we used WGS to characterize three B. cereus s.l. isolates associated with two separate welder anthrax cases that occurred in 2020 among male welders in their thirties in two U.S. Gulf Coast states. All isolates resembled "B. cereus" phenotypically (per the FDA BAM). However, by most contemporary metrics used to delineate bacterial species, the isolates associated with each of these cases belonged to separate species (e.g., using the GTDB taxonomy, B. anthracis and B. tropicus). Furthermore, the isolates differed in terms of the virulence factors they possessed: the ST78 genomes from Louisiana possessed genes encoding the Bps alternative exopolysaccharide capsule, as well as enterotoxin-encoding hbl and cytK-2, while the ST108 genome from Texas did not. Using all publicly available B. cereus s.l. genomes, we predict that members of B. cereus s.l. have gained anthrax toxin-encoding genes at least four separate times, specifically: (i) at least once among the historical, clonal B. anthracis lineage typically associated with anthrax toxin production; (ii) at least twice within the ST365 CC; and (iii) at least once among ST78. Overall, WGS has the potential to improve surveillance of anthrax-causing B. cereus s.l. However, future B. cereus s.l. isolation, WGS, and metadata collection efforts will be essential for gaining further insights into the evolution of anthrax-causing B. cereus s.l. in the U.S. and around the world.  Genomospecies-Subspecies-Biovar taxonomy), plus the panC Group IV outgroup genome.
Author Contributions: C.K.M. and C.B.K. characterized isolates and extracted DNA, J.E.G. performed whole-genome sequencing, J.E.G. and C.A.G. performed initial genetic analyses (e.g., MLST). L.M.C. performed computational analyses. L.M.C., C.A.G., Z.P.W. and J.K. conceptualized the study and co-wrote the manuscript. All authors have read and agreed to the published version of the manuscript. Data Availability Statement: NCBI accession numbers for all genomes used in this study are available in Supplementary Tables S1-S3. Genomes sequenced in this study are available under NCBI BioProject Accession PRJNA849252.