is a pathogen of humans and animals and is responsible for a wide range of enterotoxigenic and histotoxic diseases that vary in both symptoms and severity. The disease capability of particular strains is due to the production of toxins and extracellular enzymes with specialised roles in pathogenesis. The presence and absence of six major toxins is used to classify C. perfringens
isolates into seven different toxin types, A-G (Table 1
]. Toxin typing is used as an indicator of disease-causing capability as some toxins are strongly associated with disease in certain animal hosts, such as NetB (type G) and necrotic enteritis in chickens, and enterotoxin (type F) in food poisoning. However, toxin typing does not account for the full toxin repertoire a strain may be capable of producing and therefore lacks the high resolution afforded with whole genome sequencing (WGS). Furthermore, the clostridial toxin typing system does not account for strain clonality and is inappropriate for inference of evolutionary relationships [2
] as many of the toxins are encoded on large plasmids [3
] and capable of horizontal gene transfer [4
Excluding the 6 genes used in toxin typing strains, a further 16 toxins and enzymes have been described in C. perfringens
including sialidases (NanI, NanJ, NanK), hyaluronidases (NagH, NagI, NagJ, NagK),
collagenase (Kappa), Beta2 (consensus and atypical variants), TpeL, Delta, BecAB/CPILE and NetE, NetF, NetG. Many of these toxins are recent discoveries, such as NetB, NetE, NetF, NetG, BecAB, further demonstrating the importance of host-specific toxins which lie outside of the currently defined mechanisms of disease and subsequent toxin typing framework [7
Pore-forming toxins are commonly associated with disease in C. perfringens
. The pore-forming toxins are comprised of a single protein that forms multimeric complexes. Each type of pore-forming toxin has distinct domain structures. Six beta-barrel pore-forming toxins have been characterised in C. perfringens
including beta toxin, delta toxin, NetB, NetE, NetF and NetG [10
]. Each of these proteins has a unique amino acid sequence with a shared domain structure which includes a signal sequence followed by a leukotoxin/hemolysin domain. This domain is shared with toxins from other species including Staphylococcus, Bacillus,
and other Clostridium
The remaining known pore-forming toxins, including alpha toxin (phospholipase C), perfringinolysin O (theta), epsilon toxin and enterotoxin, contain distinctly different functional domains [14
]. Phospholipase C is a hemolysin with sphingomyelinase activity and phospholipase activity [17
]. Perfringinolysin O is a pore-forming cholesterol-dependent cytolysin [18
]. Epsilon toxin and enterotoxin both belong to the aerolysin-like toxin family but are comprised of distinctly different amino acid sequences and protein structures [19
]. While phospholipase C and perfringinoylsin O are chromosomally encoded toxins, epsilon and enterotoxin are located on mobile genetic elements [5
The binary toxins in C. perfringens
(iota toxin and binary enterotoxin (BEC)) are usually composed of an enzyme component (Ia) and a binding component (Ib) [20
]. Ib binds to a receptor on targeted cells and Ia is translocated into the cytosol of the cells Ia. ADP-ribosylates actin, resulting in cell rounding and death [20
]. Another binary toxin with ADP-ribosylation activity in C. perfringens
is BecAB/CPILE [9
]. Binary toxins are present in other closely related species such as Clostridiodies difficile
, Clostridium spiroforme
and Clostridium botulinum
], and have been shown to be associated with disease in these bacteria.
Due to the specificity of many C. perfringens
isolates to particular animal hosts and disease outcomes, and the diverse pan-genome of the species [4
], we hypothesize that only a subset of the potential toxins encoded by C. perfringens
have been identified. The aim of this study was to bioinformatically identify novel virulence-associated genes in previously characterised C. perfringens
strains to inform and refine future studies, as well as narrow down potential drug or vaccine targets. Here, we describe seven novel protein sequences homologous to known toxins and the associated mobile genetic elements associated with the encoding genes, which were identified from the whole genome sequences (WGS) of a diverse set of previously characterised and described C. perfringens
isolates. Our study demonstrates the value of reanalysing publicly available WGS data and of collating large WGS datasets for use in comparative genomic analysis.
In this study we identified seven potential toxin homologs with homology to beta, delta, epsilon and iota toxins, and their corresponding putative mobile genetic elements and chromosomal insertions. The toxin homologs were defined based on sequence identity and domain structure of the proteins to the known toxins from C. perfringens and other species. The discovery of novel protein sequences with similarity to previously characterised C. perfringens toxins suggests that much genetic diversity and toxin diversity still remains to be discovered in this bacterium. The advances in throughput and continuing reduction in the cost of WGS has made it a readily accessible tool for the deeper exploration of bacterial genome structure and content. We demonstrate its use as a screen for putative virulence factors based on domain sequence analysis and a large pool of isolates for which genome data is available.
Four new leukotoxin domain containing proteins, DlpA, LdpA, LdpB and LdpC, two epsilon domain containing proteins sequences, EdpA and EdpB, and the iota-like protein (IlpAB) were identified in this study. Three of these toxin homologs IlpA/IlpB, DlpA and EdpB, were found exclusively in Type A turkey isolates. Three of the four turkey isolates that carried the plasmid encoded IlpAB also carried DlpA integrated into the chromosome. Two other toxin homologs (LdpB and LdpC) were predominantly identified in isolates from turkeys suffering from necrotic enteritis, but were also identified in isolates from other sources. While there is only a small sample of turkey isolates used in this study (n=13), screening of future isolates from turkeys for these factors may reveal more about the prevalence of these genes and the potential mechanism of virulence in turkeys.
EpdA was found in two isolates from human blood and three different sources of contaminated food, providing a possible association with human disease, although considerably more sampling is required before statistical significance could be reached. It is clear given the diverse geographical range of isolates (France and New York) that the plasmid present in these strains may be widely dispersed.
While most of the toxin homologs described in this study were found on a single conserved class of plasmid, the LdpB gene was found to be co-located on two different plasmids, a beta2-encoding plasmid in a single turkey isolate and two chicken isolates, as well as on a tetracycline resistance plasmid in an isolate from an unknown source in France. This is a similar observation to enterotoxin and beta2 toxin, which are found to be encoded on multiple different plasmids, however, their co-location with tetracycline resistance is not commonly observed. Both of these plasmids encode a pCW3-like backbone, which shares a common backbone with the IlpAB plasmid.
The epsilon domain proteins EdpA and EdpB were found to be encoded on a pCP13-like backbone plasmid. The EdpA-encoding plasmid was found intact in five different isolates, while the EdpB-encoding plasmid was found in a single isolate. In contrast, the epsilon toxin is found on a pCW3-like plasmid [5
]. These results demonstrate that similar toxins can be found encoded on a diverse range of C. perfringens
plasmids with different backbones or large variable regions.
Conjugative plasmids play a very important role in C. perfringens
]. A single strain can encode up to four different toxin plasmids, with a single plasmid encoding up to three toxin genes [2
]. C. perfringens
encodes two different classes of large plasmids, pCP13 and pCW3 [3
], both of which have been demonstrated to be conjugative [27
]. This study has identified six new plasmids, two pCP13-backbone plasmids and four pCW3 backbone plasmids. The location of toxin homologs on conjugative plasmids, sometimes co-localised with other virulence genes and tetracycline resistance, was also observed here. This study therefore provides further support that a significant contribution to the genetic diversity of C. perfringens
is plasmid mediated and involves unique variable regions, including the toxin homologs, and many other genes are present on each of the plasmids.
Thresholds for protein clustering and annotation of coding sequences are important for pan-genome analysis and identification of putative new proteins. Reducing sequence thresholds too low can result in different toxins being clustered together. For example, reducing thresholds below 85% results in netB
being clustered together, hence the error in a previous study claiming netB
is present in the dog and horse isolates [25
], which has since been corrected as it is clear now that both netB
are established as two different proteins [25
The discovery of four newly identified leukotoxin domain-containing proteins, DlpA, LdpA, LdpB and LdpC, emphasises the diversity of this class of protein in C. perfringens
. With the recent discovery of NetB, NetE, NetF and NetG [8
], as well as increased functional work to define the mechanism of action of the toxins, including delta toxin, it has been shown that leukotoxin domain proteins in C. perfringens
are largely responsible for virulence and pathogenesis in multiple diseases [8
] Characterisation of two epsilon domain proteins, and the characterisation of another protein sequence with similarity to the clostridial binary toxins, suggests that there is also a high amount of genetic variability of these toxins classes, and not just within the leukotoxin domains.
The most widely published method for investigating C. perfringens
isolates from outbreaks is the use of diagnostic PCR for the toxins used in the typing scheme as well as cpb2
]. This study has shown that toxin diversity may be much greater than previously revealed and restricting diagnostics to PCR may be missing key information regarding C. perfringens
pathogenesis. We suggest the use of whole genome sequencing for C. perfringens
diagnostics and virulence investigations, in particular from diverse animal sources, as it can provide a more complete and accurate source of information, particularly on new mechanisms of virulence and associations of genetic elements with particular hosts.
This analysis has used publicly available genomic data to identify seven novel putative toxin proteins with striking similarities to characterised toxins and has localised the genes of most of them to plasmids. Further investigation, particularly on protein expression and functionality of these proteins in animal hosts or cell lines, is required before conclusions can be drawn about the functionality of these proteins as C. perfringens toxins.
4. Materials and Methods
The DNA sequences analysed in this study were obtained from two sources: FASTA files of published genomes downloaded from the NCBI genome database and genomes with only unassembled and unannotated sequence reads publicly available were downloaded from the NCBI short-read archive (SRA). Where available, the metadata (disease association, year of isolation, country of isolate) was collected for all genomes. For details of the isolates used in this study refer to Supplementary Table S1
For the genomes that required assembly, reads were assembled using Spades v.3.12.0 at default settings. All genomes were annotated with Prokka v1.13.2, and protein clustering was performed using Blastp v2.7.1 and CD-HIT in Roary v3.12.0) [38
] with minimum percentage identity of 90% (-I 90) with no splitting of paralogs (-s). Protein sequence was examined for functional domains using the NCBI conserved domains database search and Pfam [39
], signal sequences were screened using SignalP 4.1[40
]. Sequence homologs were also searched using profile hidden Markov models implemented in HMMER v3.2.1 (hmmer.org) [41
]; hmmbuild was used to create a profile for each toxin type (leukotoxin, epsilon and binary toxin) and was created from multiple sequence alignment of protein sequences of each toxin type (leukotoxin, epsilon and binary toxin). The profile was then used to search the pan-genome hmmsearch for proteins sequences with significant matches to each of the toxin profiles (--tblout).
Maximum likelihood trees of protein sequences based on alignment of novel toxin homologs to representative protein sequences were obtained from NCBI. Protein sequences were aligned using clustal omega [42
], and maximum likelihood was implemented in IQtree [43
]. The tree was inferred using the LG+F+G4 model, and rapid bootstrapping -bb 2000 and non-parametric bootstrap (-v) [43
] and visualized in Figtree. Heatmaps were produced using a percent identity matrix based on clustal omega alignments and rounded to the nearest whole number. Heatmap colours correspond to the following percent identity: dark red, 80–100%; light red, 60–79%; orange, 40–59%; bright yellow, 30–39%; pale yellow, 20–29% and white, <19%.
Plasmid assembly was performed de novo
using Spades v3.12 where reads were assembled. Contigs were scaffolded using the closed plasmids pCW3 and pCP13 as scaffolding references and to assist with gap closure and repeat resolution. Reads were mapped back to contigs for error correction using Pilon [45
] and to ensure gap closure of plasmid contigs. Plasmid contigs containing the genes of interest were extracted from the genomes and sequence alignment was performed on plasmid contigs against reference plasmids using tBlastx or blastn to investigate similarity (BLAST 2.7.1+) [46
]. Schematics of alignments were produced using Easyfig v.2.2.2 [49
]. Plasmids sequences were deposited to Genbank under the Bioproject accession PRJNA508810; the accession numbers for each plasmid are as follows: MK285071 for pCPNY83906550-1, MK285059 for pCPT1, MK285071 for pCP16SBCL1142-1, MK285060 for pCPT6-1, MK285061 for pCP16SBCL648-1 and MK285057 for pCPT84-1. The accession numbers for chromosomal regions are as follows: MK285064 for T84 dlpA
region, MK285058 for NCTC3182 cpd
region, and MK285056 for 16SBCL571 ldpA
region. Sequences of toxin homologs were deposited to Genbank under the accession numbers: MK285070 for edpA
, MK285055 for edpB
, MK285066 for dlpA
, MK285067 for ldpA
, MK285068 for ldpB
, MK285063 for ldpC
, MK285069 for ilpA
and MK285065 for ilpB