Genomic Analysis of Corynebacterium diphtheriae Strains Isolated in the Years 2007–2022 with a Report on the Identification of the First Non-Toxigenic tox Gene-Bearing Strain in Poland

Infections caused by non-toxigenic Corynebacterium diphtheriae have been reported every year in Poland since 2004, with the ST8 biovar gravis strains being most commonly isolated. This study analyzed thirty strains isolated between 2017 and 2022 and six previously isolated strains. All the strains were characterized using classic methods in terms of species, biovar level, and diphtheria toxin production, as well as by means of whole genome sequencing. The phylogenetic relationship based on SNP analysis was determined. The number of C. diphtheriae infections has been rising in Poland every year with a maximum of 22 cases in the year 2019. Since 2022, only the non-toxigenic gravis ST8 (most common) and mitis ST439 (less common) strains have been isolated. An analysis of the genomes of the ST8 strains showed that they had many potential virulence factors, such as adhesins and iron-uptake systems. The situation rapidly changed in 2022 and strains from different STs were isolated (ST32, 40, and 819). The ST40 biovar mitis strain was found to be non-toxigenic tox gene-bearing (NTTB), with the tox gene inactivated due to a single nucleotide deletion. Such strains were previously isolated in Belarus. The sudden appearance of new C. diphtheriae strains with different STs and the isolation of the first NTTB strain in Poland indicate that C. diphtheriae should be classified as a pathogen of special public health concern.


Introduction
Corynebacterium diphtheriae constitutes an etiologic agent of diphtheria, a life-threatening disease involving local infections of the respiratory tract and other mucus membranes, complicated by the effects of produced diphtheria toxin that causes early damage to the heart muscle fibers, nerve demyelination, and necrosis. Notably, diphtheria can also be caused by the C. ulcerans and C. pseudotuberculosis strains, but it is less common. Diphtheria was a serious public health problem in the past, but it has largely been brought under control thanks to the introduction of compulsory vaccination against the disease in the 1940s. However, the vaccine used globally contains only diphtheria toxoid and does not prevent infection caused by non-toxigenic strains [1]. Diphtheria toxin is the most important virulence factor of the C. diphtheriae strains and, because of that, non-toxigenic strains have been regarded as non-pathogenic. In most cases, such strains lack the tox gene that encodes the toxin. A specific group of non-toxigenic strains is made up of non-toxigenic tox genebearing (NTTB) strains that have the tox gene in their genome, although inactivated by different mechanisms [2].

Results
In Poland, like other European countries, only diphtheria (toxin-positive) cases are reported to the national epidemiological system, which is why there are no large pools of epidemiological data relating to analyzed cases. All the primary data relating to the analyzed strains are shown in Table 1. An analysis of basic clinical data shows that, among 30 tested strains isolated between 2017 and 2022, 13 (43%) were isolated from blood, 9 (30%) were isolated from wounds, 4 from respiratory tracts, and one from an eye. In three cases, there is a lack of information about the source or site of infection. In five cases, the strains were isolated from homeless people, while in another five cases, there was no address information on the diagnostics study commission. The patients' ages ranged from 17 to 90 years old with the median at 53.5. The geographical distribution of the analyzed strains with all the basic data is shown in Figure 1 and in the Microreact project available online (https://microreact.org/project/dWfYQoVtmA6Rqah76tN3KS-corynebacteriumin-poland-2017-2022-with-background, accessed on 13 December 2022) [11].
A phenotypic analysis shows that 20 strains isolated since 2017 belonged to the C. diphtheriae gravis biotype. Additionally, five previously isolated strains (years 2007-2015) also belonged to the C. diphtheriae gravis biovar. Only 10 belonged to the C. diphtheriae mitis biovar. Four of these mitis strains were isolated this year (only until August). For all the 30 strains from the years 2017-2022, sequenced on GridION, it was possible to assemble the genomes into one complete and circular chromosome, without any plasmids in most strains. All the genomes have a similar size, with an average of 2,444,112 bp. One mitis strain has a bigger genome (2,506,552 bp), while another mitis strain has a smaller genome (2,362,113 bp). Both of these outlier strains were isolated in 2022.
All 25 C. diphtheriae gravis biovar strains were negative in terms of diphtheria toxin, both in the ELEK test and in PCR. The sul1 gene was present in 14 strains and it was the only resistance gene found in these strains. A classic seven-gene MLST assay determined that twenty-four strains (96%) belonged to ST8 and one belonged to ST32. An analysis of genes coding different virulence factors showed that the analyzed gravis ST8 strains were better equipped than the mitis strains and had all the verified genes present in their genome (except the diphtheria toxin gene, even when searching with low similarity and length thresholds). The C. diphtheriae biovar gravis ST32 strain had neither the DIP0543 (neuraminidase) nor DIP2093 (putative fimbrial adhesin) genes, nor the whole SpaD-type pili gene cluster. All the results are shown in Table 2. Moreover, all the tested mitis strains were negative in terms of diphtheria toxin in an ELEK test, but one strain isolated in 2022 was positive in PCR tests. Eight strains belonged to ST439, one belonged to ST819, and the NTTB strain belonged to ST40. No antimicrobial resistance determinants were found in these strains. In general, the mitis strains had fewer virulence factor determinants with the lack of the SpaH-type pili cluster, DIP2093 (putative fimbrial adhesin), and DIP0543 (neuraminidase). In the mitis strains, the SpaA-type and the SpaD-type pili clusters are slightly different (around 85-96% of sequence identity). Additionally, strains from ST439 lack the spaB and spaD genes. In comparison, the whole SpaD-type pili gene cluster and the spaC gene (part of the SpaA-type pili cluster) were not detected in the other analyzed mitis STs.  All 25 C. diphtheriae gravis biovar strains were negative in terms of diphtheria toxin, both in the ELEK test and in PCR. The sul1 gene was present in 14 strains and it was the only resistance gene found in these strains. A classic seven-gene MLST assay determined that twenty-four strains (96%) belonged to ST8 and one belonged to ST32. An analysis of genes coding different virulence factors showed that the analyzed gravis ST8 strains were better equipped than the mitis strains and had all the verified genes present in their genome (except the diphtheria toxin gene, even when searching with low similarity and length thresholds). The C. diphtheriae biovar gravis ST32 strain had neither the DIP0543 (neuraminidase) nor DIP2093 (putative fimbrial adhesin) genes, nor the whole SpaD-type pili gene cluster. All the results are shown in Table 2.
Moreover, all the tested mitis strains were negative in terms of diphtheria toxin in an ELEK test, but one strain isolated in 2022 was positive in PCR tests. Eight strains belonged to ST439, one belonged to ST819, and the NTTB strain belonged to ST40. No antimicrobial resistance determinants were found in these strains. In general, the mitis strains had fewer virulence factor determinants with the lack of the SpaH-type pili cluster, DIP2093 (putative fimbrial adhesin), and DIP0543 (neuraminidase). In the mitis strains, the SpaA-type and the SpaD-type pili clusters are slightly different (around 85-96% of sequence identity). Additionally, strains from ST439 lack the spaB and spaD genes. In comparison, the whole SpaD-type pili gene cluster and the spaC gene (part of the SpaA-type pili cluster) were not detected in the other analyzed mitis STs. More detailed analyses were performed on the sequence of the tox gene of the C. diphtheriae ST40 strain. It was found that the diphtheria toxin gene was complete, but a single nucleotide deletion at position 55 (deletion of a single guanidine) resulted in an early frameshift, and a disruption of the gene was found. The same single-nucleotide deletion was found when compared to all the strains isolated previously in Belarus. An attempt was made to estimate the frequency of the occurrence of this mutation by searching for it using the BLASTn software, and only four more of such sequences deposited in the GenBank database (strains isolated in the UK, Belgium, Russia, and Australia, according to data in the GenBank) were found.
A complete dendrogram based on the SNP analysis is shown in Figure 1. A phylogenetic analysis confirmed that all the gravis strains that belonged to ST8 were closely related. The five oldest strains can be distinguished as a separate, closely related subgroup. This should be analyzed carefully because of the fact that these strains were sequenced only using the Illumina platform and, as a result, a different assembling algorithm.   SNPs between other strains). These strains were obviously isolated in different periods of time, which can explain such a number of SNPs observed.
All the whole genome sequences were deposited in the GenBank public database as part of BioProject No. PRJNA873913.

Discussion
The last diphtheria case in Poland was recorded in 2000, and since 2004, only infections caused by non-toxigenic C. diphtheriae have been observed every year (data from the Department of Bacteriology and Biocontamination Control of National Institute of Public Health NIH-NRI). As shown in Figure 2, before the COVID-19 pandemic, the number of such infections increased over the years, with a maximum of 22 cases in 2019. These data confirm that non-toxigenic C. diphtheriae should be treated as a re-emerging pathogen. During the COVID-19 pandemic, the number of many different infections significantly decreased, and a similar decrease can be observed in the C. diphtheriae cases. Determining whether that was because of a lower number of infections or the shift of diagnostic scope towards SARS-CoV-2 detection causing many different infections to remain undetected is obviously difficult. The non-toxigenic gravis strains from ST8 are predominant in Poland and their recorded number has been stable over the years. Between 2011 and 2015, only C. diphtheriae strains from this biovar were isolated in Poland. Since 2016, the mitis biotype strains have also been observed every year but at a much lower number (one to four cases per year), while before 2022, all these mitis strains belonged to ST439. Therefore, the picture of the C. diphtheriae strains isolated in Poland before 2022 looks rather monotonous and it was limited to only two biovars and two sequence types. These data correspond to a previous analysis performed by Czajka et al. [12], where the non-toxigenic gravis ST8 strains were also definitely the most frequently observed. This suggests these strains can be endemic to Poland. In contrast, in neighboring Germany, 20 different STs were identified among 76 strains from 2016 to 2017, but the most commonly identified were also ST8 (54%) and ST439 (7%), as well as ST130, which has not been observed in Poland (13%) [3]. Interestingly, ST8 C. diphtheriae was not recorded in Germany until 2015. In a comparable period of time, in Austria, around four to eight cases of Corynebacterium infections were observed every year (a maximum of thirteen cases in 2014), but the strains The non-toxigenic gravis strains from ST8 are predominant in Poland and their recorded number has been stable over the years. Between 2011 and 2015, only C. diphtheriae strains from this biovar were isolated in Poland. Since 2016, the mitis biotype strains have also been observed every year but at a much lower number (one to four cases per year), while before 2022, all these mitis strains belonged to ST439. Therefore, the picture of the C. diphtheriae strains isolated in Poland before 2022 looks rather monotonous and it was limited to only two biovars and two sequence types. These data correspond to a previous analysis performed by Czajka et al. [12], where the non-toxigenic gravis ST8 strains were also definitely the most frequently observed. This suggests these strains can be endemic to Poland. In contrast, in neighboring Germany, 20 different STs were identified among 76 strains from 2016 to 2017, but the most commonly identified were also ST8 (54%) and ST439 (7%), as well as ST130, which has not been observed in Poland (13%) [3]. Interestingly, ST8 C. diphtheriae was not recorded in Germany until 2015. In a comparable period of time, in Austria, around four to eight cases of Corynebacterium infections were observed every year (a maximum of thirteen cases in 2014), but the strains were much more diverse (mostly the belfanti, mitis, and gravis biovars) [13]. An additional MLST analysis of the Austrian strains confirmed this high diversity (34 different STs among 57 strains were identified) with no ST8 and only four ST439 strains. An analysis performed on the other side of Poland, in neighboring Belarus, showed that ST5 and ST8 C. diphtheriae were most frequently recorded [14]. Furthermore, in this study, ST5 included strains assigned to the belfanti, gravis, and mitis biovars and were non-toxigenic, whereas ST8 strains were toxigenic. Previously, such toxigenic ST8 C. diphtheriae strains were often isolated during the epidemic in the 1990s in the former Soviet Union (FSU) and were still isolated both in Belarus and Russia in the post-epidemic period [14,15]. Moreover, Borisova et al. [15] showed that, in Russia, most of the toxigenic strains isolated there in the years 2002-2012 belonged to the gravis biovar and ST8. These data suggest a spread of pathogenic ST8 C. diphtheriae from Eastern to Western Europe, correlated with the loss of the tox gene and the transformation to a non-toxigenic strain due to the high vaccination rate in Poland and other Western European countries [3,16].
An analysis of different virulence factors showed that the C. diphtheriae ST8 strains are better equipped, which may somehow explain their epidemiological success and stable existence in Poland. In these strains, many genes and gene clusters involved in adhesion were found. The SpaA-type pili have been shown to interact with the pharyngeal epithelial cells, while the SpaD-type and SpaH-type pili are responsible for adhesion to the lung and laryngeal epithelial cells [17][18][19]. A complete DIP2093 gene was also found that was shown to encode collagen-binding proteins [20], as well as DIP1621 and DIP1281, which both could play a role in adhesion to epithelial cells [21,22]. These data correspond to the analysis of Belarusian strains and the authors' hypothesis about their greater abilities to adhere to and invade host cells [14].
On the other hand, such conclusions should be made with caution, as most of such analyses use C. diphtheria NCTC 13129 strains as a reference, and this strain belongs to the ST8 gravis biovar and was isolated in Russia during the epidemic in the 1990s. In our analysis, some virulence genes and clusters, such as the SpaA-type pili cluster or SpaD-type pili cluster, had low sequence similarity (around 85-96%) with a lack of some genes in the clusters. Some of these genes were also previously found to be pseudogenes [23]. These differences suggest that some of these virulence determinants have different genetic organization and characteristics within this biovar.
In 2022, the invariable situation in Poland changed and an increased number of infections with strains belonging to other STs and/or the mitis biotype has been reported. Our analysis revealed that, during that year, the first ST40 NTTB, ST32, and ST819 strains appeared. The reason for this sudden variation remains unclear and, because of the lack of proper epidemiological data, can be only speculative. The main hypothesis is associated with the war in Ukraine that began on 24 February 2022 (just before the first different strains were isolated on 17 May) and the related significant migration of Ukrainian refugees to the whole territory of Poland. This hypothesis could especially explain the emergence of the ST40 NTTB strain, which has the eastern link, with strains isolated previously in Belarus.
Unfortunately, there is no clear information on the ST819 strains in the literature, with only one such strain reported in the Corynebacterium BigsDb by the Bavarian Health and Food Safety Authority (LGL), which suggests that this ST was previously recorded in Germany. On the contrary, the ST32 gravis strains have previously been widely reported in Europe [13,24,25], but also worldwide, e.g., in Canada [26] and Australia [27]. The ST32 strain described herein was found in a patient in Szczecin, located in the western part of Poland, close to the German border, and it is possible that it was transferred from the west. It was suggested that the spreading success of this clone is due to its superior adherence properties [19]. However, in this study, the SpaD-type pili gene cluster, or RS23695, which seems to encode collagen-binding proteins, was not found [20], but, on the contrary, all these genes were found in ST8 strains.
The NTTB strains require special attention due to the potential possibility of reverting the tox gene to the functional version by a spontaneous mutation or homologous recombination between different corynebacteriophages. That is why there has been a special focus on the ST40 NTTB strain isolated in 2022. As previously mentioned, Grosse-Kock et al. [14] found such strains in Belarus in the years 1996-1999 and in 2007. Moreover, Zakikhany et al. [2] found one such strain in the UK, but in this case, it was isolated not from a human, but from a cat. A comparative analysis of the tox gene sequence has shown that the Polish ST40 strain has the same single nucleotide deletion in the tox gene in position 55 as the strains isolated in Belarus and in the UK. Such point deletions seem to often be the cause of a frameshift and tox gene inactivation in the NTTB strains. The same point mutation was found in strains belonging to different STs in Australia [28]. Moreover, a Russian analysis of the NTTB strains isolated in the years 1994-2002 suggests that the deletion of one nucleotide usually inactivates the tox gene [29]. There are also different causes of tox gene inactivation in the NTTB strains, including mutations in the dtxR gene, which encodes the diphtheria toxin regulator, and in the promoter region of the tox gene [2,28].
The main limitation of this study is a lack of precise epidemiological data because of the fact that infections caused by non-toxigenic strains are not reported in the national epidemiological systems. As a result, it is, for example, not clear how the NTTB strain ST40 appeared in a Polish citizen from Katowice, which is a city in the Silesia region (geographically located in the south of Poland), far from the Polish eastern border.
In conclusion, the sudden appearance of new C. diphtheriae strains with different STs and the isolation of the first NTTB strain in 2022 in Poland indicate that C. diphtheriae should be a pathogen of special public health concern, and effective national surveillance is essential.

Bacterial Strains
In this study, 16 randomly selected C. diphtheriae strains isolated in Poland in the years 2017-2020 (4 strains from each year) were used. These strains were selected from all the strains sent for routine diagnostics and collected in the Department of Bacteriology and Biocontamination Control of the National Institute of Public Health NIH-NRI in Warsaw, which should represent all strains isolated from clinical samples in Poland. Additionally, 7 strains isolated in 2021 and 7 strains in 2022 (up to August) were sequenced, which constituted all strains collected during these years. Additionally, as a background, 6 previously isolated strains were analyzed. Out of these strains, 5 were isolated between 2007 and 2015 in Poland, while one strain was toxigenic and was isolated from a patient with diphtheria in the 1990s.

Species Verification and Toxin Identification
The strains were identified as C. diphtheriae using Gram staining; colonies' morphology on Columbia agar with 5% sheep blood, Clauberg agar, and Tinstale agar; and a biochemical assay using the API Coryne system (BioMerieux). Diphtheria toxin production was tested using the ELEK test according to the WHO Manual [30]. The occurrence of the tox geneencoding diphtheria toxin was tested with PCR according to Pallen et al. [31] for the active part of the toxin and Hauser et al. [32] for the whole toxin gene.

Whole Genome Sequencing
Whole genome sequencing was performed in different periods of time, separated by the pandemic period. As a result, different sequencing platforms were used for older and newer strains. Sequencing paired-end libraries of the 6 oldest strains were performed using the Illumina Nextera XT kit and were sequenced on the Illumina NextSeq 500 platform. Raw reads were assembled using SpaDES 3.11.0 [33] and CLC Genomics Workbench and merged with CISA [34]. The library preparation and sequencing were performed in the Biobank Lab, University of Łódź.
The sequencing of the 16 strains isolated in the years 2017-2020 was performed using the Illumina and Oxford Nanopore Technologies (ONT) platforms. Illumina paired-end libraries were performed using the Illumina DNA Prep kit, while whole genome sequencing was carried out on the MiSeq instrument. ONT libraries were performed using Rapid Barcoding Kit 96 and sequenced for 26 h on GridION using R9.4 Flow Cells and the Super Accurate basecalling algorithm. Hybrid assembly was performed using the CLC Genomics Workbench and the NanoForms online server [35]. The last 14 strains were sequenced using only the ONT GridION platform, as described previously. Assembly was performed using the CLC Genomics Workbench and the NanoForms online server. For further analysis, sequences assembled with NanoForms were used.

Phylogenetic Analysis
Whole genome sequences were deposited in BIGSdb-Pasteur for further cgMLST analysis. wgSNPs analysis was performed for all the analyzed strains using CSI Phylogeny 1.4, available on the CGE website [41]. In the next stage of the analysis, five sequences of Belarus strains ST40 were added for a broader analytical context. Because of the fact that the sequences came from different sequencing platforms, only FASTA files were used for comparison.
Author Contributions: T.W. was involved in the concept design of the study, most of the laboratory analyses (including WGS), data analysis, and writing the manuscript. K.Z. was involved in the concept of the study, laboratory works (including phenotypic analysis and WGS), and revising the manuscript. A.A.Z. was involved in the concept design of the study, the revising of the manuscript, and study founding organization. All authors have read and agreed to the published version of the manuscript.