Escherichia coli O-Antigen Gene Clusters of Serogroups O62, O68, O131, O140, O142, and O163: DNA Sequences and Similarity between O62 and O68, and PCR-Based Serogrouping

The DNA sequence of the O-antigen gene clusters of Escherichia coli serogroups O62, O68, O131, O140, O142, and O163 was determined, and primers based on the wzx (O-antigen flippase) and/or wzy (O-antigen polymerase) genes within the O-antigen gene clusters were designed and used in PCR assays to identify each serogroup. Specificity was tested with E. coli reference strains, field isolates belonging to the target serogroups, and non-E. coli bacteria. The PCR assays were highly specific for the respective serogroups; however, the PCR assay targeting the O62 wzx gene reacted positively with strains belonging to E. coli O68, which was determined by serotyping. Analysis of the O-antigen gene cluster sequences of serogroups O62 and O68 reference strains showed that they were 94% identical at the nucleotide level, although O62 contained an insertion sequence (IS) element located between the rmlA and rmlC genes within the O-antigen gene cluster. A PCR assay targeting the rmlA and rmlC genes flanking the IS element was used to differentiate O62 and O68 serogroups. The PCR assays developed in this study can be used for the detection and identification of E. coli O62/O68, O131, O140, O142, and O163 strains isolated from different sources.

products to spot the array and labeled long PCR products for hybridization [10]. Lin et al. [11] performed PCR assays targeting the wzx and wzy genes of ten Shiga toxin-producing E. coli (STEC) serogroups, and then used the Luminex system to identify the ten serogroups through binding of the PCR products to fluorescent microspheres conjugated to specific DNA probes for each of the ten serogroups. Furthermore, multiplex assays can be designed to detect specific pathogenic E. coli serogroups targeting O-antigen gene cluster sequences and virulence genes [7,12]. Use of the Luminex system (Luminex, Austin, TX, USA) employing monoclonal antibodies coated to carboxylated magnetic microbeads to simultaneously detect Shiga toxin serogroup O157, as well as Shiga toxin 1 and Shiga toxin 2 has also been reported [13]. A review by DebRoy et al. [7] provides information on E. coli O-antigen gene clusters and methods used for O-group determination.
There are a number of E. coli pathotypes, consisting of various E. coli O-groups, that have been isolated from animals and that can cause illness in humans and animals. Enteropathogenic E. coli (EPEC) O142 has been isolated from infant stools, patients with diarrhea, and piglets [14][15][16]. Shiga toxin-producing E. coli (STEC) O62 was isolated from pork, and this serogroup has also been described as an enteroaggregative E. coli [17,18]. Verocytotoxin producing E. coli (VTEC, also known as STEC) O163 has been associated with cases of hemolytic uremic syndrome [19,20]. In addition, E. coli O163 was isolated from animals, including cows [21], lamb [22], goats, sheep [23], and pigs [24]. E. coli O131 was associated with pigs with post-weaning diarrhea in China [25], and E. coli O140 was associated with broiler chickens with dermatitis [26] and piglets with diarrhea [27].
Various molecular serotyping approaches could be used to identify E. coli O-groups, including the use of the Luminex ® system, DNA microarrays, or the BioMark TM real-time PCR array system (Fluidigm Corporation, South San Francisco, CA, USA), and others. Using some of these approaches, O-group determination could be coupled with simultaneous identification of virulence genes specific for certain E. coli pathotypes. However, to accomplish this, definitive determination of the O-antigen gene cluster sequences of all of the E. coli O-groups and of strains identified as untypeable by serotyping is needed. The objectives of this study were to determine the DNA sequence of the O-antigen gene clusters of E. coli serogroups O62, O68, O131, O140, O142, and O163, analyze the sequence data, and identify unique regions that are suitable targets for PCR assays to identify these serogroups. This work provides essential information for the application of molecular methods to differentiate E. coli serogroups, which is critically needed for accurate identification of E. coli and for epidemiological investigations of disease outbreaks.

DNA Sequencing and Gene Annotation
Genomic DNA was isolated using the DNeasy Tissue Kit (Qiagen Inc., Valencia, CA, USA) according to the manufacturer's instructions. Long PCR assays were performed to amplify the O-antigen gene clusters using the Expand Long Template PCR system (Roche Applied Science, Mannheim, Germany) and the JUMPSTART (named for Just Upstream of Many Polysaccharide-associated gene STARTs) and GND (6-phosphogluconate dehydrogenase gene) primer set targeting sequences that flank the E. coli O-antigen gene clusters as described previously [12]. However, some modifications were made to the JUMPSTART and GND primer sequences, and they are the following: JUMPSTART primer 5'-CATGGTAGCTGTAAAGCCAGGGGCGGTAGCGTG-3'; GND primer 5'-CATGCTGCCATACC GACGACGCCGATCTGTTGCTTKGACA-3' (Integrated DNA Technologies, Coralville, IA, USA). The long PCR conditions were as described previously [9]. The long PCR products were verified on 0.8% agarose gels and purified according to instructions in the QIAquick PCR Purification Kit (Qiagen Inc., Valencia, CA, USA). The long PCR products were sequenced by the methods described below.
DNA integrity was verified using a Bioanalyzer 2100 (Agilent, Palo Alto, CA, USA), and DNA concentration was quantified using a QuantiFluor fluorometer (Promega, Madison, WI, USA). For sequencing with the Roche/454 GS FLX instrument (Roche, 454 Life Sciences, Branford, CT, USA), the O-antigens were amplified from 40 ng of genomic DNA isolated as described above, with eight-bp sample-specific bar coded primers using 2.5 units of AccuPrime Taq DNA Polymerase High Fidelity (Invitrogen, Carlsbad, CA, USA) in a 50-μL reaction buffer containing 200 nM primers, 200 nM dNTP, 60 mM Tris-SO4, 18 mM (NH4)2SO4, 2.0 mM MgSO4, 1% glycerol, and 100 ng/µL bovine serum albumin (New England BioLabs, Ipswich, MA, USA). PCR was performed using the following cycling profile: initial denaturing at 95 °C for two min followed by 25 cycles of 95 °C 30 s, 50 °C 30 s, and 72 °C 120 s. Bar-coded amplicons were generated from each sample separately, purified using an Agencourt AMPure XP kit (Beckman Coulter Genomics, Danvers, MA, USA), and quantified using a QuantiFluor fluorometer. Bar-coded amplicons from individual samples were pooled in equal mass (molar) ratios. The purified bar-coded amplicon library was further verified and quantified using a BioAnalyzer 2100 (Agilent) and subjected to genome sequencing using the Roche/454 GS FLX. Illumina HiSeq 2000 (San Diego, CA, USA) sequencing was performed as described by Djikeng et al. [28] using long PCR products. The sequence reads generated from the Illumina, Roche/454 GS FLX, and the Sanger sequencing method using the 3730 DNA Analyzer (Applied Biosystems, Foster City, CA, USA) (see below) were each first assembled separately. The sequence data and the generated contigs were then combined and assembled into the final O-antigen clusters using CLC Genomics Workbench 4.6.1 (CLC bio, Aarhus, Denmark) and Sequencher version 5.1 software (Gene Codes Corporation, Ann Arbor, MI, USA). Some additional details on the sequencing strategy and contig assembly were as described by Djikeng et al. [28]. To confirm the sequences of each of the O-antigen gene clusters, the long PCR products were resequenced using a 3730 DNA Analyzer (Applied Biosystems, Foster City, CA, USA) using primers designed from different regions along the gene clusters, and gene annotation was performed as described previously [9]. The HMMTOP program [29] was used to identify potential transmembrane helices from the amino acid sequences.

PCR Specificity Testing
E. coli reference strains [1] and field strains belonging to serogroups O62, O68, O131, O140, O142 and O163, and non-E. coli bacteria were grown overnight on tryptic soy agar (TSA) plates at 37 °C. Single colonies were picked and resuspended in 100 µL of Tris-EDTA buffer (pH 8.0) and heated at 100 °C for 10 min. The suspension was centrifuged at 10,000× g, and the supernatant containing genomic DNA was used for the PCR reactions.
The PCR primers (Table 1) were designed from the wzx, wzy, and rmlA/rmlC region of the targeted O-serogroups. The PCR reaction mix (20 µL total volume) was comprised of template DNA (1 µL), 300 nM of each primer, and 10 µL of the Power SYBR ® Green PCR master mix containing Taq Polymerase (Life Technologies, Carlsbad, CA, USA). RT-PCR reactions were conducted using an AB 7300 Real-Time PCR system (Applied Biosystems). The PCR cycling conditions consisted of an initial denaturation for 10 min at 95 °C followed by 40 cycles at 95 °C for 15 s and 60 °C for 1 min. Reaction mixtures without template DNA and without primers served as negative controls. Data were analyzed using 7300 system SDS software (Applied Biosystems, Foster City, CA, USA).

Results and Discussion
DNA sequences obtained from the E. coli O antigen gene clusters of serogroup O62, O68, O131, O140, O142 and O163 contained 9 to 12 ORFs (Figure 1 and Appendix Tables A1-A6), all in the same transcriptional direction from galF to gnd. The deduced amino acid sequences from these ORFs were used to search the NCBI database for an indication of their possible functions. Gene names were assigned on the basis of the Bacterial Polysaccharide Gene Nomenclature system (http://sydney.edu.au/science/ molecular_bioscience/BPGD/).
The polysaccharide structure of the E. coli O142 and O68 O-antigens has been determined [31][32][33]. The proposed function of the genes in the O-antigen gene clusters of E. coli O142 and O68 correlates well to the identified O142 and O68 polysaccharide structure [32,33].

Sugar Transferase Genes
Genes encoding for sugar transferases were identified based on their similarity to known sugar transferases. As shown in Figure 1 and Appendix Tables A1-A6, O62, O68, O140, and O163 each contained three sugar transferases, whereas O131 and O142 contained five and four sugar transferases, respectively. These ORFs have a high degree of sequence variation (30%-60% amino acid similarity), which is consistent with previous studies [30].

O Antigen Processing Genes
All of the six O-antigen gene clusters contained the wzx and wzy genes located in different regions within the gene clusters (Appendix Tables A1-A6). Analysis using the HMMTOP program [29] indicated that all six Wzx proteins contained 12 transmembrane helices, whereas the Wzy proteins contained 10 transmembrane helices, with the exception of the Wzy protein from O142 that contained 13 transmembrane helices.

Development of PCR Assays to Identify E. coli O62/O68, O131, O140, O142, and O163 Serogoups
Primers were designed targeting the wzx and/or wzy genes from the above E. coli serogroups (Table 1), and they were used in PCR assays to determine specificity for each serogroup against 174 E. coli standard strains, as well as field E. coli strains serotyped as O62, O68, O131, O140, O142 and O163 isolated from humans, animals, food, or water. Sixteen non-E. coli strains (see Experimental Section for the list of non-E. coli strains) were also included as negative controls for specificity testing. PCR assays targeting the wzx/wzy genes showed high specificity for each serogroup with no amplification of wzx/wzy genes from other E. coli serogroups and no amplification of DNA of other bacterial genera. All of the field isolates serogrouped as E. coli O131, O140, O142 and O163 were positive by PCR for the corresponding serogroup with 100% accuracy. However, the O62 wzx PCR assay also gave a positive result with the O68 reference strain ( Table 2). This is not surprising, since our sequencing data alsodemonstrated that the wzx sequences of O62 were identical with those of O68 (Appendix Tables A1  and A2). The field strains of E. coli O62 (n = 2) and O68 (n = 6) also exhibited positive PCR results with the wzx primers of O62. 100% negative a Although two strains were positive using both O62 and O68 antisera similar to the O62 reference strain, one strain did not show the presence of the insertion element found in the O62 reference strain by PCR, therefore, one strain could be either O62 or O68.

Acquisition of the IS1 Element in E. coli O62 and Evolutionary Implications and Differentiation of Serogroups O62 and O68
Analysis of the O-antigen gene clusters of E. coli O62 and O68 showed that they are almost identical, except that E. coli O62 contained an IS element insertion (ORF4), 748 bp in size at the end of the rmlA gene. ORF4 (insB) encodes for a transposase that is identical to IS1 transposition proteins in Shigella flexneri 2b (Appendix Table A1). The IS1 element in E. coli O62 is inserted within the third codon from the end of the rmlA gene, resulting in a truncated protein ending with an R (arginine) in place of K (lysine), and in comparison with E. coli O68 the last two amino acids are missing (Figure 1). IS1 is a common mobile genetic element that usually generates a 8 to 9-bp target duplication upon integration [34]. In addition, the IS1 element contains 23-bp imperfect terminal repeats that is a characteristic of an IS element [35,36]. The IS1 element is widely distributed in prokaryotic genomes, is highly mobile, and can be a source of genome rearrangements [37,38]. The IS elements present in E. coli O24 seemed to play important roles for the assembly of the O24 O-antigen gene cluster by mediating lateral gene transfer and gene inactivation [5]. The high level of similarity between the O-antigen gene clusters of the E. coli O62 and O68 reference strains suggests that the O-antigen gene clusters are very closely related and may be derived from a common ancestor.
To differentiate E. coli O62 and O68, primers were designed targeting the rmlA and rmlC region flanking the IS element from O62 ( Table 1). The predicted PCR products for O62 and O68 are 1969 bp and 1172 bp, respectively. These primers were used in PCR assays to differentiate two O62 and six O68 (determined by serotyping) field strains in our strain collection. Of the strains tested, six O68 strains were positive for O68 using the rmlA/C PCR (i.e., lacked the IS element), and they were positive only for O68 by serotyping ( Table 3). One of the two strains that were originally serotyped as O62 strains was positive for O68 according to the PCR assay targeting rmlA/C (i.e., lacked the IS element) (Table 3); however, this strain was also positive for O68 by serotyping, similar to the pattern for O62 strains, which are positive by serotyping for both O62 and O68. Therefore, this strain should either be re-assigned as a variant of O68, or it is possible that it is actually O62, but does not carry the IS element. Because there are so few field strains belonging to serogroup O62 in a collection of approximately 70,000 strains at the E. coli Reference Center at the Pennsylvania State University, collected over the last fifty years, this suggests that this O-group is not commonly found in animals, humans, and the environment. Our data show that the O-antigen gene clusters of E. coli O62 and O68 are very similar. The high similarities between O62 and O68 likely result in antisera cross reaction, which is an important problem in traditional serotyping. It is puzzling, however, the antiserum prepared against O62 does not cross react with O68, whereas antiserum against O68 cross reacts with O62. To accurately serotype O62/68 strains, it is important to first perform serotyping followed by the PCR assay using the rmlA/C primers flanking the IS element for O62 positive (by serotyping) strains.

Conclusions
The O-antigen gene cluster sequences for six E. coli serogroups have been determined, and thus PCR primers can be designed for unique regions within the gene cluster sequences to develop genetic-based methods for serotyping, which are more specific than traditional serotyping. The PCR assays designed in the current study could potentially be used for rapid diagnostic screening for the E. coli serogroups. Since serotyping results are often ambiguous and sometimes may not be able to distinguish the serogroups, PCR assays in conjunction with serotyping may be able to circumvent these problems and distinguish the serogroups more accurately.