1. Introduction
Foodborne pathogens, such as
Escherichia coli,
Salmonella spp.,
Campylobacter spp., and
Listeria monocytogenes, remain a major cause of disease globally [
1]. The World Health Organization estimates that each year, one in ten people worldwide will be sickened by a foodborne pathogen, and 420,000 people will die [
2]. In the United States (U.S.), an estimated 48 million people become ill each year [
3]. Additionally, pathogen contamination of food is a significant economic burden estimated to cost the world economy USD 110 billion [
2] and the U.S. economy USD 17 billion annually [
4]. During 2021, over 15 million pounds of meat were recalled in the U.S., and Shiga toxin-producing
Escherichia coli (STEC) was the cause of two of those recalls, totaling 300,096 pounds [
5]. Infections by STEC have been increasing since 2018 and have an incidence rate of 5.7 per 100,000 people [
6]. An STEC infection generally causes diarrhea and vomiting but may result in severe diseases such as hemorrhagic colitis or hemolytic uremic syndrome [
7].
The isolation and identification of STEC as an adulterant in meat by the U.S. Department of Agriculture Food Safety and Inspection Service (USDA FSIS) is achieved through a combination of culturing, molecular methods, O typing, and matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) mass spectrometry [
8]. Food samples are considered adulterated if they test positive for the
stx and
eae genes and one of the seven targeted O antigens commonly associated with STECs isolated from symptomatic patients [
8]. The
stx1 and
stx2 genes encode Shiga toxins 1 and 2, which cause cytotoxicity in the host, and expression of
eae produces intimin, which mediates enterocyte colonization [
7]. The serotype most frequently associated with foodborne STEC outbreaks is O157:H7 [
6]. The serotype of
E. coli is determined by the H antigen, which is present on the flagella, and the O antigen, which is present on the outer membrane [
9]. The current culture-based identification process takes at least four days to complete [
8]. Whole-genome sequencing could reduce the amount of time and labor needed for foodborne pathogen identification.
Advances in whole-genome sequencing technology have led to third-generation, or long-read, sequencing that could significantly reduce the amount of time needed to identify foodborne pathogens compared to the current culture-based methods. Oxford Nanopore Technologies’ MinION device sequences RNA or DNA by detecting changes in electrical current as the strands of nucleic acid pass through nanopores on a flow cell [
10]. The long reads generated facilitate genome assembly [
11], while their real-time analysis allows pathogen detection to be accomplished in hours instead of days [
12]. The small, portable sequencers allow whole-genome analysis to be conducted outside of traditional laboratories, and the cost is generally lower than second-generation sequencing. Additionally, significant progress has been made to reduce errors in nanopore sequencing, and raw read accuracy has improved to >99.9% [
13]. A previous in silico study by our research group also suggested that long-read sequencing would be practical for testing food for
E. coli and
L. monocytogenes contamination after growth enrichment [
14].
The goal of this project was to evaluate the potential of long-read whole-genome sequencing for STEC detection. The objectives included establishing optimal sequencing parameters, determining the limit of detection of all STEC virulence genes of interest in pure culture and STEC-inoculated ground beef, and assessing the ability of software-controlled enrichment and depletion of specific genomic material to enhance the detection of STEC in inoculated meat.
2. Materials and Methods
2.1. Bacterial Culturing
Escherichia coli O157:H7 (ATCC 43895; American Type Culture Collection, Manassas, VA, USA) was grown in tryptic soy broth (Oxoid Limited, Hampshire, UK) modified with novobiocin (mTSB; RPI Corporation, Mount Prospect, IL, USA) overnight at 42 ± 1 °C according to USDA’s Food Safety and Inspection Service procedures [
8]. The optical density at a wavelength of 600 nm (OD 600) was measured with a DeNovix DS-11 FX+ spectrophotometer (DeNovix Inc., Wilmington, DE, USA) to determine bacterial concentration.
2.2. STEC Inoculated Ground Beef
Ground beef spiked with
E. coli O157:H7 was processed according to USDA FSIS methods [
8]. The following treatments were included: media only; uninoculated meat; 1 × 10
7 cfu mL
−1 E. coli; and ground beef inoculated with 1 × 10
5 cfu g
−1, 1 × 10
6 cfu g
−1, or 1 × 10
7 cfu g
−1 of
E. coli. Treatments were prepared by placing 1 ± 0.01 g of ground beef on one side of a sterile 7 oz Whirl-Pak
® (Austin, TX, USA) strainer bag (except for the media and
E. coli only controls) and then diluting 1:4 with mTSB. Each bag was stomached for 120 s with a Bag Mixer (Spiral Biotech Inc., Norwood, MA, USA). One experiment without enrichment was conducted with triplicate samples, and one experiment was conducted with enrichment with triplicate samples. The bags in the enrichment experiment were incubated statically for 24 h at 42 ± 1 °C. The samples were filtered through a 40 µm cell strainer (Greiner Bio-One North America Inc., Monroe, NC, USA) and then centrifuged at 130×
g for 10 min. The supernatant was retained. An aliquot was plated on rainbow agar (Biolog Inc., Hayward, CA, USA) modified with potassium tellurite (Thermo Fisher Scientific, Waltham, MA, USA), novobiocin (RPI), and cefixime (RPI) (mRBA) to determine the concentration of
E. coli. The remaining supernatant was centrifuged at 3400×
g for 20 min. The supernatant was discarded, and the pellet was washed in 1 mL of phosphate-buffered saline (PBS; Boston Bioproducts Inc., Ashland, MA, USA) and then centrifuged at 12,000×
g for 1 min. The supernatant was discarded, and the pellet was retained for DNA extraction.
2.3. DNA Extractions
The DNA from pure cultures of E. coli was extracted with a Monarch High-Molecular-Weight DNA Extraction Kit (New England BioLabs Inc., Ipswich, MA, USA) according to the manufacturer’s instructions. The pellets from the ground beef experiments were extracted using a Qiagen DNeasy PowerFood Microbial Kit (Qiagen, Germantown, MD, USA) according to the manufacturer’s instructions. DNA concentration and quality measurements were taken with a Denovix DS-11 FX+ spectrophotometer.
2.4. MinION Sequencing
Libraries of E. coli O157:H7 DNA were prepared using a Field Sequencing Kit (SQK-LRK001, Oxford Nanopore Technologies [ONT], Oxford, UK) according to the manufacturer’s instructions. A flow cell check was performed prior to sequencing to ensure that enough pores were available for sequencing. The MinION Mk1B or Mk1C (ONT) were used with R9 flow cells (ONT). The optimal sequencing run time for the DNA extracted from a pure culture of E. coli O157:H7 was determined by testing the following time points in triplicate: 1 h, 2 h, 4 h, and 6 h. The limit of detection (LOD) for identifying target virulence genes in E. coli O157:H7 DNA was determined by testing the following DNA concentrations in triplicate: 400 ng, 200 ng, 100 ng, 50 ng, 25 ng, 12.5 ng, 6.25 ng, 3.125 ng, 1.56 ng, 0.78 ng, and 0.39 ng. The 400 ng concentration was sequenced with a 1 h run duration, and the 200 ng concentration was sequenced with a 2.5 h duration. The duration of the sequencing run for the remaining concentrations was determined using nonlinear regression to approximate the amount of time needed to obtain 400 k reads, which was the average number of reads generated during a 1 h run time with optimal (400 ng) DNA input. The DNA from spiked ground beef experiments was sequenced for 24 h. In some of the meat sequencing runs, the software-controlled depletion of the Bos taurus (domestic bovine) genome or the enrichment of the E. coli O157:H7 genome was employed. The reference B. taurus genome (NCBI Accession #NC037338.1) was uploaded into the MinKNOW software (ONT, version 22.03.6), and software-controlled depletion was enabled. When software-controlled enrichment was enabled, the E. coli O157:H7 reference genome (NCBI Accession #NC002695.5) was uploaded into the MinKNOW software.
2.5. Quantitative Real-Time Polymerase Chain Reaction
Quantitative real-time polymerase chain reaction (qPCR) was used to confirm the presence of the
fliC,
stx,
eae, and
rrsC genes from
E.coli O157:H7 following established USDA FSIS protocols [
15]. A StepOne Real-Time PCR System (Applied Biosystems, Waltham, MA, USA) or QuantStudio 5 Real-Time PCR System (Applied Biosystems) was used for qPCR.
2.6. Data Analysis
Sequencing data were basecalled in real-time or post-run with MinKNOW software using fast or high-accuracy basecalling and a minimum read length filter of 1 kb. The FastQ files were imported into Geneious Prime software (version 2023) and aligned to an E. coli O157:H7 reference genome (NCBI Accession #NC002695.5) using Minimap2 (version 2.24). The target genes, namely fliC, eae, stx1a, stx1b, stx2a, stx2b, rrsC, wzx, and wzy, were searched for in the alignment, and the number of times each gene was detected was recorded. The mean, standard deviation, or standard error of triplicate runs were determined for sequencing run parameters, target gene detection, and qPCR Ct values.
4. Discussion
The results of this study suggest that long-read whole-genome sequencing has the potential to shorten the time needed for
E. coli O157:H7 identification. Optimal sequencing parameters were established to provide the highest quality and quantity of sequencing data from pure cultures of
E. coli O157:H7 and ground beef inoculated with
E. coli O157:H7. The default settings in the MinKNOW software were sufficient, except for the minimum read length setting. The minimum read length was set to 1 kb to remove any reads shorter than that length from the sequencing data output. Reads longer than 1 kb were more likely to span enough of the gene for positive identification, and all targeted virulence genes were longer than 1 kb, except for the
stx genes. However,
stx1a and
stx1b are adjacent on the chromosome, as are
stx2a and
stx2b, and the combined lengths of the genes exceed 1 kb (
Table S1). The generation of short reads is likely due to the use of the Field Sequencing Kit for library preparation, which uses a transposase to cleave template DNA and add adapters for sequencing [
16]. The Field Sequencing Kit was selected because it is fast (10 min) and requires minimal extra equipment (a heat block), which makes it better suited for use in on-site testing at a meat processing plant. However, the transposase can generate smaller fragments, especially in DNA that is already sheared. Short reads of less than 1 kb represented 10–50% of the total reads in the timed runs and had to be filtered out in the post-sequencing analysis. Setting the minimum read length at 1 kb in the MinKNOW software removed the short reads and eliminated this post-analysis step. Other library preparation kits are available that do not use a transposase, which would likely reduce the proportion of smaller fragments, but it would be at the expense of time and portability.
Sequencing data can be basecalled using Guppy, the integrated basecaller in the MinKNOW software, with three different models: fast, high accuracy, or super accuracy. A comparison of the fast and high-accuracy models was conducted with the run-time and meat sequencing experiment data. The super-accuracy model required a large amount of computing power and needed to be run on a high-performance computing cluster. Therefore, it was not included in the comparison because it would be impractical for on-site testing. Basecalling was typically completed in real time with the fast model, while high accuracy took over 24 h to complete. In the run-time experiments, an average of 98.63% of reads aligned to the
E. coli O157:H7 reference genome, and high-accuracy basecalling only improved accuracy by 0.65%. In the meat experiments, fewer reads aligned to the reference genome due to the presence of bovine DNA, but high accuracy basecalling only improved the average percent of reads aligned from 16.31% to 16.66%. Target gene identification was not significantly improved for the run-time or meat experiments either. While the use of higher-accuracy models would be important in sequencing experiments designed to find small genetic changes, such as single-nucleotide polymorphisms (SNPs) [
17], that level of accuracy was not needed to identify the virulence genes of interest. Therefore, the fast model was used because data were basecalled in real time and could be immediately analyzed to provide results on foodborne pathogen presence more quickly. The recent release of an accelerated basecaller, Dorado, by ONT, doubles the basecalling speed [
18]. This could make higher-accuracy models more practical for real-time analysis in future studies.
A 1 h sequencing run time was sufficient to detect the virulence genes of interest in DNA extractions from pure cultures of
E. coli O157:H7. However, variability was high between the independent replicate samples in both the timed runs and the limit-of-detection sequencing runs (
Table 1). Other studies have also noted issues with inter-run variability [
19,
20]. Efforts were made to reduce variability between runs by using the same DNA extraction and having the same technician perform the experiments. The primary variable between runs was the flow cell (R9.4.1). Flow cells have a total of 2048 nanopores, but the number of pores available for sequencing varied between flow cells and was lower if the flow cell was being reused. However, an analysis of the timed runs and the meat runs found no correlation between pore availability and the number of reads generated or data produced. The variability between runs, regardless of run time, prompted the selection of a 3 h run time for DNA extracted from pure cultures to ensure sufficient data generation despite potential variability in flow cell performance. High-quality genome assemblies require 30× coverage to ensure that the entire genome is sequenced and to distinguish errors from sequence variations [
21]. The virulence genes of interest were detected an average of 30 times during 1 h runs and 58 times during 2 h runs, suggesting that a 3 h run time would generate sufficient coverage despite potential variability.
All
E. coli have an indistinguishable core genome that has genes for housekeeping, metabolic, and transport functions [
22]. The
rrsC gene is a core gene, which is why it can only be used for identification to the species level. The accessory genome of
E. coli contains genes that characterize the pathogenicity of specific pathotypes [
22]. Therefore, to confirm the O157:H7 serotype,
eae,
fliC,
stx,
wzx, and
wzy need to be identified in a sample. Serial dilutions of
E. coli O157:H7 DNA were tested to determine the limit of detection for all virulence genes of interest. Some of the virulence genes could be detected in the lowest concentration tested, 0.39 ng, but the lowest concentration at which all virulence genes were detected in each triplicate was 12.5 ng. The recommended DNA input for the Field Sequencing Kit is 400 ng, and extracting DNA from a bacterial culture or meat sample provides sufficient DNA to input the recommended concentration into the library preparation. However, we anticipated the potential need to sequence DNA from only one or a few colonies isolated on selective agar, which would yield lower concentrations of DNA than extraction from a bacterial culture or meat sample. The ability to detect genes of interest from a colony would save time by eliminating the need to culture it to a higher concentration to meet the recommended DNA input concentration for the sequencing kits.
The identification of the
E. coli O157:H7 virulence genes of interest was difficult in the ground beef matrix due to the high prevalence of bovine DNA. In ground beef inoculated with 10
5 or 10
6 cfu g
−1 E. coli O157:H7, only the
rrsC gene was detected. All virulence genes of interest, except
stx2B, were identified in the 10
7 cfu g
−1 E. coli O157:H7-inoculated ground beef, but most genes were detected less than 10 times. Software-controlled enrichment and depletion improved the detection of the virulence genes, but it did not increase it enough to detect all of the genes needed to positively identify
E. coli O157:H7. The infectious dose of
E. coli O157:H7 is in the 10 s of cfu [
23]; therefore, a testing method needs to be able to detect low concentrations of a pathogen to protect consumers. The inability to detect all virulence genes of interest, even at fairly high inoculum concentrations, prompted the testing of growth-enriched samples. The results showed that all virulence genes of interest were detected >100 times. Software-controlled enrichment further increased detection, but depletion generally decreased detection. This is likely because the concentration of
E. coli DNA was higher than the bovine DNA after the growth enrichment, making depletion unnecessary. A previous in silico study conducted to determine the practicality of using long-read sequencing for foodborne pathogen detection indicated that growth enrichment would be necessary [
14], and the results of the current study confirm that growth enrichment will be necessary to ensure the detection of very low concentrations of
E. coli O157:H7 contamination in ground beef using sequencing.
Variability was noted in the number of times specific genes were detected in the samples, and this is likely due to differences in the gene copy number and the presence of nonpathogenic
E. coli. The genes
eae [
24],
fliC [
25],
wzx, and
wzy [
26] are single-copy genes. Generally, there is only one copy of each
stx subtype gene, as well, but more than one copy can be present [
27]. These genes were typically identified in lower numbers than
rrsC, of which there are multiple copies on the chromosome [
28]. Additionally, in the uninoculated ground beef controls, the
rrsC gene of
E. coli, which is specific to the species level, was detected. However, none of the target virulence genes were identified. These results indicated the presence of nonpathogenic
E. coli in the ground beef samples, which would also increase the concentration of the
rrsC gene but not the other virulence genes since they are only found in pathogenic strains. As discussed above, the accessory genes that define pathotypes are of primary interest in sequencing data to distinguish between nonpathogenic and pathogenic strains [
22]. Selective growth enrichment prior to sequencing can amplify pathogenic bacteria to ensure that their detection is not masked by nonpathogenic strains.