Flesh ID: Nanopore Sequencing Combined with Offline BLAST Search for the Identification of Meat Source

Detection of animal species in meat product is crucial to prevent adulterated and unnecessary contamination during processing, in addition to avoid allergy and religious consequences. Gold standard is the real-time PCR assays, which has a limited target capability. In this study, we have established a rapid sequencing protocol to identify animal species within hours. Sequencing was achieved by nanopore sequencing and data analysis via offline BLAST search. The whole procedure was conducted in a mobile suitcase lab. As per national and international regulations, the developed assay detected adulteration of pork meat with 0.1% of horse, chicken, turkey, cattle, sheep, duck, rabbit, goat, and donkey. The developed test could be used on-site as a rapid and mobile detection system to determine contamination of meat products.


Introduction
Un-or incorrectly declared species in food can lead to considerable health risks or attack religious taboos [1]. In addition to the prohibition of some animal species from consumption in some religions, certain animal species pose a high health risk for consumers [2]. Adulteration of meat products with exotic meat increases the risk of introducing emerging infectious diseases. On top of that, many incidents of mixing expensive type of meat with poor one were reported [3]. In the horse meat scandal in 2013, the food processing industry processed horse meat and offered it incorrectly declared to customers for sale [4]. The fundamental problem remains that the inspection of meat that is supplied or processed is posing increasing challenges to the food industry and to official food control [5]. So far, the meat origin can only be clarified by very specific analytical methods such as immunological assays or DNA-based amplification technologies [6,7]. The mass spectrometry has emerged as a rapid and accurate method for the identification of meat source [8,9]. The gold-standard in the authentication process of meat and meat products is the real-time PCR [10]. Recently, many isothermal amplification assays were established [11,12]. However, these molecular tests are limited to few targets and are unable to identify species outside the narrow target range. As a consequence, delay in diagnosis, false negative, and increase costs may occur. Metagenomic sequencing is a promising solution to overcome these limitations [13]. A number of assays relying on high throughput second generation or Next Generation Sequencing (NGS) technologies were developed to detect and genetically characterize animal species [14,15]. However, challenges remain with dependence on PCR based amplification, cumbersome end-point result analysis, logistic demand, cost, applicability in field site, and restriction to laboratory settings. Fourth generation sequencing such as nanopore technology Foods 2020, 9,1392 2 of 10 confers a promising alternative to offer feasible, field deployable, and rapid sequencing option with a real-time data acquisition [16,17]. The technology was applied for identification of fish species, but the bioinformatics remained a great obstacle [18]. Therefore, we are aiming to evaluate the performance of this metagenomic sequencing based on nanopore technology in detecting animal species in meat, as well as developing user friendly offline-BLAST search for data analysis.

Meat Samples
Vacuum-packed meat of pig, cattle, sheep, horse, chicken, turkey, duck and rabbit was purchased from a local supermarket. The standard genomic DNA of goat and donkey were provided by Eurofins GeneScan Technologies GmbH (Freiburg, Germany). According to the German Federal Office of Consumer Protection and Food Safety, 0.1% considered as the lower limit of detection of meat contamination (German Food and Feed Code §64 (LFGB)) [12], therefore, using a sensitive balance, 50 mg of pork meat was spiked with 0.5 mg of cattle, sheep, horse, chicken, turkey, duck, and rabbit meat. Since, meat source from goat and donkey were not available, genomic DNA of both species equivalence to 10 3 molecules were added to the mix. Total nucleic acid from the meat mixture was extracted by using an alkaline lysis protocol adjusted from Girish et al. [19]. Briefly, a total of 200 µL lysis buffer (200 mM NaOH) was applied to the meat sample and incubated for one hour at room temperature; thereafter, a neutralization step was conducted by adding 400 µL of Tris-HCL (0.04 M pH 7.5) to the meat sample lysis buffer mix. The amount of DNA was measured by using the Qubit 2.0 Fluorometer (Invitrogen, Carlsbad, CA, USA). The sample was adjusted to a DNA amount of 200 ng/3.75 µL for sequencing. One sample containing only pig genomic material was used as a background control. Figure 1 is a schematic presentation of the study procedure.
Foods 2020, 9, x FOR PEER REVIEW 2 of 13 cost, applicability in field site, and restriction to laboratory settings. Fourth generation sequencing such as nanopore technology confers a promising alternative to offer feasible, field deployable, and rapid sequencing option with a real-time data acquisition [16,17]. The technology was applied for identification of fish species, but the bioinformatics remained a great obstacle [18]. Therefore, we are aiming to evaluate the performance of this metagenomic sequencing based on nanopore technology in detecting animal species in meat, as well as developing user friendly offline-BLAST search for data analysis.

Meat Samples
Vacuum-packed meat of pig, cattle, sheep, horse, chicken, turkey, duck and rabbit was purchased from a local supermarket. The standard genomic DNA of goat and donkey were provided by Eurofins GeneScan Technologies GmbH (Freiburg, Germany). According to the German Federal Office of Consumer Protection and Food Safety, 0.1% considered as the lower limit of detection of meat contamination (German Food and Feed Code §64 (LFGB)) [12], therefore, using a sensitive balance, 50 mg of pork meat was spiked with 0.5 mg of cattle, sheep, horse, chicken, turkey, duck, and rabbit meat. Since, meat source from goat and donkey were not available, genomic DNA of both species equivalence to 10 3 molecules were added to the mix. Total nucleic acid from the meat mixture was extracted by using an alkaline lysis protocol adjusted from Girish et al. [19]. Briefly, a total of 200 µ L lysis buffer (200 mM NaOH) was applied to the meat sample and incubated for one hour at room temperature; thereafter, a neutralization step was conducted by adding 400 µL of Tris-HCL (0.04 M pH 7.5) to the meat sample lysis buffer mix. The amount of DNA was measured by using the Qubit 2.0 Fluorometer (Invitrogen, Carlsbad, CA, USA). The sample was adjusted to a DNA amount of 200 ng/3.75 µL for sequencing. One sample containing only pig genomic material was used as a background control. Figure 1 is a schematic presentation of the study procedure.

Sequencing Library Preparation
For library preparation, the rapid sequencing kit (SQK-RAD004) and Flongle from Oxford Nanopore technologies (Cambridge, UK) were used. A total of 200 ng DNA of the meat mixture was incubated with rapid adapters at 30 °C for one minute. During the incubation, the DNA was fragmented with the Transposon and the sequencing adaptors and barcodes were attached. To avoid

Sequencing Library Preparation
For library preparation, the rapid sequencing kit (SQK-RAD004) and Flongle from Oxford Nanopore technologies (Cambridge, UK) were used. A total of 200 ng DNA of the meat mixture was incubated with rapid adapters at 30 • C for one minute. During the incubation, the DNA was fragmented with the Transposon and the sequencing adaptors and barcodes were attached. To avoid unnecessary Transposon's activities and the production of very short DNA fragment, the mixture Foods 2020, 9, 1392 3 of 10 was incubated at 80 • C for one minute. The sequencing buffer and loading beads were prepared as instructed by the manufacturer. It is important to mention that the loading style of the mix to the Flongle must be conducted by attaching the filter tips of 200 µL automatic pipette to the sample port, then rotating the volume adjustment knob in clockwise manner. Pushing the fluid using the plunger can destroy the nanopore membrane [20].

Sequencing
Sequencing was conducted on the MinION device including both Flongle adaptor and cell. Data acquisition and basecalling were carried out in real-time by the MinKNOW software. The equipment and software were purchased or downloaded from Oxford Nanopore technologies (Cambridge, UK). Sequencing was performed for up to 48 h using −180 Voltage.

Data Analysis
For data analysis, all generated data files in Fastq-format were transferred to the Software Geneious 10.2.3. Here, the sequences of all available chromosomes of Horse, Chicken, Turkey, Cattle, Sheep, Duck, Rabbit and Goat were downloaded from the NCBI database (Table 1). For Donkey, only shotgun reference was available (GCA_001305755.1) and for the Pig, sequence with accession number: NC_010443.5 was used. The accuracy of the selected database for the offline BLAST-search was tested by using reference sequences of six additional animal species (dog, NC_002008.4; impala, NC_020675; lion, CM_018460.1; bison, NC_12346; camel, NC_009849.1; Japanese quail, NC_003408.1). The speed of species identification was measured by analyzing sequence data generated after 0.5, 1, 3, and 9 h of the sequence run. Table 1. Number of sequencing hits to each chromosome of eight animal species: The sequencing data presented after 0.5, 1, 3, 9, and 18 h of the high-accuracy and the fast basecalling run were analyzed to identify the time after which all species are correctly detected using chromosome one of sheep, goat, horse, and duck, and chromosome 2 of chicken and cattle, chromosome 3 of turkey, and chromosome 7 of rabbit. The number of hits for each species and the pairwise identity to the reference sequences are shown.

Data Acquisition
The MinKnow software saved Fastq sequence files directly on the computer hard desk. All raw sequence data files are freely available for public on https://doi.org/10.5281/zenodo.4034907. In total, 34,811 reads were collected after 48 h of the high-accuracy sequencing run. All reads with a length lower than 900 bases were deleted to avoid the inclusion of short inaccurate sequences (Figure 2). One very important issue, the "What's in my pot" of the Epi2Me software (Oxford Nanopore technologies, Cambridge, UK) did not identify any of the meat species. Therefore, the establishment of an offline-BLAST search using Geneious software was necessary.

Data Acquisition
The MinKnow software saved Fastq sequence files directly on the computer hard desk. All raw sequence data files are freely available for public on https://doi.org/10.5281/zenodo.4034907. In total, 34,811 reads were collected after 48 h of the high-accuracy sequencing run. All reads with a length lower than 900 bases were deleted to avoid the inclusion of short inaccurate sequences (Figure 2). One very important issue, the "What's in my pot" of the Epi2Me software (Oxford Nanopore technologies, Cambridge, UK) did not identify any of the meat species. Therefore, the establishment of an offline-BLAST search using Geneious software was necessary.

Offline BLAST-Search
The filtered reads of the high-accuracy run were blasted against all selected sequences (Table 1). For the BLAST-search, the fast and high similarity Megablast program were chosen, with only a Query-centric alignment and a maximum of one hit per read. The e-value was set to 1 × 10 −100 . All expected animal species could be identified ( Figure 3; Table 1). As anticipated, the highest number of hits were assigned to the pig reference sequence, while the poultry species produced lower number of hits than the mammal species. No possible explanation was found for the lower hit numbers by Chicken.

Offline BLAST-Search
The filtered reads of the high-accuracy run were blasted against all selected sequences (Table 1). For the BLAST-search, the fast and high similarity Megablast program were chosen, with only a Query-centric alignment and a maximum of one hit per read. The e-value was set to 1 × 10 −100 . All expected animal species could be identified ( Figure 3; Table 1). As anticipated, the highest number of hits were assigned to the pig reference sequence, while the poultry species produced lower number of hits than the mammal species. No possible explanation was found for the lower hit numbers by Chicken.

Identification of Sequencing Speed
For identification of the sequencing run duration, the reads produced within the first 30 min, one, three, and nine hours of the high-accuracy basecalling were analyzed. Surprisingly, the reads produced in the first hour against chromosome one of pig, sheep, goat, horse, and duck, and chromosome 2 of chicken and cattle, chromosome 3 of turkey, and chromosome 7 of rabbit were sufficient to identify all ten animal species in the meat mixture. The pairwise identity between the hits and the corresponding reference sequences ranged from 80.33 to 85.96% (Table 2).
To validate the results, the sequence run was repeated using the fast basecalling model. In total 76,363 reads were obtained. All species was identified within one hour of sequencing except chicken was detected first after 9 h (Table 2).

Identification of Sequencing Speed
For identification of the sequencing run duration, the reads produced within the first 30 min, one, three, and nine hours of the high-accuracy basecalling were analyzed. Surprisingly, the reads produced in the first hour against chromosome one of pig, sheep, goat, horse, and duck, and chromosome 2 of chicken and cattle, chromosome 3 of turkey, and chromosome 7 of rabbit were sufficient to identify all ten animal species in the meat mixture. The pairwise identity between the hits and the corresponding reference sequences ranged from 80.33 to 85.96% (Table 2).
To validate the results, the sequence run was repeated using the fast basecalling model. In total 76,363 reads were obtained. All species was identified within one hour of sequencing except chicken was detected first after 9 h (Table 2). Table 2. Results of the identification of the sequencing runs speed. The sequencing data presented after 0.5, 1, 3, 9, and 18 h of the high-accuracy and the fast basecalling run were analyzed to identify the time after which all species are correctly detected. The number of hits for each species and the pairwise identity to the reference sequences are shown.

Database Specificity
To assess the accuracy of the database to correctly identify the possible meat adulteration, the whole reference sequence of the mitochondrial genome of five unrelated animal species (dog, impala, camel, bison, and the Japanese quail) and one shotgun sequence of the lion genome were chosen as a negative database. No hits were assigned to these sequences by performing an offline BLAST-search with this database on the sequence reads of the high-accuracy and fast basecalling runs. This indicates high specificity of the offline BLAST-search. The sequence reads (total: 31,344) of sample containing only background pig genomic materials have assigned only to the reference sequence of Sus scrofa and did not assorted to the other animal species, which again indicate the accuracy of the offline-BLAST search.

Discussion
For the identification of animal species in meat products, nanopore sequencing was combined with an offline BLAST-search. The DNA was extracted in one hour using alkaline lysis buffer. Library preparation was conducted in 10 min and the sequencing run in up to 48 h. The offline BLAST-search in Geneious was achieved in less than 20 min.
Oxford nanopore developed two basecalling options, the high-accuracy (Flip-flop basecalling) and the fast model. While the high-accuracy basecalling produces a higher raw read quality with a basecalling speed of 4.4 k bases/s, the fast model has a speed of 36 k bases/s, which results in a lower raw read accuracy [21]. In our experiment, the double number of the reads was collected by the fast basecalling, but both methods produced similar sequence accuracy ( Figure 3; Table 2). Nevertheless, the data of both basecalling method lead to the identification of all animal species in the meat mixture. The only difference was the speed as all species were identified after one hour in the high-accuracy sequencing run, while nine hours was needed for the fast basecalling (Table 2) as all genome from all animals except chicken behaved the same. We did not find possible reason of the lower performance of chicken genome in nanopore sequencing, which remain a big question mark.
Oxford nanopore offers a range of online data analyzing tools. Sequencing data can be uploaded to the cloud-based Epi2Me platform for real-time analysis workflows [22]. Unfortunately, only virus, bacteria, fungi, and archaea sequences can be recognized by Epi2Me. Therefore, for the identification of animal species in meat mixtures, the offline BLAST-search Flesh ID database was developed and data analysis was achieved in Geneious software. In our database, reference sequences of various chromosomes of the animal species were included (Table 1). In other studies, specific genes were selected for the identification process. Most commonly mitochondrial genes like the COI [23], the cytb [24], the 16S [25] and the 12S gene [26]. The authors performed an amplification step using PCR before sequencing, which resulting in a more complex and time-consuming library preparation. Using nanopore sequencing combined with offline-BLAST search, whole genome sequencing is possible as no amplification step is needed during library preparation. Another advantage is the use of portable sequencing device, the MinION, which can easily be implemented at point of need [27]. The data analysis and species identification can be performed by an online BLAST tool against all possible odds, which, off course, will increase the chances of detecting exotic species. Nevertheless, a powerful internet connection and hardware are required, moreover, the run time will exceed many hours in case of analyzing thousands of nanopore long sequencing reads. On other hand, the offline BLAST search limited to the reference sequences of animal species included in the local database, but can be extended according to the end-user needs.
More than three quarters of the sequence reads belongs to the host genes and the rest is the microbiome. In our study, the huge amount of host sequence is an advantage since the aim to identify the animal species. Nevertheless, many sequence reads either do not pass the threshold (quality score: >7 and/or e-value = 1 × 10 −100 ) or are part of genome region other than the one in the offline data base. This explains the differences in total reads obtain during sequencing ( Figure 2) and the number of aligned reads (Tables 1 and 2).
Compared to PCR or isothermal amplification assays, the sequencing method is not limited to specific species as any desired animal species can be easily included in the Flesh ID database. Amplification dependent assays are designed to specific targets and for each new species of interest, a new assay has to be developed. Moreover, performing several amplification assays for several animal species of interest is time-consuming. The only drawback of sequencing that the quantification is not possible.
Many factors can influence the molecular assays for the identification of meat source. The method of cooking, especially pan frying may lead to false negative results [28]. In addition, duration of heat treatment can lead to DNA fragmentation, which will not be advantageous for nanopore sequences, which rely on long DNA reads [29].

Conclusions
In this study, rapid alkaline lysis was combined with nanopore sequencing technology and offline BLAST for the identification of species in meat mixtures in around 4 h. The whole procedure was conducted in a mobile suitcase lab, which facilitates the use at point of need. However, a highly trained person must operate the developed assay and the prices is still high (around 160 Euro per sample). Furthermore, the stability of reagents must be improved to allow long storage at room temperature. In the long run, sequencing will be the standard of molecular diagnostics but data analysis and handling still a great obstacle.