A Simple Method for Assessing Diversity and Dynamics of Microbial Community: Comparison of Dairy Phages from Industrial and Spontaneous Fermentation

Featured Application: Assessing diversity and dynamics of low-complexity microbial communities, e.g., in fermentation processes. Abstract: Background: The dairy industry heavily relies on fermentation processes driven in high proportion by Lactococcus lactis . The fermentation process can be perturbed or even stopped by bacteriophage activity, leading to complete loss of fermentation batch or decreased quality product. The monitoring of the phage diversity and dynamics in the process allows implementing protective measures (e.g., starter rotation) to maintain unperturbed production. Methods: Universal primers were used to amplify sequences of the 936, c2, and P335 Lactococcus phage types. The amplicons were sequenced with the Sanger method and obtained degenerate sequences were analyzed using a simple bioinformatic pipeline in the R environment. Results: The most prevalent phage type is 936, followed by P335, whereas the c2 type is less frequent. Conclusions: Curd cheeses prepared on non-pasteurized milk based on native milk microbiota had a higher diversity of phages distinct from those found in dairy plants. Sanger sequencing of heterogenous amplicons generated on metagenome DNA can be used to assess low-complexity microbiota diversity.


Introduction
The dairy industry heavily relies on fermentation processes driven in high proportion by Lactococcus lactis. The fermentation process can be perturbed or even stopped by bacteriophage activity leading to huge economic losses [1][2][3]. Among bacteriophages infecting L. lactis [4], those belonging to species c2, 936, or P335 are most commonly encountered in dairy plants [5,6]. Most in-house dairy plants' laboratories use traditional, indirect methods for phage detection in their routine analyses. These methods are based on activity tests (assessment of a decrease in acid production or a reduction of an indicator compound). Additionally, standard microbiological methods, i.e., plaque assays or spot tests, are used for direct phage detection [1,7]. Molecular methods (mostly PCR-based) give the advantage of short time from sampling to results, high sensitivity, and independence of sensitive strain [8]. Apart from detection methods, analysis of diversity and dynamics of lactococcal bacteriophages is crucial for preventing the spread of infections in dairy environments. Knowledge of phage strains presence in milk or dairy environment may allow to plan starter culture rotation.
Analysis of microbial community composition and dynamics especially in the case of unculturable bacteria or viruses always posed a challenge. Especially, lack of knowledge on growth requirements or undetermined permissive host is a limiting factor. Today in the era of next-generation sequencing (NGS), the problem can be solved both in the determination of community composition [9] and its functional profile [10]. However, NGS is still an expensive method. Its ability to produce a massive output of data in a relatively short time can sometimes be a disadvantage. The relatively low price per sample can be achieved only by filling the whole throughput (lane, chamber, or chip of the sequencer) with samples. In case of low complexity samples or targeted approaches to use the whole throughput, extensive barcoding is necessary. It is often necessary to collect a large number of samples to cover the whole throughput of a chip to run the sequencing. The NGS technique was already successfully used to study microbial communities in dairy technologies [11,12]. In this respect, we looked for a technique that may substitute NGS in the case of low-complexity communities. Here, we describe a novel approach that was used to compare the diversity of Lactococcus phages from dairy industrial and spontaneous fermentation processes. The new method is based on well-established and affordable techniques accompanied by simple bioinformatics analysis to study the diversity of the microbial community.

Cheese Samples and DNA Isolation
A series of curd cheese samples derived from Polish cheese production facilities (n = 25) made from cow's milk and spontaneous non-pasteurized milk fermentations from in-house cheese production (n = 14) made from cow's (n = 8), goat's (n = 5) or sheep's (n = 1) milk were collected in 2017-2018. Metagenomic DNA was isolated from the samples of cheeses using Genomic Mini AX Food Kit (A&A Biotechnology, Gdańsk, Poland), according to the kit procedure, and used as the template for PCR reaction.

Sequence Analysis
The sequencing trace files (.ab1) were inspected for sequencing quality using Sequence Scanner Software 2 (Applied Biosystems, Warsaw, Poland). The sequences were submitted to BLAST service analysis [14] to recover the best-matched sequences set from the NCBI database. The sequence set obtained from BLAST results was used to build a local database and reference matrix composed of sequence distance measures. The measures were calculated in the local alignment of the ambiguous sequence separately to all sequences from the BLAST result sequences set. A principal component analysis (PCA) plot further simplified the interpretation of the community diversity. The bioinformatics analyses were performed in R [15] using DECIPHER [16] and PCAtools [17] libraries.

Phage Diversity Analysis between Samples
To compare the diversity of Lactococcus phages between samples, we performed PCR amplification in separate PCR reactions with single type-specific primers pair for each of Appl. Sci. 2021, 11, 8915 4 of 8 the phage types. In the case in which several samples of metagenomic DNA were obtained from spontaneous fermentation, multiple amplicons were generated with the use of single pair of type-specific primers. Such amplicons were purified from gel slabs cut out from electrophoretic separation gel before submission to sequencing. Inspection of sequencing trace files indicated ambiguous nucleotide positions (data not shown) that resulted from sequencing of heterogenic amplicons obtained on metagenomic DNA. We performed multiple alignments of the obtained sequences with DECIPHER library [16] and with a substitution matrix accounting for ambiguous nucleotides for all IUPAC nucleic acid codes based on match and mismatch parameters (match = 5, mismatch = −4) and calculated a distance matrix for each sequence set with Jukes-Cantor substitution model used for distance correction [18]. Each element of the distance matrix corresponds to the dissimilarity between two sequences in the sequence set.
Comparison of the phylogenetic trees generated for each of the Lactococcus phage types (Figures 2-4) indicated that the P335 type phages from dairy plant cheeses cluster together, whereas the phages from spontaneous fermentations are more divergent.

Phage Diversity Analysis between Samples
To compare the diversity of Lactococcus phages between samples, we performed PCR amplification in separate PCR reactions with single type-specific primers pair for each of the phage types. In the case in which several samples of metagenomic DNA were obtained from spontaneous fermentation, multiple amplicons were generated with the use of single pair of type-specific primers. Such amplicons were purified from gel slabs cut out from electrophoretic separation gel before submission to sequencing. Inspection of sequencing trace files indicated ambiguous nucleotide positions (data not shown) that resulted from sequencing of heterogenic amplicons obtained on metagenomic DNA. We performed multiple alignments of the obtained sequences with DECIPHER library [16] and with a substitution matrix accounting for ambiguous nucleotides for all IUPAC nucleic acid codes based on match and mismatch parameters (match = 5, mismatch = −4) and calculated a distance matrix for each sequence set with Jukes-Cantor substitution model used for distance correction [18]. Each element of the distance matrix corresponds to the dissimilarity between two sequences in the sequence set.
Comparison of the phylogenetic trees generated for each of the Lactococcus phage types (Figures 2-4) indicated that the P335 type phages from dairy plant cheeses cluster together, whereas the phages from spontaneous fermentations are more divergent.  Lactococcus phage types (M-molecular weight marker, NTC-non-template control).

Phage Diversity Analysis between Samples
To compare the diversity of Lactococcus phages between samples, we performed PCR amplification in separate PCR reactions with single type-specific primers pair for each of the phage types. In the case in which several samples of metagenomic DNA were obtained from spontaneous fermentation, multiple amplicons were generated with the use of single pair of type-specific primers. Such amplicons were purified from gel slabs cut out from electrophoretic separation gel before submission to sequencing. Inspection of sequencing trace files indicated ambiguous nucleotide positions (data not shown) that resulted from sequencing of heterogenic amplicons obtained on metagenomic DNA. We performed multiple alignments of the obtained sequences with DECIPHER library [16] and with a substitution matrix accounting for ambiguous nucleotides for all IUPAC nucleic acid codes based on match and mismatch parameters (match = 5, mismatch = −4) and calculated a distance matrix for each sequence set with Jukes-Cantor substitution model used for distance correction [18]. Each element of the distance matrix corresponds to the dissimilarity between two sequences in the sequence set.
Comparison of the phylogenetic trees generated for each of the Lactococcus phage types (Figures 2-4) indicated that the P335 type phages from dairy plant cheeses cluster together, whereas the phages from spontaneous fermentations are more divergent.

Global Phage Diversity Analysis
The BLAST search of the obtained phage sequences resulted in a hit list of Lactococcus phage sequences of high sequence identity. However, the degenerate nucleotide codes are treated by the BLAST algorithm as mismatches in nucleotide alignment [14]. The hit lists for all of the sequence searches were combined, and duplicate records were removed to build a local database. The local database was used to calculate the reference matrix of sequence distance measures with DECIPHER library [16], with substitution matrix accounting for ambiguous nucleotides for all IUPAC nucleic acid codes, as described previously. This way, we created sample composition-independent measures for global phage diversity analysis. The global distance measures were used for principal component analysis (PCA) to visualize differences and similarities in phage-type composition in analyzed samples. Sequences derived from BLAST search hits used for distance measures cluster tightly together (Figure 5), and metagenome-derived sequences form on the plot distant, more dispersed clusters separate for dairy plant and spontaneous fermentation cheeses with single outliers DP02(c2), SF01(936), and SF06(P335).

Global Phage Diversity Analysis
The BLAST search of the obtained phage sequences resulted in a hit list of Lactococcus phage sequences of high sequence identity. However, the degenerate nucleotide codes are treated by the BLAST algorithm as mismatches in nucleotide alignment [14]. The hit lists for all of the sequence searches were combined, and duplicate records were removed to build a local database. The local database was used to calculate the reference matrix of sequence distance measures with DECIPHER library [16], with substitution matrix accounting for ambiguous nucleotides for all IUPAC nucleic acid codes, as described previously. This way, we created sample composition-independent measures for global phage diversity analysis. The global distance measures were used for principal component analysis (PCA) to visualize differences and similarities in phage-type composition in analyzed samples. Sequences derived from BLAST search hits used for distance measures cluster tightly together ( Figure 5), and metagenome-derived sequences form on the plot distant, more dispersed clusters separate for dairy plant and spontaneous fermentation cheeses with single outliers DP02(c2), SF01(936), and SF06(P335).

Discussion
Our study indicated a higher prevalence of the Lactococcus phage types detected in curd cheeses from Polish dairy plants than previous reports [6,19], which indicates that the problem of phage prevalence in dairy plant environments persists. We also surveyed in-house prepared cheeses from non-pasteurized milk based on spontaneous fermentation resulting from native milk microbiota. Based on our findings, we can assume that the native milk microbiota contains a much higher diversity of phages and L. lactis strains (as sources of prophages). Especially in the case of c2 and P335 phage types, we obtained nonspecific size amplicons that were of phage origin. The P335 nonspecific amplicons might be derived from prophages of P335-type or phage DNA remnants in chromosomes of L. lactis. The higher diversity of spontaneous fermentation cheeses can be explained by the sensitivity of some phages to thermal treatment [20]. We did not find any correlation between milk type and phage diversity measures.
Here, we described also a simple and easy way to study the diversity and dynamics of a low-complexity microbial community with well-established and affordable techniques and a simple bioinformatics pipeline. Previously reported methods employed laborious and susceptible strains, requiring methods involving phage genomic DNA isolation and sequencing [20], low-resolution sequence analysis with RFLP [6,21], or MLST [22]. Our method involved PCR amplification of metagenomic DNA using universal primers hybridizing to conserved regions spanning variable sequences (e.g., in 16S rDNA, ITS, etc.). The obtained pool of amplicons was sequenced with the Sanger method, resulting in a read-out sequence with ambiguous bases at several positions. Care had to be taken to submit amplicons of the same length for Sanger sequencing to prevent shifts of homologous sequence regions or the possibility of taking into account only selected parts of the homologous regions proximal to the sequencing primer. In case of large differences in amplicon length, electrophoretic separation and gel purification could solve the problem. The resulting sequence corresponds to the consensus sequence, as it would be generated by bioinformatics sequence alignment of several homologous sequences. The sequences from different samples can be aligned, then the distance between them calculated with DECI-PHER library [16] using the substitution matrix accounting for ambiguous nucleotides for all the IUPAC nucleic acid codes. Such analysis allows comparing samples composition or dynamics (e.g., time series) in form of phylogenetic tree or PCA plots. To generate the PCA plots on a global scale (sample composition-independent context), a local database of homologous sequences is needed. The local database can be easily prepared by submitting a sequence to BLAST search and downloading the BLAST hits. The BLAST algorithm accepts the IUPAC ambiguity codes; however, it treats them as mismatches in the alignment. Therefore, a significant amount of information is lost. The sequence set built on the BLAST hits is used to build a local database and a reference matrix composed of sequence distance measures. The measures were calculated in the local alignment of the ambiguous sequence separately to all sequences from the BLAST recovered sequences set. A phylogenetic tree was generated showing the most similar samples. A principal component analysis plot further simplified the interpretation of samples diversity or dynamics.
The metagenome DNA from dairy plant and spontaneous fermentation in-house prepared curd cheeses analyzed for diversity within major types of Lactococcus phages revealed the distinct composition of phages detected in cheeses from dairy plant and in-house productions and higher diversity of phages from spontaneous fermentation on non-pasteurized milk, as compared to starter culture-based productions.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/app11198915/s1, a text file with the script for bioinformatic analyses in R environment (applsci.R), a text file with the sequences in FASTA format (File_S1.fas), and data objects for R environment (image_applsci.Rdata).