Soil pH, Calcium Content and Bacteria as Major Factors Responsible for the Distribution of the Known Fraction of the DNA Bacteriophage Populations in Soils of Luxembourg

Bacteriophages participate in soil life by influencing bacterial community structure and function, biogeochemical cycling and horizontal gene transfer. Despite their great abundance, diversity, and importance in microbial processes, they remain little explored in environmental studies. The influence of abiotic factors on the persistence of bacteriophages is now recognized; however, it has been mainly studied under experimental conditions. This study aimed to determine whether the abiotic factors well-known to influence bacteriophage persistence also control the natural distribution of the known DNA bacteriophage populations. To this end, soil from eight study sites including forests and grasslands located in the Attert River basin (Grand Duchy of Luxembourg) were sampled, covering different soil and land cover characteristics. Shotgun metagenomics, reference-based bioinformatics and statistical analyses allowed characterising the diversity of known DNA bacteriophage and bacterial communities. After combining soil properties with the identified DNA bacteriophage populations, our in-situ study highlighted the influence of pH and calcium cations on the diversity of the known fraction of the soil DNA bacteriophages. More interestingly, significant relationships were established between bacteriophage and bacterial populations. This study provides new insights into the importance of abiotic and biotic factors in the distribution of DNA bacteriophages and the natural ecology of terrestrial bacteriophages.


Introduction
Soil is a complex compartment where biodiversity and interactions between microbes, animals and plants contribute to its life and optimal functioning. Generally, plants provide organic matter as well as nutrients to soil (e.g., nitrogen, carbon, phosphorous) and are responsible for its structure through the development of the rhizosphere [1,2]. Soil-dwelling animals help to degrade and mix organic matter throughout the soil and participate in the recycling of nutrients from dead biomass [3,4]. Microbes, including fungi, archaea and bacteria, play a key role in organic matter decomposition, element cycling and nutrient availability to plants [5][6][7]. This microbial activity, particularly due to the quantity, diversity and functional roles of prokaryotes, is considered central to soil processes [8][9][10]. Viruses are also biological entities fully part of soil biodiversity with essential but relatively unexplored functions [11].
Since the beginning of research on viral diversity in soils in the 1970s [11,12], knowledge of soil viruses has been expanded well, especially for bacteriophages (bacteriainfecting viruses), whose abundance can reach up to 10 10 particles per gram of soil [13,14].

Studied Area
The Attert River basin area reaches 310 km 2 and covers a wide range of bedrock geologies and land use [40], ultimately being representative of Luxembourg's main physiographic characteristics. Its northern boundary lies on schist bedrock (belonging to the Ardennes massif; Oesling region), while its southern part exhibits alternating sedimentary layers of marls and sandstone (representative of the eastern limit of the Paris basin; Gutland region). The Attert River originates near Nobressart (Belgium) and extends over 38 km from west to east, joining the Alzette River basin in Bissen, Luxembourg. The study is focused on the Luxembourg side of the Attert River basin ( Figure 1, Table 1) where 8 sub-basins have been selected compiling 3 forested sites and 5 grassland sites. All of them are located near one of the tributaries of the Attert River. the natural distribution of the known DNA bacteriophage populations in the studied soil samples, and to analyse the role of bacterial populations in this phenomenon. Two main hypotheses were tested: (i) abiotic factors (i.e., pH, cations) well known to affect bacteriophage survival also have an impact on their diversity and distribution in terrestrial environments, and (ii) the bacterial community present in the soil influences the relationship between bacteriophages and soil properties.

Studied Area
The Attert River basin area reaches 310 km 2 and covers a wide range of bedrock geologies and land use [40], ultimately being representative of Luxembourg's main physiographic characteristics. Its northern boundary lies on schist bedrock (belonging to the Ardennes massif; Oesling region), while its southern part exhibits alternating sedimentary layers of marls and sandstone (representative of the eastern limit of the Paris basin; Gutland region). The Attert River originates near Nobressart (Belgium) and extends over 38 km from west to east, joining the Alzette River basin in Bissen, Luxembourg. The study is focused on the Luxembourg side of the Attert River basin ( Figure 1, Table 1) where 8 subbasins have been selected compiling 3 forested sites and 5 grassland sites. All of them are located near one of the tributaries of the Attert River. Figure 1. Map of the Attert River basin with the location of the eight sampling sites (modified from Geoportail.lu). Brown colours: forest sites; green colours: grasslands. With an oversimplified scheme, the plain black line represents the border between the Luxembourg and Belgian sides of the basin while the dashed red line is the limit between Oesling and Gutland regions. The Attert River is highlighted in pink, the affluents of the Attert River are in blue and the affluents of affluents, in orange.  49.7347405 (E) 5.888961 Figure 1. Map of the Attert River basin with the location of the eight sampling sites (modified from Geoportail.lu). Brown colours: forest sites; green colours: grasslands. With an oversimplified scheme, the plain black line represents the border between the Luxembourg and Belgian sides of the basin while the dashed red line is the limit between Oesling and Gutland regions. The Attert River is highlighted in pink, the affluents of the Attert River are in blue and the affluents of affluents, in orange. The first forest site is the Weierbach sub-basin located in the northwest of the Attert River basin, in the Oesling region. The two other forest sub-basins are Retgenbusch and Daerent, located on the same tributary of the Attert River basin in the southwestern part of the river (Gutland region). All forest sites have distinct vegetation types; Weierbach is a beech forest, while Daerent is an oak forest and Retgenbusch is a spruce forest.
The first grassland site, Hueschterbach, is the only sub-basin located in the Oesling region and is characterised by schist bedrock. The four other grassland sites are in the Gutland region: Mollbach a sub-basin located in the southwest of the Attert River basin; Pall located in the north of the Mollbach sub-basin, for which two distinct sites (Pall 1 and Pall 2) were chosen on the tributary of the same name; the sub-basin Koulbich located in the northwest of the Attert River basin is known for pasture activities of barley and wheat cultures.

Sample Collection
The field campaign was conducted on 12 September 2019. The topsoil was collected for all selected sites cited above. Five to seven replicates were randomly sampled within a delimited surface of 25 m 2 , with a soil auger (3 cm in diameter) on the first 20 cm of depth. To exclude the intra-site spatial variability and to ease laborious shotgun metagenomic and bioinformatic analyses, 5 to 7 soil replicates were pooled altogether to set up one sample per site, representing around 300 g per site [41][42][43]. All soil samples were split into two parts, either for physicochemical analyses or stored at −20 • C for molecular analyses.

Soil Characterisation
The characterisation of the soil properties includes first the pH measurements made on 10 g of soil according to the ISO 10,390 [44] for pH H2O and pH KCl and to VDLUFA A5.1.1 [45] for pH CaCl2 . Then, the nutrients dosing of phosphorus (P 2 O 5 ) and potassium (K 2 O) was performed according to the VDLUFA A6.2.1.1 [46] using 5 g of soil with 100 mL of Ca-acetate-lactate at pH = 4.1. On the other hand, the dosing of magnesium (Mg) and sodium (Na) was conducted following the VDLUFA A6.2.1.7 [47], using 10 g of soil with 100 mL of CaCl 2 (0.01 M). In addition, the carbon-to-nitrogen ratio (C/N) was calculated after quantifying the total organic carbon (C org ) and the total nitrogen (N tot ) both on 300 g of soil, according to ISO 10,694 [48] and ISO 13,878 [49], respectively. The cation exchangeable capacity (CEC) and cation saturation rate (S/T) were analysed according to the ISO 23,470 [50] on 2.5 g of soil. From the extract CoHex, calcium (Ca 2+ ), magnesium (Mg 2+ ), potassium (K + ) and sodium (Na + ) CoHex were analysed by ICP-OES (inductively coupled plasma-optical emission spectrometry). Finally, the soil granulometry was determined by a sieving method on 10 g of soil mixed with 100 mL of H 2 O 2 and 10 mL HCl [51].

Viral Elution and Concentration
After the optimisation of the elution and concentration protocol (Supplementary Materials, Section S1), 5 g of soil, previously stored at −20 • C, was placed at room temperature and then mixed with 15 mL of the eluent solution, i.e., 3% beef extract and 0.05 M glycine (at pH = 9.5) and shaken vigorously using an orbital shaker for 30 min at 250 rpm (rotation per minute) at 4 • C. The samples were then centrifuged (Eppendorf 5801R) at 5000× g for 15 min on a fixed rotor at 4 • C. The supernatant was collected and underwent a 0.22 µm Millex ® sterile filter unit with a Millipore Express ® PES membrane (SLGP033RS, Merck Millipore Ltd., Tullagreen, Ireland), followed by ultracentrifugal filtration using an Amicon Ultra-15 50 kD (UFC905024, Merck, Darmstadt, Germany), upon the recommendations of the manufacturer. After centrifugation at 5000× g for 15 min on a fixed rotor at 4 • C, the concentrate solutions, with a volume varying from 0.3 to 3 mL depending on the site, were finally transferred into 2 mL cryotubes and stored at 4 • C.

Viral-like Particles (VLPs) Counting
From the viral concentrates obtained after the viral elution and concentration steps, the abundances of VLPs were determined by flow cytometry (i.e., a benchtop FACSCalibur), after the samples were divided into 3 technical replicates. Briefly, after homogenisation by centrifugation, the viral sample was fixed using glutaraldehyde (0.25% final concentration) and mixed with autoclaved and <0.02 µm filtered TE buffer (0.1 mM Tris and 1 mM EDTA, From the total extracted DNA, library preparations were completed using the Nextera DNA Flex Kit (Illumina, San Diego, CA, USA) according to the manufacturer's instructions. Paired-end sequence reads were generated using the Illumina NovaSeq 600 (2 × 150 bp).

Reference-Based Bioinformatics
The raw sequencing reads were quality controlled and Illumina adapters were trimmed with Trimmomatic 0.39 [53]. Assemblies were performed separately for each sample using SPAdes v3.15.4 [54] with the "meta" option and default parameters Taxonomic classification of contigs was carried out against the complete non-redundant (nr) microbial protein sequence database (released version of April 2022) from NCBI using MMseqs2 v13.45111 [55] and the easy-taxonomy workflow that determines ancestors by searching translated ORFs (translation Table 11) [56]. Quality-filtered reads from each metagenome were mapped back to the assembled contigs using minimap2 v2.24 with the "sr" preset [57] and sorted and indexed with SAMtools v1.15.1 [58]. Finally, metagenomic profiles were exported as comma-separated value (CSV) files. Metagenomic abundances were calculated based on the number of reads mapping to each contig. Mapped reads were divided by contig length and the total number of mapped reads per sample [59,60].

Statistical Analyses
Statistical analyses were performed with RStudio, version 1.4.1106, and R 4.0.4 [61]. Figures were generated using the package 'ggplot2'. Regarding the ecological diversity, both α and β-diversities of bacteria and DNA bacteriophages were studied. The α-diversity is the average number of distinct species groups at a local scale, considering the richness of a population [62]. As popular α-diversity indexes, the Shannon index (H) and the species richness (S) were calculated for each site, using the R package 'vegan' [63]. Then, the evenness (E), which defines the distribution of the abundance of the species in a community, was calculated as follows [64,65]: The β-diversity is all changes in the species composition of a population between several habitats in a region. This diversity considers both the richness and evenness of populations [66]. Using Bray-Curtis dissimilarities, a multivariate method, the RDA (redundancy analysis), was performed to compare the different sites as a function of the bacteriophage populations at the genus level, through the analysis of their variation from one another. To build the RDA, a selection of the 10 most abundant genera was first performed, as rare genera might generate noise in the analysis). Then, abiotic factors (i.e., sand, clay, silt content, soil pH, magnesium concentration, carbon-to-nitrogen ratio and Ca 2+ concentration) were selected by removing the variables for which strong correlations (>r = 0.80) were found to avoid co-variations, using the function 'pairs'. In addition, all abiotic factors that significantly constructed the RDA were kept (i.e., pH and Ca 2+ ). The determination of the explanatory variables for the RDA was conducted using 'vegan' and correlations of the abiotic factors with each axis were defined using the package 'psych'. Permutational multivariate analysis of variance (PERMANOVA) was conducted to establish the impact of some abiotic factors on the distribution of the DNA bacteriophages. From the calculation of the Bray-Curtis dissimilarities to the PERMANOVA and RDA, the package 'vegan' was used to conduct these analyses. Finally, a co-correspondence analysis (CoCA) was performed to assess the co-variation of the distribution of the total bacterial and bacteriophage genera obtained from the metagenomic analyses, among the different sites. This analysis is based on contingency table data and highlights co-similarities between two groups of variables (here, bacterial species data and DNA bacteriophage genera data). CoCA was performed using the package 'coca', combined with a Mantel test using Pearson correlations.

Soil Description
The granulometric analysis of the soil revealed similar textures between the sites (Table 2). Indeed, selected sites are mainly silty, especially Mollbach and Retgenbusch, for which the silt content is around 52-58%. However, Weierbach and Pall 1 are clay loam with approximately 42% silt, while Daerent and Huechterbach are sandy loam with 53% sand. Concerning Pall 2, the proportion for the different textures are rather equal (28% clay, 38% silt, 34% sand). Koulbich is a loam soil, with a similar proportion of silt and sand (≈40%) but little clay (19%). The gravimetric water content (GWC) was the highest for Weierbach (24.6%) and Pall 1 (20.9%) and the lowest for Koulbich (10.8%).
The physical-chemical properties of soils are reported in Table 2. Within the forest sites, the soils of Weierbach and Retgenbusch are highly acidic (respectively, pH = 4.1 and 4.5) and their respective cation exchange capacity and calcium concentration are quite low (CEC of Retgenbusch = 2.6 milliequivalent per 100 g (meq per 100 g) and Ca 2+ = 1.6 meq per 100 g, CEC of Weierbach = 8.5 meq per 100 g and Ca 2+ = 2.1 meq per 100 g). Their nutrient content was also relatively poor, especially for phosphorus (P 2 O 5 ) and potassium (K 2 O), suggesting that these soils are not very fertile. In addition, the cation saturation rate (S/T) of Retgenbusch is superior to 85% (S/T = 101%), which can explain the high calcium content in the soil, while the soil of Weierbach shows a regular calcium intake (20% ≤ S/T = 42% ≤ 85%). The soil of Weierbach is distinguished by a slow organic matter decomposition (carbon-to-nitrogen ratio or C/N = 18 and high C org content), displayed by a need for nitrogen to ensure an efficient organic decomposition. The soil of Daerent is fertile (CEC = 17.1 meq per 100 g, pH = 6.3, K 2 O = 21 mg per 100 g of dry soil) and has a maximum decomposition rate (C/N = 10). Regarding the grassland sites, Pall 1, Pall 2 and Mollbach soils are all fertile (CEC = 23.7, 17.2, 14.3 meq per 100 g, respectively) with a high concentration of calcium (Ca 2+ = 16.6, 15.1 and 13.1 meq per 100 g), while Koulbich and Hueschterbach are not quite (CEC = 7.4 and 8.5 meq/100 g, respectively) with lower calcium concentrations (Ca 2+ = 5.8 and 8.7 meq per 100 g). However, all of them are saturated in calcium, with a cation saturation rate superior to 100%. Finally, the grassland soils have a rapid decomposition (C/N ≈ 8.5) with a maximum organic decomposition for Pall 1 (C/N = 9.8) and have a favourable pH (pH ≈ 6-6.5) and mineral nutrient for plant growth. Table 2. Physical-chemical properties of the eight studied soils: GWC: gravimetric water content, C org : organic carbon, N tot : total nitrogen, CEC: cation exchangeable capacity, S/T: cation saturation rate, Ca 2+ : calcium concentration.

VLPs Enumeration
The average concentration of VLPs found in every soil was about 10 10 VLPs per gram of dry soil ( Figure 2). The highest concentrations were recorded for the soils of Retgenbusch, Hueschterbach, Daerent and Pall 2 with approximately 1.5 × 10 10 VLPs per gram of dry soil. The lowest concentration was for the soil of Weierbach with 3.3 × 10 9 VLPs per gram of dry soil. and displayed a 0.48 log difference with the lowest VLPs concentration (after the Weierbach), Koulbich and 0.67 log with Hueschterbach, the highest VLPs concentration. Finally, Koulbich showed on average a lower concentration of 0.18 log compared to all the other sites.

VLPs Enumeration
The average concentration of VLPs found in every soil was about 10 10 VLPs per gram of dry soil ( Figure 2). The highest concentrations were recorded for the soils of Retgenbusch, Hueschterbach, Daerent and Pall 2 with approximately 1.5 × 10 10 VLPs per gram of dry soil. The lowest concentration was for the soil of Weierbach with 3.3 × 10 9 VLPs per gram of dry soil. and displayed a 0.48 log difference with the lowest VLPs concentration (after the Weierbach), Koulbich and 0.67 log with Hueschterbach, the highest VLPs concentration. Finally, Koulbich showed on average a lower concentration of 0.18 log compared to all the other sites.

Taxonomic Classification
The statistics of the shotgun metagenomic sequencing are reported in Table 3.

Taxonomic Classification
The statistics of the shotgun metagenomic sequencing are reported in Table 3. The shotgun sequencing produced between 40 to 57 million paired reads per sample generating an average of 6.2 million contigs after assembly. From 75 to 99% of the total raw reads mapped back to the contigs. Of these mapped reads, between 0.01% and 0.04% were virus-affiliated reads (Supplementary Table S1), corresponding to a total of 37 known viral families for all sites included. Among these families, 13 DNA bacteriophage families were identified and represented between 89% and 96% of all virus-affiliated reads ( Figure 3A). The DNA bacteriophage composition was strongly characterised by the Caudovirales order with between 75% and 93% of the total metagenomics identification. This high proportion was mainly due to the high abundance of Siphoviridae (13.9% to 49%), Myoviridae (5% to 12%) and Podoviridae (4% to 10%). Caudovirales gathers all the identified DNA bacteriophage families, except for Autolykiviridae (unclassified order) in Daerent and Mollbach soils, Microviridae (Petitvirales) and Tectiviridae (order Kalamavirales) both present in every site. By investigating each site, Siphoviridae was found to always be the most identified family, especially in the soils of Pall 1, Hueschterbach, Mollbach, Koulbich and Pall 2 (with, respectively, 49.58%, 44.83%, 35.15% and 34.80%) while its abundance was the lowest for Retgenbusch (13.9%). Regarding Myoviridae and Podoviridae, the former was highly abundant in Koulbich (10.65%), Retgenbusch (10.9%) and Weierbach (12.3%) while the latter was rather highly present in the soils of Weierbach (10.2%) and Mollbach (10.3%). It is noteworthy that most of the assigned DNA bacteriophages from the libraries belonged to the unclassified Caudovirales (between 29% and 51%). Of the two ssDNA bacteriophage families, i.e., Inoviridae and Microviridae, both were detected by the metagenomics with a low abundance in all sites (between 0.09% and 1.6% for the former and between 0.06% and 1% for the latter), while all the previously mentioned families are dsDNA bacteriophages. Saphexavirus (Siphoviridae), while in Retgenbusch, Borockvirus (Myoviridae) was the most abundant genus. Pall 1 was found to be mainly represented by Scapunavirus (Siphoviridae) followed by Bingvirus (Siphoviridae). Scapunavirus was also found in a slightly higher abundance in Daerent and Hueschterbach. Koulbich and Pall 2 did not display high abundances for specific genera, but still showed the presence of Fromanvirus (Siphoviridae) besides Lessievirus.  The DNA bacteriophage α-diversity of the known microbial fraction of each soil was investigated using the Shannon index, species evenness and species richness ( Table 4). All sites displayed similar diversity (around H = 2) where Retgenbusch was the least diverse soil (H = 2.31) and Weierbach the most one (H = 2.83). Indeed, the soil of Weierbach presented the lowest number of DNA bacteriophage species (S = 91) and the highest evenness (E = 0.63). A one-way analysis of variance (one-way ANOVA) was performed on the Shannon index to analyse the hypothetical impact of the selected soil properties on the DNA bacteriophage diversity within the sites. As a result, none of the soil properties showed a significant effect on the DNA bacteriophage alpha-diversity. Table 4. Shannon diversity index (H), evenness (E) and richness (S) on genus level of DNA bacteriophage from the known fraction of the population from the soil of the studied sites. To visualise the composition of the DNA bacteriophage population, a heatmap on the log-transformation abundances, at the genus level, was performed on the 10 most abundant genera, all sites included ( Figure 3B). Genus Lessievirus (Podoviridae) was detected in every site, and more specifically in the soil of Weierbach, for which Mieseafarmvirus (Myoviridae) was also present. The soil of Mollbach was specifically represented by Saphexavirus (Siphoviridae), while in Retgenbusch, Borockvirus (Myoviridae) was the most abundant genus. Pall 1 was found to be mainly represented by Scapunavirus (Siphoviridae) followed by Bingvirus (Siphoviridae). Scapunavirus was also found in a slightly higher abundance in Daer-ent and Hueschterbach. Koulbich and Pall 2 did not display high abundances for specific genera, but still showed the presence of Fromanvirus (Siphoviridae) besides Lessievirus. The DNA bacteriophage α-diversity of the known microbial fraction of each soil was investigated using the Shannon index, species evenness and species richness ( Table 4). All sites displayed similar diversity (around H = 2) where Retgenbusch was the least diverse soil (H = 2.31) and Weierbach the most one (H = 2.83). Indeed, the soil of Weierbach presented the lowest number of DNA bacteriophage species (S = 91) and the highest evenness (E = 0.63). A one-way analysis of variance (one-way ANOVA) was performed on the Shannon index to analyse the hypothetical impact of the selected soil properties on the DNA bacteriophage diversity within the sites. As a result, none of the soil properties showed a significant effect on the DNA bacteriophage alpha-diversity. Table 4. Shannon diversity index (H), evenness (E) and richness (S) on genus level of DNA bacteriophage from the known fraction of the population from the soil of the studied sites.

β-Diversity: Known DNA-Bacteriophage Diversity between Sites
The β-diversity analysis allowed comparing the DNA bacteriophage subset populations from one site to another. The PERMANOVA test (Table 5) performed on the DNA bacteriophage genera abundancy did not reveal a significant effect of the vegetation (i.e., forest and grasslands) on the distribution of the DNA bacteriophages (p-value = 0.075, α = 0.05). Table 5. PERMANOVA tests to analyse the influence of the selected abiotic factors on the distribution of the known DNA bacteriophage genera (α = 0.05). However, soil pH and calcium concentration were associated with the differences in the DNA bacteriophage compositions between the sites (p-value = 0.003 and 0.008, respectively, α = 0.05). A redundancy analysis (afterwards called RDA; Figure 4) was carried out to analyse the variations of bacteriophage genera depending on the soil properties of the sites. The RDA was built over the two first dimensions, covering together 67.5% of the fitted variation. Among all the abiotic factors, soil pH was the explanatory variable that had a significant weight on the construction of the RDA (p-value = 0.008, α = 0.05). On the one hand, the axis RDA1 was significantly correlated at 88.4% with soil pH (p-value < 0.01, α = 0.05) and at 79.2% with calcium cation (p-value < 0.01, α = 0.05). The RDA2, on the other hand, was not significantly correlated with any of the tested soil properties.

Environmental Variables p-Value
Significance: ** p < 0.01. However, soil pH and calcium concentration were associated with the differences in the DNA bacteriophage compositions between the sites (p-value = 0.003 and 0.008, respectively, α = 0.05). A redundancy analysis (afterwards called RDA; Figure 4) was carried out to analyse the variations of bacteriophage genera depending on the soil properties of the sites. The RDA was built over the two first dimensions, covering together 67.5% of the fitted variation. Among all the abiotic factors, soil pH was the explanatory variable that had a significant weight on the construction of the RDA (p-value = 0.008, α = 0.05). On the one hand, the axis RDA1 was significantly correlated at 88.4% with soil pH (p-value < 0.01, α = 0.05) and at 79.2% with calcium cation (p-value < 0.01, α = 0.05). The RDA2, on the other hand, was not significantly correlated with any of the tested soil properties.  Along the RDA1 axis, both soil pH and calcium concentration explained the sites distributions. Indeed, Weierbach and Retgenbusch were displayed apart from the other sites, as they were the most acidic soils (pH = 4.1 and 4.5, respectively) and had the lowest concentration in calcium (Ca 2+ = 21 and 16 positive charges per kg of soil, respectively). The RDA showed a particular correlation between the sites of Weierbach and Retgenbusch and the genus Borockvirus. On the contrary, Pall 1, Huescherbach and Daerent presented the highest soil pH (pH = 6.8, 6.4 and 6.3, respectively) with relatively high concentrations of calcium (Ca 2+ = 166, 126 and 87 positive charges per kg of soil, respectively). In terms of DNA bacteriophage genera, these soil properties (i.e., pH and Ca 2+ ) showed a negative correlation with Bingvirus and Scapunavirus. Finally, Mollbach showed a particular distribution, related to the magnesium content, as indeed its soil presented the highest concentration of magnesium (Mg = 52 mg/100 g of soil). Additionally, Saphexavirus displayed a positive correlation specifically to the magnesium concentration.

Known Bacterial Populations and Connections with the Known DNA-Bacteriophage Communities
As reported in Supplementary Table S1, bacterial sequences represented 74 to 77% of the total microbial sequences identified in the libraries. First, bacterial populations of all sites were mainly composed of the following phyla, Acidobacteria, Actinobacteria, Proteobacteria and Verrucomicrobia (see Supplementary Figure S1). As for the DNA bacteriophages, the taxonomical composition of the bacterial populations was then analysed at the genus level by selecting the most abundant bacteria (abundance above 1000 mapped reads per species, all sites included). Retgenbusch and Weierbach showed similar patterns in bacterial genera with the lowest soil bacterial diversity (H = 2.78 and 2.89, respectively) (see Supplementary Table S2) and similar bacterial composition as is visible in Figure 5.
sites were mainly composed of the following phyla, Acidobacteria, Actinobacteria, Proteobacteria and Verrucomicrobia (see Supplementary Figure S1). As for the DNA bacteriophages, the taxonomical composition of the bacterial populations was then analysed at the genus level by selecting the most abundant bacteria (abundance above 1000 mapped reads per species, all sites included). Retgenbusch and Weierbach showed similar patterns in bacterial genera with the lowest soil bacterial diversity (H = 2.78 and 2.89, respectively) (see Supplementary Table S2) and similar bacterial composition as is visible in Figure 5.  In addition, these two sites showed differences in bacterial compositions from the other sites. Indeed, Regtenbusch and Weierbach showed higher abundance for the genera Trebonia, Edaphobacter and Mycobacterium, for which the relative abundances were lower in the other soils. Regarding the other genera, with the exception of Streptomyces and Bradyrhizobium, all of them showed high abundance for all sites except for Regtenbusch and Weierbach. Finally, Streptomyces and Bradyrhizobium, were detected in every soil with the highest abundance reported for the latter. The differences in bacterial community composition between the sites were studied through a PERMANOVA test, performed on the bacteria genera abundance. As noticed for the DNA bacteriophages, the vegetation (i.e., forest and grasslands) did not significantly impact the bacteria distribution (p-value = 0.121, α = 0.05). In addition, soil pH and calcium concentration also displayed significant effect on the bacterial compositions between the sites (p-value = 0.006 and 0.002, respectively, α = 0.05). Contrary to the observations on the DNA bacteriophages, magnesium concentration played a significant role on the bacterial composition (p-value = 0.027, α = 0.05). As for the DNA bacteriophage investigation, RDA was carried out to analyse the variations of bacterial genera depending on the same selected soil properties of the sites (Figure 6). The RDA was built over the two first dimensions, covering together 93.77% of the fitted variation. Among all the abiotic factors, soil pH was the sole explanatory variable that had a significant weight on the construction of the RDA (p-value = 0.01, α = 0.05). On one hand, the axis RDA1 was significantly correlated at 92.16% with soil pH (p-value < 0.01, α = 0.05) and at 79.21% with calcium (p-value < 0.01, α = 0.05). On the other hand, the RDA2 was not significantly correlated with any of the tested soil properties. Weierbach and Retgenbusch were found apart from all the other sites, and both related to only Edaphobacter and Mycobacterium.
As DNA bacteriophages and bacteria are known for being exclusively linked to one another, the covariation of these two communities was analysed using a symmetric CoCA. This ordination method was built over the two first dimensions that covered 67.4% of the total covariation (Figure 7). It allows exploring the distribution of the whole known bacterial and DNA bacteriophage subset populations at the genera level by analysing both simultaneously. The patterns for both bacterial and DNA bacteriophage CoCA were similar and confirmed the distinct group formed by Weierbach and Retgenbusch. The latter was displayed offset from the other sites while Daerent, Pall 1, Pall 2, Mollbach and Hueschterbach were similar to each other on both bacterial and bacteriophage CoCA. A Mantel test using Pearson correlation carried on DNA bacteriophage and bacterial genera confirmed a strong positive correlation (R-value = 0.75, significance = 0.001).
concentration played a significant role on the bacterial composition (p-value = 0.027, α = 0.05). As for the DNA bacteriophage investigation, RDA was carried out to analyse the variations of bacterial genera depending on the same selected soil properties of the sites ( Figure 6). The RDA was built over the two first dimensions, covering together 93.77% of the fitted variation. Among all the abiotic factors, soil pH was the sole explanatory variable that had a significant weight on the construction of the RDA (p-value = 0.01, α = 0.05). On one hand, the axis RDA1 was significantly correlated at 92.16% with soil pH (p-value < 0.01, α = 0.05) and at 79.21% with calcium (p-value < 0.01, α = 0.05). On the other hand, the RDA2 was not significantly correlated with any of the tested soil properties. Weierbach and Retgenbusch were found apart from all the other sites, and both related to only Edaphobacter and Mycobacterium. As DNA bacteriophages and bacteria are known for being exclusively linked to one another, the covariation of these two communities was analysed using a symmetric CoCA. This ordination method was built over the two first dimensions that covered 67.4% of the total covariation (Figure 7). It allows exploring the distribution of the whole known bacterial and DNA bacteriophage subset populations at the genera level by analysing both simultaneously. The patterns for both bacterial and DNA bacteriophage CoCA were similar and confirmed the distinct group formed by Weierbach and Retgenbusch. The latter

Discussion
In our study, a viral concentration of approximately 10 9 to 10 10 VLPs per gram o soil was found, which is consistent with previous works [14,68,69]. Although VLP c are not always considered to be a good representative of virus concentrations, the c performed here were carried out in the same manner as previously on freshwater ments using the FACSCalibur FCM table. This method was previously validated and vided similar VLP counts to those obtained by EFM [70]. A good correlation was fo highlighting the high sensitivity of the device and thus, allowing obtaining unambig signatures for VLPs by discriminating noise from VLPs (virus-like particles) to PLPs karyote-like particles). It is noteworthy, however, that flow cytometry analysis doe

Discussion
In our study, a viral concentration of approximately 10 9 to 10 10 VLPs per gram of dry soil was found, which is consistent with previous works [14,68,69]. Although VLP counts are not always considered to be a good representative of virus concentrations, the counts performed here were carried out in the same manner as previously on freshwater sediments using the FACSCalibur FCM table. This method was previously validated and provided similar VLP counts to those obtained by EFM [70]. A good correlation was found, highlighting the high sensitivity of the device and thus, allowing obtaining unambiguous signatures for VLPs by discriminating noise from VLPs (virus-like particles) to PLPs (prokaryote-like particles). It is noteworthy, however, that flow cytometry analysis does not allow identifying viruses or discriminating bacteriophages from the rest of the VLPS [71]. To draw a complete picture of the viral community, a taxonomic identification of the known DNA bacteriophage populations was complementarily performed through shotgun metagenomics and reference-based bioinformatics. The taxonomy was assigned using the nr database released in April 2022; however, it is noteworthy that the viral taxonomy built by the International Committee on Taxonomy of Viruses (ICTV) is nowadays updated according to the genetic similarities between bacteriophages, including reclassification of the families observed in the current study, such as Siphoviridae, Myoviridae and Podoviridae.
Our study showed Caudovirales as the most frequently identified order in all sites, while Siphoviridae, Podoviridae and Myoviridae were found as the most abundant families observed within the identified DNA bacteriophages, which is consistent with most of the literature that studied viral soil diversity [13,26,29]. However, the high proportion of these three families is rather the reflection of the taxonomy made available on the databases up to now, where most of the bacteriophage species are classified into these families [72]. For this reason, it is important to consider low taxonomic levels in diversity studies. In the current study, the viral genera obtained from the libraries did not display specific patterns related to the ecosystems (i.e., forests vs. grasslands). The three selected forest sites presented different vegetal covers (i.e., beech, spruce and oak) with diverse types of soil textures. The forest sites might not be sufficiently similar to constitute a common pattern in DNA bacteriophage diversity. However, several patterns in the known DNA bacteriophage distribution are distinguishable by considering the specific soil parameters. The soil pH as well as the calcium concentration significantly drove the ecological distribution of the known DNA bacteriophage communities among the different studied soils. More interestingly, the distribution of both DNA bacteriophage and bacterial communities in diverse types of soil were found to be strongly linked to one another and their distributions were explained by the same abiotic factors, i.e., soil pH and calcium concentration. Additionally, a complementary effect of magnesium concentration solely on the bacterial distribution was detected. The differences in microbial diversity were highlighted especially for the soils of Weierbach and Retgenbusch. Indeed, these two sites did not only display an acidic pH (around 4) with low calcium availability (<5 meq per 100 g of soil), but also low abundance and diversity of the known microbial populations at the genus level. While studying the known DNA bacteriophage populations, these same three soils were represented by the significant presence of one specific DNA bacteriophage genus. Indeed, Saphexavirus, Borockvirus and Lessievirus were highly abundant in the soil of Mollbach, Retgenbusch and Weierbach, respectively, compared to the other studied soils. Some studies reported a dominance of small ssDNA viruses rather than dsDNA-tailed bacteriophages in the soils [73][74][75]. In the present study, ssDNA bacteriophages (Microviridae and Inoviridae) were also detected, but in low frequency in comparison to dsDNA bacteriophages. Most ssDNA bacteriophages detected from environmental metagenomes belonged to the Microviridae family, for which all members infect a narrow range of hosts [76]. However, most ssDNA bacteriophages remain scarcely characterised within the taxonomy as Inoviridae represents 11% and Microviridae 2% of the ICTV database [77], reflecting a potential bias in the determination of their abundance in the soil samples.
Calcium is an important component participating in soil life. It plays a role in the soil structure, firstly as a binding agent in the aggregation of soil particles, secondly as a macronutrient in plant nutrition and thirdly as a factor facilitating soil permeability [78]. Then, calcium has a biological role in promoting an optimal environment to enhance soil microorganism activities [79]. Finally, calcium is highly related to soil pH. Indeed, soils with low pH generally display low calcium availability. As high concentrations of hydrogen ions displace calcium concentration from binding sites, it results in the leaching of "free" calcium, particularly in the more porous soils [80,81]. Such soils, such as the ones of Weierbach and Retgenbusch, i.e., low pH and calcium concentration, did display a low microbial diversity among the bacterial and DNA bacteriophage populations characterised in this study. On the contrary, Pall 1, Pall 2 and Daerent with much higher calcium concentrations (between 10 and 20 meq per 100 g of soil) and an ideal range of soil pH (between 6 and 7) showed higher microbial diversity. In laboratory experiments, divalent cations (such as calcium and magnesium) were found to interact with bacteriophage MS2 by increasing its attachment to the surface of river natural organic matter [82]. In soil, adsorption onto surface particles is the most important process implied in the bacteriophages-soil interactions [11]. This interaction is highly dependent upon the type of cations found in the environment. Calcium cation (positively charged) is highly involved in the adsorption of clay-humus complex (CHC), negatively charged on its surface, through electrostatic forces [83,84]. Bacteriophages can then adsorb onto the CHC by the presence of the cations on the surface. Indeed, bacteriophages are charged on their surface; however, this charge is highly dependent on pH. At neutral pH (7), most bacteriophages are negatively charged and can interact with the available cations [85]. Reducing the pH will induce some bacteriophage surfaces to turn into positive charges, according to their isoelectric point (pI). Once positive, bacteriophages can compete with cations to bind with the clay charges of the CHC. Indeed, bacteriophages tend to be more retained in soils rich in clay. The latter has a lower specific surface, due to large particles and a greater ion-exchange capacity, which highly adsorb bacteriophages [86]. Therefore, the combination between calcium cation and pH can play a role in the bacteriophage dynamic in soils. However, not only did these factors show a significant effect on the DNA bacteriophage distribution, but they also did on the bacterial distribution. Additionally, our results showed a strong positive correlation between both bacterial and DNA bacteriophages distribution among the sites, suggesting a strong interaction. Therefore, soil pH seems like the most significant factor in leading the known DNA bacteriophage distributions, directly or/and indirectly through the shaping of bacterial communities. Calcium cation availability might rather play a role in the dynamic of DNA bacteriophages, through their adsorption with the soil particles.
The advances in metagenomics and bioinformatics have enabled deeper investigations for the identification of viruses [87,88]; however, some technical issues remain in the analysis of soil viruses [89]. As viruses tend to adsorb on soil particles, virus elution and concentration steps are often required before performing metagenomics. Nevertheless, this step generally ends in low recovery efficiencies [90,91], as noticed in our results during the optimisation of the elution and concentration protocol. Indeed, even though the viral recovery was satisfactory to conduct a viral quantification using flow cytometry, it resulted in low DNA yield after nucleic acid extraction. Therefore, it affected the good quality of viral sequences obtained after conducting the shotgun metagenomics, leading us to carry out the DNA extraction directly from the soil samples. As the analysis is solely relying on virus-affiliated reads that can be obtained from total DNA metagenomes, few viral contigs were obtained with N50 comprising between 200 and 300 per library. This can limit the interpretation of relationships between viruses and other microbes within the same samples.
Much remains to be understood about the diversity, dynamics and roles of bacteriophages in soils and metagenomics approaches have only recently enabled increasing our knowledge, particularly under in situ conditions. In laboratory experiments, some physicochemical properties, especially pH and ion strength, influenced bacteriophage inactivation [92,93]. Our results seemed to show the effect of these similar factors on bacteriophage diversity in soils; however, it is noteworthy that the study was focused on the known fraction of the microbial populations in the diverse selected soils. The OTU-based approach is less dependent on reference databases but works by clustering sequences according to similarity thresholds (often set at a sequence similarity level above 97%) [94][95][96]. This approach does not identify the taxonomic species but is rather a proxy for them and has shown to be phylogenetically incoherent [94][95][96]. Therefore, further analyses using complementary molecular and bioinformatics methods, including untargeted approaches, could be applied to obtain a more complete picture of bacteriophage dynamics. Additionally, the investigation of the influence of certain abiotic factors on bacteriophage diversity based on what has already been characterised could open perspectives for the use of these species as, for instance, biotechnologies or in the development of new culture models for laboratory experiments.

Conclusions
In this study, the natural diversity and distribution of DNA bacteriophage populations in soils were explored as well as their relationship with the soil-dwelling bacterial populations. Through the complementary use of metagenomics, reference-based bioinformatics and multidimensional analyses, we identified the known fraction of DNA bacteriophage communities in forest and grassland soils and deciphered the influence of some soil properties and microbial communities on the DNA bacteriophage occurrence. Both soil pH and calcium content were observed to influence the distribution of DNA bacteriophage and bacterial populations. A strong positive correlation between DNA bacteriophage and bacterial communities was confirmed, suggesting in addition to soil pH and calcium content, bacteria are also involved. Therefore, our results add to the growing body of evidence that bacteriophages are likely to play key roles in soil ecology.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/microorganisms10071458/s1, Figure S1: Bar charts of the relative abundance (in %) of bacterial phylum (selected genera with an abundance above 100 occurrences) from the known fraction of all studied sites; Figure S2: Per base sequence quality from FastQC for read forward 1 (A) and read reverse 2 (B) of the sequencing ot the Weierbach soil; Table S1: Percentages of relative abundances identified for each taxonomic domain; Table S2: Shannon diversity index (H), evenness (E) and species richness (S) of bacteria populations in the soil of the studied sites; Table S3: Recovery of bacteriophage ΦX174 comparing three methods for mechanical viral detachment from the Weierbach soil; Table S4: Recovery of bacteriophage ΦX174 from the Weierbach soil comparing six different eluent solutions; Table S5: Recovery of bacteriophage ΦX174 from the Weierbach soil after concentration step, comparing Amicon with two different cut-off points (i.e., 30 kDa and 50 kDa); Table S6: Recovery of bacteriophage ΦX174 at each step of the elution and concentration protocol for each study site; Table S7: Summary of each step from the DNase treatment to the library preparation from viral concentrates and soil sample of the Weierbach site. References [97][98][99][100][101][102] are cited in the supplementary materials.