Analysis of Major Bacteria and Diversity of Surface Soil to Discover Biomarkers Related to Soil Health

The discovery of biomarkers for assessing soil health requires the exploration of organisms that can explain the core functions of soil and identification of species with major roles in these functions. However, identifying specific keystone markers within the soil microbiota is challenging. Next-generation sequencing (NGS)-based molecular-biological methods have revealed information on soil biodiversity; however, whether this biodiversity is related to soil health remains unclear. In this study, we performed NGS on grassland surface soil to compare the prokaryotic and eukaryotic genetic diversity to determine the chemical soil quality and examined markers associated with soil health. Microorganisms associated with the nitrogen cycle, bioremediation, plant pathogenicity, antibiotic production, and material degradation showed potential for use as markers. To propose a framework for soil health assessment, we not only used traditional indicators, such as chemical and physical measures, but also assessed metagenomics data of soil by land use to identify the major factors influencing the microbial structure in soil. Moreover, major keystone species were identified. Furthermore, the microbial genetic diversity of generally healthy surface soil, such as forests, farmland, and parks, was determined. These findings provide basic data for exploring soil health-related biomarkers.


Introduction
Living organisms in soil, including the soil microbiota, play important roles in supporting life on earth. Climate changes and anthropogenic threats to soil such as intensive agriculture can greatly affect soil functions. Compared to soil indicators based on physical and chemical measures that can be used to assess soil quality, bioindicators remain controversial. Although these indicators typically have key functions and important regulatory roles (known as keystone species), identifying specific keystone markers within the vast functional redundancies of the soil microbiota remains challenging [1].
Next-generation sequencing (NGS)-based molecular techniques have been used to study soil microbial diversity and soil microorganisms useful for producing high-quality plants (agricultural soil environment and agriculture-related microorganisms). These studies reported the bacterial and fungal diversity according to the specific treatment of field soil; microbial diversity according to soil depth on a poplar farm; association between soil depth and native and exotic plant species; and comparison of the soil microbiota in natural and re-seeded grassland [2][3][4][5]. The results suggested that the soil depth is a Toxics 2022, 10, 117 2 of 12 major factor affecting the networking between the microbiota structure and abiotic factors, including interactions with fungi at approximately 1 m below the soil surface and microbial diversity at depths of 0-20 and ≥21 cm. The results also demonstrated that the surface soil has important effects on plant growth.
Other studies suggested that the following groups of indicators can be used as indicators of the health and quality of soil based on NGS metadata: (i) Microorganisms beneficial to plants [nitrogen-fixing bacteria-symbiotic (Rhizobia etc.) or plant-associated bacteria (Azospirillium and Paenibacillius etc.); phosphate-solubilizing bacteria-Pseudomonas and Bacillus etc.; and bacteria inducing induced systemic resistance in plants and fungi forming beneficial symbionts with plants-arbuscular mycorrhizal and ectomycorrhizal fungi]; (ii) microorganisms harmful to plants (Fusarium genus is plant pathogen and related markers are assessed as factors negatively affecting plant growth); (iii) other potential genetic markers [anti-pathogen compounds that block pathogenic microorganisms and indole acetic acid which promotes plant growth via production of plant growth hormones]; and (iv) soil microorganisms related to nutrient cycling [nitrogen fixation (nifH), nitrification (amoA), denitrification (nir, nor), N immobilization (glutamine synthase-encoding gene), N mineralization (protease-encoding genes), organic C mineralization (β-glucosidase-encoding genes), carbon dioxide fixation (RUBISCO-encoding genes), and organic P mineralization (acid and alkaline phosphomonoesterase)] [6]. Other ecological services performed by the soil microbiota include regulation of biogeochemical cycles, retention and delivery of nutrients to primary producers, maintenance of soil structure and fertility, bioremediation of contaminants, supply of clean drinking water, flood and drought mitigation, erosion control, regulation of atmospheric trace gases, pest and pathogen control, and regulation of plant production through secondary metabolites (non-nutritive biochemical substances).
Specific genes, taxa, or groups with principles based on such functionalities may be useful as indicators [7]. However, some microorganisms may appear at different times and locations and may vary in the presence of different plant species. For example, when Carex arenaria was cultivated in ten different types of soil, the diversity of rhizobacteria was more similar to that in bulk soil compared to the diversity of rhizobacteria in other soil types [8,9]. Examining the key functions based on soil microorganism metadata may lead to the identification of markers in multi-function soil beyond the single-microorganism level (co-occurrence of specific microbial taxa), enabling the use of network connectivity as an indicator of soil health. Moreover, sampling at different times and locations may be more important than assessing all DNA from the soil microbiota. To develop a framework for soil health assessment, we used traditional indicators, such as chemical and physical measures, as well as molecular-biological metagenomics data of soil by land use to identify the major factors influencing the microbial structure in soil. Moreover, we identified major keystone species according to land use.

Sampling
The study area was comprised of forests, agricultural lands, and residential districts. Forty-five samples were collected in 15 samples of the forest and paddy soil in Chungcheong province in southwest Korea (Yesan, Geumsan, Gongju, Okcheon, and Boeun) and 15 samples from residential districts in Sejong and Daejeon ( Figure 1).

Physico-Chemical Parameter Analysis
Soil sampling and analysis were performed according to the guidelines of the National Academy of Agricultural Science in Korea (NAAS, 2010) [10]. The collected soil samples were filtered through a 2-mm (10 mesh) sieve after air-drying. The soil texture was determined using the micro-pipette method [11]. The pH and electrical conductivity of the samples were determined for a 1:5 soil:water (w/v) suspension using a pH meter and conductivity meter (MP220, Mettler Toledo, Columbus, OH, USA) [4]. The cation exchange capacity was analyzed using the 1 M CH 3 COOH extraction method [2]. The soil organic Toxics 2022, 10, 117 3 of 12 content was measured as described by Walkley and Black. Effective phosphoric acid was determined using the Bray No.1 method with molybdenum blue dye and measured on a UV spectrophotometer (UV-1800, Shimadzu, Kyoto, Japan) [2]. The inorganic nitrogen content, as NH 4 -N and NO 3 -N, was determined using a QuikChem automated ion analyzer (QuikChem 6000 Series, Lachat Instruments, Milwaukee, WI, USA), after extraction with 2 M KCl [3].

Physico-Chemical Parameter Analysis
Soil sampling and analysis were performed according to the guidelines of the National Academy of Agricultural Science in Korea (NAAS, 2010) [10]. The collected soil samples were filtered through a 2-mm (10 mesh) sieve after air-drying. The soil texture was determined using the micro-pipette method [11]. The pH and electrical conductivity of the samples were determined for a 1:5 soil:water (w/v) suspension using a pH meter and conductivity meter (MP220, Mettler Toledo, Columbus, OH, USA) [4]. The cation exchange capacity was analyzed using the 1 M CH3COOH extraction method [2]. The soil organic content was measured as described by Walkley and Black. Effective phosphoric acid was determined using the Bray No.1 method with molybdenum blue dye and measured on a UV spectrophotometer (UV-1800, Shimadzu, Kyoto, Japan) [2]. The inorganic nitrogen content, as NH4-N and NO3-N, was determined using a QuikChem automated ion analyzer (QuikChem 6000 Series, Lachat Instruments, Milwaukee, WI, USA), after extraction with 2 M KCl [3].
For phosphatase analysis, 1 g of fresh soil was incubated for 1 h at 37 °C after adding 0.2 mL toluene, 0.025 M p-nitrophenyl phosphate, and 1 mL of modified universal buffer (pH 6.5) to the test tube. Next, 4.0 mL of 0.5 M NaOH and 1 mL of 0.5 M CaCl2 were added to quench the reaction. The filtrate was measured with a UV/vis spectrometer at 400 nm. A calibration curve for both β-glucosidase and phosphatase was generated with 0.1 M Tris buffer mixed with 0.4-1.7 µ g p-nitrophenol. Soil enzyme activity was expressed as µ g pnitrolphenol produced by 1 g dry weight soil/h (Table 1).
For phosphatase analysis, 1 g of fresh soil was incubated for 1 h at 37 • C after adding 0.2 mL toluene, 0.025 M p-nitrophenyl phosphate, and 1 mL of modified universal buffer (pH 6.5) to the test tube. Next, 4.0 mL of 0.5 M NaOH and 1 mL of 0.5 M CaCl 2 were added to quench the reaction. The filtrate was measured with a UV/vis spectrometer at 400 nm. A calibration curve for both β-glucosidase and phosphatase was generated with 0.1 M Tris buffer mixed with 0.4-1.7 µg p-nitrophenol. Soil enzyme activity was expressed as µg p-nitrolphenol produced by 1 g dry weight soil/h (Table 1).

NGS Analysis
DNA was extracted using a DNeasyPowerSoil Kit (Qiagen, Hilden, Germany) according to the manufacturer's instructions. Sequencing libraries were prepared according to the Illumina 16S Metagenomic Sequencing Library protocols to amplify the V3 and V4 region. The input genomic DNA (2 ng) was PCR-amplified in 5× reaction buffer, 1 mM of dNTP mix, 500 nM each universal F/R PCR primer, and Herculase II fusion DNA polymerase (Agilent Technologies, Santa Clara, CA, USA). The cycling conditions for the first round of PCR were as follows:   Illumina MiSeq sequencing was conducted to analyze the bacterial community structures. Finally, All samples were analyzed using Illumina Miseq sequencing (San Diego, CA, USA) by Macrogen, Inc. (Seoul, Korea). Quantitative Insights into Molecular Ecology (QIME) software was used for comparisons at the phylum to species levels through data trimming and analyzing the alpha diversity [13]. Sequence reads were analyzed using QIIME software version. Ambiguous and chimeric sequences were removed, and sequences were classified into operational taxonomic units (OTUs) at 97% similarity using the CD-HIT-OTU program (Macrogen, Inc., Seoul, Korea). The taxonomy of each OTU was assigned based on the NCBI 16S microbial database. Shannon's diversity index, Simpson's diversity index, Chao 1 richness, and Ace richness were calculated in QIIME and used to compare the soil fungal alpha diversity. Venn diagrams of unique and shared OTUs were drawn to highlight the similarities and shared sequences between the different samples.

Nonmetric Multidimensional Scaling and Canonical Correspondence Analysis
Nonmetric multidimensional (NMDS) analysis was performed to determine the patterns of similarity (Bray-Curtis similarity) in the structure of the microbial community between treatments [14]. Canonical correspondence analysis (CCA) was conducted to explore the association of the microbial community composition with the soil characteristics. NMDS analysis and CCA were performed using the "vegan" package in R version 3.2.0 for Windows (The R Project for Statistical Computing, Vienna, Austria). MDS, which is used to express similarity/dissimilarity between samples by plotting points in two-or threedimensional space, is a statistical method for identifying patterns or structures inherent within data through visualization of the proximity between samples. This approach is applied in various fields, including the social and natural sciences, to analyze the similarity/dissimilarity in microbial communities that change according to environmental factors. In MDS, the distance between samples is calculated using the Euclidean distance matrix. In contrast, NMDS is used when data are given on an order scale. When the distance between samples is given in order, the distance is generated by converting the order scale to be the same as the attributes of distance.  Table 2).

Phylum-Level Analysis
As shown in Figure 2, Proteobacteria was the dominant phylum in both agriculture and forest soil samples, followed by Actinobacteria and Acidobacteria. The phyla Firmicutes, Actinobacteria, and Chloroflexi showed relatively higher levels in agriculture soil than in forest soil, with differences of 3.9%, 2.5%, and 2.5%, respectively. In contrast, the phyla Proteobacteria, Acidobacteria, and Verrucomicrobia showed higher levels in forest soil than in agriculture soil, with differences of 5.2%, 3.7%, and 2.5%, respectively. The findings were similar to those reported by Lee et al. (2021) [13]. However, unlike previous studies showing that Acidobacteria was dominant, this phylum showed approximately 3.7% higher levels in forest soil than in agriculture soil in the present study. Acidobacteria may have been affected by factors other than the type of land use, such as regional differences.
In agriculture soil, the phyla Cyanobacteria, Nitrospirae, Bacteroidetes, Gemmatimonadetes, Firmicutes, and Chloroflexi and class Gammaproteobacteria were relatively dominant. In forest soil, the phyla Synergistetes, Planctomycetes, Verrucomicrobia, and Acidobacteria and class Alphaproteobacteria were relatively dominant. The phylum Proteobacteria, which showed a relatively lower level in agriculture soil, and Firmicutes [9.2% with ≥3.9% difference relative to other use (forest 5 findings were similar to those reported by Lee et al. (2021) [13]. However, unlike previous studies showing that Acidobacteria was dominant, this phylum showed approximately 3.7% higher levels in forest soil than in agriculture soil in the present study. Acidobacteria may have been affected by factors other than the type of land use, such as regional differences.

Land Use Major Species Ratio (%) Characteristics Reference
Agriculture Bacillus cucumis 0.6 Gram-positive-staining, aerobic, endospore-forming bacterial strain, isolated from the stem of a cucumber plant, studied in detail for its taxonomic position.

Kittiwongwattana et al., 2015 [19]
Nitrospira moscoviensis 0.8 Gram-negative, non-motile, non-marine, nitrite-oxidizing bacterium was isolated from an enrichment culture initiated with a sample from a partially corroded area of an iron pipe of a heating system in Moscow, Russia.
Ehrich et al., 1995 [20]  Gaiella occulta was the dominant species in both agriculture and forest soil, whereas V. silvestris was the subdominant species in agriculture and residential (park) soil and C. flavus was the subdominant species in forest soil. Gaiella occulta, which was dominant in all types of soil, appeared at a rate of 4.0% in agriculture soil and 4.1% in forest soil, with 104 types of OTUs (Figure 3). Vicinamibacter silvestris, which was subdominant in agriculture and residential (park) soil, showed a prevalence of 3.7% (3.2% in forest soil for top 5 OTUs among species excluding others) and there were 315 types of OTUs. Moreover, C. flavus, which was subdominant in forest soil, showed a rate of 3.5% [1.3% in agriculture for top 10 OTUs among species excluding others] and there were 138 types of OTUs. The dominant species G. occulta was isolated from aquifer mineral water by Albuquerque et al. (2011) [32] and has been reported to be related to nucleic acids (clones) reported in soil and lakes. Vicinamibacter silvestris was isolated and reported in semi-arid soil from subtropical savanna region by Huber et al. (2016) [33], whereas C. flavus was isolated and reported in soil from Australian rye fields and clover pasture by Sangwan et al. (2004) [34]. The dominant and subdominant species may be associated with the environment and plants. Moreover, by examining the characteristics of the dominant microorganisms and bacteria diversity, this information can be used as a major indicator of soil health. However, there are disadvantages to NGS-based research results, and it is considered necessary to perform culture-based follow-up studies.

NMDS and CCA
NMDS and metagenomic patterns of microorganisms by land use were analyzed for CCA between the major dominant species and physicochemical factors.
To increase the accuracy of the relative distance, fitness is expressed as stress values. pH, soil organic matter, soil available phosphorous, soil enzyme, and PHA showed high correlation coefficients with the microbial results by land use. Similar results were found using CCA. Particularly, pH showed major parameter with microorganisms by land use, with high correlation coefficients in NMDS analysis and CCA. These results indicate that microorganisms affecting soil health are closely associated with pH ( Figure 4).

NMDS and CCA
NMDS and metagenomic patterns of microorganisms by land use were analyzed for CCA between the major dominant species and physicochemical factors.
To increase the accuracy of the relative distance, fitness is expressed as stress values. pH, soil organic matter, soil available phosphorous, soil enzyme, and PHA showed high correlation coefficients with the microbial results by land use. Similar results were found using CCA. Particularly, pH showed major parameter with microorganisms by land use, with high correlation coefficients in NMDS analysis and CCA. These results indicate that microorganisms affecting soil health are closely associated with pH ( Figure 4).

Conclusions
To propose a framework construct for soil health assessment, we used not only traditional indicators, such as chemical and physical measures, but also molecular-biological metagenomics data of soil by land use to assess the major factors influencing the microbial structure in soil. We observed close associations in the order of PHA, pH, phosphorous, and soil enzyme among the major indicators for assessment of soil health. Soil organic matter and pH were also significant factors. pH was found to have a major influence on soil health and major microbial communities. The pH and soil microorganism data can be

Conclusions
To propose a framework construct for soil health assessment, we used not only traditional indicators, such as chemical and physical measures, but also molecular-biological metagenomics data of soil by land use to assess the major factors influencing the microbial structure in soil. We observed close associations in the order of PHA, pH, phosphorous, and soil enzyme among the major indicators for assessment of soil health. Soil organic matter and pH were also significant factors. pH was found to have a major influence on soil health and major microbial communities. The pH and soil microorganism data can be used to maintain and manage soil health. Furthermore, metadata on various soil microorganisms should be collected continuously to define markers of multi-functions in soil.