Can We Use Functional Annotation of Prokaryotic Taxa (FAPROTAX) to Assign the Ecological Functions of Soil Bacteria?

: FAPROTAX is a promising tool for predicting ecological relevant functions of bacterial and archaeal taxa derived from 16S rRNA amplicon sequencing. The database was initially developed to predict the function of marine species using standard microbiological references. This study, however, has attempted to access the application of FAPROTAX in soil environments. We hypothesized that FAPROTAX was compatible with terrestrial ecosystems. The potential use of FAPROTAX to assign ecological functions of soil bacteria was investigated using meta-analysis and our newly designed experiments. Soil samples from two major terrestrial ecosystems, including agricultural land and forest, were collected. Bacterial taxonomy was analyzed using Illumina sequencing of the 16S rRNA gene and ecological functions of the soil bacteria were assigned by FAPROTAX. The presence of all functionally assigned OTUs (Operation Taxonomic Units) in soil were manually checked using peer-reviewed articles as well as standard microbiology books. Overall, we showed that sample source was not a predominant factor that limited the application of FAPROTAX, but poor taxonomic identiﬁcation was. The proportion of assigned taxa between aquatic and non-aquatic ecosystems was not signiﬁcantly different ( p > 0.05). There were strong and signiﬁcant correlations ( σ = 0.90–0.95, p < 0.01) between the number of OTUs assigned to genus or order level and the number of functionally assigned OTUs. After manual veriﬁcation, we found that more than 97% of the FAPROTAX assigned OTUs have previously been detected and potentially performed functions in agricultural and forest soils. We further provided information regarding taxa capable of N-ﬁxation, P and K solubilization, which are three main important elements in soil systems and can be integrated with FAPROTAX to increase the proportion of functionally assigned OTUs. Consequently, we concluded that FAPROTAX can be used for a fast-functional screening or grouping of 16S derived bacterial data from terrestrial ecosystems and its performance could be enhanced through improving the taxonomic and functional reference databases.


Introduction
Microbes are known as engines of an ecosystem as their growth and metabolisms drive various biogeochemical cycles and mediate many ecological processes such as decomposing organic compounds, solubilizing mineral substances and promoting plant performance [1]. Moreover, microbes also play important roles in removing toxic pollutants and chemical contaminants. For instance, microorganisms transform aromatic compounds into harmless metabolites or less/nontoxic forms [2]. Other harmful metabolites that are converted by microorganisms include hydrocarbon compounds [3], heavy metals [4] and other chemicals with excessive concentrations in an environment [5][6][7].
Various tools have been developed for the prediction of ecological functions of microbial taxa derived from amplicon-based next-generation sequencing data. These tools allow us to investigate both community and functional composition of microbes. The usefulness of these tools depends on thorough and global data on microbial community and functions, which would provide deeper insight for microbial ecological research and could be a lowcost alternative to metagenomic sequencing [8,9]. As a result, many functional prediction tools were generated for both prokaryotic and eukaryotic microorganisms. For example, FUNGuild is a typical functional prediction tool for fungi, providing guild characteristics of the detected taxa, such as saprotroph, pathogen, decomposer or lichenivorous fungi, based on their taxonomic identity [10]. Other tools such as phylogenetic investigation of communities by reconstruction of unobserved states (PICRUSt) [11,12], pathway prediction by phylogenetic placement (PAPRICA) [13], predicting functional profiles from metagenomic 16S rRNA data (Tax4Fun) [8] and functional annotation of prokaryotic taxa (FAPROTAX) [9] were developed to predict bacterial and archaeal functions. The former three predict the functions based on gene content of detected taxa, whereas the latter is the only tool that uses experimental data on culturable taxa to identify functional groups, metabolic phenotypes or ecologically relevant functions [9]. Furthermore, FAPROTAX may be more preferable for functional prediction of the biogeochemical cycle of environmental samples [9,14,15].
The FAPROTAX is a database that maps bacterial or archaeal taxa to metabolic or ecologically relevant functions (i.e., nitrogen fixation, sulfate respiration or hydrocarbon degradation) using literature on culture representatives. This means that if all cultured members of a taxon can perform a function, the function will be assigned to all members of this taxon (cultured and uncultured). The FAPROTAX provides a python script to convert OTUs tables into functional tables based on taxa identified in a sample and functional phenotype of each taxon in the FAPROTAX database. This database was initially built for a study on marine environments, containing over 80 functions and 4600 taxonomic details of bacteria and archaea from oceans [9]. However, FAPROTAX used standard references (in Bergey's manual of systematic bacteriology [16][17][18][19][20]. The prokaryotes [21] and International Journal of Systematic and Evolutionary Microbiology [22]) for general bacteria and archaea living in both aquatic and terrestrial ecosystems. This implies that the functional assignments are not completely dependent on habitats (aquatic vs. terrestrial systems) and highly dependent on taxonomic information at genus, species or strain levels of a particular bacterial and archaeal taxon. Consequently, many studies have used FAPROTAX for functional annotation of bacteria and archaea on various ecosystems including aquatic [9,[23][24][25][26][27] and terrestrial systems [28][29][30][31], as well as human and animal microbiome [14,32].
Despite the compatibility to the soil system, FAPROTAX is still not widely utilized for functional annotation of soil bacteria and archaea because there has been no effort to test the capability of FAPROTAX in soil. Some important questions remain, for example, (i) what proportions of bacterial and archaeal taxa are living in soil that are successfully annotated using FAPROTAX (as compared to aquatic systems) and (ii) are functional annotations using FAPROTAX accurate in soil systems?
This study aimed to investigate the potential use of FAPROTAX for bacterial functional annotation in non-aquatic ecosystems, specifically in soil. We used both published and newly prepared soil microbial datasets of three ecosystems (mangrove, agriculture and forest). In total, four datasets, including mangrove rhizosphere soil, agricultural bulk soil, agricultural rhizosphere soil and forest soil, were processed. We (i) tested the differences in functional annotated capacity between non-aquatic and aquatic ecosystem using published articles (meta-analysis), (ii) tested the accuracy of FAPROTAX annotation in soil systems by manually checking both appearance and functional performance in soils of all functionally assigned bacterial and archaeal operational taxonomic units (OTUs) with previously published literature and, (iii) tested additional options to improve the functional assignment capacity of FAPROTAX in soils by using relevant references (case study: nitrogen-fixing bacteria in bulk and rhizosphere soils of Trifolium pratense).

Sample Collection
We set up experiments to evaluate the suitability of FAPROTAX to assign functions of soil bacterial and archaeal OTUs. Soil samples from two major terrestrial ecosystems, agricultural grassland and forest were collected. Furthermore, one published dataset on rhizosphere soil of Rhizophora stylosa was also added in this study [33]. Mangrove ecosystems are considered as the interface between aquatic and terrestrial ecosystems [34]; thus, the mangrove soil samples were used as the borderline between the aquatic and terrestrial ecosystems.
In the agricultural grassland ecosystem, bulk soil and rhizosphere soil of Trifolium pratense were taken from five experimental plots of extensively used meadow (ambient treatment) at the Global Change Experimental Facility (GCEF), Germany (51 • 22 60 N, 11 • 50 60 E) [35,36]. These experimental plots are managed as described in Schädler et al. [35], where grasses and legumes growing in the plot are used to feed livestock. In this study, three healthy pratense plants, which represent three subsamples, were taken from each experimental plot. Bulk soil and rhizosphere soil were collected following the protocol as described by Barillot et al. [37]. Briefly, bulk soil attached to the root was first removed by vigorous shaking, then rhizosphere soil was collected by shaking the root in PCR-grade water. Three subsamples of bulk soil and rhizosphere soil collected from the same plot were pooled into one composite sample for each soil fraction. Overall, five composite samples of bulk soil and five composite samples of rhizosphere soil (representing five true replicates) were obtained.
In the forest ecosystem, soil samples were taken from a bamboo-deciduous forest [38] which was dominated by Dendrocalamus membranaceous and Bambusa bambos, in northern Thailand (18 • 32 23 N, 99 • 34 47 E). In detail, five replicated plots (5 × 5 m 2 ) > 20 m apart were selected. Five soil subsamples were collected to 10 cm depth in each plot using an auger with 10 cm in diameter. The subsamples were then pooled into one composite sample. In this study, all composite soil samples were homogenized and sieved (2 mm) to remove stones, roots, macrofauna, and litter.
For the mangrove ecosystem, we used the dataset that has previously been published by Purahong et al. [33]. In detail, this dataset consisted of 5 replicated samples of rhizosphere soil of R. stylosa located in wetland at Pingtung County, southern Taiwan (22 • 26 17.6 N, 120 • 29 29.6 E). Sample collection was described in detail in Purahong et al. [33]. Briefly, five healthy, mature stylosa trees were selected, then four subsamples of rhizosphere soil around each selected tree were collected, pooled and sieved to make a composite sample.
All soil samples, including those from agricultural grassland, deciduous forest and mangrove ecosystems, were kept at −20 • C for further analyses.

DNA Extraction, Sequencing and Taxonomy and Functional Assignment
DNA samples of both agricultural soils were extracted by QIAGEN DNeasy Power-Soil kit, whereas those of forest soil were extracted by NucleoSpin ® Soil DNA extraction kit. PCR amplification of the bacterial V3-V4 regions was conducted using the bacterial primer pair Bact341F (5 -CCTACGGGNGG-CWGCAG-3 ) and Bact785R (5 -GACTACHVGGGTATCTAATCC-3 ) [39]. Amplicons were sequenced using an Illumina MiSeq platform and V3 Chemistry (Illumina). All amplification and sequencing steps were performed at RTL Genomics (Lubbock, TX, USA).
Bioinformatics was proceeded on MOTHUR 1.33.3 [40] following Standard Operating Procedure (SOP) custom analysis workflow. Raw reads, overlapping more than 20 base pairs, were first assembled to generate paired-end reads, then filtered to get high-quality reads, containing at least 200 bp length and having a minimum average quality of 25 Phred score. Chimeric sequences were checked using UCHIME in de novo mode [41], as implemented in MOTHUR and removed from the datasets. The cleaned sequences were clustered at 97% sequence identity and assigned taxonomy using the SILVA 132 database for bacterial 16S rRNA gene [42]. Ecologically relevant functions were then assigned to all detected OTUs using FAPROTAX [9]. In detail, a taxon (e.g., strain, species or genus) was annotated to certain functions if there was a literature report on a culture representative of the taxon that performed the functions. For example, if all cultured species of a genus were previously reported as sulfate reducers, all detected taxa belonging to the genus were also considered as sulfate reducers. All database and assignment instructions are available at http://www.loucalab.com/archive/FAPROTAX/. Lastly, rare OTUs (singletons to tripletons), which could potentially originate from sequencing error, were removed. The remaining reads were normalized to the minimum read count per sample of each dataset. Final OTU tables of all datasets are available in supplementary Tables S1.1-S1.3. The raw sequences are available in the National Center for Biotechnology Information (NCBI) under BioProject accession number: PRJNA646011 for forest and agricultural soils and PRJNA554586 for mangrove soil. The functional assignment percentage (proportion of number of OTUs assigned by FAPROTAX to total detected OTUs in each study) presented in peer-reviewed publications was gathered and divided based on sampling source into 2 main groups, including aquatic and non-aquatic data (Table S3). Differences in the functional assignment percentage over aquatic and non-aquatic data were tested using the Mann-Whitney U test in PAST program v. 2.17c [43]. Shapiro-Wilk and Fligner-Killeen were used to test normality and equality of variances between the two groups.

Taxonomic Identification and FAPROTAX Assignment
The datasets, including twenty soil samples derived from agricultural-bulk soil (n = 5), agricultural-rhizosphere soil (n = 5), forest soil (n = 5) and mangrove soil (n = 5), were used to test the correlation between number of functionally assigned OTUs and number of OTUs identified to different taxonomic ranks (Genus, Order and Phylum). Firstly, the normality of these data sets was analyzed using Shapiro-Wilk test. We detected non-normal distribution in some data sets (p < 0.05); thus, Spearman's rank correlation method was used to test the correlation between number of functionally assigned OTUs and number of OTUs identified to each taxonomic rank. The correlation method was analyzed using stat_cor function in ggpubr package [44]. The simple linear regressions, showing the relationship between the two variables, were plotted with ggscatter function of the ggpubr package. These correlation analyses were run on R (version 3.6.2) [45].

Validation of FAPROTAX on Soil Bacteria
Bacterial taxa provided in a file called "FAPROTAX.txt", which is a database for FAPROTAX analysis (http://www.loucalab.com/archive/FAPROTAX/lib/php/index. php?section=Download), were randomly selected and habitat of the selected taxa was identified using references cited in the database. This action indicated that prokaryotic taxa and functional results in the FAPROTAX database have been obtained not only from aquatic samples but also from non-aquatic environments (an example is presented in Table S4) which lead to the promising application of FAPROTAX in soil samples.
Subsequently, four datasets, including agricultural-bulk soil, agricultural-rhizosphere soil, forest soil and mangrove soil, were used as examples to validate the suitability of FAPROTAX applied to soil samples. In this section, OTUs that were functionally assigned by FAPROTAX were used (Tables S2.1-S2.3). In detail, FAPROTAX assigned OTUs was first checked on the appearance of each taxon in soil habitat using peer-reviewed publications. If a taxon was previously reported in soil systems, we confirmed the appearance by using the word "Yes". On the other hand, if there was no record of a taxon in soil systems, we used the word "No" (Tables S2.1-S2.3: column named "Confirmation living in soil"). Secondly, functional performance in soil systems of the FAPROTAX assigned OTUs was manually checked using a similar procedure as FAPROTAX database (a taxon was assigned to certain functions if there was a literature report on a representative of the taxon that performed the functions). The functional performance was confirmed by using "Yes" when particular OTUs have been reported and performed the function in soil, whereas "No" was used for those with no record of functional performance in soil (Tables S2.1-S2.3: column named "Confirmation on functions in soil habitat"). The performance of FAPROTAX assignment in soil sample was indicated by number of FAPROTAX assigned OTUs that were both available and performed the assigned function in soil.

Additional Literature on Soil Functions
Bacterial taxa that were identified as a driver on phosphate solubilization, potassium solubilization and nitrogen fixation were provided (Table 1). These taxa and functions were overlooked from FAPROTAX database (file named "FAPROTAX.txt" available at FAPROTAX webpage). Subsequently, the advantage of the literature added was tested using soil samples from agricultural bulk soil and rhizosphere soil of T. pratense datasets. In detail, the nitrogen fixation function was manually assigned to a particular OTU in the datasets when that OTU was identified as one of the taxa in Table 1 (nitrogen fixation). Then, the number of manually assigned OTUs was counted and compared with that of OTUs assigned to nitrogen fixation by FAPROTAX.

Assignment Percentage of Aquatic and Non-Aquatic Samples Based on Previous Studies
We found no significant difference between assignment percentage (proportion of FAPROTAX assigned OTUs in total detected OTUs) of aquatic and non-aquatic data in datasets from peer-reviewed publications (Figure 1 and Table S3). The percentage of aquatic samples, including water from hot spring, lake, river, ocean and glacier, varied from 1.87% to 62.65%, while those from other ecosystems, including various soil samples and animal skins, varied from 10.21% to 65.30% (Figure 1).

Assignment Percentage of Aquatic and Non-Aquatic Samples Based on Previous Studies
We found no significant difference between assignment percentage (proportion of FAPROTAX assigned OTUs in total detected OTUs) of aquatic and non-aquatic data in datasets from peer-reviewed publications (Figure 1 and Table S3). The percentage of aquatic samples, including water from hot spring, lake, river, ocean and glacier, varied from 1.87% to 62.65%, while those from other ecosystems, including various soil samples and animal skins, varied from 10.21% to 65.30% (Figure 1).

General Overview of Bioinformatics and FAPROTAX Assignment of Data Derived from Soil Samples
A total of 196,366 reads (on average 19,637 ± 2355 reads per sample) of bacterial 16S rRNA gene were detected in agricultural bulk soil and rhizosphere soil, after quality filtering and chimeric sequence removal. For forest and mangrove soils, 119,433 reads (on

General Overview of Bioinformatics and FAPROTAX Assignment of Data Derived from Soil Samples
A total of 196,366 reads (on average 19,637 ± 2355 reads per sample) of bacterial 16S rRNA gene were detected in agricultural bulk soil and rhizosphere soil, after quality filtering and chimeric sequence removal. For forest and mangrove soils, 119,433 reads (on average 23,886 ± 1980 reads per sample) and 66,263 reads (on average 19,637 ± 2355 reads per sample [33]) were detected, respectively. After rare OTUs were removed, the sequences were normalized to the smallest read numbers per sample, which were 4951, 12,269 and 7086 [33] reads per sample for agricultural soils (bulk soil and rhizosphere soil), forest soil, and mangrove rhizosphere soil, respectively.
Different bacterial OTUs were obtained from agricultural bulk soil (3329), agricultural rhizosphere soil (3365), forest soil (2177) and mangrove rhizosphere soil of R. stylosa (2497). The proportion of OTUs that were identified to different taxonomic ranks and that of functional assignments were presented in Figure 2. Functional assignment capacities of agricultural soils (bulk soil and rhizosphere soil), forest soil and mangrove rhizosphere soil accounted for 28.24-28.42%, 12.48% and 15.86% (Figure 1) and the number of functions assigned to those samples were 34, 36, 37 and 52 functions, respectively ( Figure S1). Predominant functions across all samples belonged to chemoheterotrophy, followed by aerobic chemoheterotrophy (Figure 3). However, when we focused on more specific functions, the result showed differences in dominant functions involved in biogeochemical cycling derived from each ecosystem (Figure 3).
Appl. Sci. 2021, 11, x FOR PEER REVIEW 7 of 18 per sample [33]) were detected, respectively. After rare OTUs were removed, the sequences were normalized to the smallest read numbers per sample, which were 4951, 12,269 and 7086 [33] reads per sample for agricultural soils (bulk soil and rhizosphere soil), forest soil, and mangrove rhizosphere soil, respectively. Different bacterial OTUs were obtained from agricultural bulk soil (3329), agricultural rhizosphere soil (3365), forest soil (2177) and mangrove rhizosphere soil of R. stylosa (2497). The proportion of OTUs that were identified to different taxonomic ranks and that of functional assignments were presented in Figure 2. Functional assignment capacities of agricultural soils (bulk soil and rhizosphere soil), forest soil and mangrove rhizosphere soil accounted for 28.24%-28.42%, 12.48% and 15.86% (Figure 1) and the number of functions assigned to those samples were 34, 36, 37 and 52 functions, respectively ( Figure S1). Predominant functions across all samples belonged to chemoheterotrophy, followed by aerobic chemoheterotrophy (Figure 3). However, when we focused on more specific functions, the result showed differences in dominant functions involved in biogeochemical cycling derived from each ecosystem ( Figure 3).

Correlation between the Number of Functionally Assigned OTUs and Taxonomically Identified OTUs in Each Taxonomic Rank
Significant and positive correlations (σ = 0.90-0.95, p < 0.01, Figure 2c,d) were found between the number of functionally assigned OTUs and number of OTUs assigned at genus and order levels, whereas the correlation between that at phylum level was not statistically significant (σ = 0.11, p = 0.64, Figure 2e). Although strong correlations between the numbers of functionally assigned OTUs and numbers of OTUs assigned at genus and order levels are detected in this study, we noted that a small set of data (20 samples) was used for the correlation analysis. Thus, these correlation results should be interpreted carefully.

Accuracy of Functionally Assigned Bacterial and Archaeal OTUs Based on FAPROTAX in Soil Systems
The result showed that more than 97% of the FAPROTAX assigned OTUs have previously been detected and potentially performed the functions in agricultural (1081 out of 1098 OTUs) and forest soils (265 out of 272 OTUs). On the other hand, only 28.79% (114 out of 396 OTUS) of functionally assigned OTUs detected in Mangrove rhizosphere soil had record of appearance and assigned functional performance in soil. We found that several detected taxa, including but not limited to Demequina, Euzebya, Maribacter, Marinobacter, Muricauda, Desulfatitalea, Desulfopila, were found to potentially perform the assigned functions in an estuary, seawater, sediment, and other marine habitats (Table S2.3). However, it should be noted that mangrove is a unique habitat with a mixture of land and sea, and therefore some aquatic bacteria can possibly be detected.

Impact of Adding Reported Datasets to Functional Assignment Percentage
In total, 17 and 31 OTUs detected in the agricultural bulk and rhizosphere soil, respectively, were functionally assigned to nitrogen fixation by FAPROTAX, while the ad-

Correlation between the Number of Functionally Assigned OTUs and Taxonomically Identified OTUs in Each Taxonomic Rank
Significant and positive correlations (σ = 0.90-0.95, p < 0.01, Figure 2c,d) were found between the number of functionally assigned OTUs and number of OTUs assigned at genus and order levels, whereas the correlation between that at phylum level was not statistically significant (σ = 0.11, p = 0.64, Figure 2e). Although strong correlations between the numbers of functionally assigned OTUs and numbers of OTUs assigned at genus and order levels are detected in this study, we noted that a small set of data (20 samples) was used for the correlation analysis. Thus, these correlation results should be interpreted carefully.

Accuracy of Functionally Assigned Bacterial and Archaeal OTUs Based on FAPROTAX in Soil Systems
The result showed that more than 97% of the FAPROTAX assigned OTUs have previously been detected and potentially performed the functions in agricultural (1081 out of 1098 OTUs) and forest soils (265 out of 272 OTUs). On the other hand, only 28.79% (114 out of 396 OTUS) of functionally assigned OTUs detected in Mangrove rhizosphere soil had record of appearance and assigned functional performance in soil. We found that several detected taxa, including but not limited to Demequina, Euzebya, Maribacter, Marinobacter, Muricauda, Desulfatitalea, Desulfopila, were found to potentially perform the assigned functions in an estuary, seawater, sediment, and other marine habitats (Table S2.3). However, it should be noted that mangrove is a unique habitat with a mixture of land and sea, and therefore some aquatic bacteria can possibly be detected.

Impact of Adding Reported Datasets to Functional Assignment Percentage
In total, 17 and 31 OTUs detected in the agricultural bulk and rhizosphere soil, respectively, were functionally assigned to nitrogen fixation by FAPROTAX, while the additions of 53 and 59 OTUs were assigned the function by a manual search based on reference given in Table 1. Several taxa that could potentially perform nitrogen fixation were assigned in addition to Rhizobium gallicum, the only taxon that was assigned by FAPROTAX (Table 2).

Discussion
Keeping possible biases inherent to molecular technique and next-generation sequencing (NGS) in mind [9] (i.e., PCR, short-read sequences, pan-genome concept of bacterial evolution), many works demonstrated that NGS can be successfully used to improve our understanding of bacterial taxonomic structure and functional profile across aquatic [9,[17][18][19][20][21] and terrestrial ecosystems [22][23][24][25]. One of the most commonly used NGS techniques (Illumina Miseq) offers paired-end reads, which increase almost two times of the non-merging read length (i.e., 580 bp for MiSeq reads of 300 bp with a 20 bp minimal overlap) [52]. In this study, we demonstrated that FAPROTAX was able to assign functions to prokaryotic taxa derived from both aquatic and terrestrial sources, especially soils. Even though aquatic samples tend to gain higher assignment than those of terrestrial samples, no significant difference was found on assignment percentage between different sample sources based on a limited number of publications (Table S3).
Several previous studies used this tool to predict the functions of bacterial/archaeal communities residing in not only aquatic habitats but also in soil samples. For example, FAPROTAX was used to assess the impact of microbial inoculation and fertilizer application on soil bacterial functions involved in for C and N cycles [53][54][55]. Specifically, Gao et al. [53] showed stimulation of denitrification and nitrification in soil after Spartina alterniflora invasion, as well as Li et al. [54] revealed a significant effect of straw incorporation and nitrogen fertilization on hydrocarbon degradation and nitrogen fixation. Similarly, Wang et al. [55] showed an increase of aerobic nitrite oxidation in soil inoculated with multi-species inoculants. Moreover, FAPROTAX was utilized in several studies that aimed to compare the effect between different treatments or a response of site managements [15,28,[56][57][58][59]. For instance, it was used to investigate the effects of bacterial functions after grazing prohibition and in plantation and natural forest [15,28,56]. The increase in denitrification and nitrification functions was found after soil tillage and forest-to-agriculture conversion [57,58]. In addition to soil samples, FAPROTAX has also been helpful in predicting bacterial functions of animals and human microbiome. For examples, FAPROTAX was used to compare the functional richness of bacteria in gallbladder and gut between young and adult rabbits [60], infer biogeochemical processes, especially in nitrogen and manganese cycling occurring on human and mammalian skin [32] and show the impact of environmental factors on the functional diversity of bacteria in frog skin [14]. Although FAPROTAX is unable to reveal functional phenotypes of all taxa in the community, previous studies have shown that it was a helpful tool to highlight functions related to biogeochemical dynamics, especially on N and C cycle in non-aquatic samples and to compare functional profiles among treatments.
Our results also showed that sample source was not a primary factor that limited the application of FAPROTAX, especially in soil samples. A strong positive and significant correlation between the number of functional assigned OTUs and the number of OTUs identified to the order and genus levels, but not with those assigned to phylum level implies that poor taxonomic identification was an important factor that limited FAPROTAX functional assignment in soil samples. Since certain functions are conserved at low taxonomic levels (species, genus or order), OTUs identified only at a phylum level usually cannot be assigned to any function. The study cases on agricultural and forest soil samples reported that more than 97% of functionally assigned OTUs have previously been reported to exist (previously reported on both appearance and functional performance) in soil samples (Tables S2.1-S2.3). We found several studies that confirmed the presence of functionally assigned taxa in soil bacterial communities. For example, taxa related to C-cycle such as Nocardioides, which was previously isolated from soil, was assigned to aromatic compound degradation and confirmed the ability to degrade hexachlorobenzene [61]. Rhodococcus species was also assigned to hydrocarbons degradation and was found to have the ability to utilize various hydrocarbons groups in soil [62]. Similarly, taxa involved in nitrogen cycles, such as Rhizobium gallicum (nitrogen fixation) and Micromonospora aurantiaca (nitrate reduction and cellulolysis), were also found in soil [63,64] (see Tables S2.1-S2.3 for more information). These lines of evidence confirmed that some taxa available in the FAPROTAX database are generally dispersed across different ecosystems, including soil, even though they were originally generated from aquatic samples. Moreover, we showed that bacterial taxa and functional prediction presented in the FAPROTAX database were not only derived from aquatic samples but also from soil, plants and animals (Table S4). These results support our hypothesis that FAPROTAX was compatible with terrestrial ecosystems. On the other hand, lower portion of soil-detected taxa but high portion of marine-detected taxa in mangrove rhizosphere soil was not surprising results because mangrove land is located at the boundary between terrestrial and aquatic ecosystems. Therefore, it is possible to detect both marine and soil bacterial taxa in the area [65]. More importantly, we confirmed that a number of bacterial taxa found in mangrove rhizosphere soil were also found in soil system. Furthermore, we demonstrated that the addition of references on bacterial taxa capable of a particular function could significantly increase the efficiency of the functional assignment. This study showed that adding more references to nitrogen fixation function can increase the number of OTUs assigned to the function by 2-3 times, accounting for a 2% increase in total functional assignment percentage. The absence of the nitrogen fixation function may be due to FAPROTAX being created based on aquatic samples [9]; thus, some terrestrial specialized taxa were neglected from the database. Furthermore, bulk and rhizosphere soils of red clover (Trifolium pratense) are known to colonize by diverse N-fixing bacteria [66]. Consequently, using relevant and up-to-date references of current status of Nfixing bacterial legume symbiosis can strongly increase the number of functional assigned bacterial taxa. Specifically, many important N-fixing bacterial genera, for examples Ensifer, Bradyrhizobium, Microvirga, Mesorhizobium, Devosia, etc., are not included in FAPROTAX as N-fixing bacteria. Therefore, FAPROTAX database still needs to be extended to cover other relevant functions in soils. This study provided soil bacterial taxa capable of nitrogen fixation, phosphate and potassium solubilization, so further study on soil sample could make the most of our work.
Although this study has demonstrated that FAPROTAX was compatible with soil samples, the bias of this tool should be kept in mind. The FAPROTAX has an assumption that if all cultured members of a taxon can perform a function, all members of that taxon (cultured and uncultured) can perform that function. Moreover, since soil ecosystem contains several biogeochemical cycles as well as diverse prokaryotic taxa, it is concerning that FAPROTAX may have low performance (low functional assignment percentage) for certain soil samples. We recommend that further work should add more soil-specific functions and prokaryotic taxa to optimize the performance of this tool. On the other hand, other alternative tools can be applied for bacterial functional annotation in soil, such as PICRUSt [11,12] or Tax4Fun [8]. Notice should be taken that each tool provides different aspects of bacterial functional phenotypes. FAPROTAX presents functional phenotypes as metabolic and ecologically relevant functions which are predicted based on the literature of cultured taxa (i.e., nitrogen fixation, nitrification, hydrocarbon degradation, chitinolysis, cellulolysis, etc.), whereas PICRUSt or Tax4Fun present functions as phenotypes of gene families or enzyme activities which are predicted based on gene content. We simulate results from Tax4Fun to show the functional phenotype of bacteria associated with the rhizosphere soil samples of Trifolium pratense used in this study (Appendix A). The simulated results show various potential enzymes that may be relevant to many soil functions ( Figure A2). In our opinion, each functional annotation tool provides information on different aspects of bacterial functions. Selection of such tools for a particular study should depend on its purposes. If a study focuses on key bacterial functions important for biogeochemical cycling as well as microbemicrobe, plant-microbe and animal-microbe interactions, FAPROTAX can be a suitable choice. On the other hand, if a study targets at changes of gene expression or potential enzyme activity, other tools should be considered, such as PICRUSt or Tax4Fun.
In conclusion, this study presented that FAPROTAX database can be effectively used to predict function of bacteria in soil samples. Even though the database cannot predict function of all detected taxa, it can be beneficial for fast-functional screening or grouping of 16S derived bacterial data from any ecosystem. We further suggested that additional datasets of both the taxonomy and functional references could improve the FAPROTAX database and thereby increase the number of functionally assigned OTUs derived from 16S rRNA.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Functional Phenotypes Derived from Tax4Fun and FAPROTAX. A Simulate Result from Bacterial Communities of Agricultural Rhizosphere of Trifolium pretense
A dataset (OTU table) consisted of bacterial communities in the rhizosphere of Trifolium pratense was selected to simulate the result derived from Tax4Fun. In detail, the Tax4Fun [8] R package, which employs 16S rRNA gene-based taxonomic information, and the Kyoto Encyclopedia of Genes and Genomes (KEGG) database were used to predict the metabolic functional attributes of bacterial communities in the rhizosphere of T. pratense. Tax4Fun converted the SILVA-labelled OTUs into prokaryotic KEGG organisms and normalized these predictions using the 16S rRNA copy number (obtained from the National Center for Biotechnology Information genome annotations). Furthermore, FAPROTAX was also used to predicted bacterial function of the same dataset following the FAPROTAX's instruction (http://www.loucalab.com/archive/FAPROTAX/) [9].
A total of 36 functions were derived from FAPROTAX ( Figure A1). FAPROTAX showed ecologically relevant functions which concern biogeological cycles, such as nitrification, cellulolysis and hydrocarbon degradation, and the interaction of microbe to plants/animals, such as plant pathogen or invertebrate parasites ( Figure A1). On the other hand, Tax4Fun provided a total of 6338 functions which included functions that involved in soil biogeochemical cycles, such as chitinase, beta-glucosidase or acid phosphatase, and not involved in soil system, such as sn-glycerol 3-phosphate transport system ATP-binding protein, queuine tRNA-ribosyltransferase, and DNA-directed RNA polymerase. In order to use Tax4Fun, the researcher might have an overview of functions required for their work, then sort only interesting functions to be investigated. For example, in this case, we gathered 30 enzymes involved in soil system (Table A1) and show the functional profile presented in Figure A2.
However, we believe that both functional annotation tool contributes most benefit, but different aspect for each work. The selection of these tools should depend on the purpose of each study. If a study focuses on the biogeochemical cycle or the interaction of microbe to plants/animals, FAPROTAX would be the best alternatives. Moreover, FAPRROTAX is quite easy to follow as it provides plain functional phenotypes. On the other hand, if a study pays more attention to enzyme activity, Tax4Fun should be a great choice.     Table A1. Tax4Func functions involved in soil system and its description.

Tax4Fun Output
Function's Description