Ecological Trait-Based Digital Categorization of Microbial Genomes for Denitrification Potential

Microorganisms encode proteins that function in the transformations of useful and harmful nitrogenous compounds in the global nitrogen cycle. The major transformations in the nitrogen cycle are nitrogen fixation, nitrification, denitrification, anaerobic ammonium oxidation, and ammonification. The focus of this report is the complex biogeochemical process of denitrification, which, in the complete form, consists of a series of four enzyme-catalyzed reduction reactions that transforms nitrate to nitrogen gas. Denitrification is a microbial strain-level ecological trait (characteristic), and denitrification potential (functional performance) can be inferred from trait rules that rely on the presence or absence of genes for denitrifying enzymes in microbial genomes. Despite the global significance of denitrification and associated large-scale genomic and scholarly data sources, there is lack of datasets and interactive computational tools for investigating microbial genomes according to denitrification trait rules. Therefore, our goal is to categorize archaeal and bacterial genomes by denitrification potential based on denitrification traits defined by rules of enzyme involvement in the denitrification reduction steps. We report the integration of datasets on genome, taxonomic lineage, ecosystem, and denitrifying enzymes to provide data investigations context for the denitrification potential of microbial strains. We constructed an ecosystem and taxonomic annotated denitrification potential dataset of 62,624 microbial genomes (866 archaea and 61,758 bacteria) that encode at least one of the twelve denitrifying enzymes in the four-step canonical denitrification pathway. Our four-digit binary-coding scheme categorized the microbial genomes to one of sixteen denitrification traits including complete denitrification traits assigned to 3280 genomes from 260 bacteria genera. The bacterial strains with complete denitrification potential pattern included Arcobacteraceae strains isolated or detected in diverse ecosystems including aquatic, human, plant, and Mollusca (shellfish). The dataset on microbial denitrification potential and associated interactive data investigations tools can serve as research resources for understanding the biochemical, molecular, and physiological aspects of microbial denitrification, among others. The microbial denitrification data resources produced in our research can also be useful for identifying microbial strains for synthetic denitrifying communities.


Introduction
Microorganisms encode proteins that function in the transformations of useful and harmful nitrogenous compounds in the global nitrogen cycle [1,2].The major transformations in the nitrogen cycle are nitrogen fixation, nitrification, denitrification, anaerobic ammonium oxidation (anammox), and ammonification [3,4].Nitrogen cycling is central to ecosystem functioning including by microbial sources and the sink of nitrogenous compounds [5][6][7].This report focuses on the complex biogeochemical process of denitrification which, in the complete form, consists of a series of four enzyme-catalyzed reduction reactions that transform nitrate to nitrogen gas [4,8].Enwall et al. [9] described denitrification as "an alternative pathway for microorganisms to respire under oxygen-limited conditions, using nitrogen oxides as electron acceptors".Denitrification is a strain-level trait, and denitrification potential (functional performance) can be inferred from trait rules that rely on the presence or absence of genes for denitrifying enzymes such a nitrous oxide reductase [10,11].Digital categorization of biological knowledge (e.g., denitrification) with representations such as trait rules, ontologies, and controlled vocabularies support knowledge sharing and discovery across biological domains [12,13].Microbial genome web portals provide large-scale taxonomic-strain level datasets that include annotations of enzymes encoded by microbial genomes [14].For example, the Integrated Microbial Genomes & Microbiomes (IMG/M) system provides tools to retrieve lists of microbial genomes with specific functional annotation entries such as Enzyme Commission (E.C.) number, Clusters of Orthologous Genes (COG), Kyoto Encyclopedia of Genes and Genomes (KEGG) Orthology, and Pfam: protein families and domains [15].Furthermore, researchers can download datasets of interest such as genomes or genes annotated with specific KEGG or COG denitrification entries [16].In our prior research [17], we synthesized downloaded datasets of genomes using binary number representations to categorize the genomes.Karaoz and Brodie [10] in a journal article titled "MicroTrait: a toolset for trait-based representation of microbial genomes", recommended the need for new data synthesis approaches for microbial trait datasets, where data from microbial genomes are at the core of investigating the environmental roles and functional performance of microorganisms.The 16 categories of denitrification potential including complete denitrification were defined by Karaoz and Brodie [10] (Appendix A, Figure A1).
A widely used bioinformatics tool for predictive functional profiling of 16S rRNA gene amplicon sequencing from environmental samples is Phylogenetic Investigation of Communities by Reconstruction of Unobserved states (PICRUSt2) [24,25].PICRUSt2 prediction of function relies on similarities to genome annotations in IMG/M [15].In addition, studies have used PICRUSt2 to predict nitrogen-cycling pathways of microbial communities in an ecosystem [26][27][28][29].A limitation of PICRUSt2 is its inability to distinguish strain-level functionality [24,28].In addition, the ecological trait relevance (e.g., denitrification potential) of the list of KEGG orthologues predicted by PICRUSt2 requires interpretation.Thus, there is an opportunity to provide datasets and interactive data investigation resources that contain strain-level categorization of denitrification potential according to the 16 possible categories of Karaoz and Brodie [10].Interactive computational resources designed with general purpose software (e.g., spreadsheet and visual analytics) for investigating denitrification potential datasets can be also be useful for the interpretation of denitrification traits predicted by tools such as PICRUSt2, PAPRICA (PAthway PRediction by phylogenetIC plAcement) [30], and Tax4Fun2 [31].
Despite the global significance of denitrification [2,32] and associated large-scale genomic and scholarly data sources, there is lack of datasets and interactive computational tools for investigating microbial genomes according to denitrification trait rules.Therefore, our goal is to categorize archaeal and bacterial genomes by denitrification potential based on denitrification traits defined by rules of enzyme involvement in the denitrification reduction steps [5,10,11].This goal is important because denitrification is a taxonomic-strain level trait and because the denitrification traits of a newly sequenced microbial genome are useful to designing research on the biochemical, molecular, and physiological aspects of microbial denitrification among others.The abundance of genome sequences of microbial strains has led to a growing interest in synthetic microbial communities (SynComs) or consortia (e.g., synthetic denitrifying communities) for biotechnological, bioengineering, and ecosystem function applications [7,33,34].A critical initial stage in the design of optimal synthetic microbial communities is identifying microbial strains that constitute the microbial community [35].Thus, researchers will benefit from datasets and easy-to-use computational tools for identifying strains for the optimal design of a synthetic microbial community.Therefore, the first objective of our microbial denitrification data investigations was to construct a microbial denitrification potential dataset containing archaeal and bacterial genomes annotated with at least one of the twelve KEGG denitrification enzyme entries.The second objective was to develop interactive computational resources with spreadsheet software and visual analytics software to support investigations of the microbial denitrification dataset.
Since marine invertebrates such as clams, mussels, and oysters can be hosts to microorganisms that contribute to denitrification [36,37], we compiled patterns of presence or absence of denitrifying enzyme genes and the denitrification potential for bacteria genera associated with the Eastern oyster (Crassostrea virginica).One reason for the interest in the denitrification potential of bacteria associated with the Eastern oyster is that oyster aquaculture is associated with low greenhouse gas emissions [38].Metagenomics sequencing technologies have aided in the identification of archaea and bacteria associated with oyster anatomical parts including the gills, gut, hemolymph, mantle, pallial fluid, stomach, and shell [37,[39][40][41][42][43].In studies with a focus on the denitrification potential of the oyster microbiome [37,42], the bacteria genera identified include Clostridium, Endozoicomonas, Erythrobacter, Mycoplasma, Neptunibacter, Pleurocapsa, Psychrobacter, Pseudomonas, Pseudoalteromonas, Shewanella, Synechococcus, and Vibrio.
We report an ecosystem and taxonomic annotated dataset of 62,624 microbial genomes (866 archaea and 61,758 bacteria) that encode at least one of the twelve denitrifying enzymes in the four-step canonical denitrification pathway.The bacterial strains with complete denitrification potential pattern included Arcobacteraceae strains isolated or detected in diverse ecosystems including aquatic, human, plant, and Mollusca (shellfish).In addition, we developed a set of accessible and easy-to-use data-investigation interfaces (spreadsheet and visual analytics) to support human interaction with the microbial denitrification dataset.The visual analytics interface also includes searches for gene symbols of the denitrification enzymes in scholarly databases.The microbial denitrification data resources can be used for identifying microbial strains for synthetic denitrifying communities (SDCs).

Construction of a Denitrification Potential of Archaeal and Bacterial Genomes Dataset
The construction of the dataset followed the formats used for trait datasets such as wide table and long table [12].The archaeal and bacterial genomes (strains) with the genes annotated with each of the 12 KEGG Orthology (KO) terms were retrieved in the Integrated Microbial Genomes and Microbiomes (IMG/M) data management and analysis system [15].We used the uniform resource locator (URL) script that finds genomes with the denitrification KEGG Orthology (KO) term (for example, K00376 for nitrous-oxide reductase, nosZ).An alternative approach was to use the IMG/M Find Genes interface to retrieve the genomes with the KO term.Each of the retrieved datasets contained columns for domain (taxonomic domain), status (sequencing status), genome ID (identifier assigned by IMG/M), and genome name (Figure 1).As the data-investigation-ready approach for datasets collected -from microbial genome web portals [14], we used binary numbers (0 and 1) of varying length of the binary number to synthesize the availability of data on functional annotations in microbial genomes [17].We also adapted an approach that use binary numbers to synthesize the direction of gene arrangements assigned to genes in microbial genomes [46].An additional 4-digit binary number ("Denitrification Pattern") from the 12-digit "Denitrifying Enzymes Pattern" using the rules for de as described by Karaoz and Brodie [10] (Appendix A, Figure A1).For ex denitrification trait or potential (denitrification pattern 1111) will involv the appropriate combinations of enzymes to catalyze each of the four st tion.
We used the IMG/M Find Genomes tools to retrieve relevant fields nomic and ecosystem interpretation as well as research advances such synthetic denitrifying communities using the denitrification potential da ries of data fields are (1) Genome Database Taxonomy (GTDB) Toolkit (G Each dataset was uploaded into Tableau, a visual analytics software [47], and a column "Dataset" with identical value for all the records was added as a calculated field.For example, we added "01_narG" and "12_nosZ" to the genome list for narG and nosZ, respectively (Figure 2).We added this additional column to construct the data-investigationready dataset consisting of the 12 datasets of genomes.We downloaded the labeled datasets and then combined them in Tableau using the "Append Data from File" feature.

Figure 2.
A screenshot of the design of a visual analytics resource for constructing a dataset of microbial genomes from the dataset retrieved from the bioinformatics resource (IMG/M).The example shown is for nitrous oxide reductase with KEGG Orthology identifier K00376.The filters in the design allow for the display of a dataset with options for taxonomic domain and genome sequencing status.

Designs and Implementations of Visual Analytics Resources to Support Human Interaction with the Dataset on the Denitrification Potential of Microbial Genomes
We designed spreadsheet and visual analytics worksheets to include filters and other interaction techniques for interaction with the data in the worksheets.The interaction techniques can support the performance of complex cognitive activities, which are information intensive and involve complex human cognition (mental processes) [48,49].A catalog of 32 interaction techniques that support the performance of complex cognitive activities (such as knowledge discovery, problem-solving, decision-making) [48] guided the designs and implementation of the visual analytics resources (worksheets and dashboards) in Tableau [47].The overall design of the visual analytics resource for interacting with the dataset is an enclosure table view that groups the genome names according to a 4-digit and a 12-digit binary number.Each row in the view also has a shape mark to indicate the genome sequencing status.We included filtering and searching interaction techniques in our design to help us identify a subset of the dataset to perform complex cognitive activities.In this project, the design of a core visual analytics resource allows for the querying of the dataset using the columns such as those for genome name, genome ID, denitrification pattern, denitrification traits, and denitrifying enzymes pattern.An additional feature of the design is the uniform resource locator (URL) action that provides a hyperlink to a web page of Google Scholar, a web search engine for scholarly literature and academic resources.

Denitrification Potential Categorization of Bacterial Genera Associated with Eastern Oyster (Crassostrea virginica)
Since oysters are filter feeders and since the gills are the filtering tissue in constant contact with the surrounding water [37,50,51], we designed visual analytics worksheets to categorize according to denitrification potential for a set of bacteria genera (Arcobacter, Bradyrhizobium, Caulobacter, Marinifilum, Pelomonas, Pseudoalteromonas, Pseudomonas, Psychrobacter, and Sphingomonas) associated with oyster gills [52].We constructed an integrated dataset from the 12 datasets in the visual analytics software.The columns in the dataset included those for genome ID and genome name; 12 columns for the denitrification KEGG Orthology with entries of "0" (absence of KO in genome) or "1" (presence of KO in genome); and a column that joined all the KO binary digits to form a 12-digit binary number, which we termed "Denitrifying Enzymes Pattern".The order of the digits in the binary number reflects the enzymes in the four steps of denitrification: (1) nitrate reductases (narG, narH, narI, napA, napB); (2) nitrite reductase (nirK, nirS); (3) nitric oxide reductase (norB, norC, norV, norW); and (4) nitrous oxide reductase (nosZ).Therefore, the 12th digit is for the presence or absence of the gene for nitrous oxide reductase in a microbial genome.
An additional 4-digit binary number ("Denitrification Pattern") was constructed from the 12-digit "Denitrifying Enzymes Pattern" using the rules for denitrification traits as described by Karaoz and Brodie [10] (Appendix A, Figure A1).For example, complete denitrification trait or potential (denitrification pattern 1111) will involve the presence of the appropriate combinations of enzymes to catalyze each of the four steps of denitrification.
We used the IMG/M Find Genomes tools to retrieve relevant fields to facilitate taxonomic and ecosystem interpretation as well as research advances such as the design of synthetic denitrifying communities using the denitrification potential dataset.The categories of data fields are (1) Genome Database Taxonomy (GTDB) Toolkit (GTDB-Tk Domain, GTDB-Tk Family, GTDB-Tk Genus, GTDB-Tk Order, GTDB-Tk Phylum, GTDB-Tk Species) and (2) ecosystem classes (Ecosystem, Ecosystem Category, Ecosystem Type, Ecosystem Subtype, and Specific Ecosystem).We also derived a column "Genus" from the "Genome Name" column by extracting the text before the space in the "Genome Name" column.For example, "Pseudomonas aeruginosa PAO1" would be "Pseudomonas".

Designs and Implementations of Visual Analytics Resources to Support Human Interaction with the Dataset on the Denitrification Potential of Microbial Genomes
We designed spreadsheet and visual analytics worksheets to include filters and other interaction techniques for interaction with the data in the worksheets.The interaction techniques can support the performance of complex cognitive activities, which are information intensive and involve complex human cognition (mental processes) [48,49].A catalog of 32 interaction techniques that support the performance of complex cognitive activities (such as knowledge discovery, problem-solving, decision-making) [48] guided the designs and implementation of the visual analytics resources (worksheets and dashboards) in Tableau [47].The overall design of the visual analytics resource for interacting with the dataset is an enclosure table view that groups the genome names according to a 4-digit and a 12-digit binary number.Each row in the view also has a shape mark to indicate the genome sequencing status.We included filtering and searching interaction techniques in our design to help us identify a subset of the dataset to perform complex cognitive activities.In this project, the design of a core visual analytics resource allows for the querying of the dataset using the columns such as those for genome name, genome ID, denitrification pattern, denitrification traits, and denitrifying enzymes pattern.An additional feature of the design is the uniform resource locator (URL) action that provides a hyperlink to a web page of Google Scholar, a web search engine for scholarly literature and academic resources.

Denitrification Potential Categorization of Bacterial Genera Associated with Eastern Oyster (Crassostrea virginica)
Since oysters are filter feeders and since the gills are the filtering tissue in constant contact with the surrounding water [37,50,51], we designed visual analytics worksheets to categorize according to denitrification potential for a set of bacteria genera (Arcobacter, Bradyrhizobium, Caulobacter, Marinifilum, Pelomonas, Pseudoalteromonas, Pseudomonas, Psychrobacter, and Sphingomonas) associated with oyster gills [52].

Dataset on the Denitrification Potential of Microbial Genomes
The dataset on the denitrification potential of 62,624 microbial genomes (866 archaea and 61,758 bacteria) consisted of 36 variables (columns) from genome annotations and denitrification annotations (Table 1).The Genome ID from the IMG/M system was the unique identifier for each genome.We calculated/derived the denitrification annotations categories (denitrification potential and denitrifying enzymes) from the input datasets retrieved from the IMG/M system (Table 2).In the dataset, the gene for the nitrous oxide reductase (nosZ), the enzyme for the last step of denitrification, was present in 181 archaea and 8009 bacteria genomes (Table 2).There were at least 100 archaeal and 2000 bacteria genera as well as 484 twelve-digit denitrification patterns in the dataset.We observed 1021 strains with two, three, four, or five genome sequences.The four strains with five genome sequences in the microbial denitrification potential dataset were the following: (1) Brucella melitensis bv. 1 16 M; (2) Corynebacterium aurimucosum CN-1, ATCC 700975; (3) Escherichia coli EC2; and (4) Pseudomonas aeruginosa DSM 50071.The Supplementary Materials and Data Availability sections of this report provide details on how to access the denitrification potential dataset.The distribution of the 16 denitrification patterns and associated denitrification traits in ecosystems for archaeal and bacterial genomes revealed the potential for complete denitrification by 3280 bacterial genomes (Figure 3).The denitrification-potential dataset contained five IMG/M ecosystem annotations (engineered, environmental, host-associated, mixed, and mixed, environmental) assigned to 37,407 of the 62,624 genomes.We verified that the 8190 genomes with the nitrous oxide reduction trait (nosZ) were associated with denitrification patterns 0001 (1079 genomes), 0011 (92 genomes), 0101 (1069 genomes), 0111 (496 genomes), 1001 (756 genomes), 1011 (196 genomes), 1101 (1222), and 1111 (3280 genomes).The 179 genomes annotated with the Mollusca ecosystem category included 31 genomes with an ecosystem type annotation of oyster (Figure 4).

K12265
nitric oxide reductase FlRd-NAD(+) reductase norW 14,224 14,224 K00376 nitrous-oxide reductase nosZ 181 8009 8190 1 The Kyoto Encyclopedia of Genes and Genomes (KEGG) database was the source of the identifiers. 2The list of enzyme genes was obtained from the denitrification trait rules that are based on the presence or absence of a protein family in a microbial genome [10]. 3Data were retrieved from the Integrated Microbial Genomes and Microbiomes (IMG/M) system in November 2023.

Designs and Implementations of Visual Analytics Resources to Support Interaction with the Dataset on the Denitrification Potential of Microbial Genomes
We designed and implemented several visual analytics worksheets and dashboards to support the performance of investigation, knowledge discovery, decision-making, and other complex cognitive activities on the denitrification potential of microbial genomes dataset.A visual analytics worksheet (Figure 5) design allows for interaction with the denitrification potential dataset using the columns in the genome, ecosystem, and denitrification potential category (Table 1).Based on the taxonomic description in valid publications of microbial strains, microorganisms with "denitrificans", meaning denitrifying, can provide a subset of genomes with evidence for denitrification enzymes (for example "reduces nitrate to nitrogen" as in the description of Sulfuricella denitrificans skB26 [53]).The constructed dataset contains 2 archaea and 116 bacteria genomes with the genome name containing "denitrificans" (meaning denitrifying) assigned to 13 denitrification traits.As shown in Figure 5, 20 genome names were displayed when the interaction filters were (1) environmental ecosystem, (2) a genome name that contained "denitrificans", and (3) a denitrification pattern for complete denitrification of "1111".The "Denitrifying Enzymes Pattern" for Marinobacter denitrificans JB02H27 [54] of "111111111001" lacks the genes for anaerobic nitric oxide reductase flavorubredoxin (norV) and nitric oxide reductase FlRd-NAD(+) reductase (norW).Other genera that have species with "denitrificans" in species

Designs and Implementations of Visual Analytics Resources to Support Interaction with the Dataset on the Denitrification Potential of Microbial Genomes
We designed and implemented several visual analytics worksheets and dashboards to support the performance of investigation, knowledge discovery, decision-making, and other complex cognitive activities on the denitrification potential of microbial genomes dataset.A visual analytics worksheet (Figure 5) design allows for interaction with the denitrification potential dataset using the columns in the genome, ecosystem, and denitrification potential category (Table 1).Based on the taxonomic description in valid publications of microbial strains, microorganisms with "denitrificans", meaning denitrifying, can provide a subset of genomes with evidence for denitrification enzymes (for example "reduces nitrate to nitrogen" as in the description of Sulfuricella denitrificans skB26 [53]).The constructed dataset contains 2 archaea and 116 bacteria genomes with the genome name containing "denitrificans" (meaning denitrifying) assigned to 13 denitrification traits.As shown in Figure 5, 20 genome names were displayed when the interaction filters were (1) environmental ecosystem, (2) a genome name that contained "denitrificans", and (3) a denitrification pattern for complete denitrification of "1111".The "Denitrifying Enzymes Pattern" for Marinobacter denitrificans JB02H27 [54] of "111111111001" lacks the genes for anaerobic nitric oxide reductase flavorubredoxin (norV) and nitric oxide reductase FlRd-NAD(+) reductase (norW).Other genera that have species with "denitrificans" in species name are Aquitalea, Halomonas, Halospina, Hyphomicrobium, Nisaea, Noviherbaspirillum, Paracoccus, Pseudoalteromonas, Pseudovibrio, Roseobacter, Shewanella, Sulfuricella, Thioalkalivibrio, Thioalbus, and Thiobacillus.We observed shared 12-digit presence or absence patterns of denitrifying enzymes by genomes.For example, the digital categorization process assigned pattern "000111011001" to genomes of Nisaea denitrificans DSM 18348 and Shewanella denitrificans OS217 (Figure 5).Another visual analytics design emphasized the filtering of the dataset by taxonomic classifications.In Figure 6, the view displayed is for filtering the dataset by Roseibium GTDB-Tk Genus.We filtered the dataset by Roseibium as we observed the annotation of Roseibium genomes with oyster host-associated ecosystem (Figure 4).The view produced by the interaction contains three Denitrifying Enzymes Patterns (000000111001, 000001011001 and 000110111001), and two types of Denitrification Traits: (1) Nitrite, Nitric Oxide and Nitrous Oxide Reduction only and (2) Complete Denitrification.A screenshot of a visual analytics resource to support interaction with the dataset on denitrification potential of archaeal and bacterial genomes with an emphasis on filtering by ecosystem options.The interaction worksheet provides options and links to external resources (IMG/M website, Google Search and Google Scholar).The insert box on the left was obtained from clicking the sequencing status symbol associated with Marionobacter denitrificans JB02H27, a bacteria isolated from marine sediment and known to reduce nitrite and nitrate to gaseous nitrogen [54].The webpage link to the interactive version of the visual analytics resource is available in the Supplementary Materials section.
Another visual analytics design emphasized the filtering of the dataset by taxonomic classifications.In Figure 6, the view displayed is for filtering the dataset by Roseibium GTDB-Tk Genus.We filtered the dataset by Roseibium as we observed the annotation of Roseibium genomes with oyster host-associated ecosystem (Figure 4).The view produced by the interaction contains three Denitrifying Enzymes Patterns (000000111001, 000001011001 and 000110111001), and two types of Denitrification Traits: (1) Nitrite, Nitric Oxide and Nitrous Oxide Reduction only and (2) Complete Denitrification.Figure 6.A screenshot of a visual analytics resource to support human interaction with the dataset on denitrification potential of archaeal and bacterial genomes with emphasis on filtering by taxonomic options.The interaction worksheet provides options as well as connection to external resources (IMG/M website, Google Search and Google Scholar).The insert image with GTDB-Tk taxonomic assignments was obtained by clicking the sequencing status symbol associated with Roseibium aestuarii SYSU M00256-3, a bacteria isolated from an estuary and known to be unable to reduce nitrate [55].The webpage link to the interactive version of the visual analytics resource is available in the Supplementary Materials section.

Denitrification Potential Categorization of Bacterial Genera Associated with Eastern Oyster (Crassostrea virginica)
The nine bacteria genera associated with the gill tissue of the Eastern oyster whose strains were categorized by patterns of denitrification potential are Arcobacter, Bradyrhizobium, Caulobacter, Marinifilum, Pelomonas, Pseudoalteromonas, Pseudomonas, Psychrobacter, and Sphingomonas.We determined, in three stages of visual analytics views, the distribution of denitrification potential patterns for 2603 genomes from the nine bacteria genera (Figure 7).Our categorization method assigned a complete denitrification pattern to 1331 genomes from four genera (Arcobacter, Bradyrhizobium, Pseudoalteromonas, and Pseudomonas).Furthermore, the following six genomes were assigned to the Mollusca (shellfish) ecosystem category: Arcobacter ellisii LMG 26155, Arcobacter ellisii CECT 7837, Arcobacter venerupis CECT7836, Arcobacter sp.LA11, Pseudomonas alcaligenes OT 69, and Psychrobacter sp.C 20.9.Only the Arcobacter sp.LA11 genome had the complete denitrification trait (Figure 7) with a denitrifying enzymes pattern of "000110111001" (presence in genome of napA, napB, nirS, norB, norC, and nosZ).Arcobacter sp.LA11, which was isolated from the gut of the abalone Haliotis discus, has the complete repertoire genes for nitrogen fixation and denitrification [56].Pseudoalteromonas denitrificans DSM 6059, a denitrifying marine bacterium [57], has the same denitrifying enzymes pattern as Arcobacter sp.LA11 (Figure 7c).The interaction worksheet provides options as well as connection to external resources (IMG/M website, Google Search and Google Scholar).The insert image with GTDB-Tk taxonomic assignments was obtained by clicking the sequencing status symbol associated with Roseibium aestuarii SYSU M00256-3, a bacteria isolated from an estuary and known to be unable to reduce nitrate [55].The webpage link to the interactive version of the visual analytics resource is available in the Supplementary Materials section.

Denitrification Potential Categorization of Bacterial Genera Associated with Eastern Oyster (Crassostrea virginica)
The nine bacteria genera associated with the gill tissue of the Eastern oyster whose strains were categorized by patterns of denitrification potential are Arcobacter, Bradyrhizobium, Caulobacter, Marinifilum, Pelomonas, Pseudoalteromonas, Pseudomonas, Psychrobacter, and Sphingomonas.We determined, in three stages of visual analytics views, the distribution of denitrification potential patterns for 2603 genomes from the nine bacteria genera (Figure 7).Our categorization method assigned a complete denitrification pattern to 1331 genomes from four genera (Arcobacter, Bradyrhizobium, Pseudoalteromonas, and Pseudomonas).Furthermore, the following six genomes were assigned to the Mollusca (shellfish) ecosystem category: Arcobacter ellisii LMG 26155, Arcobacter ellisii CECT 7837, Arcobacter venerupis CECT7836, Arcobacter sp.LA11, Pseudomonas alcaligenes OT 69, and Psychrobacter sp.C 20.9.Only the Arcobacter sp.LA11 genome had the complete denitrification trait (Figure 7) with a denitrifying enzymes pattern of "000110111001" (presence in genome of napA, napB, nirS, norB, norC, and nosZ).Arcobacter sp.LA11, which was isolated from the gut of the abalone Haliotis discus, has the complete repertoire genes for nitrogen fixation and denitrification [56].Pseudoalteromonas denitrificans DSM 6059, a denitrifying marine bacterium [57], has the same denitrifying enzymes pattern as Arcobacter sp.LA11 (Figure 7c).The findings on Arcobacter genomes with complete denitrification traits (Figure 7a) as well as the recommendation for research on Arcobacter strains and their hosts [56] led us to construct a denitrification potential dataset for 127 genomes taxonomically classified to the bacteria family Arcobacteraceae.The ecosystem classification and counts of genomes according to denitrification potential revealed Arcobacteraceae strains inhabit engineered, environmental, and host-associated ecosystems (Figure 8).Arcobacteraceae family members are associated with diverse ecosystem categories (including human, animals, plants, wastewater, marine and non-marine aquatic environments, food production, and industrial production).
The digital categorization assigned the 127 Arcobacteraceae genomes to eight of the sixteen denitrification potential traits.The eight categories and associated number of genomes were as follows:  3).The findings on Arcobacter genomes with complete denitrification traits (Figure 7a) as well as the recommendation for research on Arcobacter strains and their hosts [56] led us to construct a denitrification potential dataset for 127 genomes taxonomically classified to the bacteria family Arcobacteraceae.The ecosystem classification and counts of genomes according to denitrification potential revealed Arcobacteraceae strains inhabit engineered, environmental, and host-associated ecosystems (Figure 8).Arcobacteraceae family members are associated with diverse ecosystem categories (including human, animals, plants, wastewater, marine and non-marine aquatic environments, food production, and industrial production).
The digital categorization assigned the 127 Arcobacteraceae genomes to eight of the sixteen denitrification potential traits.The eight categories and associated number of genomes were as follows:  8) complete denitrification (23 genomes).Among the Arcobacteraceae genomes investigated, only the Aliarcobacter cryaerophilus AZT-1 genome (denitrifying enzymes pattern "000010111000") did not encode the periplasmic nitrate reductase complex NapAB, as only the gene for NapB was present.The IMG/M annotated the ecosystem category of Mollusca to 15 Arcobacteraceae genomes in three denitrification traits categories of complete denitrification (7 genomes); nitrate reduction only (7 genomes); and nitrate, nitrite, and nitric oxide reduction only (1 genome) (Table 3).Malaciobacter molluscorum F98-3 (mussel)

Nitrite and Nitric Oxide Reduction Only
Poseidonibacter ostreae JOD-M-6 (oyster) 1 The details for each genome are available from the Integrated Microbial Genomes and Microbiomes (IMG/M) system website.

Nitrogen Assimilation, Taxonomic, and Ecosystems Annotations for Genomes with a Complete Denitrification Pattern
We uploaded to the IMG/M system the list of 3280 identifiers ("taxon_oid") for the genomes with a complete denitrification pattern.We then used the IMG/M Find Function tool to identify genomes that have genes for four nitrogen assimilation pathways.The pathways investigated were nitrogen fixation, assimilatory nitrate reduction, assimilatory nitrite reduction, and ammonia assimilation to glutamine [5,10].There were 369 bacteria genomes that encoded nifH, a biomarker used for identifying nitrogen-fixing bacteria and archaea [58,59] (Table 4).In addition, 3164 of the 3280 bacterial genomes (96.5%) had the glutamine synthetase (glnA) gene (KEGG Entry: K01915) for ammonia assimilation to glutamine.The examples of bacterial strains provided in Table 4 are from an ecosystem perspective, including an example relevant to Mollusca health.Aliiroseovarius crassostreae DSM 16950, a causative bacterium of Roseovarius oyster disease in Eastern oysters (Crassostrea virginica), is an example of 664 complete denitrifying bacterial strains encoding the glnA and without evidence for genes of the four other nitrogen assimilatory pathways investigated.We also searched the IMG/M database for genomes that we assigned the complete denitrification pattern and annotated with the KEGG identifier of K01601 (ribulose-bisphosphate carboxylase large chain [EC:4.1.1.39])for carbon fixation.The genomes of three strains (CECT 5094, CECT 5095, and CECT 5096) of Roseibium album isolated from oysters were among the 695 genomes annotated with the gene for carbon fixation.Aliiroseovarius crassostreae DSM 16950 1 The Kyoto Encyclopedia of Genes and Genomes (KEGG) database was the source of the identifiers.
binary data synthesis of the denitrification dataset.In the denitrification dataset, th tential nitrous oxide reducers had "1" while non-nitrous oxide reducers had "0" in th digit of the twelve-digit denitrifying enzymes pattern and four-digit denitrification tern.The last digit for the two patterns was "0" for the Lebtimonas, Nautilia, and Ca bacter genomes (Figure 10).Since Arcobacteraceae is a member of the Campylobacterota and since denitrifying Arcobacteraceae strains have been isolated from oysters (Figure 9), we conducted a literature search on the other Campylobacterota families with genomes categorized as having complete denitrification potential.Our search retrieved a publication on nitrous oxide reducing Campylobacterota isolated from deep-sea hydrothermal environments [62].The Campylobacterota genera listed in the publication as having strains with potential nitrous oxide reducers are Nitratifractor, Nitratiruptor, Sulfurimonas, and Sulfurovum.The availability of a comparative set of Campylobacterota genera (Lebtimonas, Nautilia, and Caminibacter) whose strains do not reduce nitrous oxide allowed us to verify the accuracy of the binary data synthesis of the denitrification dataset.In the denitrification dataset, the potential nitrous oxide reducers had "1" while non-nitrous oxide reducers had "0" in the last digit of the twelve-digit denitrifying enzymes pattern and four-digit denitrification pattern.The last digit for the two patterns was "0" for the Lebtimonas, Nautilia, and Caminibacter genomes (Figure 10).Among the Campylobacterota genera that are potential nitrous oxide reducers, Sulfurovum and Sulfurimonas have genomes that encode and those that do not encode nitrous oxide reductase.Additionally, 33 genomes with the complete denitrification pattern include all one of the Nitratifractor and eight of the Nitratiruptor strains as well as twenty of the Sulfurimonas and four of the Sulfurovum strains (Figure 11).Among the Campylobacterota genera that are potential nitrous oxide reducers, Sulfurovum and Sulfurimonas have genomes that encode and those that do not encode nitrous oxide reductase.Additionally, 33 genomes with the complete denitrification pattern include all one of the Nitratifractor and eight of the Nitratiruptor strains as well as twenty of the Sulfurimonas and four of the Sulfurovum strains (Figure 11).

Searches for Scholarly Articles with Gene Symbols of Enzymes for Denitrification
We designed a visual analytics worksheet that lists the gene symbols and other identifiers for the 12 denitrifying enzymes (Figure 12a).Additionally, the design included uniform resource locator (URL) actions for 16 Google Scholar searches, with the prefix text "denitrification" and the gene symbol (e.g., "narG" and "nosZ") of the denitrifying enzymes being part of the design (Figure 12b).When a researcher selects the Google Scholar URL action, the results will be up to date, with options to retrieve related articles and articles citing the retrieved article.The URL action might also retrieve the context of the search text within the scholarly article.The search texts that include the gene symbol prefixed with negation words (such as "absence", "lack", "lacking", "missing", "no", "not possess", "not with", and "without") can retrieve scholarly articles on incomplete denitri-

Searches for Scholarly Articles with Gene Symbols of Enzymes for Denitrification
We designed a visual analytics worksheet that lists the gene symbols and other identifiers for the 12 denitrifying enzymes (Figure 12a).Additionally, the design included uniform resource locator (URL) actions for 16 Google Scholar searches, with the prefix text "denitrification" and the gene symbol (e.g., "narG" and "nosZ") of the denitrifying enzymes being part of the design (Figure 12b).When a researcher selects the Google Scholar URL action, the results will be up to date, with options to retrieve related articles and articles citing the retrieved article.The URL action might also retrieve the context of the search text within the scholarly article.The search texts that include the gene symbol prefixed with negation words (such as "absence", "lack", "lacking", "missing", "no", "not possess", "not with", and "without") can retrieve scholarly articles on incomplete denitrification.
A Google Scholar search with search text "('absence of nosZ' denitrification)" retrieved 40 results as of 23 March 2024, including an article on incomplete denitrification trait for 23 Thermus strains associated with terrestrial geothermal environments [63] (Figure 12c).We used this list of strains from the scholarly article by Jiao et al. [63] to determine the overlap with the 29 genomes of Thermus strains in the microbial denitrification potential dataset.According to the article, the 23 genomes of Thermus do not encode the gene for nitrous oxide reductase (nosZ).An explanation for the absence is that nosZ is sensitive to oxygen.The absence of nosZ gene is consistent with the denitrification patterns and denitrification trait assigned by our study (Appendix B, Figure A2).Furthermore, the Thermus genomes absent in our dataset were reported by Jaio et al. [63] as lacking the genes for the denitrification pathway.Thus, the data-investigation interfaces supported knowledge discovery on nosZ biochemical characteristics and evolutionary history We used this list of strains from the scholarly article by Jiao et al. [63] to determine the overlap with the 29 genomes of Thermus strains in the microbial denitrification potential dataset.According to the article, the 23 genomes of Thermus do not encode the gene for nitrous oxide reductase (nosZ).An explanation for the absence is that nosZ is sensitive to oxygen.The absence of nosZ gene is consistent with the denitrification patterns and denitrification trait assigned by our study (Appendix B, Figure A2).Furthermore, the Thermus genomes absent in our dataset were reported by Jaio et al. [63] as lacking the genes for the denitrification pathway.Thus, the data-investigation interfaces supported knowledge discovery on nosZ biochemical characteristics and evolutionary history through a combination of (1) scholarly searchers, (2) the presence or absence of genes for denitrifying enzymes in genomes, and (3) patterns of denitrification traits.
The article by Jiao et al. [63] also notes the presence of nosZ in the genome of the related bacteria, Deinococcus ficus CC-FR2-10.There are six Deinococus genomes in our microbial denitrification potential dataset, of which Deinococcus ficus CC-FR2-10 encodes the genes for nitrite reductase (nirK) and nosZ.We interpreted the presence of only nirK and nosZ genes as the denitrification trait of "Nitrite and Nitrous Oxide Reduction Only".The other three Deinococus genomes assigned to the same denitrification trait in our dataset are Deinococcus enclensis DSM 25127, Deinococcus ficus DSM 19119, and Deinococcus ficus KS 0460.The remaining two Deinococcus genomes (Deinococcus sp.NW-56 and Deinococcus yavapaiensis DSM 18048) have denitrification trait "Nitrite Reduction Only".

Denitrification Patterns of Archaeal Genomes
The 866 archaeal genomes were assigned to 9 of possible 16 denitrification patterns.These nine denitrification patterns were deduced from 52 twelve-digit binary number codes (Table 5 and Appendix C Figure A3).None of the archaeal genomes had a complete denitrification pattern.The potential for nitrate reduction (represented by "1" in the first three digits of the twelve-digit binary number) was assigned to 43 genomes including Ferroglobus placidus AE-DII12DO, DSM 10642, the only member of a denitrification pattern that has the denitrification potential for "Nitrate, Nitric Oxide and Nitrous Oxide Reduction Only".The other one-archaea member denitrification potential categories were (1) "Nitric Oxide Reduction Only" (Candidatus Hydrothermarchaeota archaeon JdFR-18), and ( 2) "Nitrate and Nitrite Reduction Only" (Candidatus Heimdallarchaeota archaeon LC_3).Among the archaeal genomes investigated, 585 genomes encoded the nitrate reduction trait (represented by the sixth digit and seventh digit in the twelve-digit binary number).The Ferroglobus placidus genome did not encode nirK or nirS for nitrite reduction to produce nitric oxide, consistent with findings from a publication on the genome sequence of the archaea [64].In addition, the genome of Ferroglobus placidus had gene annotations for carbon fixation (K01601) and glutamine synthetase (K01915).Table 5 includes references to research on the denitrification potential of the example archaea genome.The microbial denitrification dataset contains 21 archaeal genomes (7 genera and 18 unique strains) that encode both nirK and nirS genes for nitrite reduction.The seven Halobacteriota genera are Halobiforma, Halorubrum, Halosolutus, Haloterrigena, Natrinema, Natronomonas, and Salinilacihabitans (Table 6).

Discussion
In this study, we investigated the denitrification potential in the context of taxonomic and ecosystem features for 62,624 microbial genomes (866 archaea and 61,758 bacteria).The dataset constructed includes 181 archaeal and 8009 bacterial genomes with the nitrous oxide reductase gene (nosZ) (Table 2).This fundamental scientific knowledge of archaea and bacteria includes trait knowledge (e.g., complete denitrification), which is needed for machine learning models that scale knowledge at microsites for decision-making at a global scale [74].Incomplete microbial denitrification that results in the production and emission of harmful nitrous oxide gas is detrimental to the health of humans, animals, plants, and the environment [75].Nitrous oxide reductase catalyzes the last step of denitrification, which transforms the ozone-layer-depleting nitrous oxide to dinitrogen gas [75][76][77].Our research builds on the microTrait categories [10] and the 2019 publication by Albright et al. [5] that reported the presence of annotations for 11 nitrogen cycling pathways in 6384 bacterial and 252 archaeal finished genomes in the IMG/M database.The collection of the IMG/M genomes investigated in our study includes three categories of genome sequencing status: draft, finished, and permanent draft.The constructed microbial denitrification potential dataset also includes taxonomic and ecosystem annotations of the genomes.Some strains (e.g., Brucella melitensis bv. 1 16 M with complete denitrification trait) have more than one genome sequence available in IMG/M, allowing for the produced dataset to include biological and technical replicates.This unique denitrification potential dataset is useful for planning and conducting microbiological research on denitrification.The methods implemented in the data investigation can be adapted for traits defined by ecological functions of resource acquisition, resource use, and stress tolerance [10], for example, the microbial genes involved in the resource acquisition function of nitrogen fixation, where microorganisms convert atmospheric nitrogen gas to biologically available ammonia [59].
The categorization for nitrogen fixation potential can be based on the presence or absence in genome of a set of six genes (nifH, nifD, nifK, nifE, nifN, and nifB) coding for structural and biosynthetic components, namely NifHDK and NifENB [58].
The microbial denitrification dataset allows researchers to retrieve subsets of bacteria or archaea strains with 1 or more of 36 variables (Table 1).A query of the dataset with keyword "denitrificans" in the "Genome Name" field combined with environment ecosystem and complete denitrification pattern ("1111") retrieved 20 genomes (Figure 5).The possibility for human interaction with the dataset can facilitate the production of evidence by comparison of the digits in the 4-digit binary "Denitrification Pattern" and 12-digit binary "Denitrifying Enzymes Pattern".Digit 6 and Digit 7 in the 12-digit pattern are, respectively, for the presence or absence of the gene for copper-type nitrite reductase (nirK) and the gene for cytochrome cd1-type nitrite reductase (nirS).In the case of aquatic ecosystems, aquatic bacteria inhabit a variety of microhabitats such as diffusion-controlled water phases, colloidal phases, particles, and within the living biosphere (oyster tissue, zooplankton, algae, fish, etc.), which are impacted by and also influence abiotic factors within the water and/or tissues they inhabit [78].The gaseous nitric oxide is an intermediate product of the rate-limiting step of denitrification [79].The possibility that nitric oxide can be an extracellular signaling molecule between aerobic bacteria (e.g., Phaeobacter inhibens) and algae (e.g., Gephyrocapsa huxleyi) [80] presents the use for our data resources to investigate the denitrification potential of aerobic marine bacteria.Bacterial nirK is expressed in oxygenated marine waters that have detectable nitrite levels and photosynthesizing microorganisms [80].The microbial denitrification potential dataset contains 49 Phaeobacter from 47 strains, with 43 genomes having evidence of nirK for the reduction of nitrite to nitric oxide.
A study of a collection of 249 archaeal genomes (170 Euryarchaeota, 65 Crenarchaeota, and 14 Thaumarchaeota) reported only partial denitrification pathways (nitrite reduced to nitric oxide, nitric oxide to nitrous oxide, and nitrous oxide to nitrogen gas) [5].In our study of 866 archaeal genomes (Table 5), we found genomic evidence for three denitrification steps for the metabolic versatile Ferroglobus placidus AEDII12DO, a hyperthermophilic, strictly anaerobic chemolithoautotroph iron-oxidizer that belongs to the Archaeoglobaceae family in the phylum Euryarchaeota.The genome sequence of strain AEDII12DO does not have annotations for the nitrite reductases (nirK or nirS) that produce nitric oxide in the second stage of denitrification [64,81].In cells of aerobic ammonia-oxidizing archaea (AOA), the highly reactive nitric oxide is needed for sustaining aerobic ammonia oxidation activity [82].We identified 21 archaea genomes (7 genera and 18 strains) of the phylum Halobacteriota that encode both nitrite reductases (Table 6).In the case of bacteria genomes with both genes, our dataset contains 257 bacterial genomes from at least 57 genera including (1) Methyloprofundus associated with the gills of the mussel, Bathymodiolus platifrons [83] and (2) the oligotrophic nitrogen-fixing Bradyrhizobium oligotrophicum S58 [84].The presence of two types of nitrate reductases could confer archaea and bacteria with the potential to produce nitric oxide in different saline environments of (1) non-saline and low salinity (rivers and fresh water lagoons), (2) slight and moderate salinity (oceans, estuaries and mangroves), and (3) hypersalinity (salt marshes, hypersaline lakes, and salty ponds) [85].One of the nitrite reductases may also function beyond denitrification, such as in the colonization of rice roots by Bradyrhizobium oligotrophicum S58 through maintaining swimming motility under fluctuating oxygen conditions in the presence of nitrate [84].Thus, the type of nitrite reductase encoded in a microbial genome could be predictive of the microbe's ecological functioning [82,85,86].
Growing anthropogenic disturbances, including climate change, invasive species, and micro/nanoplastics, are likely influencing microbial communities and impacting microbial processes [87,88].This dataset will assist researchers in identifying changes in denitrification potentials that may occur with changes in microbial diversity due to disturbance.In addition to the availability of genomic sequences of single microbial isolates, metagenomics sequencing technologies produce data on the microbiome (the collective set of gene sequences from multiple genomes) in a specific habitat and timeframe [89].Microbiome/metagenomic analyses of ecosystems such as engineered (e.g., wastewater), environmental (e.g., soil and seawater), and host-associated (e.g., oyster) types have revealed constituent microorganisms as well as the enzyme genes for denitrification [37,[39][40][41][42]90].We suggest that the data-investigation products (Supplemental Materials) can be useful for producing evidence on the denitrification patterns of identified taxa from microbiome analysis.For example, a microbiome analysis of the Eastern oyster as a function of ploidy and seasons identified metagenomics associated genome Psychrobacter maritimus as having genes for denitrifying enzyme genes narH, narI, nirK, and norB [42].The patterns for the Psychrobacter maritimus Pi2-25 denitrification dataset have the denitrification pattern "1100" (nitrate and nitrite reduction only) and denitrifying pattern "111001010010" (presence of narG, narH, narI, nirK, norB, and norW).
We designed and implemented interactive visualizations in visual analytics software for two main purposes related to microbial denitrification.The first purpose is to provide evidence for microbial denitrification potential by comparing patterns of presence or absence in microbial genomes of denitrifying enzymes for ecologically relevant denitrification trait standards (Figures 5-7).The second purpose is to facilitate personalized and collaborative learning and knowledge exchange on microbial denitrification by connecting to bioinformatics and scholarly resources.The inclusion of hyperlinks in the visual analytics design allows for the 62,624 genome names in the denitrification dataset to be searched with search engines and literature databases that are up to date (Figures 5 and 6).A major contribution of our data investigations is a denitrification potential categorized dataset of microbial genomes that allows for decision-making on the choice of microbial strains for sustainable microbial denitrification applications.For example, a recent report experimentally combined two denitrifying bacteria strains, Paracoccus denitrificans PD1222 and Ochrobactrum sp.TCC-2, to mitigate nitrous oxide emission and detoxify triclocarban, a widespread broad-spectrum antimicrobial [91].Our microbial denitrification dataset contains 96 strains of Paracoccus and 77 strains of Ochrobactrum (including those previously classified as the Bacillus genus).The counts of strains with complete denitrification patterns were 43 and 34 for Paracoccus and Ochrobactrum, respectively.
The constructed dataset and accompanying interactive data-investigation resources can help to advance research into the molecular, biochemical, physiological, and microbial aspects of denitrification, among others.The total 12-digit "Denitrifying Enzymes Patterns" observed in the 62,624 genomes were 484 out of possible 4096 twelve-digit patterns.We have provided the microbial denitrification dataset in a variety of data formats (comma separated file, spreadsheet, and Tableau views) for further data investigations, research, applications, and education purposes.Following the guidelines for constructing ecological trait datasets [12], the microbial denitrification dataset contains identifiers for connecting to microbial web portals and scholarly resources.The microbial denitrification potential dataset, spreadsheet files, and interactive visual analytics resources are available as online or off-line tools to articulate the value of data.Researchers can incorporate these denitrification-potential data resources into research on the biochemical, molecular, and physiological aspects of denitrification, among others.For example, when a research team is describing a new bacteria or archaea isolate or genome sequence for publication, researchers could compare the denitrification-potential patterns of the isolate with members of the same genus in our microbial denitrification dataset.
Several Arcobacteraceae strains are associated with the Mollusca ecosystem category and include strains with complete denitrification potential (Figure 7).Although there have been discussions on the nomenclature changes and new genera described, there is a consensus that the Arcobacteraceae family is justified [92,93].Arcobacteraceae strains have been isolated from diverse habitats including terrestrial, aquatic, animal, food, and human [92][93][94][95][96].The presence of antimicrobial resistance genes has been documented in strains of Arcobacteraceae [94].Antimicrobials such as triclocarban that occur with anthropogenic reactive nitrogen sources in the environment can affect the efficiency of denitrification [91].There is a need to investigate denitrifying Arcobacteraceae for effects of antimicrobials on denitrification rates.In addition, using studies of synthetic denitrifying communities of Shewanella as a guide [7], we suggest investigations into synthetic denitrifying communities of Arcobacteraceae for optimized and stable denitrification in ecosystems.
There are limitations of this data-investigation project.The datasets used in the project are from different sources, and data providers might complete updates as new data become available.For example, the bacterial taxonomic classification may be updated or be inconsistent between methods of annotation.To mitigate this limitation, we have included multiple taxonomic sources as well as web links to Integrated Microbial Genomes and Microbiomes (IMG/M).We based the digital categorization of denitrification potential on 12 enzymes in the canonical denitrification pathway.In some cases, we verified the accuracy of the patterns using published studies [62,63] that tested for the presence of denitrification enzymes.However, other factors can affect the functional performance of the denitrification trait such as environmental and genetic factors [74,97].Our principal data source for the datasets is IMG/M from genomes of varying levels of genome sequence completion (finished, draft, and permanent draft).Therefore, we have included a filter on the genome sequencing status in some views to help researchers decide on the data to use.

Conclusions
Denitrification is a major component of the nitrogen cycle for the reduction of harmful nitrous oxide gas to harmless dinitrogen gas.We articulated the denitrification potential in context of taxonomic classification and ecosystem features for 62,624 microbial genomes (866 archaea and 61,758 bacteria).We recommend denitrification traits of Arcobacteraceae for further research because of (1) the bacteria family's global distribution; (2) associations with humans, animals, plants, and the environment; (3) presence of antimicrobial resistance genes; (4) assignment of 127 genomes to eight denitrification traits, and (5) the interaction of some Arcobacteraceae strains with shellfish filter feeders.Finally, the microbial denitrification data resources produced in our research can also be useful for identifying microbial strains for synthetic denitrifying communities.

Figure 1 .
Figure 1.A screenshot of an Integrated Microbial Genomes and Microbiomes displaying microbial genomes with annotation for a KEGG Orthology (KO) term ample shown is for nitrous oxide reductase with KO identifier K00376, retrievin

Figure 1 .
Figure 1.A screenshot of an Integrated Microbial Genomes and Microbiomes (IMG/M) webpage displaying microbial genomes with annotation for a KEGG Orthology (KO) term identifier.The example shown is for nitrous oxide reductase with KO identifier K00376, retrieving 8193 genomes.

Figure 2 .
Figure 2. A screenshot of the design of a visual analytics resource for constructing a dataset of microbial genomes from the dataset retrieved from the bioinformatics resource (IMG/M).The example shown is for nitrous oxide reductase with KEGG Orthology identifier K00376.The filters in the design allow for the display of a dataset with options for taxonomic domain and genome sequencing status.

Figure 3 .
Figure 3. Distribution of denitrification patterns and denitrification traits assigned to a set of 62,624 microbial genomes consisting of 866 archaeal and 61,758 bacterial genomes."Null" means an absence of annotation.

Figure 3 .
Figure 3. Distribution of denitrification patterns and denitrification traits assigned to a set of 62,624 microbial genomes consisting of 866 archaeal and 61,758 bacterial genomes."Null" means an absence of annotation.Microorganisms 2024, 12, x FOR PEER REVIEW 9 of 31

Figure 4 .
Figure 4. Distribution of denitrification patterns, denitrification traits, and ecosystem types for 179 bacterial genomes annotated with the ecosystem of the host-associated and ecosystem category of Mollusca.The five genomes assigned to the oyster ecosystem type were from four strains of Roseibium album (CECT 5094, CECT 5095, CECT 5096, and CECT 7551) and Ruegeria denitrificans CECT 5091.

Figure 4 .
Figure 4. Distribution of denitrification patterns, denitrification traits, and ecosystem types for 179 bacterial genomes annotated with the ecosystem of the host-associated and ecosystem category of Mollusca.The five genomes assigned to the oyster ecosystem type were from four strains of Roseibium album (CECT 5094, CECT 5095, CECT 5096, and CECT 7551) and Ruegeria denitrificans CECT 5091.

31 Figure 5 .
Figure5.A screenshot of a visual analytics resource to support interaction with the dataset on denitrification potential of archaeal and bacterial genomes with an emphasis on filtering by ecosystem options.The interaction worksheet provides options and links to external resources (IMG/M website, Google Search and Google Scholar).The insert box on the left was obtained from clicking the sequencing status symbol associated with Marionobacter denitrificans JB02H27, a bacteria isolated from marine sediment and known to reduce nitrite and nitrate to gaseous nitrogen[54].The webpage link to the interactive version of the visual analytics resource is available in the Supplementary Materials section.

Figure 5 .
Figure5.A screenshot of a visual analytics resource to support interaction with the dataset on denitrification potential of archaeal and bacterial genomes with an emphasis on filtering by ecosystem options.The interaction worksheet provides options and links to external resources (IMG/M website, Google Search and Google Scholar).The insert box on the left was obtained from clicking the sequencing status symbol associated with Marionobacter denitrificans JB02H27, a bacteria isolated from marine sediment and known to reduce nitrite and nitrate to gaseous nitrogen[54].The webpage link to the interactive version of the visual analytics resource is available in the Supplementary Materials section.

Figure 6 .
Figure6.A screenshot of a visual analytics resource to support human interaction with the dataset on denitrification potential of archaeal and bacterial genomes with emphasis on filtering by taxonomic options.The interaction worksheet provides options as well as connection to external resources (IMG/M website, Google Search and Google Scholar).The insert image with GTDB-Tk taxonomic assignments was obtained by clicking the sequencing status symbol associated with Roseibium aestuarii SYSU M00256-3, a bacteria isolated from an estuary and known to be unable to reduce nitrate[55].The webpage link to the interactive version of the visual analytics resource is available in the Supplementary Materials section.

Figure 7 .
Figure 7. Three stages of interactive data investigation for the denitrification potential of bacterial genera associated with the Eastern oyster (Crassostrea virginica).We obtained the list of nine genera from the study of bacteria associated with the gill tissues of the Pacific oyster (Crassostrea gigas) and Eastern oyster [52].

Figure 7 .
Figure 7. Three stages of interactive data investigation for the denitrification potential of bacterial genera associated with the Eastern oyster (Crassostrea virginica).We obtained the list of nine genera from the study of bacteria associated with the gill tissues of the Pacific oyster (Crassostrea gigas) and Eastern oyster [52].

Figure 8 .
Figure 8. Ecosystem classifications and denitrification potential patterns of 127 Arcobacteraceae genomes.The association of Arcobacteraceae with multi-ecosystem habitats including human, animal, plants, and the environment presents a bacteria family for research on synthetic denitrifying communities.

Figure 8 .
Figure 8. Ecosystem classifications and denitrification potential patterns of 127 Arcobacteraceae genomes.The association of Arcobacteraceae with multi-ecosystem habitats including human, animal, plants, and the environment presents a bacteria family for research on synthetic denitrifying communities.

Figure 9 .
Figure 9. Ecosystem categories assigned to 3280 bacterial genomes with complete denitrific potential.The phyla Campylobacterota and Pseudomonadota have genera associated with Mo (shellfish).

Figure 9 .
Figure 9. Ecosystem categories assigned to 3280 bacterial genomes with complete denitrification potential.The phyla Campylobacterota and Pseudomonadota have genera associated with Mollusca (shellfish).

Figure 10 .
Figure 10.Evidence from binary numbering patterns indicating that three Campylobacterota genera (Caminibacter, Lebetimonas, and Nautilia) do not encode the gene for nitrous oxide reductase.The last digit of the "Denitrification Pattern" and "Denitrifying Enzymes Pattern" is "0".

Figure 10 .
Figure 10.Evidence from binary numbering patterns indicating that three Campylobacterota genera (Caminibacter, Lebetimonas, and Nautilia) do not encode the gene for nitrous oxide reductase.The last digit of the "Denitrification Pattern" and "Denitrifying Enzymes Pattern" is "0".

Microorganisms 2024 , 31 Figure 11 .
Figure 11.Genomes of the genera in phylum Campylobacterota (Nitratifractor, Nitratiruptor, Sulfurimonas, and Sulfurovum) that have the complete denitrification pattern ("1111") in the microbial denitrification potential dataset.The nitrous oxide reductase activity of strains from the taxonomic class campylobacteria associated with deep-sea hydrothermal vents was reported by Fukushi et al. [62].
fication.A Google Scholar search with search text "('absence of nosZ' denitrification)" retrieved 40 results as of 23 March 2024, including an article on incomplete denitrification trait for 23 Thermus strains associated with terrestrial geothermal environments [63] (Figure 12c).

Figure 11 .
Figure 11.Genomes of the genera in phylum Campylobacterota (Nitratifractor, Nitratiruptor, Sulfurimonas, and Sulfurovum) that have the complete denitrification pattern ("1111") in the microbial denitrification potential dataset.The nitrous oxide reductase activity of strains from the taxonomic class campylobacteria associated with deep-sea hydrothermal vents was reported by Fukushi et al. [62].

Microorganisms 2024 , 31 Figure 12 .
Figure 12.Visual interfaces for selecting and exploring searches for scholarly articles with gene symbols of enzymes for denitrification.(a) The list of functional annotation identifiers and gene symbol for enzymes in the canonical denitrification pathway.Selecting the square for each gene symbol displays the Google Scholar search options.(b) The list of search text for Google Scholar to retrieve up-to-date journal articles and other scholarly literature.(c) An example of part of the retrieved results for the search text "('absence of nosZ' denitrification)".The selected journal article provides insights into the evolutionary history of the incomplete denitrification pathway of the bacteria genus, Thermus.

Figure 12 .
Figure 12.Visual interfaces for selecting and exploring searches for scholarly articles with gene symbols of enzymes for denitrification.(a) The list of functional annotation identifiers and gene symbol for enzymes in the canonical denitrification pathway.Selecting the square for each gene symbol displays the Google Scholar search options.(b) The list of search text for Google Scholar to retrieve up-to-date journal articles and other scholarly literature.(c) An example of part of the retrieved results for the search text "('absence of nosZ' denitrification)".The selected journal article provides insights into the evolutionary history of the incomplete denitrification pathway of the bacteria genus, Thermus.

Author
Contributions: Conceptualization, R.D.I. and Y.K.; validation, R.D.I., Y.K., S.E.K. and V.D.T.; investigation, R.D.I., Y.K., S.E.K. and V.D.T.; data curation, R.D.I.; writing-original draft preparation, R.D.I.; writing-review and editing, R.D.I., Y.K., S.E.K. and V.D.T.; visualization, R.D.I.; project administration, R.D.I. and Y.K.; funding acquisition, R.D.I., Y.K., S.E.K. and V.D.T.All authors have read and agreed to the published version of the manuscript.Funding: This research was funded by grants from National Science Foundation (IOS-1901377 and DUE-2142465), the National Institutes of Health (H3Africa Bioinformatics Network U41HG006941), the United States Department of Education Title III Program (P031B170091), and the U.S. Department of Energy Minority Serving Institution Partnership Program (MSIPP) managed by the Savannah River National Laboratory under SRNS contract DE-AC09-08SR22470.The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Table 1 .
Data columns in the microbial denitrification potential dataset including genome annotations and denitrification annotations.Genome, Ecosystem and Lineage categories were retrieved from the Integrated Microbial Genomes and Microbiomes (IMG/M) system.Denitrifying enzymes and denitrification potential were derived/calculated in visual analytics software based on the datasets of genomes with KEGG Orthology annotation in the IMG/M system. 1

Table 2 .
Functional annotation identifiers, gene nomenclature of enzymes, and count of genomes in associated with canonical denitrification pathway.

Table 3 .
Denitrification traits of selected Arcobacteraceae genomes isolated from Mollusca hosts.

Table 3 .
Denitrification traits of selected Arcobacteraceae genomes isolated from Mollusca hosts.

Table 4 .
Distribution of nitrogen assimilation pathways for 3280 bacterial genomes assigned with a complete denitrification pattern.