Collection of a Bacterial Community Reconstructed from Marine Metagenomes Derived from Jinhae Bay, South Korea

: Marine bacteria are known to play signiﬁcant roles in marine biogeochemical cycles regard-ing the decomposition of organic matter. Despite the increasing attention paid to the study of marine bacteria, research has been too limited to fully elucidate the complex interaction between marine bacterial communities and environmental variables. Jinhae Bay, the study area in this work, is the most anthropogenically eutrophied coastal bay in South Korea, and while its physical and biogeochemical characteristics are well described, less is known about the associated changes in microbial communities. In the present study, we reconstructed a metagenomics data based on the 16S rRNA gene to investigate temporal and vertical changes in microbial communities at three depths (surface, middle, and bottom) during a seven-month period from June to December 2016 at one sampling site (J1) in Jinhae Bay. Of all the bacterial data, Proteobacteria, Bacteroidetes, and Cyanobacteria were predominant from June to November, whereas Firmicutes were predominant in December, especially at the middle and bottom depths. These results show that the composition of the microbial community is strongly associated with temporal changes. Furthermore, the community compositions were markedly different between the surface, middle, and bottom depths in summer, when water column stratiﬁcation and bottom water hypoxia (low dissolved oxygen level) were strongly developed. Metagenomics data contribute to improving our understanding of important relationships between environmental characteristics and microbial community change in eutrophication-induced and deoxygenated coastal areas.


Summary
Marine environmental variables and bacterial data were collected from three different depths at Jinhae Bay (JB), South Korea, in 2016. The obtained dataset comprised temperature, salinity, and dissolved oxygen, as well as data of 324 different bacteria at the species level at three depth layers (surface, middle, and bottom) during a seven-month period from June to December at the J1 sampling site. The obtained bacterial data were quality filtered by removing ambiguous DNA sequences and chimera sequences, and by denoising. In total, 5,418,926 (83.6%) 16S rRNA quality reads were obtained, and approximately 3 million reads were associated with 55,400 operational taxonomic units (OTUs) (97% identity cutoff). At the phylum level, 11 bacterial phyla were detected, of which Proteobacteria (71%), Bacteroidetes (13%), Cyanobacteria (12%), and Actinobacteria (2%) accounted for 98% of all the OTUs in all the samples, except for those taken from the middle and bottom depths in December. With the exception of the month of December, at the class level, Alphaproteobacteria was predominant in all of the samples (accounting for 64%), followed by Flavobacteriia and Gammaproteobacteria. All bacteria sequence data were deposited in the NCBI. The collection of bacterial composition data will aid in understanding the correlation between environmental characteristics and bacterial community composition in eutrophication-induced and deoxygenated coastal areas.

Study Area
Jinhae Bay (JB) is located in the southeastern coastal area of South Korea (Figure 1), which, since the 1960s, has suffered from anthropogenic eutrophication due to massive nutrient loading from the adjacent large cities, Changwon and Geoje [1]. As a result, JB has suffered from anthropogenically-derived environmental problems, such as persistent hypoxia, water quality deterioration, and harmful algal blooms [2][3][4][5]. Seasonal hypoxia of JB has developed in bottom waters since the 1970s because of a mixture of eutrophicationderived algal blooming (spring) and water stratification (summer) [6,7].
Data 2021, 6, x FOR PEER REVIEW 2 of 4 class level, Alphaproteobacteria was predominant in all of the samples (accounting for 64%), followed by Flavobacteriia and Gammaproteobacteria. All bacteria sequence data were deposited in the NCBI. The collection of bacterial composition data will aid in understanding the correlation between environmental characteristics and bacterial community composition in eutrophication-induced and deoxygenated coastal areas.

Study Area
Jinhae Bay (JB) is located in the southeastern coastal area of South Korea (Figure 1), which, since the 1960s, has suffered from anthropogenic eutrophication due to massive nutrient loading from the adjacent large cities, Changwon and Geoje [1]. As a result, JB has suffered from anthropogenically-derived environmental problems, such as persistent hypoxia, water quality deterioration, and harmful algal blooms [2][3][4][5]. Seasonal hypoxia of JB has developed in bottom waters since the 1970s because of a mixture of eutrophicationderived algal blooming (spring) and water stratification (summer) [6,7].

Bacterial Community Compositions
To determine bacterial data quality, the QC20 scores of all 21 samples were higher than 97, with a minimum of 97.56 and a maximum of 98.3 (average: 97.92). The relative abundance (%) was calculated using each sample's OTU count number. For all 21 samples, the total number of sequence reads was 5,418,926 (83.6%) with an average of 258,044 and a mean of 258,552 quality reads per sample, and 55,400 OTUs were associated with roughly 3 million reads. The bacterial data (Supplementary Table S2) showed that Proteobacteria (mean: 71%) was the most abundant at the phylum level, followed by Bacteroidetes (mean: 13%), Cyanobacteria (mean: 12%), and Actinobacteria (mean: 2%). Additionally, at the class level, Alphaproteobacteria was the most dominant in all the samples, accounting for 64% (except in December), followed by Flavobacteriia and Gammaproteobacteria (Supplementary Table S2).

Sample Collection
Microbial samples were collected at three different depths (surface, middle, and bottom, where the surface is defined as 0.5 m beneath the surface, the middle is 10 m beneath

Bacterial Community Compositions
To determine bacterial data quality, the QC20 scores of all 21 samples were higher than 97, with a minimum of 97.56 and a maximum of 98.3 (average: 97.92). The relative abundance (%) was calculated using each sample's OTU count number. For all 21 samples, the total number of sequence reads was 5,418,926 (83.6%) with an average of 258,044 and a mean of 258,552 quality reads per sample, and 55,400 OTUs were associated with roughly 3 million reads. The bacterial data (Supplementary Table S2) showed that Proteobacteria (mean: 71%) was the most abundant at the phylum level, followed by Bacteroidetes (mean: 13%), Cyanobacteria (mean: 12%), and Actinobacteria (mean: 2%). Additionally, at the class level, Alphaproteobacteria was the most dominant in all the samples, accounting for 64% (except in December), followed by Flavobacteriia and Gammaproteobacteria (Supplementary Table S2).

Sample Collection
Microbial samples were collected at three different depths (surface, middle, and bottom, where the surface is defined as 0.5 m beneath the surface, the middle is 10 m beneath the surface, and the bottom is 1 m above the seabed) at the J1 sampling site in the central area of JB from June to December 2016 using a Niskin water sampler. The water samples were filtered through a sterile 0.22 µm cellulose ester membrane (Millipore, Ireland) to capture microbial cells. The filtered membrane was then immediately placed into a sterile petri dish and stored at −20 • C until analysis.

Measurement of Marine Environmental Variables
We conducted a hydrographic survey to collect environmental variables and bacterial community composition data at the J1 station in the central area of JB from June to December 2016. The vertical profiles of temperature, salinity, and dissolved oxygen (DO) were measured with a multi-sensor sonde (YSI; 6600 V). Water samples (21 = 3 depths × 7 months) for microbial DNA analysis were collected at the surface (approximately 0.5 m below the surface), middle (10 m below the surface), and bottom depths (approximately 1 m above the seabed) using a Niskin water sampler (Supplementary Table S1).

DNA Extraction and Sequencing
DNA from the filtered membrane was extracted using the DNeasy PowerWater ® DNA Isolation Kit (MO BIO Laboratories Inc., Carlsbad, CA, USA) by a combined chemical and mechanical procedure. Roughly 7-9 filtered membrane pieces were added to the PowerWater ® Bead Tubes, and the kit's protocol was followed, with the exception of the DNA precipitation time, which was prolonged to 30 min. DNA integrity was confirmed by electrophoresis in an agarose gel (1.2%), and the quantity was estimated by the PicoGreen method (Invitrogen) by Victor 3 fluorometry. For microbial biodiversity analysis, the V3-V4 variable region of the 16S rRNA gene was studied. Microbial primer pairs were used to identify the V3-V4 region [8]. Library quantification was carried out by real-time PCR on a CFX96 real-time system (BioRad, Hercules, CA, USA). After 16S rDNA amplification, the multiplexing step was performed with the Nextera XT Index Kit (Illumina). Verification of the size was carried out using a Bioanalyzer DNA 1000 chip; the expected size was~300 bp, and the size range of the DNA 1000 kit was 25-1000 bp. After size verification, the libraries were sequenced in a 2 × 300 bp paired-end run (MiSeq Reagent Kit v3, Illumina, CA, USA) on the Illumina MiSeq platform (San Diego, CA, USA).

Bioinformatics Analysis
Sequencing data were processed using QIIME1.9.1 to assemble paired end reads into tags according to their overlapping relationship [9][10][11]. In the pre-processing step, the primer was removed, and then demultiplexing and quality filtering (Phred ≥ 20) were applied [12]. USEARCH7 was used to perform denoising and chimera detection/filtering in operational taxonomic units (OTUs) grouping [13]. Then, the Silva132 and the NCBI database was used to determine the OTUs with 97% similarity using UCLUST and the open-reference analysis method determined the OTU identifier [14][15][16]. The OTU table was normalized dividing each OTU by the 16S copy number abundance. After filtering the generated OTU table using the Biological Observation Matrix (BIOM) format [17], the resulting sequences were clustered into OTUs based on a similarity threshold of ≥97% using Python Nearest Alignment Space Termination (PyNAST) [18]. We carried out comparative OTU assignment with the database in terms of Phylum, Class, Order, Family, Genus, and Species separately using RDP classifiers [19,20].