Distribution of Epilithic Diatoms in Estuaries of the Korean Peninsula in Relation to Environmental Variables

This study explores the relationships between environmental factors and the distribution of epilithic diatoms in 161 estuaries of three coastal areas on the Korean peninsula. We investigated epilithic diatoms, water quality, and land use in the vicinities of the estuaries during the months of May 2012, 2013 and 2014, because Korea is relatively free from the influences of rainfall at that time of year. We recorded 327 diatom taxa from the study sites, and the assemblage was dominated by members of the Naviculaceae. Bacillariaceae accounted for the largest proportion of diatoms, and Nitzschia inconspicua (18%) and N. frustulum (9.6%) were the most dominant species. A cluster analysis based on epilithic diatom abundance suggested that the epilithic diatom communities of Korean estuaries can be classified into four large groups (G) according to geography, as follows: Ia—the East Sea watershed, Ib—the eastern watershed of the South Sea, IIa—the West Sea watershed, and IIb—the western watershed of the South Sea. The former two groups, Ia and Ib, showed higher proportions of forest land cover and use, higher species occurrence, lower salinity, lower turbidity, and lower concentrations of nutrients than the latter two groups, while the latter groups, IIa and IIb, had higher proportions of agricultural land cover and use, higher electrical conductivity, higher turbidity, higher concentrations of nutrients, and lower species occurrence. The environmental factors underlying the distribution of epilithic diatoms, representative of each region, are as follows: dissolved oxygen and forest land cover and use for Reimeria sinuate and Rhoicosphenia abbreviate of the East Sea (ES), salinity and turbidity for Tabularia fasciculate of the West Sea (WS), and biochemical oxygen demand (BOD) and nutrients for Cyclotella meneghiniana of the WS. On the other hand, the most influential environmental factors affecting the occurrence of indicator species showing the highest indicator values (>60%) of each group were electrical conductivity for Navicula saprophila and Reimeria sinuate of Ia, and turbidity for Encyonema minutum of IIa. Collectively, the distribution of epilithic diatom communities inhabiting Korean estuaries are determined by geographical factors and water quality, which are in turn influenced by land cover and use, and which differ from east to west.


Introduction
The Korean peninsula is surrounded by seas, and these seas can be divided into two parts (the West and South Seas and the East Sea).The West and South Seas (WS and SS) are characterized by indented coastlines, large tidal ranges, and well-developed tidal flats.The East Sea (ES) is marked by high mountain ranges, high water levels, simple coastlines, and higher water quality.Despite its limited land area, there are more than 500 estuaries of different sizes on the Korean peninsula.Generally, there are two types of estuaries, including enclosed estuaries, which shut off the influx of seawater with artificial structures, and open estuaries, which occur without such structures [1].Recently, structures such as dams and irrigation reservoirs have been built in the West Sea and South Sea areas, leading to a gradual increase in the number of enclosed estuaries on the Korean peninsula [2].
According to geographical aspects and environmental management of the watershed regions, Korean estuaries can be divided into seven areas.The seven areas include the West Han River, the East Han River, the Geum River, the Yeongsan River, the Seomjin River, the Nakdong River, and Jeju Island [3].The watershed regions of estuaries in Korea consist of forest (39.2%), agricultural land (28.8%), bodies of water (16.4%), and cities (5.8%).Among the seven areas of estuaries, the East Han River area, a region spanning the Han River to the East Sea, is largely dominated by forest area (78%) and agricultural land (11%).In contrast, the Geum River area and the Yeongsan River area have higher proportions of urban and agricultural land due to national reclamation and land development projects of the Korean government [3].
Estuaries are transition zones where freshwater joins seawater, and are dynamic systems where volatile physicochemical changes occur in water temperature, salinity, and nutrients [4,5].Not only are estuaries valued from an ecological perspective as habitats in which a variety of organisms live, nurture, and spawn, but estuaries are also valued from a socio-economical perspective for agriculture, transportation, tourism, and leisure activities [6][7][8].However, the growth of cities within estuary basins has resulted in a population boom in Korea.What is more, the development of estuaries and diverse industrial facilities is associated with the destruction of natural environments and ecosystems in estuaries, with subsequent influxes of domestic and industrial wastewater and sewage causing environmental problems [9,10].Such problems are more marked in enclosed estuaries [1,11,12].
Together with bacteria, diatom communities are major biological factors producing biofilms on various underwater substrates [13].Like phytoplankton and aquatic plants, diatoms are an important primary producer in aquatic food webs and energy flows, as well as a food source for large invertebrates and fish communities [14][15][16].In addition, diatoms respond more quickly than other biota (macro-invertebrates and fish communities) to physicochemical changes in photo-intensity, water temperature, salinity, and nutrients, not only in freshwater but also in seawater [17,18].That is why diatom communities can be used to explain the influences of seawater and water temperature on microalgae communities in both enclosed and open estuaries [19][20][21][22][23][24][25].In particular, due to their low mobility, epilithic diatom communities have long been used to estimate the cumulative effects of pollutants over long periods of time, and to predict the occurrence and abundance of upstream regulators [26][27][28][29].Streams have attracted attention as lotic water zones used by investigators to explore factors influencing the distribution of epilithic diatoms, such as landscape, weather, land use/cover, and hydrologic conditions [30][31][32][33].
There is much room for further research on the influence of water quality on epilithic diatom communities and the relationships between the distribution of diatom communities and the environment at both domestic and global scales.The water quality of Korean estuaries and their biological communities, including the phytoplankton, vegetation structures, fish communities, and macro-invertebrates, are frequently studied [34][35][36][37].However, prior research efforts have focused on the nutrients flowing into streams or eutrophication and the red tides caused by fish farming businesses within estuary basins [38][39][40][41].Consequently, there has not been adequate research on epilithic diatom communities in estuaries.In addition, considering that freshwater and seawater mix cyclically in estuaries, investigating the conditions of aquatic ecosystems based on highly mobile phytoplankton, macro-invertebrates, and fish communities is a very limited approach in terms of time and space [42,43].Also observed in estuaries are lower degrees of complexity and sinuosity than in upper streams and rivers, and thus, simple microhabitats (river beds and canopy) allow more accurate interpretations of environments in which sessile organisms with limited mobility accumulate [44,45].Therefore, to fully understand Korean estuaries with their large variation in geographical features and size, epilithic diatom communities and environmental factors must be studied.
This study aims to assess the relationships between epilithic diatom communities and environmental factors in 90 highly accessible streams (161 study sites) distributed among all the estuaries in the Korean peninsula during a period that is relatively insensitive to rainfall.Our goals were (i) to characterize the distributional patterns of epilithic diatom communities in estuaries in the Korean peninsula and ii) to evaluate the effects of environmental factors on the distribution and occurrence of epilithic diatom communities in Korean estuaries.

Ecological Data
This survey was conducted in five watershed regions (161 study sites among 90 estuaries), not including poorly accessible areas such as the West Han River and Jeju Island, during the months of May in 2012, 2013 and 2014 (May is a period that is less influenced by rainfall than the rest of the year).For convenience, the survey areas were analyzed in the following order: the East Sea watershed (45 sites), the South Sea watershed (81 sites), and the West Sea watershed (35 sites) (Figure 1).In accordance with the guidelines of a national survey for stream ecosystem health [46], we selected study sites that are under the influence of seawater on land-sea boundaries, excluding those shaded by waterfront vegetation.Fourteen environmental factors were measured at each site.Environmental factors were categorized into two groups, including physicochemical factors and land use/cover.Water temperature, dissolved oxygen (DO), pH, electrical conductivity (EC), salinity, and turbidity of sites were measured in situ by the Horiba U-50 (HORIBA Ltd., Kyoto, Japan), which is a multi-parameter water quality analyzer.Water samples (2L) were collected for water-quality analyses in sterile plastic bottles at each field site and transported to the laboratory on ice.Biochemical oxygen demand (BOD) was determined using the relationship BOD = DO 1 ´DO 2 .Dissolved oxygen values on day one (DO 1 ) were determined on the first day, and the second set of dissolved oxygen values (DO 2 ) were determined after incubation of the same sample at room temperature of 20 ˝C for five days in a ventilated culture vessel according to the Winkler-Azide method.Concentrations of total nitrogen (TN) and phosphorus (TP) were measured by colorimetric assay using spectrophotometer (Optizen POP, Mecasys Co., Ltd, Daejeon, Korea), after persulfate digestion and ascorbic acid reduction approach [47].Land use/cover was recorded as a percentage of land type (e.g., forest, agriculture, urban) within a 1000 meter radius around each study site, according to the United States Environmental Protection Agency (US EPA) guidelines [48].To quantify chlorophyll-a (Chl-a) and ash-free dry matter (AFDM), we chose more than three natural stones with flat surfaces submerged in water for at least one week at each study site.The surface areas (25 cm 2 ) of the substrates were scrubbed with an iron brush and rinsed with field water on site.The rinse water was then collected in plastic specimen bottles and transported on ice to the laboratory.Both Chl-a and AFDM of each specimen were analyzed by standard methods after acetone extraction and membrane filtration, respectively [47].
To collect epilithic diatom specimens, we relied on the same method as the method used for the collection of Chl-a and AFDM.The specimens were immediately fixed with Lugol's solution and transported to the laboratory for permanganic acid treatment [49].The processed specimens were washed with distilled water at least three times and dried at room temperature.Then, a mounting medium was added to the specimens to make permanent preparations.Diatom counts and identifications were performed at 400ˆ-1000ˆmagnification using a light microscope (Nikon E600, Tokyo, Japan).To determine species abundance, we counted more than 500 diatom frustules using the S-R chamber under an inverted microscope, and then multiplied the proportion of each occurring species previously identified.Epilithic diatoms were identified based on the literature of Krammer and Lange-Bertalot [50,51], and the identified species were classified using the taxonomic system described by Simonsen [52].

Data Analysis
To understand the characteristics of diatom communities in the estuaries of the Korean peninsula, we conducted cluster analyses (CA) including all occurring species.We calculated the spatial patterns of diatom distribution and grouped samples according to similarities in the composition of diatom species using the Ward linkage method with Euclidean distance measures [53].To observe the ecological characteristics of each community, the dominant index [54], diversity index [55], species richness index [56], and evenness index [57] were calculated per group.We also conducted an analysis of indicator species representative of each group using the indicator species analysis (ISA) method [58], which is a non-hierarchical statistical method.We calculated the indicator value (IndVal) using the relative richness and relative frequency of species occurring at each study site.In this study, a species in a group with an IndVal of more than 25 or with an IndVal greater than five times the IndVal of species in other groups was defined as a good indicator species for that group [59].Among the diatom data used in CA and ISA, rare samples occurring at <5% of all total sites were excluded in the multivariate analysis.The remaining data were transformed by natural logarithms to reduce variation after adding one to all data to avoid the problem of an undefined logarithm zero.The analysis was performed using the PC-ORD software program (MjM Software, Gleneden Beach, OR, USA [60]. To understand the influences of environmental factors at each site on the distribution of diatoms, we conducted canonical correspondence analysis (CCA) [61].CCA can be used to define composite environmental variables that correspond as closely as possible to the major patterns of species occurrence.CCA provides a direct gradient analysis of all species simultaneously in relation to the underlying gradients within the measured environmental variables [62,63].Axes are derived that are linear combinations of environmental variables.Individual species are related directly to these axes under the assumption of a unimodal species response to the environmental variables.Canonical correspondence analysis is a means not only of directly investigating relationships between modern diatom assemblages and associated environmental variables but also indirectly of implementing weighted average calibration.The main advantages of weighted averaging ordinations include the simultaneous ordering of sites and species, rapid computation and very good performance when

Data Analysis
To understand the characteristics of diatom communities in the estuaries of the Korean peninsula, we conducted cluster analyses (CA) including all occurring species.We calculated the spatial patterns of diatom distribution and grouped samples according to similarities in the composition of diatom species using the Ward linkage method with Euclidean distance measures [53].To observe the ecological characteristics of each community, the dominant index [54], diversity index [55], species richness index [56], and evenness index [57] were calculated per group.We also conducted an analysis of indicator species representative of each group using the indicator species analysis (ISA) method [58], which is a non-hierarchical statistical method.We calculated the indicator value (IndVal) using the relative richness and relative frequency of species occurring at each study site.In this study, a species in a group with an IndVal of more than 25 or with an IndVal greater than five times the IndVal of species in other groups was defined as a good indicator species for that group [59].Among the diatom data used in CA and ISA, rare samples occurring at <5% of all total sites were excluded in the multivariate analysis.The remaining data were transformed by natural logarithms to reduce variation after adding one to all data to avoid the problem of an undefined logarithm zero.The analysis was performed using the PC-ORD software program (MjM Software, Gleneden Beach, OR, USA [60]. To understand the influences of environmental factors at each site on the distribution of diatoms, we conducted canonical correspondence analysis (CCA) [61].CCA can be used to define composite environmental variables that correspond as closely as possible to the major patterns of species occurrence.CCA provides a direct gradient analysis of all species simultaneously in relation to the underlying gradients within the measured environmental variables [62,63].Axes are derived that are linear combinations of environmental variables.Individual species are related directly to these axes under the assumption of a unimodal species response to the environmental variables.Canonical correspondence analysis is a means not only of directly investigating relationships between modern diatom assemblages and associated environmental variables but also indirectly of implementing weighted average calibration.The main advantages of weighted averaging ordinations include the Water 2015, 7, 6702-6718 simultaneous ordering of sites and species, rapid computation and very good performance when species have nonlinear and unimodal relationships to environmental gradients [64].Meanwhile, a random forest (RF) model, a non-parametric method designed to predict and evaluate the relationship between potential predictor variables and dependent variables [65], was conducted to identify important environmental factors contributing to the occurrence of each diatom community.To compare the relative significance of each environmental factor, we used the minimum description length (MDL) [66].To assess the predictability of the model, accuracy rates and the area under the curve (AUC) were calculated.The accuracy rates ranged from 0 to 1, and the AUC values ranged between 0.5 and 1 [67].Canonical correspondence analysis was conducted using PC-ORD software [60], and the RF model was run using the CORElearn package in the R statistical program [68].Analysis of variance (ANOVA) was carried out to compare the differences in environmental and biological characteristics among groups defined by correspondence analysis (CA).Tukey's multiple comparison tests were used for post hoc comparisons.Pearson's correlation analysis was performed to determine the relationship between the indicator species of each group and environmental factors.These analyses were conducted using SPSS software version 21 (IBM, New York, NY, USA).

Characteristics of Epilithic Diatom Communities
A total of 327 diatom taxa with 10 families were recorded at 161 estuary sites in the Korean peninsula during the research period.Nitzschia inconspicua (17.1%) was the most abundant species in the estuaries of Korea, followed by Nitzschia frustulum (9.1%) and Nitzschia fonticola (7.3%).Among the five watershed areas, the Nakdong River area displayed the highest species richness with 207 species, whereas the East Han River area showed the lowest species richness with 140 species.
The diatom communities (except for rare species occurring in less than 5% of sites) in the 161 study sites were classified into two groups (I and II), with four subgroups based on their similarities by cluster analysis as follows: Ia (43 sites), Ib (38 sites), IIa (30 sites), and IIb (50 sites) (Figure 1).The site distribution in each group reflects the characteristics of the geographical locations of the sites.Most sites in Group I were situated in the East Sea and the eastern part of the South Sea, whereas the sites in Group II were located in the West Sea and the western part of the South Sea.
Community indices, except for biomass and the Shannon diversity index (H'), were significantly different among the four subgroups defined by cluster analysis (ANOVA, p < 0.05) (Table 1).The number of species and the richness index (R) showed higher values in Group I (Ia: 2.18 ˘0.09, Ib: 2.11 ˘0.16) (Tukey's test, p < 0.05).Group IIb showed high values for the evenness index (E) (0.73 ˘0.02) but showed the lowest values for the dominance index (DI) (0.05 ˘0.02) (Tukey's test, p < 0.05).Group IIa had the highest values for biomass (705.6 ˘178.1 ˆ10 3 cells/cm 2 ), although the biomass of Group IIa was not significantly different from the biomass of other clusters (Tukey's test, p > 0.05).Meanwhile, the family Naviculaceae had the highest occurrence (>40%) among the 10 diatom families, followed by the Bacillariaceae, Fragilariaceae, and Achnanthaceae families, all of which showed similar rates.Although such patterns did not vary by group, families Hemidiscaceae (Group Ia) and Entomoneidaceae (Group Ib) occurred in specific groups only (Table 2).Compared with the family Naviculaceae, which was the most diverse in terms of species, family Bacillariaceae (37.8%-61.7%)made up the majority of the diatom community, while families Naviculaceae, Fragilariaceae, Achnanthaceae, and Thalassiosiraceae accounted for relatively small proportions (Table 1).Such patterns varied by group.Families showing high abundance were Achnanthaceae (21.2%) in Group Ia, Naviculaceae (34.4%) in Group Ib, Bacillariaceae (61.7%) in Group IIa, and Thalassiosiraceae (6.2%) in Group IIb.

Differences in Environmental Factors
All environmental factors, except for urban area, were significantly different among groups (ANOVA, p < 0.05) (Figure 2).Group I (Ia and Ib) showed higher proportions of forest area than Group II (IIa and IIb) (Tukey's test, p < 0.05), whereas the proportion of agricultural area was significantly higher in Group II (IIa and IIb) than in Group I (Ia and Ib) (Tukey's test, p < 0.05).Water temperature was the lowest in Group Ia (19.70 ˘0.64 ˝C) (Tukey's test, p < 0.05).Dissolved oxygen (DO) values were found to be high in Groups Ia (8.91 ˘0.22 mg/L) and Ib (8.62 ˘0.27 mg/L), and the values of pH, BOD, electrical conductivity, TN, and TP were significantly higher in Group II (IIa and IIb) (Tukey's test, p < 0.05) than in Group I.In particular, both turbidity (105.87 ˘10.54 NTU) and salinity (7.30 ˘1.15 ‰) were the highest in Group IIa (Tukey's test, p < 0.05).Ash-free dry matter (AFDM) was the lowest in Group Ia (1.78 ˘0.23 mg/cm 2 ), while chlorophyll-a (Chl-a) was the lowest in Group IIa (0.63 ˘0.05 µg/cm 2 ) (Tukey's test, p < 0.05).

Correlations between Diatom Communities and Environment
To clarify the indicator species of each group, we carried out indicator species analysis on 136 taxa that occurred at more than 5% of all sites.Good indicator species with higher indicator values (IndVal) of 25% were identified in a total of 28 taxa, comprising Group Ia (11 species), Group Ib (4 species), Group IIa (10 species), and Group IIb (three species).The highest good indicator species (IndVal > 50%) were Navicula saprophila (67%) and Reimeria sinuata (51%) in Group Ia and Encyonema minutum (60%) in Group IIa.On the other hand, all the indicator species showed low IndVal (<40%) in Group Ib and Group IIb (Table 3).Overall, the indicator species showed high correlations with turbidity and electrical conductivity(EC).Negative correlations were observed in Groups Ia and Ib (with the exception of EC), but positive correlations were observed in Groups IIa and IIb (Table 3).The indicator species in Group Ia (Navicula saprophila, Reimeria sinuata, Gomphonema pseudoaugur, and Fragilaria capucina var.gracilis) showed positive correlations with the proportion of forest area and dissolved oxygen (DO), but negative correlations with water temperature, pH, salinity, turbidity, electrical conductivity (EC), and AFDM.Rhoicosphenia abbreviata, the indicator species in Group Ib, showed positive correlations with the proportion of forest area and values of dissolved oxygen (DO) and chlorophyll-a (chl-a), but negative correlations with turbidity and electrical conductivity (EC).The remaining indicator species showed strongly negative correlations with turbidity.

Correlations between Diatom Communities and Environment
To clarify the indicator species of each group, we carried out indicator species analysis on 136 taxa that occurred at more than 5% of all sites.Good indicator species with higher indicator values (IndVal) of 25% were identified in a total of 28 taxa, comprising Group Ia (11 species), Group Ib (4 species), Group IIa (10 species), and Group IIb (three species).The highest good indicator species (IndVal > 50%) were Navicula saprophila (67%) and Reimeria sinuata (51%) in Group Ia and Encyonema minutum (60%) in Group IIa.On the other hand, all the indicator species showed low IndVal (<40%) in Group Ib and Group IIb (Table 3).Overall, the indicator species showed high correlations with turbidity and electrical conductivity(EC).Negative correlations were observed in Groups Ia and Ib (with the exception of EC), but positive correlations were observed in Groups IIa and IIb (Table 3).The indicator species in Group Ia (Navicula saprophila, Reimeria sinuata, Gomphonema pseudoaugur, and Fragilaria capucina var.gracilis) showed positive correlations with the proportion of forest area and dissolved oxygen (DO), but negative correlations with water temperature, pH, salinity, turbidity, electrical conductivity (EC), and AFDM.Rhoicosphenia abbreviata, the indicator species in Group Ib, showed positive correlations with the proportion of forest area and values of dissolved oxygen (DO) and chlorophyll-a (chl-a), but negative correlations with turbidity and electrical conductivity (EC).The remaining indicator species showed strongly negative correlations with turbidity.

Environmental Effects on the Spatial Distribution of Diatoms
We performed CCA to determine the effects of environmental factors on the distribution of epilithic diatoms.The distribution of diatom communities is explained by the two ordination axes in CCA (eigenvalues: 0.308 in Axis-1 and 0.160 in Axis-2).Group Ia is located on the right side of Axis-1, and Group Ib is closer to the origin than Group Ia.Group IIa is positioned on the left side of Axis-1, and Group IIb is located on the bottom of Axis-2.The effects of environmental factors were identified by correlation coefficients (Pearson correlation coefficient, p < 0.05) between environmental factors and each axis.Correlation coefficients are shown as arrows on the CCA ordination (Figure 3c), where the arrow length indicates the magnitude of the correlation value and the arrow direction implies a correlation with each axis.Axis-1 was most highly correlated with turbidity (r = ´0.816),followed by dissolved oxygen (r = 0.631), salinity (r = ´0.579),proportion of forest area (r = 0.561), and electrical conductivity (r = ´0.559).Axis-2 showed a high correlation with BOD (r = ´0.535),chlorophyll-a (r = 0.382), salinity (r = 0.380), and total phosphorus (r = ´0.364)(Figure 3d).

Environmental Effects on the Spatial Distribution of Diatoms
We performed CCA to determine the effects of environmental factors on the distribution of epilithic diatoms.The distribution of diatom communities is explained by the two ordination axes in CCA (eigenvalues: 0.308 in Axis-1 and 0.160 in Axis-2).Group Ia is located on the right side of Axis-1, and Group Ib is closer to the origin than Group Ia.Group IIa is positioned on the left side of Axis-1, and Group IIb is located on the bottom of Axis-2.The effects of environmental factors were identified by correlation coefficients (Pearson correlation coefficient, p < 0.05) between environmental factors and each axis.Correlation coefficients are shown as arrows on the CCA ordination (Figure 3c), where the arrow length indicates the magnitude of the correlation value and the arrow direction implies a correlation with each axis.Axis-1 was most highly correlated with turbidity (r = −0.816),followed by dissolved oxygen (r = 0.631), salinity (r = −0.579),proportion of forest area (r = 0.561), and electrical conductivity (r = −0.559).Axis-2 showed a high correlation with BOD (r = −0.535),chlorophyll-a (r = 0.382), salinity (r = 0.380), and total phosphorus (r = −0.364)(Figure 3d).Water 2015, 7, 6702-6718

Predictions of the Appearance of Diatom Species
Appearances of species in diatom communities were predicted according to environmental factors using a random forest model.The performance measures were different depending on species, ranging from 0.76 to 0.95 in accuracy and from 0.94 and 1.00 in AUC (Table 4).Among the 136 taxa, four species (Amphora fontinalis, Cocconeis sp., Eunotia minor, Navicula bacillum) exhibited a relatively high predictive power (accuracy: 0.95, AUC: 1.00), whereas Navicula cryptotenella and Achnanthes brevipes showed relatively low accuracy regarding prediction (0.76 and 0.79, respectively).To evaluate the contribution of environmental factors in predicting the occurrence of each diatom species, sensitivity analysis was conducted using the minimum description length (MDL) in RF.The most important factors impacting the occurrence of epilithic diatoms were turbidity (35 species, 26%), electrical conductivity (28 species, 21%), and salinity (14 species, 10%).These factors explained the appearance of 57% of total species in all the diatom communities of each site.Concentrations of TN (12 species, 9%) and TP (11 species, 8%) were also shown to be of relatively high importance for species appearance in diatom communities (Table 4).Other important factors were electrical conductivity for Navicula saprophila and Reimeria sinuata, which are the indicator species of Group Ia with high indicator values (>60%), and turbidity for Encyonema minutum, which is the indicator species of Group IIa (Table 4).

Discussion
The present study analyzes the distribution and abundance of diatom communities in estuaries of the Korean peninsula in relation to environmental conditions.A total of 327 diatom taxa were recorded at 161 sites in estuaries in the Korean peninsula during the research period.In comparison to previous studies identifying 284 taxa in 58 sites in 2012 [69] and 171 taxa in 38 sites in 2013 [2], our sample is more diverse, likely because the present study includes more sites than previous studies.With regard to the occurrence of diatom species in each group, Group Ib includes the largest number of diatom taxa (188 species), while Group IIa includes the smallest number of diatom taxa (128 species).In contrast, the abundance of diatoms was highest in Group IIb (2.5 ˆ10 7 cells/cm 2 ) and was lowest in Group Ib (1.8 ˆ10 7 cells/cm 2 ).However, there were no significant differences among groups (ANOVA, p > 0.05).
The overall sample of diatom communities in the estuaries of the Korean peninsula was classified into four subgroups through cluster analysis based on similarities in community composition.These four subgroups were reclassified into two groups, groups I and II, reflecting the characteristics of the geographical locations of the sites.The diatom communities comprising Group I were located near the East Sea and the eastern part of the South Sea, including the East Han River watershed and the Nakdong River watershed.The East Sea maintains high water quality due to its composition of land use/cover, which includes forest (more than 60%) and agricultural land use/cover (less than 20%) [3].Group II is comprised of diatom communities from the West Sea and the western part of the South Sea, including the Geum River and Yeongsan River watersheds.The proportion of agricultural land use/cover is far higher (more than 35%) in Group II than the average proportion of agricultural land use/cover in the rest of South Korea (28.8%).In addition, large-scale reclamation projects and land development work have resulted in severe water pollution of the streams in the region [3].In comparison to Group II, the regions of Group I have fewer nutrients with lower salinity and turbidity (ANOVA, p < 0.05; Figure 2).This finding coincides with the findings of previous studies, which assert that forest land cover has a negative correlation with nutrients [70,71].In addition, streams in the east coastal area of Korea are characterized by a series of ridges with decreasing altitudes from the Taebaek Mountains to the East Sea.Thus, small streams flowing into the East Sea have steep bed slopes and short river channels [72].The tidal effects are usually weak on the east coast of Korea due to this small tidal range [73].Meanwhile, Group II has a relatively higher proportion of agricultural land use/cover, more nutrients, higher electrical conductivity, and higher turbidity (Figure 2).In general, high proportions of agricultural land use/cover lead to higher concentrations of zinc, lead, cadmium, and manganese, as well as higher concentrations of nutrients (total nitrogen and total phosphorus), which aggravates water pollution [71,[74][75][76].Furthermore, the west coastal area of Korea has always had high turbidity levels due to its relatively large tidal range [77].
Indicator species in each group were determined through IndVal analysis, and the impact of environmental conditions on the indicator species was indicated.Among them, Reimeria sinuata, Cocconeis placentula var.lineata and Fragilaria rumpens var.fragilarioides in Group Ia and Rhoicosphenia abbreviata in Group Ib were positively correlated with dissolved oxygen (DO) and the proportion of forest land use/cover (Table 3, Figure 3).These saproxenous taxa are generally found in oligotrophic-mesotrophic waters [78,79] with high levels of dissolved oxygen (DO), and all of them occur in oceans, brackish water, or freshwater zones with high electrical conductivity [80].In addition, saprophilous taxa, such as Cyclotella meneghiniana and Navicula salinarum in Group IIb, have positive correlations with BOD, but a negative correlation with the proportion of forest land use/cover (Table 3).These taxa usually occur in polluted streams [81][82][83][84][85]. Tabularia fasciculata and Navicula capitatoradiata of Group IIa are positively correlated with salinity (Table 3).Previous studies [80,86] indicate that these are cosmopolitan species that favor saline brackish water zones or waters of high electrical conductivity.
A random forest model is a non-parametric method for predicting and assessing the relationships between a large number of potential predictor variables and repose variables [87].Through demonstration of their high capability for modeling ecological problems involving non-linear relationships between data, random forest models are known to have learning and predicative power, as well as explanatory capacity [88].In this study, we predict the occurrence of epilithic diatom communities using a random forest model based on differences in environmental factors.The importance of environmental factors used in the RF model was evaluated using the minimum description length (MDL), which measures the quality of attributes according to their ability to compress the data [66].According to our evaluation of the relative importance of environmental factors for the prediction of epilithic diatom communities, the factors that exhibited relatively higher importance in their contributions were physicochemical factors (over other factors such as land use/cover) based on the MDL in the RF model (Table 4).The most important factors affecting the occurrence of epilithic diatoms were turbidity (35 species, 26%), electrical conductivity (28 species, 21%), and salinity (14 species, 10%).These factors explained the appearance of 57% of total species in all the diatom communities of each site.The concentrations of TN (12 species, 9%) and TP (11 species, 8%) also showed relatively high importance (Table 4).Turbidity refers to reduced amounts of DO through the blocking of light, which is required in the short run for photosynthesis.In the long run, turbidity induces the growth of algae in water by gradually dissolving particulate nutrients that flow in with silt [89,90].In addition, electrical conductivity plays an important role in driving the structure and composition of diatom communities in the estuaries of the Korean peninsula [91,92].project for their help in sampling and analyses.The authors also thank the reviewers for their help in improving the scientific quality of the manuscript.

Figure 1 .
Figure 1.Map of 161 sampling sites for studies of water quality and epilithic diatoms in 90 Korean estuaries from 2012 to 2014; a: Dendrogram of site affinity based on diatom abundance, b: Colors in figures represent the same groups: Ia (blue), Ib (green), IIa (yellow), and IIb (red).

Figure 1 .
Figure 1.Map of 161 sampling sites for studies of water quality and epilithic diatoms in 90 Korean estuaries from 2012 to 2014; a: Dendrogram of site affinity based on diatom abundance, b: Colors in figures represent the same groups: Ia (blue), Ib (green), IIa (yellow), and IIb (red).

Table 1 .
Biological characteristics of Korean estuaries between 2012 and 2014.The estuaries were divided into four groups by cluster analysis based on diatom abundance.Small letters (a, b, and c) indicate Tukey's test values.The P value is computed from the F ratio which is computed from the ANOVA table.

Table 2 .
Species (N, %) and cell density (D, %) of epilithic diatom communities distributed in Korean estuaries between 2012 and 2014 at the family level.The estuaries were divided into four groups by cluster analysis based on diatom abundance.

Table 3 .
Correlation coefficients among indicator species and environmental variables in each group of Korean estuaries between 2012 and 2014.

Table 4 .
List of epilithic diatoms with the first and second most important variables predicting species appearance by a random forest model in Korean estuaries between 2012 and 2014.Ar: accuracy rate, AUC: area under the curve, GI: group indicator.