Artiﬁcial Neural Network (ANN) Modeling Analysis of Algal Blooms in an Estuary with Episodic and Anthropogenic Freshwater Inputs

: The Youngsan River estuary, located on the southwest coast of South Korea, has transi-tioned from a natural to an artiﬁcial estuary since dike construction in 1981 separated freshwater and seawater zones. This artiﬁcial transition has induced changes in the physical properties and circulation within the estuary, which has led to hypoxia and algal blooms. In this study, an artiﬁcial neural network (ANN) model was employed to simulate phytoplankton variations, including algal blooms and size fractions based on chlorophyll a , using data obtained by long-term monitoring (2008–2018) of the seawater zone of the Youngsan River estuary. The model was validated through statistical analyses, and the validated model was used to determine the contribution of the environmental factors on size-fractionated phytoplankton variations. The statistical validation of the model showed extremely low sum square error (SSE ≤ 0.0003) and root mean square error (RMSE ≤ 0.0173) values, with R 2 ≥ 0.9952. The accuracy of the model predictions was high, despite the considerable irregular-ity and wide range of phytoplankton variations in the estuary. With respect to phytoplankton size structure, the contribution of seasonal environmental factors such as water temperature and solar radiation was high for net-sized chlorophyll a , whereas the contribution of factors such as freshwater discharge and salinity was high for nano-sized chlorophyll a , which includes typical harmful algae. Notably, because the Youngsan River estuary is inﬂuenced by a monsoon climate—characterized by high precipitation in summer—the contribution of freshwater discharge to harmful algal blooms is predicted to increase during this period. Our results suggest that the ANN model can be an important tool for understanding the inﬂuence of freshwater discharge, which is essential for managing algal blooms and maintaining the ecosystem health of altered estuaries.


Introduction
Phytoplankton, as the primary producers in the marine ecosystem food web, influence herbivory levels and fishery production; additionally, they play an important role in the cycling of inorganic nutrients [1,2]. However, red tides caused by excessive algal blooms release toxic substances that affect the aquatic ecosystem as well as humans [3,4]. In addition, algae that are not consumed by herbivores sink to the bottom of the water body and can provide a substrate for aerobic decomposition, resulting in depleted levels of dissolved oxygen (hypoxia) [2,5]. To understand phytoplankton ecology, especially in the case of harmful algal blooms (HABs), it is important to understand both the quantitative biomass variations and the qualitative variations, such as changes in phytoplankton species composition and size structure. Phytoplankton size structure is known to be sensitive to physicochemical changes in the marine ecosystem, and variations in size structure can not only affect water quality but also the marine food web structure and fishery production [6,7].
Thus, analyzing the variations in phytoplankton size structure along with environmental factors is critical to understand and manage HABs as well as the entire aquatic ecosystem.
Turbulence occurs in estuaries due to their binary water layer structure, where freshwater meets seawater in a transition zone [8]. Estuaries, thus, harbor diverse species and are known to be highly productive systems due to nutrient loads from the adjacent land [9]. The production of organic matter per unit area is estimated to be 4-10-fold higher than in grainfields, and the ecological value per unit area is reported to be the highest among Earth's ecosystems, at USD 22,832 ha −1 yr −1 [10]. The Youngsan River (YR) estuary, the target estuary in the present study, is located on the southwest coast of South Korea and has a temperate and monsoonal climate (E 126 • 26 -126 • 18 , N 34 • 49 -34 • 44 ) [11]. A dike was constructed in 1981, and the resulting division between the freshwater and seawater zones transformed the YR estuary from a natural to an artificial estuary, with corresponding changes in water quality and environmental properties [12]. Since dike construction, the freshwater zone has experienced an increase in nutrient concentration and accelerated organic pollution [11], with recent incidences of hypoxia [13]. In the case of the seawater zone, although the normal state (negligible freshwater discharge) allows the physical properties and circulation to be controlled by tidal forcing, the opening of the sluice gate lets in a large quantity of freshwater that alters the physical properties and circulation in the estuary [14]. Moreover, with increased freshwater discharge during the summer season, there is an increase in the abundance of species that cause red tides [15,16].
In general, phytoplankton are affected by bottom-up control, involving nutrients, radiation intensity, water temperature, salinity, and physical circulation (advection and diffusion), and top-down control, which includes predation [17,18]. However, such factors exhibit complex characteristics in an estuary due to the freshwater influx from the river and seawater circulation by tidal forcing. Tidal forcing leads to high-frequency (diel and fortnightly) variations of the estuarine phytoplankton contents, whereas freshwater discharge produces low-frequency patterns, that is, seasonal and annual variations [19,20]. In an altered estuary, such as the YR estuary, where freshwater inflow is intermittent, artificial, and unpredictable, phytoplankton may vary in an irregular, high-frequency, and transient fashion rather than in a regular pattern. Thus, we hypothesized the following: "In the YR estuary with artificial and intermittent freshwater inflow, the variation in phytoplankton size structure is significantly influenced by environmental changes induced by freshwater inflow, including changes in salinity, nutrients, stratification, and turbulence, and these influences are greater than those of seasonal changes such as radiation intensity and water temperature. The responses to such environmental changes vary according to the phytoplankton size structure." To investigate the phytoplankton variations and the regulatory factors in the altered and complex aquatic system of the YR estuary, an artificial neural network (ANN) was applied. An ANN is a model based on the mathematical description of the brain's informationprocessing mechanisms that underlie human cognition and determination. The model is, thus, widely used in describing intricately related problems through learning [21][22][23][24][25]. Furthermore, ANNs are advantageous in that no mathematical relationship or assumption of data distribution is required; response time is relatively brief, and the prediction power improves with data accumulation. Ultimately, ANNs are a useful model in cases where identifying main regulatory factors is difficult due to the intricate connections among the biological, physical, and chemical factors. This describes the situation of phytoplankton variations in an altered estuary.
Recently, ANNs have been utilized to investigate phytoplankton dynamics [26,27]; however, there are only a few published studies of variations in algal size structure. Understanding the mechanism that drives phytoplankton size structure variations is challenging in the case of an altered estuary with artificial and irregular changes in the environment. Accordingly, a study based on long-term (~10 year) data may contribute to the understanding of algal blooms as well as the aquatic ecosystem in an altered estuary, such as the YR estuary that experiences severe anthropogenic changes in the environment. Thus, in the present study, an ANN model based on long-term data collected from 2008 to 2018 was employed to describe the size-dependent changes in the phytoplankton community structure and the complex and simultaneous correlations among environmental factors. Furthermore, the main environmental factors influencing phytoplankton size structure variations were identified using this novel model.

Sampling Site and Data Aquisition
The YR estuary is located in a temperate region (Figure 1) under the influence of a monsoon climate characterized by concentrated precipitation in summer and relatively low precipitation in other seasons, resulting in seasonal variations [11]. The YR estuary dike was constructed in 1981 to secure agricultural lands and aquatic resources by reclaiming the tideland 7 km upstream of the mouth of the estuary. The data on phytoplankton biomass (chlorophyll a) and environmental factors used in the ANN modeling consisted of measurements taken in the period September 2008-January 2018, either monthly or seasonally, at three points between Site A (St. A) near the YR estuary dike and Site C (St. C) at the open sea region. The data were collected by Coastal Estuarine Research Center, Mokpo National Maritime University. The methods of measuring and analyzing the data are described in the literature [11,28]. To examine the phytoplankton size structure variations, we divided the surface chlorophyll a (total) into net-sized (>20 µm) and nanosized (≤20 µm) fractions using a 20 µm Nitex mesh.

ANN Structure
We constructed a multilayer ANN model consisting of input, hidden, and output layers. The ANN connects the input layer via the hidden layer to the output layer. In this way, the model operates on a feed-forward structure without a direct connection between the input and output layers or between each neuron. The input layer receives external data and relays it to the hidden layer, which stores the required data based on each feature being analyzed through learning. The output layer produces the final outcome of the learned data. The function applied in transferring data between each layer was a sigmoid non-linear activation function producing values between 0 and 1. As sigmoid functions are based on curves rather than straight lines, differentiation is possible by using a back-propagation algorithm that mediates learning towards reducing the difference between output and predicted values by controlling the weight upon disagreements (the generalized delta rule) and terminating learning upon agreement [22,29]. In the present study, the ANN consisted of 14 input groups, 22 hidden groups, and 3 output groups ( Figure 2). MATLAB R2021a was used for model implementation.

Learning Dataset and Normalization
The learning data used in the ANN model were based on 14 items: freshwater discharge (discharge), water temperature (temperature), salinity, surface and bottom temperature difference (∆T), surface and bottom salinity difference (∆S), transparency, photosynthetically active radiation (PAR), duration of sunshine (DS), solar radiation (SR), NO 2 − + NO 3 − , NH 4 + , DSi, PO 4 3− , and dissolved inorganic nitrogen to dissolved inorganic phosphorus ratio (DIN/DIP) ( Table 1). Discharge is total amount of freshwater discharge for 4 days before sampling. The differences in water temperature and salinity in the surface water compared with the bottom samples indicated stratification and turbulence, and the DIN/DIP was used as an indirect indicator of nutrient limitation. The three items of output data were total chlorophyll a (total chl a) to indicate the total phytoplankton biomass [30], and size-fractionated (across 20 µm) biomass as net-sized chlorophyll a (net chl a) and nano-sized chlorophyll a (nano chl a). The output values of the ANN model were obtained through the sigmoid function that produces values between 0 and 1, and for unifying the scale and units of the 17 environmental factors, normalization was applied to the input and output data. Equation (1) was used to convert the values to those between 0.05 and 0.95.
where χ min : 5% minimum value χ max : 95% maximum value χ: original data The examples of the histogram and cumulative distribution function of normalized and original input variable are shown in Figure 3.

Statistical Validation
To determine the fit of the predicted values of the ANN to the actual measured values, the statistical validation suggested by the World Meteorological Organization (WMO) was used. The method involves the sum square error (SSE; Equation (2)) to represent the mean relative error, the root mean square error (RMSE; Equation (3)) to represent the mean absolute error, and R 2 (coefficient of determination; Equation (4)) as the criteria for accuracy.
where N: number of samples X i : measurement X im : mean of measurement Y i : output.

Environmental Impact Assessment
To examine the influence of the input environmental factors on the output sizefractionated phytoplankton biomass, a single input item was entered as a value between 0.1 and 0.9, and all other input factors were entered as 0.5, the median normalized value. ANN analysis could identify and characterize the environmental factors that influenced phytoplankton biomass. The difference between maximum and minimum values for the obtained results was estimated to determine the relative influence (contribution level) of each environmental factor on size-fractionated phytoplankton biomass. An increase in the difference indicated a higher influence of the factor on the output phytoplankton biomass.

Validation of ANN Model
To construct a suitable ANN model for predicting the environmental changes in the YR estuary, data collected through field investigations were used for ANN learning. To implement an optimal model, learning frequency was increased during training, and at ≥3000 epochs (error 8.94%), the error decreased to ≤10%. It was thus determined that the ANN prediction would be valid after ≥3000 learning epochs; nonetheless, our ANN model was subjected to 20,000 learning epochs, obtaining an error of 4.16%.
For the size-fractionated phytoplankton biomass, the predicted values obtained from the ANN model and actual measured values were compared (Figure 4). The results showed that the distributions of measured and predicted values were mostly similar across all sizes and sites. The predicted values of total chl a were slightly lower than the peaks measured at site A; otherwise, the accuracy was high across all periods and sites. The predicted net chl a was slightly lower than the peaks measured at Sites A and B. The predicted nano chl a showed a slight difference compared with the peaks measured at Site A. Otherwise, accuracy was high.
The statistical validation of the ANN model indicated high accuracy: SSE 0.0001-0.0003 and RMSE 0.0119-0.0173. In addition, across all size classes of phytoplankton biomass, R 2 was ≥0.9952 (Table 2).

Relationship between Size-Fractionated Phytoplankton and Environmental Factors
The correlation between the phytoplankton biomass and the environmental factors in relation to phytoplankton size structure ( Figure 5) showed a direct influence of freshwater discharge on phytoplankton biomass of all sizes (Figure 5a). In contrast, an increase in water temperature negatively affected phytoplankton biomass of all sizes (especially net chl a) (Figure 5b). An increase in salinity positively influenced net chl a in the beginning but salinity negatively influenced phytoplankton biomass of all sizes (Figure 5c). An increase in the surface and bottom ∆T positively affected total chl a and net chl a (Figure 5d). An increase in the surface and bottom ∆S had a positive impact across all sizes of phytoplankton biomass, but high surface and bottom ∆S negatively influenced net chl a (Figure 5e). An increase in transparency positively influenced total chl a and nano chl a (Figure 5f). Increasing PAR and DS negatively influenced total chl a and net chl a (Figure 5g,h). An increase in SR had a positive influence on phytoplankton biomass of all sizes (Figure 5i). The influence of nutrients was dependent on size classes (Figure 5j-l), except for PO 4 3− , which negatively influenced all sizes of phytoplankton biomass (Figure 5m). Increasing DIN/DIP (phosphorous restriction) positively influenced total chl a and nano chl a but negatively influenced net chl a (Figure 5n). The influence (contribution level) of each environmental factor on size-fractionated phytoplankton biomass ( Figure 6) showed that for total chl a the greatest influence came from SR, followed by water temperature, surface and bottom ∆T, PO 4 3− , transparency, PAR, DS, freshwater discharge, and surface and bottom ∆S (Figure 6a). For net chl a, the greatest influence came from water temperature, followed by surface and bottom ∆T, SR, transparency, DIN/DIP, NH 4 + , DSi, DS, and PAR (Figure 6b). For nano chl a, the greatest influence came from DIN/DIP, followed by PO 4 3− , DSi, SR, salinity, NH 4 + , and freshwater discharge (Figure 6c).

Applicability of the ANN Model to Algal Blooms in an Altered Estuary
Since the 1990s, the ANN model has served as a useful tool for understanding and predicting the non-linear relationship between phytoplankton and environmental factors [27,31]. Nevertheless, there have been few simulations of algal blooms in seawater regions of an altered estuary, such as the YR estuary, which has irregular and wide variations due to the direct influence of freshwater inflow. In fact, the input data used in the present study showed substantial variations in the distribution of size-fractionated phytoplankton biomass: 0.18-35.84 (total), 0.02-21.02 (net-size), and 0.02-29.22 (nano-size) µg L −1 ( Table 1). The substantial variations in phytoplankton biomass were attributed to environmental changes such as the presence or absence of freshwater discharge, differences in discharge quantity, and tidal forcing, in addition to seasonal changes [15,28]. The ANN model in the present study accurately reproduced the observed variations in phytoplankton biomass and size structure in the YR estuary ( Figure 4). The statistical validation of the ANN learning outcomes showed that for all predicted values, the SSE (≤0.0003) and RMSE (≤0.0173) were extremely low. Furthermore, the accuracy of the model predictions was high (R 2 ≥ 0.9952) when comparing the predicted values with the actual measurement data (Table 2). Based on this, the direction and level of contribution of the intricately related and simultaneous environmental factors could be determined. Thus, the ANN model employed here is likely to be a powerful tool for studying irregular and highly variable phenomena, such as algal blooms in an altered estuary.

Factors Influencing Variations in the Phytoplankton Size Structure
The environmental factors influencing phytoplankton growth in coastal seawater regions include water temperature, nutrient levels, and turbulence; however, the most fundamental factor is light [32,33]. Accordingly, in the present study, an increase in transparency, which directly influenced the light available for phytoplankton, was found to contribute to an increase in phytoplankton biomass, especially for net-sized phytoplankton (Figures 5f and 6b). However, although phytoplankton may exhibit the maximum photosynthetic rate under an optimum light intensity, photoinhibition may occur with a reduced photosynthetic rate under a light intensity above the optimum level [34]. Furthermore, an increase in PAR was shown to adversely affect phytoplankton biomass (total and net-sized), with a particularly significant effect across high PAR levels ( Figure 5g). On the contrary, an increase in solar radiation, a seasonal factor, significantly contributed to increased phytoplankton biomass (especially total and net-sized), whereas an increase in DS had a negative effect. This indicates the potential influence of seasonal factors on net-sized phytoplankton variations.
Phytoplankton biomass also shows seasonal increases with increased water temperature, owing to the favorable growth conditions with the low Pheopigments/Chl a [35,36]. However, in the present study, an increase in water temperature negatively affected phytoplankton biomass, especially for net-sized phytoplankton (Figure 5b). Such a negative influence might account for the large-scale algal blooms that occurred during winter in the YR estuary [37], which is consistent with the results of a previous study on the characteristic preference of net-sized phytoplankton for relatively low water temperature compared to smaller phytoplankton [38]. For the nano-sized phytoplankton, the influence of water temperature was present but at an extremely low level, suggesting a potentially greater influence of other environmental factors.
An increase in salinity was observed to adversely affect the biomass of nano-sized phytoplankton ( Figure 5c). In other words, a decrease in salinity could potentially contribute to an increase in the biomass of nano-sized phytoplankton. The main factor inducing a sudden decline in salinity in an altered estuary like the YR estuary is freshwater discharge through the sluice gate of the estuary dike. The freshwater zone of the YR estuary is mainly dominated by nano-sized phytoplankton [39], such that their biomass may increase in the seawater zone during periods of low salinity, i.e., upon freshwater discharge. However, in a natural estuary, the freshwater flowing in from the river directly influences the salt distribution in the seawater zone and nutrient concentrations [40]. Notably, the supply of nutrients to the seawater zone plays an important part in increasing primary production by phytoplankton [41,42]. Accordingly, in the present study, an increase in freshwater discharge significantly contributed to the increase in the phytoplankton biomass (especially for nano-sized phytoplankton) (Figures 5a and 6c), and the supply of nutrients such as NH 4 + exhibited a similar level of contribution (Figures 5k and 6c). The YR estuary is influenced by a monsoonal climate, characterized by increasing precipitation towards summer when water temperature and solar radiation are high; in addition, there is a proportional increase in the freshwater discharge through the sluice gate [43]. The immediate result of freshwater discharge is a decline in salinity in the seawater zone and an influx of a large quantity of nano-sized phytoplankton present in the discharged freshwater, resulting in an increase in the biomass of nano-sized phytoplankton [15]. Moreover, after the sluice gate is closed, a nano-sized red tide species may cause HABs while salinity recovers [15]. The anthropogenic freshwater inflow thus creates stratified water layers due to the lower salinity and higher water temperature. In the present study, stratification due to high water temperatures and low salinities increased the biomass of nano-sized phytoplankton (Figure 5d,e). However, the semidiurnal and fortnightly tidal forcing cycles in the estuary may induce variations in the direction and intensity of the tidal current, consequently affecting salinity and water temperature. Thus, in the future, short-term variation factors such as tidal forcing should also be considered as environmental factors.
In summary, the environmental factors that significantly contributed to phytoplankton variations in the YR estuary were mostly seasonal factors, including water temperature, solar radiation, and duration of sunshine, as well as factors related to the artificial inflow of freshwater, including discharge, salinity, nutrients, transparency, and the difference between surface and bottom water temperature. The contribution of the environmental factors varied according to phytoplankton size, and seasonal factors generally had a greater influence on the variation of net-sized phytoplankton. In contrast, the most influential factors for the variation of nano-sized phytoplankton, including the red tide species such as Heterocapsa sp. [15], were factors related to freshwater inflow. Taken together, the results of our study demonstrate that the management of anthropogenic freshwater discharge is essential for preventing and controlling HABs in an altered estuarine system, such as the YR estuary, and for managing overall aquatic ecosystem health.

Conclusions
In this study, we applied the ANN model to analyze the influence of environmental factors on variations in size-fractionated phytoplankton biomass. The ANN model was constructed using long-term data collected over 10 years in the YR estuary, which has been altered by the construction and operation of a dike. The results showed that anthropogenic, irregular, and transient freshwater inflow as well as seasonal environmental changes contributed to phytoplankton variations, and that the scale of the influence varied according to phytoplankton size structure. For net-sized phytoplankton, environmental factors related to seasonal changes had a high contribution level. On the contrary, for nanosized phytoplankton, environmental factors related to freshwater inflow showed a high contribution level. Notably, as the YR estuary is under the influence of a monsoonal climate characterized by high levels of precipitation and discharge in summer, the region may encounter HABs of the nano-sized red tide species during this period. Thus, the impact of freshwater inflow is predicted to be higher in this estuary compared to other temperate estuaries. Therefore, to prevent HABs in an altered estuary like the YR estuary, the management of anthropogenic freshwater discharge is essential. Furthermore, based on our results, ANN modeling could be instrumental in investigating algal blooms in environments such as estuaries with irregular and wide variations.