1. Introduction
Excessive nitrogen and phosphorus elements in the water environment will cause water eutrophication [
1]. In this eutrophic water environment, cyanobacteria will overproduce, which is called an outbreak of cyanobacterial blooms. The harm of this is huge: the visible harm is that the water body becomes green and smelly, affecting the water appearance and water quality; the invisible harm is that cyanobacterial blooms produce harmful toxins [
2,
3], poisoning fish and shrimp and other aquatic plants in the aquatic environment, as well as humans and animals [
4,
5], bringing huge losses to the farming industry and normal production life.
In the past 30 years, harmful algal blooms (HABs) have occurred frequently in China’s coastal waters, resulting in economic losses of more than CNY 5.9 billion due to massive fish and shellfish kills and negative impacts on tourism [
6]. Aguilera et al. [
7] searched the published literature on the occurrence of cyanobacterial blooms and cyanobacterial toxins and found a total of 241 bloom events between 1994 and 2014 in Argentina. Gorham et al. [
8] found a significant positive correlation between drinking water sources impacted by cyanobacterial blooms and hepatocellular carcinoma incidence rates. If the cyanobacterial bloom concentration can be accurately predicted in advance, the prevention and control measures can be deployed in advance, and the supply of alternate drinking water sources can be carried out to minimize the harm that cyanobacterial blooms may produce. Therefore, the prediction of cyanobacterial bloom concentration has been a research topic of interest to scholars.
The challenge of accurate prediction for cyanobacterial bloom concentration is two-fold. On the one hand, there are many factors affecting the growth of cyanobacteria, such as the water temperature, pH, water conductivity, turbidity, etc. [
9]. The key to solve this problem is to determine the magnitude of the influence of external factors on the growth of cyanobacteria, which can be obtained by the correlation among the sequence of external factors. On the other hand, the growth changes in cyanobacteria are irregular and easily affected by external factors [
10]. Traditional prediction methods include nutrient models based on cyanobacterial growth mechanisms [
11,
12] and ecodynamic models [
13,
14]. Nutrient salt models consider the interaction between algal biomass changes and nutrients and judge the water quality by the obtained cyanobacterial biomass changes. These traditional models are not applicable to waters with a large spatial–geographic extent. Ecodynamic models, such as WASP (Water Quality Analysis Simulation Program), EFDC (Environmental Fluid Dynamics Code), and CE-QUAL-W2 (two dimensional hydrodynamic and water quality model) [
15], consider the effects of physical, chemical, and biological processes on the water ecosystem and simulate the dynamic changes in algae. These models can reflect the growth characteristics and patterns of algae, which are of great significance for understanding and preventing cyanobacterial bloom outbreaks. However, these models have a large number of parameters to be estimated, require actual data of the water ecosystem for parameter optimization rate determination, and are more dependent on experience.
Recently, artificial intelligence models have been applied to the field of cyanobacterial bloom concentration prediction. Artificial neural networks (ANN) have greater advantages for analyzing complex data [
16,
17,
18] and can provide effective solutions to nonlinear problems. For example, Recknage et al. [
19] developed an artificial neural network prediction model using historical data on algal biomass and external driving variables observed in four different freshwater lake systems. Hill et al. [
20] developed a detection and prediction system for harmful algal blooms based on a convolutional neural network (CNN) to monitor Mexican waters using remote sensing short-term data. Cho et al. [
21] applied the long short-term memory (LSTM) networks to predict the concentration of chlorophyll-a (a recognized characterization of algal activity) using the daily water quality data as input, which showed a better performance in 4-day and 1-day prediction tasks. These models all demonstrate the excellent ability of deep learning methods for algal bloom prediction. However, all of these models require a large amount of historical data to train in order to obtain accurate models. Regardless, there exist some water areas where the amount of monitoring data obtained is relatively small due to an inconvenient location, late start of monitoring, or frequent sensor failures. Thus, it is difficult to train accurate prediction models for these water areas with a low data volume.
Transfer learning is the approach that can address the problem introduced above. The concept of transfer learning is to apply knowledge or patterns learned in one task to different but related tasks so that these tasks can be solved more effectively and efficiently. For example, Wu et al. [
22] proposed a method combining industry chain information transfer learning with a deep learning model to predict stock quotes, which improved the prediction accuracy of a target stock market index. Grubinger et al. [
23] proposed an online transfer learning framework for predicting residential temperatures that significantly improved the prediction accuracy using data from just a few weeks before new construction. Hu et al. [
24] applied transfer learning techniques to predict short-term wind speeds on newly built farms using data training from data-rich farms. These above literatures prove the effectiveness of the transfer learning, especially in the case of a small amount of data.
Based on the idea of transfer learning, Tian et al. [
25] presented a transfer-learning-based neural network model for chlorophyll-a dynamics prediction in an estuary reservoir in eastern China for a long-term application, under a small-time interval condition. Different from the literature [
25], we propose a prediction method based on transfer learning to solve the problem of a small amount of data in some water areas. When the amount of data in the target domain is small, the model cannot be well trained by only using the data in the target domain. However, the knowledge of the cyanobacteria bloom growth in different water areas is similar. Thus, the motivation of this study is to fine-tune by freezing some parameters of the model to realize the prediction of cyanobacterial bloom concentration across different water regions. In addition, to reduce the effects of diversities of different water areas, the prediction model for the target domain is different from the source domain, which uses a CNN network for sequence feature extraction and a fine-tuned model together.
The main contributions of this paper are as follows: (1) a fused transfer learning model is proposed to achieve the prediction of cyanobacterial bloom concentration across different water areas; (2) a bidirectional long short-term memory (BiLSTM) network is used to set up the source domain model, which can extract sequence long-term dependence to learn cyanobacteria bloom growth knowledge; (3) a two-branch model is presented for the target domain, where one branch is based on a CNN network for sequence feature extraction and the other branch is the fine-tuned model. In addition, various experiments on the real monitoring water quality data are conducted. The experimental results show that the error of the proposed model is lower than that of the model trained alone at the target domain, which proves the effectiveness and efficiency of the proposed model.
This paper is organized as follows:
Section 2 shows the details of the research data and the proposed method;
Section 3 gives out the experiments and results. Furthermore, some discussions on the generalization ability and the performance on different prediction times of the proposed method are given out in this section;
Section 4 gives out the conclusion and possible future research directions.
4. Conclusions
In this paper, deep learning and transfer learning techniques are applied to the prediction of cyanobacterial bloom concentration time series in aquatic systems, and a fused transfer learning model is proposed to transfer knowledge from waters with abundant water quality monitoring data to waters with insufficient water quality monitoring data to achieve the cross-water prediction of the cyanobacterial bloom concentration. Transfer learning has some benefits in improving the model performance. The potential practical value of this work is that we can save the amount of monitoring data collected and, for waters with inconvenient geographic locations, the number of sensors used, saving manpower and material resources. However, there are some limitations of the proposed model, which is a single-source domain transfer learning model that has a relatively short forecast time period. These problems should be further studied.
The research data in this paper are all from various stations in Taihu Lake, so, in future work, we will investigate whether the model can cover a wider area, which would be very meaningful if feasible. In addition, we will consider building a fusion model of deep learning and transfer learning to combine remote sensing images to predict cyanobacterial bloom concentration sequences in order to achieve a better prediction accuracy.