Chlorophyll-a (Chl-a) concentration is one of the most commonly used water quality parameters. Its high levels indicate a state of eutrophication, most often due to an abundance of nutrients [
2]. It is known to not only cause the death of aquatic life but also constitutes a threat to public health [
3]. The traditional measurement of Chl-a concentration is a prolonged and labor-intensive process. It involves arduous and repeated sample acquisitions from the field and their subsequent laboratory analysis. The end results, in addition, only reflect the water quality of a handful of measurement sites at most, at often large temporal intervals, or even fewer, in cases where the lake is geographically inaccessible [
4]. Therefore, developing a computer model that employs as input remote sensing images acquired over the area of interest, and that outputs pixel-level Chl-a estimations, constitutes a highly attractive and efficient alternative solution, capable of the systematic monitoring of an entire water body, rendering both field visits and laboratory analysis redundant.
The research question underlying this study is how to develop such a computer model, capable of exploiting the scarce labels as well as abundant unlabeled data.
1.1. Related Work
The assessment of water quality from remote sensing images in terms of Chl-a concentration constitutes a long-standing challenge, with an abundance of published approaches spanning open oceans [
5], and coastal [
6] and inland waters [
7]. Chl-a retrieval algorithms can be aggregated into two broad categories: semianalytical and semiempirical [
8]; for a comprehensive survey, the reader is referred to [
9].
Semianalytical approaches rely on radiative transfer theory and modeling the propagation of light in water, through the radiative transfer equation that connects inherent optical properties of lake water with its radiance levels [
10]. Solving this equation via either look-up tables or inversion methods can lead to the estimation of Chl-a concentration. Even though they can be developed with no available field samples, and are broadly applicable thanks to their physical foundation [
11], they still require ample prior information about the lake’s optically active water constituents. However, their main disadvantage is their high sensitivity to atmospheric effects, as the models rely heavily on precise radiance readings, thus limiting their operational capacity [
12].
Semi-empirical approaches, on the other hand, are more robust against atmospheric effects and do not require prior information about a lake’s physical characteristics [
9]. They rely on feature engineering; in other words, each pixel of an often-optical remote sensing image is described through a numerical feature vector that is subsequently provided to a statistical/machine learning algorithm with the end goal of developing a regression model for Chl-a estimation. Notable examples include Gaussian processes [
13], multilayer perceptron [
14], and support vector regression [
15]. Comprehensive comparative studies of such features have been provided in [
16]. As features, various linear and nonlinear band combinations can be encountered in the state-of-the-art [
17], involving mostly blue–green and red–near-infrared bands [
18], as well as various expertly derived band ratios and indices [
19]. Naturally, they require extensive in situ measurements of Chl-a in order to produce models with high estimation accuracy.
The advent of satellites such as Sentinel-2, with shorter revisit times and higher spatial and radiometric resolutions, has not only paved the way for new advances, but has also rendered necessary the use of more elaborate methods, to deal with the greater level of image detail [
20]. More precisely, the results of an extensive comparative study across various band ratios, using Sentinel-2 multispectral images with multiple atmospheric correction processors, has been presented in [
16]. Jadidi et al. [
21] explored spectra-derived features through color space and coordinate transforms, from images belonging to three sources while working on data collected from Central European lakes. The combined use of spectral–spatial features obtained through connected morphological operators has also been investigated [
22]. One of the most extensive studies is due to Neil et al. [
23], who explored 185 inland and coastal aquatic systems at a global level, through which the performances of 48 distinct Chl-a estimation algorithms were tested.
However, the Chl-a estimation performance of semiempirical approaches depends heavily on the effectiveness of the underlying machine learning method. Consequently, the paradigm-shift that has occurred in the field of machine learning through the advent of deep learning (DL) [
24] is of paramount importance in this context. More specifically, DL has led to groundbreaking performances in various highly challenging computer vision tasks, especially via convolutional neural networks (CNNs) [
25]. This new family of algorithms enabling the design and efficient training of deep neural networks has rendered, de facto, the process of feature engineering redundant, as it is now delegated to the network itself. Depending on the quality of the provided data, the deep networks often compute features outperforming handcrafted alternatives. For a survey of DL applications in remote sensing, the reader is referred to [
26].
It is thus not surprising that DL methods have been already applied to Chl-a estimation. In particular, Peterson et al. [
27] have developed an artificial neural network (ANN) with six hidden layers, operating on spectral signatures of Landsat-8 and Sentinel-2 images at 30 m resolution, and applied it to the water quality estimation of lakes in the United States. It constitutes one of the first studies that assesses the suitability of DL for this context, and their results show increased accuracy and robustness with respect to alternative methods. In addition, Pu et al. [
28] reported the first application of CNNs to this problem, where small patches of Landsat-8 images, centered on the pixel under study, were used as network input, with the end goal of classifying the water quality of Chinese lakes. Since, however, Chl-a estimation is inherently a regression problem, such networks estimating the direct Chl-a level of its input have appeared as well [
29,
30]. Moreover, in an effort to exploit temporal correlations across multitemporal data, long short-term memory networks have also been studied, both individually [
31] and together with CNNs [
32]. Lastly, a highly comprehensive study was presented in [
8], on lakes across four continents, using a relatively large dataset of 2943 samples. They proposed a single cross-mission solution for both Sentinel-2 and Sentinel-3 sensors, based upon a five-layer mixture density network, and obtained remarkable results.
1.2. Aim and Contribution
As with many remote sensing applications (e.g., semantic segmentation), overfitting due to labeled data shortage is a critical problem in this context as well. In fact, even more so, as, for each Chl-a sample of the ground truth, a rigorous in situ sample collection and laboratory analysis workflow is required (as opposed to an expert labeling pixels on a screen), and as such, labeled datasets are often relatively small. This problem is further exacerbated by the notorious labeled data need of contemporary deep networks. Consequently, it is imperative to exploit, to the maximum extent possible, whatever precious little amount of data is available.
To this end, this article’s main contribution is a new network structure that learns to estimate Chl-a concentration levels while exploiting both labeled image locations, as well as unlabeled portions of the same visual content. We propose to achieve this via a multitask double-branch CNN, whose main task is Chl-a level regression via training with labeled input, and whose auxiliary task is the classification of unlabeled samples to its month of acquisition. The proposed approach was tested with multispectral Sentinel-2 images of Lake Balik in North Turkey, using samples collected over 3 years. The ablation studies conducted show that the inclusion of unlabeled data improves the correlation coefficient of the resulting model by multiple percentile points.
In the sequel of this article, following an overview of the collected dataset and study area (
Section 2.1), we elaborate on the proposed method of Chl-a concentration estimation (
Section 2.2), present the results of our regression experiments (
Section 3), and discuss our findings, while
Section 4 is devoted to concluding remarks.