Next Article in Journal
Spatiotemporal Dynamics and Multiple Drivers of Vegetation Cover in the Jinsha River Basin: A Geodetector-Based Analysis
Previous Article in Journal
Machine Learning-Based Soil Moisture Inversion from Drone-Borne X-Band Microwave Radiometry
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Deep Meta-Connectivity Representation for Optically-Active Water Quality Parameters Estimation Through Remote Sensing

Electronic Information School, Wuhan University, Luoyu Road 129, Wuhan 430072, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Remote Sens. 2025, 17(16), 2782; https://doi.org/10.3390/rs17162782
Submission received: 24 June 2025 / Revised: 26 July 2025 / Accepted: 9 August 2025 / Published: 11 August 2025
(This article belongs to the Section Remote Sensing in Geology, Geomorphology and Hydrology)

Abstract

Monitoring optically-active water quality (OAWQ) parameters faces key challenges, primarily due to limited in situ measurements and the restricted availability of high-resolution multispectral remote sensing imagery. While deep learning has shown promise for OAWQ estimation, existing approaches such as GeoTile2Vec, which relies on geographic proximity, and SimCLR, a domain-agnostic contrastive learning method, fail to capture land cover-driven water quality patterns, limiting their generalizability. To address this, we present deep meta-connectivity representation (DMCR), which integrates multispectral remote sensing imagery with limited in situ measurements to estimate OAWQ parameters. Our approach constructs meta-feature vectors from land cover images to represent the water quality characteristics of each multispectral remote sensing image tile. We introduce the meta-connectivity concept to quantify the OAWQ similarity between different tiles. Building on this concept, we design a contrastive self-supervised learning framework that uses sets of quadruple tiles extracted from Sentinel-2 imagery based on their meta-connectivity to learn DMCR vectors. After the core neural network is trained, we apply a random forest model to estimate parameters such as chlorophyll-a (Chl-a) and turbidity using matched in situ measurements and DMCR vectors across time and space. We evaluate DMCR on Lake Erie and Lake Ontario, generating a series of Chl-a and turbidity distribution maps. Performance is assessed using the R 2 and R M S E metrics. Results show that meta-connectivity more effectively captures water quality similarities between tiles than widely utilized geographic proximity approaches such as those used in GeoTile2Vec. Furthermore, DMCR outperforms baseline models such as SimCLR with randomly cropped tiles. The resulting distribution maps align well with known factors influencing Chl-a and turbidity levels, confirming the method’s reliability. Overall, DMCR demonstrates strong potential for large-scale OAWQ estimation and contributes to improved monitoring of inland water bodies with limited in situ measurements through meta-connectivity-informed deep learning. The temporal-spatial water quality maps can support large-scale inland water monitoring, early warning of harmful algal blooms.

1. Introduction

Remote sensing of optically-active water quality (OAWQ) using multispectral imagery is an effective strategy for monitoring inland water bodies, providing critical information for managing water resources, assessing pollution, and understanding ecological changes. Since 1978, more than ten optical remote sensing sensors have been launched, enabling a collection of extensive multispectral data from instruments such as MODIS, Landsat 8-OLI, Sentinel-2, and the Chinese HY-1 [1]. The emergence of these remote sensing technologies has significantly advanced OAWQ-related research [2].
Machine learning (ML) methods have been widely adopted for estimating OAWQ parameters from remote sensing data [3,4,5]. These methods are ideal for large-scale OAWQ monitoring. Unlike traditional analytical [6,7] and semi-analytical [8,9] approaches, ML methods do not require real-time atmospheric data or complex measurements of absorption and backscattering, which are often difficult to obtain. Both traditional ML [10,11,12]) and deep learning (DL) [13,14,15,16] follow a similar process for OAWQ estimation, mapping satellite images to in situ measurements, then training an ML model and estimating OAWQ parameters with this model. However, the accuracy of these models depends heavily on the availability of in situ OAWQ data samples that are spatially and temporally aligned with the remote sensing data. Although DL models generally outperform traditional ML models, they require more extensive and distributed in situ data [17,18], which are often scarce due to the high cost and labor-intensive nature of sample collection and analysis.
To address the challenge of limited in situ data, transfer learning (TL) approaches have been developed. Most TL methods that belong to inductive TL methods [19,20] involve pretraining a model on one water body and fine-tuning it on another [21,22,23,24]. Fine-tuning, which adjusts some or all layers of the model, has shown improved performance compared to models trained on smaller datasets [19]. However, the success of TL depends on the similarity between the source and target water bodies. When the two are similar, fine-tuning is more likely to succeed [22], but the approach may fail when the water bodies differ significantly [1]. Contrastive self-supervised learning offers a promising solution to the data scarcity problem. This method involves pretraining a general model on a large set of unlabeled images and then fine-tuning it on a specific task with limited labeled data [25]. Inspired by this approach, our research leverages contrastive self-supervised learning to improve OAWQ estimation, particularly in scenarios where in situ data are limited.
DL models [19] for remote sensing of OAWQ parameters have been presented based on a widely accepted assumption in remote sensing, namely that geographically proximal water bodies tend to exhibit similar water quality characteristics, while those more geographically distant are more likely to possess divergent ones [19,26]. A single DL model typically captures the relationship between multispectral images and OAWQ parameters across an entire water body. However, this assumption does not always hold. We found that the two far-away water sites have the same water quality level [27]; for instance, the northern region (25.95°N, 100.16°E) of Erhai Lake has similar water quality to the southern region (25.60°N, 100.24°E) [27] according to the Environmental Quality Standard for Surface Water (GB3838-2002) [28]. Despite their geographical separation, both regions exhibit poor water quality. The northern area of Erhai Lake consists of farmland and villages, while the southern part of Erhai Lake includes Dali City. The same phenomenon also appears in [1], where distant sites shared similar OAWQ levels. These findings suggest that OAWQ similarity between water sites is influenced by factors beyond mere geographical proximity.
Water quality is influenced by two main categories of factors: anthropogenic (human-related) and natural [29,30]. Anthropogenic factors encompass activities such as residential development, urbanization, industrial operations, and agricultural practices, including the use of fertilizers, manures, pesticides, irrigation, and aquaculture. Natural factors, on the other hand, include climate change, natural disasters, geological processes, and interactions between the surface water and its environment [13,31]. Natural environmental factors often manifest with considerable dynamism; in parallel, anthropogenic influences are typically distinguished by their sustained pressures or cumulative impacts, often resulting in persistent long-term modifications to environmental systems. Among these, human activities are the primary drivers of changes in water quality. Land cover data provide insights into anthropogenic activities and can serve as an indicator of the surrounding water quality. For instance, the condition of a water body, i.e., whether it is good or poor, can often be inferred from the land cover in its vicinity [30,32]. Although land use data can be used to estimate the water quality of nearby water bodies, this approach may not always be precise. Nonetheless, it offers a useful, albeit approximate method for assessing water quality based on surrounding land characteristics.
This paper presents a new approach for assessing spatial patterns in OAWQ by introducing the concept of meta-connectivity, which captures the similarity of OAWQ characteristics across different regions. To quantify this, we construct meta-feature vectors, with the distances between these vectors reflecting the degree of similarity in water quality profiles. We also propose a contrastive self-supervised learning framework to develop a deep meta-connectivity representation (DMCR) from multispectral remote sensing image tiles. DMCR is designed to account for meta-connectivity beyond regional similarities in water quality. Finally, we demonstrate the effectiveness of DMCR by applying it to estimate key OAWQ parameters, including chlorophyll-a (Chl-a) and turbidity, which are essential for monitoring aquatic health. The main contributions of this work are summarized below:
  • We define meta-connectivity as a measure of similarity in water quality between multispectral image tiles based on the distance between their corresponding meta-feature vectors. Each meta-feature vector is constructed by using land cover information aligned with that multispectral tile, capturing key environmental characteristics that influence water quality across regions.
  • We develop a contrastive self-supervised learning framework that generates deep meta-connectivity representations (DMCR) for multispectral tiles. The model is trained on groups of four tiles selected based on their meta-connectivity, encouraging it to learn meaningful spatial patterns. A random forest (RF) model then uses the DMCR vectors along with matched in situ measurements to estimate key water quality indicators such as chlorophyll-a and turbidity.
  • We conduct comprehensive experiments to evaluate the performance of our approach. These include both quantitative assessments of estimation accuracy and qualitative analyses through visualization of chlorophyll-a and turbidity distributions across Lake Erie and Lake Ontario. Results confirm that DMCR offers a reliable and effective solution for monitoring optically-active water quality parameters.

2. Materials and Methods

2.1. Study Area

We chose Lake Erie and Lake Ontario as a case study to test our method.
Lake Erie (42.2°N, 81.2°W) is the fourth-largest lake by surface area of the five Great Lakes in North America. Situated on the International Boundary between Canada and the United States, Lake Erie’s northern shore is the Canadian province of Ontario, specifically the Ontario Peninsula, with the U.S. states of Michigan, Ohio, Pennsylvania, and New York on its western, southern, and eastern shores, respectively. Figure 1 shows the land cover along Lake Erie. Industrial outfalls, municipal sanitary and storm sewer outfalls, and diffuse sources such as overland runoff from farm and forest land all add to eutrophication and cyanobacterial blooms in Lake Erie [33]. Large-area water quality monitoring is essential for prevention of eutrophication and blue–green alga as well as for waste water restoration.
Lake Ontario (43.7°N, 77.9°W) is one of the five Great Lakes of North America. The Canada–United States border spans the center of the lake. It borders Ontario, Canada on the north, west, and southwest sides and New York State, USA on the south and east. The Canadian cities of Hamilton, Kingston, Mississauga, and Toronto are located on the lake’s northern shorelines, while the Canadian city of St. Catharines and the American city of Rochester are located on the south shore. Figure 2 shows the land cover around Lake Ontario. Lake Ontario has been ranked as the most environmentally stressed among the five Great Lakes [34], as it is the furthest downstream of the Great Lakes and the pollution from all the other lakes flows into it. Other stresses on the lake include fertilizer runoff from agriculture, toxic chemicals from industries along the rivers, and metropolitan drainage from large cities. Therefore, large-area water quality monitoring can support cleanup projects for Lake Ontario, such as the final stage of cleaning up the contaminated Randle Reef currently getting underway in Hamilton Harbour.

2.2. Data

2.2.1. In Situ Data

In the past decade, Lake Erie and Lake Ontario have experienced extensive blooms of toxic blue–green algae, also known as harmful algal blooms [35]. Although the causes of algae blooms are complex, many industrial and municipal sources and farming practices contribute much to such blooms. Chl-a is a main indicator of the amount of algae in the water [36]. A high level of Chl-a suggests high phytoplankton biomass, indicating eutrophication. Turbidity is the measure of the relative clarity of a liquid. Materials that cause water to be turbid can include algae, dissolved colored organic compounds, plankton, and other microscopic organisms. In this study, we have selected Chl-a and turbidity as observation objects.
In situ Chl-a and turbidity measurements are the basis for remote sensing model establishment, testing, and verification. We downloaded in situ measurement data from the open Great Lakes Water Quality Monitoring and Surveillance Data website (https://data-donnees.az.ec.gc.ca/data/substances/monitor/great-lakes-water-quality-monitoring-and-aquatic-ecosystem-health-data/great-lakes-water-quality-monitoring-and-surveillance-data/?lang=en (accessed on 30 November 2024)). We downloaded Chl-a and turbidity records on Lake Erie from 2000 to 2023. The records were obtained from 73 stations distributed over Lake Erie. We also downloaded Chl-a and turbidity records on Lake Ontario from 2001 to 2023. These records were observed by 96 stations distributed over Lake Ontario. The locations of observation sites on Lake Erie and Lake Ontario are shown in Figure 3. Water quality parameters were observed monthly at each site on Lakes Erie and Ontario. The accumulated data reflect the water quality of the two studied lakes, and are used to support our research.

2.2.2. Satellite Data

We chose Sentinel-2 images to study the remote sensing of OAWQ due to their high spatial and temporal resolution. Sentinel-2 is an Earth observation mission with two identical satellites, Sentinel-2A and Sentinel-2B. Each Sentinel satellite carries a single instrument, the Multi-Spectral Instrument (MSI), which has 13 spectral channels in the visible near-infrared (VNIR) and short-wave infrared (SWIR) spectral range. The two satellites are phased 180 degrees from each other in the same orbit. The revisit cycle can be completed in 5 days. With MSI, optical imagery at high spatial resolutions of 10 m, 20 m, and, 60 m can be acquired. Sentinel-2 provides coverage over all continental land surfaces, including inland waters between latitudes 56° south and 82.8° north.
We utilized Sentinel level-2A (L2A) products for water quality estimation [37,38]. The level-2A product provides atmospherically corrected surface reflectance from level-1C products. Level-2A image products are resampled with an equal spatial resolution of 20 m for all bands, which may have 10 m, 20 m, or 60 m resolution prior to resampling. We downloaded the L2A images from the recommended website (https://browser.dataspace.copernicus.eu (accessed on 30 November 2024)). The principle of choosing the images for water quality estimation involved three points. Below the Table 1, we use the chlorophyll-a (Chl-a) for Lake Ontario as an example.
Three Sentinel-2 images are required to fully cover Lake Ontario. The Sentinel-2 range column in the table specifies the geographic extents of these three regions. A successful match was determined based on the following criteria:
  • Temporal proximity—The in situ measurement collection date must be within ±5 days of the Sentinel-2 acquisition date.
  • Spatial coverage—The monitoring station’s latitude and longitude must fall within the corresponding Sentinel-2 image’s geographic range.
  • Cloud-free conditions at monitoring sites—Only images without cloud coverage at the field measurement locations are considered valid.
Following these image selection criteria, the subsequent datasets of matched L2A images and in situ measurements were compiled. We assume that a 11 × 11 pixel tile has the same water quality. Tiles were selected for training and testing under the principle of the smallest distance between the center pixel and the in situ sample. For Chl-a estimation, 57 L2A images of Lake Ontario spanning the period from April 2016 to April 2023 were deemed qualified. The number of matched in situ Chl-a records was 82. For Lake Erie, 62 L2A images spanning the period from May 2017 to May 2023 were deemed qualified. The number of matched in situ Chl-a records was 47. For turbidity estimation, 56 L2A images of Lake Ontario spanning the period from August 2018 to April 2023 were deemed qualified. The number of matched in situ turbidity records was 113. For Lake Erie, 75 L2A images spanning the period from May 2019 to May 2023 were deemed qualified. The number of matched in situ turbidity records was 191. The numbers for both L2A images and in situ measurements support our research.

2.3. Meta-Connectivity of Water Quality Between Tiles

Land use patterns significantly influence surface water quality parameters by governing pollutant sources and transport pathways [39,40]. Our analysis reveals that water quality similarity between two locations is influenced not just by their geographic proximity, but is also significantly influenced by adjacent land use patterns. To quantify this relationship, we introduce the concept of water quality meta-connectivity, which measures similarity through meta-vectors derived from land cover imagery. These vectors provide a systematic way to assess water quality relationships between different areas.

2.3.1. Generation of Meta-Feature Vectors for Connectivity Analysis

The land cover tile is utilized to construct a meta-feature vector representing the central pixel’s water quality in the corresponding multispectral image tile. Land cover images were downloaded from the website of the National Catalogue Service for Geographic Information (https://www.webmap.cn/commres.do?method=globeIndex (accessed on 9 November 2024)). The spatial resolution of land cover images is usually different from that of corresponding satellite multispectral images. Therefore, meta-feature vector generation involves two steps: spatial alignment between the land-cover image and the satellite multispectral image, and meta-vector generation from the two corresponding tiles.
The GlobeLand30 land cover dataset, with native 30 m resolution, was resampled to 20 m using nearest neighbor interpolation. This was specifically done to match the 20 m resolution of the Sentinel-2 L2A products used for water quality estimation, which are provided at this uniform resolution despite their original bands having 10 m, 20 m, or 60 m resolution.
We assume that the source two-dimension (2D) Globeland30 image size is W × H , the coordinate of each pixel is ( x , y ) , the target 2D image size is W × H , and the coordinate of each pixel is ( x , y ) . Through Formula (1), the land cover label of pixel ( x , y ) is the same as that of pixel ( x , y ) . Thus, the resampled Globeland30 images have the same spatial resolution and size as the Sentinel-2 images. The label of each pixel in the resampled Globeland30 image represents the land cover of the aligned pixel in the Sentinel-2 image.
x = f l o o r ( x W / W + 0.5 ) y = f l o o r ( y H / H + 0.5 )
Selecting one tile from the multispectral image (named the MS tile), another spatially aligned tile is extracted from the resampled Globeland30 image (named the RSGL tile). We use the information in the RSGL tile to construct the meta-feature vector of the MS tile. Figure 4 provides a visual elucidation of the meta-feature vector generation process.
Figure 4a,c depicts RSGL tiles, wherein each pixel value C i , j corresponds to a specific land cover label. Concurrently, Figure 4b,d illustrates the corresponding distance weight maps. In these maps, each pixel (excluding the central one) is assigned a weight W i , j quantifying the influence of the surrounding pixels on the central pixel within a multispectral (MS) tile. Specifically, the combination of Figure 4a,b serves to illustrate the procedural steps involved in generating the meta-feature vector. In contrast, Figure 4c,d showcases the 101 × 101 tile size extracted from the RSGL and MS tiles, respectively, for computing the meta-feature vector associated with its central pixel.
The influence of a surrounding pixel at coordinates ( x i , y j ) on the central pixel (denoted by c) is fundamentally determined by its Euclidean distance, calculated as per Formula (2). This distance is then utilized to derive the corresponding weight W i , j . These weights, forming a Gaussian Euclidean distance weight matrix, are computed using Formula (3). This formula involves applying a Gaussian function to the image, primarily to mitigate image noise and smooth fine details, thereby preparing the data for subsequent analytical stages. The standard deviation σ of the Gaussian function is set to 20 in this study.
As exemplified in Figure 4b, this weighting scheme ensures that pixels equidistant from the center are assigned uniform weights. For instance, the four corner pixels, sharing an identical Euclidean distance to the center, all receive a weight of W 1 . Similarly, the four pixels situated along the principal axes, which share another common Euclidean distance to the center, are assigned a weight of W 2 . This systematic assignment of weights W k is extended to subsequent concentric sets of pixels, each characterized by a unique Euclidean distance from the central pixel.
D i , j = ( x i x 0 ) 2 + ( y j y 0 ) 2
W i , j = exp ( D i , j 2 σ 2 )
Combining the land cover type matrix (Figure 4a) with the corresponding distance weights matrix (Figure 4b), the weights that belong to the same land cover label are added as an element in the meta-feature vector. The GlobeLand30 dataset includes ten land cover types, which we use to construct the meta-feature vectors (see Appendix A for the detailed land cover types and corresponding labels). Let C k denote the label number assigned to a distinct land cover type, where k is an integer index from 1 to 10, corresponding to ten distinct land cover types. Consequently, C i , j represents the C k label of the pixel located at coordinates ( x i , y j ). The kth element v k of the ten-dimensional meta-feature vector V = [ v 1 , , v 10 ] is computed according to Formulas (4) and (5), where the dimensionality (10) corresponds to the number of land cover types. In this way, both the impact of spatial distance and the impact of land cover on water quality are included in the meta-feature vector.
v k = i , j W i , j · δ ( C i , j , C k )
δ ( C i , j , C k ) = 1 , C i , j = C k 0 , o t h e r w i s e
Water quality meta-connectivity definition: If the Euclidean distance between two meta-feature vectors of two MS tiles is less than the threshold d ( V 1 , V 2 ) = V 1 V 2 2 < TH, then the two MS tiles have water quality meta-connectivity. We set the threshold TH as 1. The selection of this threshold is a critical step that governs the tradeoff between the strictness of the meta-connectivity definition and the quality of the learned representations. A lower threshold (e.g., TH < 1) would enforce a highly stringent similarity criterion, ensuring that positive pairs (anchor and neighbor tiles) are extremely similar in terms of their land cover context; however, this would also significantly reduce the number of available positive pairs, potentially leading to slower convergence during training or even model instability due to data sparsity. Conversely, a higher threshold (e.g., TH > 1) would relax this criterion, providing more positive pairs for training, but at the risk of introducing false positives, i.e., treating tiles with substantively different land cover influences as similar, which would degrade the discriminative power of the learned DMCR vectors. Based on empirical evaluation during our preliminary experiments, a value of TH = 1 was determined to provide an optimal balance, ensuring that the defined meta-connectivity is meaningful while also providing sufficient positive samples for robust and efficient model training. This choice is validated by the fast convergence and superior performance shown in Section 3.

2.3.2. Effect of Water Quality Meta-Connectivity

We selected four pairs of sites to illustrate the effect of water quality meta-connectivity on the description of water quality similarity. As shown in Figure 5a, each pair was labeled with the same number. The two sites labeled 0 are geographically close, while the other three pairs are geographically far away. Figure 5b shows the four pairs’ meta-feature vectors using the t-SNE tool. The sites for pair 0 are geographically nearby, and are also close in terms of meta-feature vector distance. Thus, they share not only geographic connectivity but also water quality meta-connectivity. On the other hand, the sites for pair 3 are both geographically distant and far away in terms of meta-feature vector distance. Meanwhile, pairs 1 and 2 are geographically distant but close in terms of meta-feature vector distance. Thus, the first and second pairs are considered water quality meta-connectivity pairs.
We extracted the tiles with centers at the sites labeled 1, 2, and 3 respectively from the Globeland30 images; for each tile size, the size was 101 × 101 . The histograms of the three pairs are listed in Figure 6. Each picture exhibits the pixel numbers of the ten types of land cover in each tile. The two histogram pictures of the pair 1 are similar. They show that water accounts for a large proportion of pixels, while the pixel number of artificial water surface is very small. Thus, the meta-feature vector distance between the two sites in pair 1 is close. The meta-feature vector distance between pair 2 is also close, as the histograms of the two tiles are also very similar. The meta-feature vector distance between the pair 3 sites is larger than that of pairs 1 and 2, as the difference between the two histograms of pair 3 is bigger than that of pairs 1 and 2. Figure 5 and Figure 6 illustrate that the meta-feature vector and water quality meta-connectivity can effectively express the water quality similarity between two tiles.

2.4. Deep Meta-Connectivity Representation for Optically-Active Water Quality Parameters Estimation

We propose a deep meta-connectivity representation (DMCR) of multispectral tiles to deal with the challenge of there being insufficient in situ labels and multispectral images to support OAWQ parameter estimation for a large water body. The generation of DMCR and OAWQ parameter estimations is shown in Figure 7. It involves three parts: meta-feature vector generation, DMCR vector training, and OAWQ parameter estimation. The meta-feature vectors characterize the OAWQ level of multispectral tiles, while the Euclidean distance between two tiles represents their OAWQ meta-connectivity. A DMCR vector is designed to efficiently represent OAWQ-related information in a compressed vector suitable for self-learning algorithms. This representation is specifically tailored for multispectral tiles, where each vector encodes crucial data from a multispectral tile into a condensed and actionable vector. The DMCR vectors facilitate the effective estimation of OAWQ from limited multispectral images and in situ OAWQ measurements.

2.4.1. Learning Deep Meta-Connectivity Representation Vectors for Tiles

Each multispectral tile is composed of multiple band image patches acquired simultaneously and in the same geographic location. Establishing a computation model between the in situ measurement and the high-dimensional tiles would be intractable. It is also impossible to estimate the water quality parameter from the finite data using statistics [19,41]. The key underlying idea is to find a low-dimensional representation of the tiles that is more suitable for water quality parameter estimation.
We assume that multispectral tiles having meta-connectivity also have similar water quality levels and low-dimensional representations. Therefore, multispectral tiles that are far apart in terms of the Euclidean distance of their meta-feature vectors are likely to have dissimilar water quality levels and dissimilar low-dimensional representations. We design compressed DMCR vectors from unlabeled multispectral tiles based on the corresponding land cover tiles. The process used for DMCR vector generation is shown in Figure 7. The process involves two steps: quadruple-tile sampling, and DMCR vector generation network training.

2.4.2. Quadruple-Tiles Sampling

To train our DMCRV-Net, we designed a quadruple-tile sampling strategy that generates structured training samples, each of which comprises an anchor, a neighbor, a temporally distant tile, and a temporal neighboring tile, denoted as { t a , t n , t d , t t } , respectively. This structure is crucial for teaching the model meaningful contrastive relationships based on both meta-connectivity and temporal proximity.
The sampling process is as follows. First, we align a land-cover (LC) image with two temporally-proximate multispectral (MS) images captured at times T MS 1 and T MS 2 , ensuring that the time difference is minimal (e.g., within the same season) to maintain land cover consistency. A large pool of candidate points N is then uniformly sampled from the water body regions of the MS image at T MS 1 . This pool is deliberately oversampled (e.g., N 3 M , where M is the target number of quadruplets) to ensure a diverse selection.
For each quadruplet, the selection proceeds through these steps:
  • Anchor Tile ( t a ): A point P 1 is randomly selected from the candidate pool. The MS tile centered at P 1 from the T MS 1 image is designated as the anchor tile. Its corresponding meta-feature vector V 1 is computed using the spatially aligned LC tile.
  • Neighbor ( t n ) or Distant ( t d ) Tile: Another point P 2 is randomly selected from the remaining candidates. Its MS tile and meta-feature vector V 2 are generated. The Euclidean distance V 1 V 2 2 is then compared to our meta-connectivity threshold TH. If the distance is less than TH, the tile is labeled as a neighbor tile t n , forming a positive pair with the anchor based on similar land-cover context; otherwise, it is labeled as a distant tile t d , forming a negative pair. This process is repeated until both neighbor ( t n ) and distant ( t n ) tiles are selected.
  • Temporal Neighbor Tile ( t t ): This tile is extracted from the second MS image (at T MS 2 ) at the exact same geographic coordinates as the anchor tile P 1 . It represents a positive sample in the temporal dimension, as it shares the same location and consequently the same meta-feature vector as the anchor, but captures potential subtle changes in water appearance over a short time.
By repeating this process, we create a large dataset of quadruplets. Each quadruplet effectively encodes the spatial, temporal, and feature-based relationships required for robust contrastive learning. The detailed pseudocode for this procedure is presented in Algorithm A1 in Appendix B.

2.4.3. Producing Deep Meta-Connectivity Representation with Neural Network

We then train the deep meta-connectivity representation vector generation network (DMCRV-Net) on the dataset consisting of quadruple tiles. The architecture and workflow of DMCRVNet are shown in Figure 8. Each set of quadruple tiles (anchor MS tile, neighbor MS tile, distant MS tile, and temporal neighbor MS tile) is sequentially input to DMCRVNet; the corresponding serial output vectors ( 1 × 128 ) are the anchor MS vector, neighbor MS vector, distant MS vector, and temporal neighbor MS vector, respectively. DMCRVNet is constructed following the residual neural network architecture [42]. At layer 5 of DMCRVNet, the 512 channels are compressed to 128 channels by 2 D convolution. Therefore, the output of DMCRVNet is a ( 1 × 128 ) vector.
Following our assumption which defines the neighbor and temporal neighbor MS tiles as positive samples and the distant MS tiles as negative samples for a given anchor MS tile, the network then extracts a meta-feature vector from each of these tiles. Consequently, we minimize the meta-feature Euclidean distance between the anchor MS vector and neighbor MS vector as well as the distance between the anchor MS vector and temporal neighbor MS vector, while maximizing the meta-feature Euclidean distance between the anchor MS vector and the distant MS vector. For each set of quadruple-tiles { t a i , t n i , t d i , t t i } and their corresponding vectors { v a i , v n i , v d i , v t i } , we seek to minimize the quadruple loss, named tile_loss, as Formula (6), where l _ n represents the distance between the pair of close tiles and l _ d represents the distance between the pair of distant tiles. Each training session involves M 1 sets of quadruple tiles that are selected from the dataset, and terminates when the loss function ceases to decrease.
t i l e _ l o s s = l _ n l _ d l _ n = i = 1 M 1 ( 1 2 ( v a i v n i ) + 1 2 ( v a i v t i ) ) 2 l _ d = i = 1 M 1 ( ( v a i v d i ) ) 2

2.5. Optically-Active Water Quality Parameter Estimation with Deep Meta-Connectivity Representation

Using the collected in situ OAWQ measurement data and referring to each in situ OAWQ observation point, we extracted a tile from an MS image, ensuring that the center pixel of the tile aligns with the geographical location of the corresponding in situ OAWQ observation point. The temporal interval between the in situ OAWQ observation point and the MS images was within 5 days. The extracted tile was input to the trained DMCRVNet and the DMCR vector of the tile was obtained. The resulting pairs of in situ OAWQ values and corresponding DMCR vectors constitutes the dataset for estimating water quality parameters.
We utilized a random forest (RF) model [43] to establish the regression relationship between the in situ water quality value and its corresponding DMCR vector. We used 80 % of the pairs in the dataset for training and 20 % for testing. The  results of our experiments demonstrate that high-accuracy water quality parameter estimation can be obtained through our method using multispectral images under conditions of limited available in situ measurements.

3. Results

To evaluate the performance of the DMCR method, we applied it to estimate Chl-a and turbidity levels in Lake Erie and Lake Ontario. These estimates were derived by combining spatially aligned in situ measurements with DMCR vectors generated by the trained DMCRNet. To validate the accuracy of DMCR, we compared its results with other advanced methods in terms of the R 2 and root mean square error (RMSE) metrics. Additionally, we analyzed the temporal and spatial variations of Chl-a and turbidity across both lakes to further illustrate the method’s effectiveness.

3.1. Performance of DMCR

The performance of DMCR was tested on the available datasets introduced in Section 2.4. We selected R-squared R 2 and RMSE as metrics to quantitatively evaluate the estimation of Chl-a and turbidity.

3.1.1. Selected Comparison Methods

We conducted ablation experiments to demonstrate the effectiveness of DMCR. The difference between DMCR and the ablation algorithm, named GeoTile2Vec, is that the latter calculates the distance between the anchor tile and the other tile using the geospatial Euclidean distance instead of the meta-feature Euclidean distance. Except for the difference in each set of quadruple tiles, the structures and processes of DMCR and GeoTile2Vec are the same. GeoTile2Vec follows the widely accepted assumption in remote sensing of water quality through DL that geographically distant water bodies have different water quality levels.
In addition, SimCLR [44] was utilized to estimate Chl-a and turbidity in our experiments. Because SimCLR is a state-of-the-art contrastive learning method, we selected it as a baseline method to demonstrate the effectiveness of DMCR in water quality estimation. The detailed algorithmic workflow and implementation details of SimCLR are provided in Appendix C.

3.1.2. Convergence Speed of the Loss Function During Training

The convergence speed of the loss function value during training illustrates how quickly an algorithm reaches a satisfactory solution. Below, we display the convergence speeds of the loss functions in the training process for Chl-a estimation. Figure 9 shows the respective convergence speeds of DMCR, GeoTile2Vec, and SimCLR for Lake Ontario, while Figure 10 exhibits the convergence speeds for Lake Erie. DMCR has the highest convergence speed, while SimCLR has the slowest convergence speed and highest variation. The positive and negative tiles defined by meta-connectivity are more conducive to deep representation training than the other definitions. Similar convergence speeds also appeared in the training process for turbidity estimation.

3.2. Estimation of Optically-Active Water Quality Parameters

Three types of deep representation vectors were generated by DMCR, GeoTile2Vec, and SimCLR respectively. In the last parts of DMCR, GeoTile2Vec, and SimCLR, Chl-a and turbidity were estimated by RF algorithm. The measurement unit for chlorophyll-a concentration here is micrograms per liter μg/L, while the measurement unit for turbidity is NTUs (Nephelometric Turbidity Units). In each iteration of RF, the training data and testing data were randomly selected from the dataset with matched deep representation vectors and in situ measurement pairs. The ratio of training data to test data was kept at 8 to 2. Table 2 and Table 3 list the best results of 500 iterations of RF subject to R 2 and RMSE. According to R 2 and RMSE, DMCR performs the best in both Chl-a and turbidity estimation.
We created scatter plots to illustrate the effectiveness of the three algorithms. Figure 11 and Figure 12 show the scatter plots corresponding to Table 2 (Chl-a results), while Figure 13 and Figure 14 correspond to Table 3 (turbidity results). The scatter plots show that the line of DMCR is closest to the fit line. Thus, the quantitative analysis demonstrates that DMCR is the best among the compared methods.

3.3. Temporal–Spatial Variation of Optically-Active Water Quality

We used DMCR and Sentinel-2 L2A images to estimate Chl-a and turbidity on Lake Erie and Lake Ontario [38]. The nearly one-year variation of Chl-a and turbidity and their variations in August over several years exhibit the reasonableness of our estimates.

3.3.1. Temporal–Spatial Variation of Optically-Active Water Quality in the West of Lake Erie

Figure 15 shows the variation of Chl-a and turbidity in the western part of Lake Erie in 2022. The multispectral images acquired in January, February, March, and December were blurred by clouds and could not be used to estimate Chl-a and turbidity. According to Lake Erie beach climate and weather data (https://www.meteoblue.com/en/weather/historyclimate/climatemodelled/lake-erie-beach_united-states_5123788 (accessed on 17 Feburary 2025)), fewer than three sunny days occurred in January, February, March, and December, and the other days were cloudy or overcast. All missing monthly images are due to there being no Sentinel-2 images available for water quality estimation.
Chl-a concentration is measured in micrograms per liter (μg/L), while turbidity is measured in Nephelometric Turbidity Units (NTU). In Figure 15 and Figure 16, the range of Chl-a concentration is 2–16 μg/L and the turbidity range is between 1.5–14 NTU .
Chl-a concentration is a key indicator of algal biomass and primary productivity in aquatic ecosystems. Sunlight, temperature, and pollution sources are the key factors affecting Chl-a concentration. The large cities of Toledo and Detroit are located near the west shore of Lake Erie. Almost 75 % of the Lake Erie Basin land is used in agricultural production. Fertilizer runoff from farms and industrial waste from cities are among the contributing sources of pollution to the west of Lake Erie. From the open data on Lake Erie Beach‘s climate and weather (https://www.meteoblue.com/en/weather/historyclimate/climatemodelled/lake-erie-beach_united-states_5123788 (accessed on 17 Feburary 2025)), we know that the mean daily maximum temperature is highest in June, July, and August, followed by September and May, with the other months being relatively low. September has the most sunny days, January and December have almost one sunny day, and other months have sunny days fewer than five such days. Sunlight and warmer temperatures enhance the growth of algae, leading to higher Chl-a levels. The images in Figure 15a show that Chl-a concentrations from August to November are higher than those of other months. The higher Chl-a concentration time series lags behind the higher temperature time series by two months. We believe that this phenomenon is due to the fact that the growth of algae takes time. The water depth of Lake Erie is shallow in the west and grows deeper towards the east. The proportion of eutrophic substances is higher in shallow waters and lower in deep waters. The spatial distributions of Chl-a in Figure 15a and Figure 16a are consistent with the eutrophication distribution. The Chl-a concentration gradually decreases from west to east. Figure 16a also shows that the Chl-a concentration in August expands from the west to the east from 2018 to 2024. Expansion of Chl-a indicates increased pollution.
Water turbidity is a measure of the cloudiness or haziness of water caused by suspended particles such as sediment, algae, organic matter, and other substances. Turbidity is a key water parameter indicating light penetration. The factors affecting water turbidity are more complex than those of Chl-a. The factors affecting turbidity can be grouped into natural, human-induced, and environmental categories. Natural factors include algae blooms, snowmelt, wind, and wave action. Human-induced factors include agricultural runoff and industrial discharges. Environmental factors include rainfall, climate change, and geological characteristics. Figure 15b and Figure 16b show that the pixel value of most areas in each figure is greater than 5, which means moderately and above the turbid water of Lake Erie. From the open data on Lake Erie Beach climate and weather, we know that nearly half of the days have a wind speed that exceeds 20 km/h in April, October, and November. Because the west of Lake Erie is shallow, the slightest breeze can swell to lively waves. Wave turbulence can break up cohesive sediment layers, leading to higher turbidity in April, October, and November, although the mean daily maximum temperature in these months is below 15 °C. Sunlight and warm temperatures make the algae grow. The growth of algae can also increase turbidity. From Figure 15b, we know that the higher turbidity time series lags behind the higher temperature time series by two months, similar to the Chl-a concentration time series. Figure 16b shows that turbidity in August also expands from the west to the east from 2018 to 2024. Expansion of turbidity also means increased pollution.
The estimation images in Figure 15 and Figure 16 are reasonable. The water quality is poor in the west of Lake Erie, and eutrophication has expanded from the west to the east in recent years.

3.3.2. Temporal–Spatial Variation of Optically-Active Water Quality in the Middle of Lake Ontario

We extracted Lake Ontario’s middle part to exhibit a better water quality image series than the western part of Lake Erie. The northern part of the extracted images includes Port Hope, Cobourg, and Brighton, while the southern part includes Rochester. Apart from these scattered cities, most of the land on both sides of Lake Ontario’s middle part is covered by farmland, grassland, and forest. The lake’s average depth is 86 m, with a maximum depth of 244 m in the Rochester Basin. In deep water, light penetration is limited and pollutants may be diluted and dispersed. In Figure 17 and Figure 18, the range of Chl-a concentration is 0.4–3 μg/L and the turbidity range is between 1.8–7.5  NTU . The maximum Chl-a concentration in the middle of Lake Ontario is one-third that of the west of Lake Erie, while the maximum turbidity of Lake Ontario is almost half that of Lake Erie.
Most of the Chl-a concentration is below 3 μg/L in the middle of Lake Ontario, indicating that this body of water has very low algal biomass. According to Lake Ontario climate and weather data (https://www.meteoblue.com/en/weather/historyclimate/climatemodelled/ontario-on-the-lake_united-states_5129893 (accessed on 17 Feburary 2025)), the mean daily maximum temperature is above 20 °C and the number of sunny days is more than three days from June to September. Higher temperatures and more sunny days make algae grow. Chl-a concentrations close to 3 μg/L appear in August and September of 2022 near Brighton and Rochester, respectively. Figure 18 shows that Chl-a concentration had a slight upward trend from 2018 to 2024.
Turbidity in the middle of Lake Ontario is fair and stable. Turbidity is less than 6 NTU in most water areas and most times, which means that the middle part of Lake Ontario is slightly cloudy and can be treated as drinking water.
From Figure 15, Figure 16, Figure 17 and Figure 18, the water quality of bodies of water surrounded by the cities is likely to be worse than the water quality of waters surrounded by cultivated lands, forests, or grassland. Human activity is the key factor affecting water quality.

4. Discussion

4.1. A Novel Deep Representation for Water Quality Connectivity

DMCR is an innovative method for estimating OAWQ parameters using multispectral imagery, particularly in contexts with limited in situ measurements. Its key advancement lies in the concept of meta-connectivity, which determines the similarity of water quality between multispectral tiles based on their surrounding land cover rather than relying solely on geographic proximity. DMCR operates on the principle that tiles with meta-connectivity are likely to exhibit similar water quality parameters, while those separated in meta-feature vector space are less likely to share similarities. The DMCR method employs a neural network to encode multispectral tiles into a deep representation space, where tiles with meta-connectivity have similar representations and distant tiles have dissimilar ones. Meta-connectivity is quantified using the Euclidean distance between meta-feature vectors, which are derived from land cover images. This approach allows DMCR to learn meaningful representations of multispectral tiles in an unsupervised manner, leveraging land cover data without direct dependence on in situ water quality parameters. In this way, it provides a robust framework for OAWQ parameter estimation, particularly in scenarios where labeled data are scarce.
To contextualize our contribution, we compare DMCR with existing research that links land use to water quality. Our approach aligns with the methods applied in Europe and Asia where researchers have established connections between land use patterns and water quality, particularly focusing on eutrophication risks. Several studies across these regions have demonstrated that watershed land use composition significantly influences nutrient loading and water quality in downstream water bodies [45,46,47]. DMCR advances these traditional approaches by incorporating meta-connectivity and deep learning to quantify these relationships at finer spatial resolutions. While models such as those in [45,46,47] recognize runoff as a key transport mechanism for pollutants, our DMCR framework provides a more sophisticated means of capturing the complex spatial relationships between land use and resulting water quality. However, a limitation acknowledged in our current model is the incomplete integration of dynamic hydrological factors. Various hydrological processes, including runoff patterns and seasonal variations, significantly influence how land use impacts water quality [48,49]. These are particularly important in agricultural watersheds, where fertilizer application and tillage practices create complex temporal nutrient loading patterns. Our current implementation of DMCR does not fully model these transport mechanisms, an aspect that we discuss further in the Section 4.3.
DMCR demonstrates superior performance compared to SimCLR and GeoTile2Vec in experiments using Sentinel-2 L2A images and in situ measurements. GeoTile2Vec follows the widely utilized assumption that geographically nearby tiles will exhibit similar water quality, while distant tiles are less likely to share similar water quality characteristics. SimCLR, on the other hand, employs self-supervised learning with randomly cropped tiles. Although DMCR, GeoTile2Vec, and SimCLR all utilize a similar core ResNet architecture to extract deep representations from multispectral tiles, DMCR achieves faster convergence during the training process. The evaluation indicators of R 2 and R M S E confirm that DMCR provides more accurate OAWQ parameters estimations than GeoTile2Vec and SimCLR. Additionally, an attempt was made to use a convolutional neural network (CNN) to establish a direct relationship between the tiles and in situ measurements. However, the CNN failed to converge due to insufficient data, resulting in poor performance. Subsequently, we also conducted experiments using RF as a supervised baseline. While RF is generally less prone to overfitting and can handle nonlinear feature interactions, it still requires a sufficient amount of labeled data to perform effectively. In our few-shot learning scenario, the limited availability of in situ measurements led to insufficient training samples for RF, which constrained its ability to capture the complex relationships inherent in multispectral image data. Consequently, the RF model yielded suboptimal predictions compared to DMCR. Therefore, both CNN and RF fail to provide competitive results under limited data conditions, highlighting the advantage of DMCR’s contrastive self-supervised learning approach. The CNN- and RF-based result maps are not presented here. Overall, DMCR emerges as a robust contrastive self-supervised learning framework that is particularly effective for OAWQ parameter estimation in scenarios with limited data availability.
Figure 15 and Figure 16 present the estimated distribution maps of Chl-a and turbidity for Lake Erie generated using the shared core neural network of DMCR; similarly, Figure 17 and Figure 18 display the Chl-a and turbidity distribution maps for Lake Ontario derived from the DMCR model trained on this lake. By analyzing the key factors influencing Chl-a and turbidity, we conclude that the estimated distribution images are reliable and accurately reflect the water quality conditions in both lakes.

4.2. Future Potential and Scalability of DMCR

DMCR shows promise for diverse water bodies and remote sensing platforms by addressing the challenge of limited in situ measurements.
The main challenge of remote sensing inland water quality is the limited amount of in situ labeled data. Our method’s design for few-shot learning and focus on land cover relationships rather than geographic proximity enhances its transferability to various water ecosystems. While validated on data for Lake Erie and Lake Ontario, we believe that it can be applied more broadly with minimal calibration data. We also note the scarcity of public datasets combining remote sensing with in situ measurements, and advocate for more open data initiatives to enable comprehensive validation across different lake conditions.
Moreover, DMCR is well suited for hyperspectral imagery. Compared with multispectral images, hyperspectral images have more bands and superior spectral resolution, meaning that they consist of higher-dimensional data than multispectral images. Conventional end-to-end models often struggle to establish relationships between such high-dimensional data and in situ measurements due to the curse of dimensionality. Because DMCR compresses high-dimensional data and generates similar representations for meta-connectivity tiles, it represents a credible approach for remote sensing of OAWQ parameters using hyperspectral images.
Furthermore, the ability to conduct high-accuracy, reliable, and large-scale remote sensing of Chl-a and turbidity can contribute to early warning of harmful algal blooms. Chl-a is proportional to systemic biogeochemical stress in water bodies [36], and the spatiotemporal distribution changes of Chl-a and turbidity can serve as key indicators for predicting bloom occurrences. In the future, the DMCR method could be integrated into environmental management workflows to support pollution prevention, water cleanup, and other water treatment strategies.

4.3. Limitations of DMCR

The primary limitation of DMCR lies in its reliance on up-to-date land cover images. If these images are not regularly updated, the generated meta-feature vectors may be inaccurate, leading to incorrect meta-connectivity assessments. For instance, if urban expansion transforms grassland into artificial surfaces, the corresponding meta-feature vectors would change, reflecting a probable degradation in nearby water quality. An outdated land cover map would cause the model to incorrectly associate the water body with grassland during both training and estimation, leading to erroneous water quality predictions. Consequently, water quality estimates derived from unreliable deep meta-connectivity representations could also be erroneous. Conversely, the impact of lagged updates is less severe when land cover changes occur between classes with similar environmental impacts (e.g., grassland to shrubland).
Although the results of DMCR are promising, training a deep meta-connectivity representation for each lake is a time-intensive process. In future work, we aim to enhance the efficiency of DMCR to make it more scalable and practical for broader applications.

5. Conclusions

Remote sensing for observing and assessing water quality (OAWQ) faces a significant bottleneck due to the scarcity of concurrent in situ measurements, which critically limits the application of data-intensive models. To address this challenge, we have proposed and validated a deep meta-connectivity representation (DMCR) framework. Our central finding is that meta-connectivity defined by the similarity in surrounding land cover characteristics serves as a more effective proxy for water quality likeness than widely utilized geographical proximity.
We operationalized this concept through DMCR, a contrastive self-supervised learning framework. Our experiments on data for Lake Erie and Lake Ontario demonstrate that DMCR significantly outperforms both the geography-based GeoTile2Vec and a standard self-supervised method (SimCLR), achieving superior performance in both R 2 and R M S E for chlorophyll-a and turbidity estimation. Furthermore, the water quality distribution maps generated by our model exhibit strong spatial coherence and align with known water quality conditions of the region. In conclusion, DMCR provides a robust and effective solution for OAWQ parameter estimation, particularly in data-limited scenarios. The principle of meta-connectivity holds significant potential for generalization across diverse aquatic ecosystems, paving the way for more accurate and scalable satellite-based water quality monitoring.

Author Contributions

Conceptualization, F.P. and Z.L.; methodology, F.P. and Z.L.; validation, Y.Y. and Y.D.; formal analysis, Y.Y.; investigation, Y.Y. and Y.D.; resources, F.P. and X.X.; data curation, Y.Y. and Y.D.; writing—original draft preparation, F.P.; writing—review and editing, F.P., H.C., and Y.D.; visualization, Y.Y.; supervision, H.C. and X.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Natural Science Foundation of China (General Program, Grant No.62271356).

Data Availability Statement

Data are contained within the article.

Acknowledgments

We acknowledge the Sentinel-2 team for providing the Sentinel-2 imagery for this research and Environment and Climate Change Canada for the in situ data for the Great Lakes.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Land-Cover Types

Table A1 lists the ten land cover types from the GlobeLand30 dataset along with their corresponding label numbers.
Table A1. Land cover types.
Table A1. Land cover types.
Land_Cover TypeLabel Number
Cultivated Land10
Forest20
Grassland30
Shrubland40
Wetland50
Water Bodies60
Tundra70
Artificial Surfaces80
Bareland90
Permanent Snow and Ice100
Note: The numbers correspond to different land types.

Appendix B. Algorithm A1

We selected 45 images of Lake Erie (from 23 March 2019 to 9 May 2023) and also 46 images of Lake Ontario (from 19 March 2021 to 23 October 2023). We utilized Algorithm A1 to generate the sets of quadruple tiles for Lake Ontario and Lake Erie, respectively. Each lake has 200,000 sets of quadruple tiles. Each tile in one set of quadruple tiles has a size of 11 × 11 . The DMCR vector generation network is trained on the quadruple tiles dataset.
Algorithm 1 Quadruple tiles sampling
Input: Two multispectral (MS) images and one land cover (LC) image.
(1)
One MS image was obtained at time T M S 1 .
(2)
Another MS image was obtained at time T M S 2 .
(3)
Require T M S 1 T M S 2 2 TH T
Output: Quadruple-tiles set T = { ( t a ( i ) , t n ( i ) , t d ( i ) , t t ( i ) ) } i = 1 M
  • Initialize quadruple tile T = { , , , }
  • Re-sample the LC image by formula (1) to align it with the two MS images.
  • A set of N points is sampled with a uniform spatial distribution from the water body regions within the MS image acquired at time TMS1. The number of points N is set to be at least three times M, where M (M = 200,000) is the total number of primary samples extracted from the remote sensing imagery for our analysis.
For  i = 1 ; i M ; i + +  Do
     Randomly select 1 point ( P 1 ) from the sampling points set.
     Around the center pixel P 1 , extract a tile from the MS image at T M S 1 (labeled T S 1 ) and a corresponding tile from the LC image (labeled T L 1 ) respectively.
      T S 1 is also labeled as an anchor tile t a .
     Compute the meta-feature vector V 1 of TS1 as Formula (4) using corresponding T L 1 .
     While not extract triplet vectors { t a , t n , t d } do
          Randomly select another 1 point ( P 2 ) except P 1 from the sampling points set.
          Around the center pixel P 2 , extract a tile from the MS image at T M S 1 (labeled T S 2 ) and a tile from the LC image (labeled T L 2 ) respectively.
          Compute the meta-feature vector V 2 of T S 2 as Formula (4) using T L 2 .
          If  V 1 V 2 2 TH then
                T S 2 is labeled as a neighbor tile t n .
          Else
                T S 2 is labeled as a distant tile t d .
          EndIf
     EndWhile
     Remove the selected three points from the sampling points set.
     Extract a tile as the temporal neighbor tile t t from the MS image at T M S 2 . The spatial position and size of the tile t t are the same as that of t a .
     One quadruple-tiles vector T= { ( t a ( i ) , t n ( i ) , t d ( i ) , t t ( i ) ) } is obtained.
EndFor
 

Appendix C. Flow of SimCLR

Appendix C shows the flow of SimCLR. A vector ( 1 × 128 ) representing the high-dimensional multispectral tiles is generated through the SimCLR process shown in Figure A1. Formulas (A1) and (A2) are the loss functions, where z i , z j are the vectors output in training process of SimCLR and N denotes the batch size. The temperature parameter τ is a crucial hyperparameter that scales the cosine similarity scores, effectively controlling the penalty on hard-negative examples. After the self-supervised training is complete, the trained SimCLR encoder generates feature vectors for the multispectral tiles corresponding to the in situ measurements. Finally, an RF model is also applied to estimate Chl-a and turbidity using these generated vectors and the corresponding in situ data.
l i , j = log exp sim ( z i , z j ) / τ k = 1 2 N 1 k i exp sim ( z i , z k ) / τ
L = 1 2 N k = 1 N l ( 2 k 1 , 2 k ) + l ( 2 k , 2 k 1 )
Figure A1. Flow of SimCLR.
Figure A1. Flow of SimCLR.
Remotesensing 17 02782 g0a1

References

  1. Guo, H.; Zhu, X.; Huang, J.J.; Zhang, Z.; Tian, S.; Chen, Y. An enhanced deep learning approach to assessing inland lake water quality and its response to climate and anthropogenic factors. J. Hydrol. 2023, 620, 129466. [Google Scholar] [CrossRef]
  2. Zhu, W. Remote Sensing Statistical Inference for Colored Dissolved Organic Matter in Inland Water: Case Study in Qiandao Lake. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 7462–7470. [Google Scholar] [CrossRef]
  3. Dong, L.; Gong, C.; Huai, H.; Wu, E.; Lu, Z.; Hu, Y.; Li, L.; Yang, Z. Retrieval of Water Quality Parameters in Dianshan Lake Based on Sentinel-2 MSI Imagery and Machine Learning: Algorithm Evaluation and Spatiotemporal Change Research. Remote Sens. 2023, 15, 5001. [Google Scholar] [CrossRef]
  4. Liu, B.; Li, T. A Machine-Learning-Based Framework for Retrieving Water Quality Parameters in Urban Rivers Using UAV Hyperspectral Images. Remote Sens. 2024, 16, 905. [Google Scholar] [CrossRef]
  5. Lu, Q.; Si, W.; Wei, L.; Li, Z.; Xia, Z.; Ye, S.; Xia, Y. Retrieval of Water Quality from UAV-Borne Hyperspectral Imagery: A Comparative Study of Machine Learning Algorithms. Remote Sens. 2021, 13, 3928. [Google Scholar] [CrossRef]
  6. Giardino, C.; Brando, V.E.; Dekker, A.G.; Strömbeck, N.; Candiani, G. Assessment of water quality in Lake Garda (Italy) using Hyperion. Remote Sens. Environ. 2007, 109, 183–195. [Google Scholar] [CrossRef]
  7. Zhou, X.; Chen, J.; Rakstad, T.E.; Ploughe, M.; Tang, P. Water Chlorophyll Estimation in an Urban Canal System With High-Resolution Remote Sensing Data. IEEE Geosci. Remote Sens. Lett. 2021, 18, 1876–1880. [Google Scholar] [CrossRef]
  8. Abd-Elrahman, A.; Croxton, M.; Pande-Chettri, R.; Toor, G.S.; Smith, S.; Hill, J. In situ estimation of water quality parameters in freshwater aquaculture ponds using hyperspectral imaging system. ISPRS J. Photogramm. Remote Sens. 2011, 66, 463–472. [Google Scholar] [CrossRef]
  9. Cao, L.; Zhang, D.; Guo, Q.; Zhan, J. Inversion of Water Quality Parameter Bod5 Based on Hyperspectral Remotely Sensed Data in Qinghai Lake. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, 11–16 July 2021; pp. 5036–5039. [Google Scholar]
  10. Xiong, J.; Lin, C.; Cao, Z.; Hu, M.; Xue, K.; Chen, X.; Ma, R. Development of remote sensing algorithm for total phosphorus concentration in eutrophic lakes: Conventional or machine learning? Water Res. 2022, 215, 118213. [Google Scholar] [CrossRef] [PubMed]
  11. He, J.; Chen, Y.; Wu, J.; Stow, D.A.; Christakos, G. Space-time chlorophyll-a retrieval in optically complex waters that accounts for remote sensing and modeling uncertainties and improves remote estimation accuracy. Water Res. 2020, 171, 115403. [Google Scholar] [CrossRef]
  12. Yang, C.; Tan, Z.; Li, Y.; Shen, M.; Duan, H. A comparative analysis of machine learning methods for algal Bloom detection using remote sensing images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 7953–7967. [Google Scholar] [CrossRef]
  13. Wang, J.; Ke, C.Q.; Cai, Y.; Ji, J.; Wang, Z. A novel convolutional neural network for the extraction of algal bloom and aquatic vegetation in typical eutrophic shallow lakes. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 8099–8111. [Google Scholar] [CrossRef]
  14. Tang, Y.; Feng, Y.; Fung, S.; Xomchuk, V.R.; Jiang, M.; Moore, T.; Beckler, J. Spatiotemporal deep-learning-based algal bloom prediction for Lake Okeechobee using multisource data fusion. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 8318–8331. [Google Scholar] [CrossRef]
  15. Aptoula, E.; Ariman, S. Chlorophyll-a Retrieval From Sentinel-2 Images Using Convolutional Neural Network Regression. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
  16. Taheri Dehkordi, A.; Hashemi, H.; Naghibi, A.; Mehran, A. Ensemble of Pruned Bagged Mixture Density Networks for Improved Water Quality Retrieval Using Sentinel-2 and Landsat-8 Remote Sensing Data. IEEE Geosci. Remote Sens. Lett. 2024, 21, 1–5. [Google Scholar] [CrossRef]
  17. Yang, H.; Kong, J.; Hu, H.; Du, Y.; Gao, M.; Chen, F. A review of remote sensing for water quality retrieval: Progress and challenges. Remote Sens. 2022, 14, 1770. [Google Scholar] [CrossRef]
  18. Ewuzie, U.; Bolade, O.P.; Egbedina, A.O. Application of deep learning and machine learning methods in water quality modeling and prediction: A review. In Current Trends and Advances in Computer-Aided Intelligent Environmental Data Engineering; Academic Press: Cambridge, MA, USA, 2022; pp. 185–218. [Google Scholar]
  19. Wai, K.P.; Chia, M.Y.; Koo, C.H.; Huang, Y.F.; Chong, W.C. Applications of deep learning in water quality management: A state-of-the-art review. J. Hydrol. 2022, 613, 128332. [Google Scholar] [CrossRef]
  20. Wu, R.; Wang, W.; Li, S. Soft measurement of ammonia nitrogen in sea cucumber aquaculture water via transfer learning. In Proceedings of the 2022 4th International Conference on Industrial Artificial Intelligence (IAI), Shenyang, China, 24–27 August 2022; pp. 1–5. [Google Scholar]
  21. Syariz, M.A.; Lin, C.H.; Heriza, D.; Lasminto, U.; Sukojo, B.M.; Jaelani, L.M. A transfer learning technique for inland chlorophyll-a concentration estimation using Sentinel-3 imagery. Appl. Sci. 2021, 12, 203. [Google Scholar] [CrossRef]
  22. Zhu, N.; Ji, X.; Tan, J.; Jiang, Y.; Guo, Y. Prediction of dissolved oxygen concentration in aquatic systems based on transfer learning. Comput. Electron. Agric. 2021, 180, 105888. [Google Scholar] [CrossRef]
  23. Tian, W.; Liao, Z.; Wang, X. Transfer learning for neural network model in chlorophyll-a dynamics prediction. Environ. Sci. Pollut. Res. 2019, 26, 29857–29871. [Google Scholar] [CrossRef]
  24. Lumini, A.; Nanni, L. Deep learning and transfer learning features for plankton classification. Ecol. Inform. 2019, 51, 33–43. [Google Scholar] [CrossRef]
  25. Li, H.; Li, Y.; Zhang, G.; Liu, R.; Huang, H.; Zhu, Q.; Tao, C. Global and local contrastive self-supervised learning for semantic segmentation of HR remote sensing images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–14. [Google Scholar] [CrossRef]
  26. Zhi, W.; Appling, A.P.; Golden, H.E.; Podgorski, J.; Li, L. Deep learning for water quality. Nat. Water 2024, 2, 228–241. [Google Scholar] [CrossRef]
  27. Pu, F.; Ding, C.; Chao, Z.; Yu, Y.; Xu, X. Water-quality classification of inland lakes using Landsat8 images by convolutional neural networks. Remote Sens. 2019, 11, 1674. [Google Scholar] [CrossRef]
  28. GB3838-2002; Environmental Quality Standard for Surface Water. State Environmental Protection Administration: Beijing, China, 2002.
  29. Akhtar, N.; Syakir Ishak, M.I.; Bhawani, S.A.; Umar, K. Various natural and anthropogenic factors responsible for water quality degradation: A review. Water 2021, 13, 2660. [Google Scholar] [CrossRef]
  30. Anh, N.T.; Nhan, N.T.; Schmalz, B.; Le Luu, T. Influences of key factors on river water quality in urban and rural areas: A review. Case Stud. Chem. Environ. Eng. 2023, 8, 100424. [Google Scholar] [CrossRef]
  31. Zhang, S.; Liu, N.; Luo, M.; Jiang, T.; Chan, T.O.; Yau, C.S.T.; Sun, Y. Downscaling Sentinel-3 Chlorophyll-a Concentration for Inland Lakes Based on Multivariate Analysis and Gradient Boosting Decision Trees Regression. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 7850–7865. [Google Scholar] [CrossRef]
  32. Uddin, M.G.; Nash, S.; Olbert, A.I. A review of water quality index models and their use for assessing surface water quality. Ecol. Indic. 2021, 122, 107218. [Google Scholar] [CrossRef]
  33. Robinson, R.M.; Robinson, R.M. Cross Border Governmental Organizations and Tragedies of the Commons. In Environmental Organizations and Reasoned Discourse; Palgrave Macmillan: Cham, Switzerland, 2021; pp. 273–297. [Google Scholar]
  34. Estepp, L.R.; Reavie, E.D. The ecological history of Lake Ontario according to phytoplankton. J. Great Lakes Res. 2015, 41, 669–687. [Google Scholar] [CrossRef]
  35. Jenny, J.P.; Anneville, O.; Arnaud, F.; Baulaz, Y.; Bouffard, D.; Domaizon, I.; Bocaniov, S.A.; Chèvre, N.; Dittrich, M.; Dorioz, J.M.; et al. Scientists’ warning to humanity: Rapid degradation of the world’s large lakes. J. Great Lakes Res. 2020, 46, 686–702. [Google Scholar] [CrossRef]
  36. Wang, H.; Convertino, M. Algal bloom ties: Systemic biogeochemical stress and Chlorophyll-a shift forecasting. Ecol. Indic. 2023, 154, 110760. [Google Scholar] [CrossRef]
  37. Louis, J.; Pflug, B.; Main-Knorn, M.; Debaecker, V.; Mueller-Wilm, U.; Iannone, R.Q.; Cadau, E.G.; Boccia, V.; Gascon, F. Sentinel-2 global surface reflectance level-2A product generated with Sen2Cor. In Proceedings of the IGARSS 2019–2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 8522–8525. [Google Scholar]
  38. Coulibaly, N.; Sanogo, S.; BA, A. Evaluation of SENTINEL-2 products-based algorithms in estimating water pollutants of the River Niger in Bamako. Environ. Res. Commun. 2024, 6, 085004. [Google Scholar] [CrossRef]
  39. Zheng, H.; Liu, Y.; Wan, W.; Zhao, J.; Xie, G. Large-scale prediction of stream water quality using an interpretable deep learning approach. J. Environ. Manag. 2023, 331, 117309. [Google Scholar] [CrossRef] [PubMed]
  40. Wu, J.; Zeng, S.; Yang, L.; Ren, Y.; Xia, J. Spatiotemporal Characteristics of the Water Quality and Its Multiscale Relationship with Land Use in the Yangtze River Basin. Remote Sens. 2021, 13, 3309. [Google Scholar] [CrossRef]
  41. Jean, N.; Wang, S.; Samar, A.; Azzari, G.; Lobell, D.; Ermon, S. Tile2vec: Unsupervised representation learning for spatially distributed data. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 3967–3974. [Google Scholar]
  42. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  43. Schonlau, M.; Zou, R.Y. The random forest algorithm for statistical learning. Stata J. 2020, 20, 3–29. [Google Scholar] [CrossRef]
  44. Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. In Proceedings of the International Conference on Machine Learning, Virtual, 13–18 July 2020; pp. 1597–1607. [Google Scholar]
  45. Idrees, M.; Ahmad, S.; Khan, M.W.; Dahri, Z.H.; Ahmad, K.; Azmat, M.; Rana, I.A. Estimation of water balance for anticipated land use in the potohar plateau of the indus basin using SWAT. Remote Sens. 2022, 14, 5421. [Google Scholar] [CrossRef]
  46. Shao, M.; Xie, X.; Li, J.; Ren, W.; Chao, E. Fine-resolution estimation for urban surface water pollution susceptibility with multi-modal earth observation data. Environ. Res. Lett. 2024, 19, 064026. [Google Scholar] [CrossRef]
  47. Shiferaw, N.; Habte, L.; Waleed, M. Land use dynamics and their impact on hydrology and water quality of a river catchment: A comprehensive analysis and future scenario. Environ. Sci. Pollut. Res. 2025, 32, 4124–4136. [Google Scholar] [CrossRef]
  48. Song, Y.; Li, X.; Feng, L.; Zhang, G. Spatio-temporal dynamics coupling between land use/cover change and water quality in Dongjiang lake watershed using satellite remote sensing. Land 2024, 13, 861. [Google Scholar] [CrossRef]
  49. Ni, X.; Parajuli, P.B.; Ouyang, Y.; Dash, P.; Siegert, C. Assessing land use change impact on stream discharge and stream water quality in an agricultural watershed. Catena 2021, 198, 105055. [Google Scholar] [CrossRef]
Figure 1. Land cover surrounding Lake Erie.
Figure 1. Land cover surrounding Lake Erie.
Remotesensing 17 02782 g001
Figure 2. Land cover surrounding Lake Ontario.
Figure 2. Land cover surrounding Lake Ontario.
Remotesensing 17 02782 g002
Figure 3. Sampling sites on Lakes Erie and Ontario.
Figure 3. Sampling sites on Lakes Erie and Ontario.
Remotesensing 17 02782 g003
Figure 4. The process of generating meta-feature vectors.
Figure 4. The process of generating meta-feature vectors.
Remotesensing 17 02782 g004
Figure 5. Meta-feature vector depicted by t-SNE.
Figure 5. Meta-feature vector depicted by t-SNE.
Remotesensing 17 02782 g005
Figure 6. Histograms of the three pairs labeled 1, 2, and 3.
Figure 6. Histograms of the three pairs labeled 1, 2, and 3.
Remotesensing 17 02782 g006
Figure 7. Generation of DMCR and OAWQ parameter estimates.
Figure 7. Generation of DMCR and OAWQ parameter estimates.
Remotesensing 17 02782 g007
Figure 8. Architecture of the deep meta-connectivity representation vector generation network.
Figure 8. Architecture of the deep meta-connectivity representation vector generation network.
Remotesensing 17 02782 g008
Figure 9. Convergence speeds of the loss function during training on the Lake Ontario data.
Figure 9. Convergence speeds of the loss function during training on the Lake Ontario data.
Remotesensing 17 02782 g009
Figure 10. The convergence speeds of the loss function during training on the Lake Erie data.
Figure 10. The convergence speeds of the loss function during training on the Lake Erie data.
Remotesensing 17 02782 g010
Figure 11. Scatter plot of Chl-a estimation for Lake Ontario.
Figure 11. Scatter plot of Chl-a estimation for Lake Ontario.
Remotesensing 17 02782 g011
Figure 12. Scatter plot of Chl-a estimation for Lake Erie.
Figure 12. Scatter plot of Chl-a estimation for Lake Erie.
Remotesensing 17 02782 g012
Figure 13. Scatter plot of turbidity estimation for Lake Ontario.
Figure 13. Scatter plot of turbidity estimation for Lake Ontario.
Remotesensing 17 02782 g013
Figure 14. Scatter plot of turbidity estimation for Lake Erie.
Figure 14. Scatter plot of turbidity estimation for Lake Erie.
Remotesensing 17 02782 g014
Figure 15. Chl-a and turbidity variation in 2022 in the west of Lake Erie.
Figure 15. Chl-a and turbidity variation in 2022 in the west of Lake Erie.
Remotesensing 17 02782 g015
Figure 16. Chl-a and turbidity variation in August from 2018 to 2024 in the west of Lake Erie.
Figure 16. Chl-a and turbidity variation in August from 2018 to 2024 in the west of Lake Erie.
Remotesensing 17 02782 g016
Figure 17. Chl-a (2022) and turbidity (2021) variation in the middle of Lake Ontario.
Figure 17. Chl-a (2022) and turbidity (2021) variation in the middle of Lake Ontario.
Remotesensing 17 02782 g017
Figure 18. Chl-a and turbidity variation in August from 2018 to 2024 in the middle of Lake Ontario.
Figure 18. Chl-a and turbidity variation in August from 2018 to 2024 in the middle of Lake Ontario.
Remotesensing 17 02782 g018
Table 1. Temporal matching between Sentinel acquisitions and in situ measurements.
Table 1. Temporal matching between Sentinel acquisitions and in situ measurements.
Sentinel-2 DateIn Situ Measurement DateSentinel-2 Range (Lat/Lon)In Situ Position
16 April 202317 April 202343.257356–44.222825(43.84083, −78.03889)
−79.770508–−78.380656
28 April 202224 April 202243.2012110–44.2151810(43.43333, −77.71167)
−78.4968860–−77.1873690
13 April 202315 April 202243.2593360–44.2259640(43.58028, −77.2)
−77.5040720–−76.1113170
Table 2. Chl-a estimation results.
Table 2. Chl-a estimation results.
LakeAlgorithmTraining SetTest Set
R 2 RMSE R 2 RMSE
Lake OntarioSimCLR0.87750.28150.83620.2808
GeoTile2Vec0.86560.30050.69630.3436
RF0.84141.15700.63861.1328
DMCR0.90510.25070.90470.2049
Lake ErieSimCLR0.82712.20630.51642.8664
GeoTile2Vec0.81972.24060.68162.5209
RF0.84912.18430.70652.3511
DMCR0.86302.05350.73651.2698
Note: R 2 is the coefficient of determination, measuring the goodness of fit between observed and predicted values. RMSE stands for Root Mean Square Error, indicating the average magnitude of prediction errors, with units consistent with the data. Here, the RMSE here shares the same measurement unit as chlorophyll-a concentration, which is micrograms per liter (μg/L).
Table 3. Turbidity estimation results.
Table 3. Turbidity estimation results.
LakeAlgorithmTraining SetTest Set
R 2 RMSE R 2 RMSE
Lake OntarioSimCLR0.88150.41690.87950.4056
GeoTile2Vec0.86670.47560.80410.2854
RF0.89620.41910.84240.2575
DMCR0.94130.31760.92050.1588
Lake ErieSimCLR0.86711.46500.85661.0430
GeoTile2Vec0.83801.58360.79981.4606
RF0.88311.24220.75991.8772
DMCR0.88781.39470.88080.5123
Note: R 2 is the coefficient of determination, measuring the goodness of fit between observed and predicted values. RMSE stands for Root Mean Square Error, indicating the average magnitude of prediction errors, with units consistent with the data. Here, the RMSE here shares the same measurement unit as turbidity, which is Nephelometric Turbidity Units (NTU).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Pu, F.; Luo, Z.; Yang, Y.; Chen, H.; Dai, Y.; Xu, X. Deep Meta-Connectivity Representation for Optically-Active Water Quality Parameters Estimation Through Remote Sensing. Remote Sens. 2025, 17, 2782. https://doi.org/10.3390/rs17162782

AMA Style

Pu F, Luo Z, Yang Y, Chen H, Dai Y, Xu X. Deep Meta-Connectivity Representation for Optically-Active Water Quality Parameters Estimation Through Remote Sensing. Remote Sensing. 2025; 17(16):2782. https://doi.org/10.3390/rs17162782

Chicago/Turabian Style

Pu, Fangling, Ziang Luo, Yiming Yang, Hongjia Chen, Yue Dai, and Xin Xu. 2025. "Deep Meta-Connectivity Representation for Optically-Active Water Quality Parameters Estimation Through Remote Sensing" Remote Sensing 17, no. 16: 2782. https://doi.org/10.3390/rs17162782

APA Style

Pu, F., Luo, Z., Yang, Y., Chen, H., Dai, Y., & Xu, X. (2025). Deep Meta-Connectivity Representation for Optically-Active Water Quality Parameters Estimation Through Remote Sensing. Remote Sensing, 17(16), 2782. https://doi.org/10.3390/rs17162782

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop