A Review of Machine Learning Applications in Ocean Color Remote Sensing

Zhang, Zhenhua; Chen, Peng; Zhang, Siqi; Huang, Haiqing; Pan, Yuliang; Pan, Delu

doi:10.3390/rs17101776

Open AccessReview

A Review of Machine Learning Applications in Ocean Color Remote Sensing

by

Zhenhua Zhang

^1,2

,

Peng Chen

^1,2,*

,

Siqi Zhang

^1,2

,

Haiqing Huang

^1,2,

Yuliang Pan

¹ and

Delu Pan

^1,2

¹

Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), No. 1119, Haibin Road Nansha District, Guangzhou 511458, China

²

State Key Laboratory of Satellite Ocean Environment Dynamics, Second Institute of Oceanography, Ministry of Natural Resources, 36 Bochubeilu, Hangzhou 310012, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(10), 1776; https://doi.org/10.3390/rs17101776

Submission received: 8 April 2025 / Revised: 13 May 2025 / Accepted: 18 May 2025 / Published: 20 May 2025

(This article belongs to the Section Environmental Remote Sensing)

Download

Browse Figures

Versions Notes

Abstract

:

Ocean color remote sensing technology has proven to be an indispensable tool for monitoring ocean conditions, as it has consistently provided critical data on global ocean optical properties, color, and biogeochemical parameters over several decades. With the rapid advancement of artificial intelligence, the integration of machine learning (ML) models into ocean color remote sensing has become a significant focus within the scientific community. This article provides a comprehensive review of the current status and challenges associated with ML models in ocean color remote sensing, assessing their applications in atmospheric correction, color inversion, carbon cycle analysis, and data reconstruction. This review highlights the advancements made in applying ML techniques, such as neural networks and deep learning, to improve data accuracy, enhance resolution, and enable more precise predictions of oceanic phenomena. Despite challenges such as model generalization and computational complexity, ML has significant potential for enhancing our understanding of marine ecosystems, facilitating real-time monitoring, and supporting global climate models.

Keywords:

ocean color remote sensing; machine learning; deep learning; optical oceanography; atmospheric correction; bio-optical properties; oceanographic parameters; remote sensing technology

Graphical Abstract

1. Introduction

The vastness and complexity of the world’s oceans present formidable challenges to scientists seeking to understand and monitor their critical roles in global climate regulation, ecosystem dynamics, and biogeochemical cycles. Ocean color remote sensing (OCRS), a noninvasive observational technique, has emerged as an indispensable tool for monitoring the marine environment. This technology has been instrumental in advancing our understanding of various oceanographic processes, including the detection and analysis of phytoplankton blooms [1,2], assessments of ocean primary production [3,4], evaluations of particulate carbon standing stock [5,6,7], and investigations into the impacts of climate change [8,9].

While traditional remote sensing methods have provided valuable insights, they encounter limitations, particularly with the vast and intricate datasets yielded by contemporary satellite sensors. The integration of machine learning (ML) has led to a significant paradigm shift, introducing sophisticated algorithms capable of discerning patterns, extracting features, and generating predictions with remarkable accuracy and efficiency [10]. The ability of ML to evolve from data has significantly facilitated the advancement of ocean color remote sensing. Techniques such as neural networks for classifying oceanic phenomena and algorithms that foresee shifts in marine ecosystems are expanding our understanding of the ocean’s role within the Earth’s systems [11,12].

The integration of ML is not just a technological advancement but also an essential evolution, necessitated by the immense data volumes produced by contemporary satellite sensors and the increasing demand for precise, actionable insights [13]. ML algorithms have exhibited considerable efficacy in the processing and interpretation of complex datasets, thereby facilitating more accurate retrievals of oceanographic parameters, including chlorophyll-a concentrations (Chl-a) [14] and the partial pressure of CO₂ (pCO₂) at the air–sea interface [15,16]. The advancement of novel ML methodologies in ocean color remote sensing, such as neural networks for atmospheric correction [17,18,19] and self-organizing maps for pCO₂ estimation [20], has been pivotal in overcoming the limitations inherent in traditional data analysis techniques. These innovations have not only enhanced the accuracy of ocean color data interpretation but also significantly extended the spatial and temporal coverage of marine observations [21,22].

While the previous study discussed machine learning techniques for ocean color remote sensing, it focused primarily on the general trends and challenges associated with the application of ML in this field, neglecting the systematic conceptual description and outlook on future development [23]. This study aims to explore recent advances in machine learning applications within the field of ocean color remote sensing. We investigate the development and training of ML models and algorithms designed to analyze ocean color data, the innovative applications that are enhancing our understanding of marine processes, and the challenges and opportunities that lie ahead in this rapidly evolving field. This work introduces several key innovations that were not covered in the earlier work including the integration of deep learning models for atmospheric correction and the use of multisource data fusion for more accurate and reliable ocean color retrievals. Furthermore, the current work introduces the novel application of physics-informed neural networks and explainable artificial intelligence, which were not discussed in the earlier study, offering a more advanced and comprehensive framework for interpreting ocean color data.

2. Fundamentals of Ocean Color Remote Sensing

2.1. Principles of Ocean Color Remote Sensing

Ocean color remote sensing is fundamentally based on the interaction between light and the surface and column of the ocean, which is influenced by the biological, chemical, and physical properties of the water. When light reaches the ocean, it is absorbed, scattered, and reflected by marine constituents such as phytoplankton, suspended sediments, and dissolved organic matter. This interaction produces a distinct spectral signature that varies across different wavelengths, enabling inference of the concentration of these constituents through analysis of the reflected light. Satellite sensors, equipped with specialized spectral bands, facilitate the detection of subtle variations in ocean color, which can be systematically translated into quantitative data regarding the optical properties of water.

The process of ocean color remote sensing relies on the precise measurement of water-leaving radiance—the fraction of light that emerges from the water after interacting with its constituents—and its subsequent conversion into meaningful environmental data. This process requires correction for atmospheric effects that can alter the light detected by sensors and involves the application of bio-optical models to link optical measurements to the biological and chemical properties of the water. The remote sensing reflectance, defined as the ratio of water-leaving radiance to incident light, serves as a standardized metric for these measurements, enabling comparisons across different temporal scales and sensor platforms [24]. The accuracy of these measurements is crucial for obtaining reliable information about marine ecosystems and environmental conditions.

2.2. Traditional Challenges in Data Interpretation and Analysis

Ocean color remote sensing provides a unique capability for monitoring the marine environment; however, the interpretation and analysis of the resulting data have historically presented significant challenges. One of the primary concerns is the atmospheric influence on light measured by remote sensors. The atmosphere can scatter and absorb light, complicating the distinction between the atmospheric signal and that originating from the ocean surface. Accurate interpretation of the data necessitates the use of sophisticated atmospheric correction models designed to estimate and mitigate these atmospheric effects. This process can be intricate and is particularly susceptible to errors in regions characterized by high aerosol concentrations or variable weather conditions [19,24,25].

In addition to atmospheric correction, the optical complexity of coastal waters presents another significant challenge. Coastal and inland waters frequently contain high concentrations of suspended sediments, colored dissolved organic matter (CDOM), and diverse types of phytoplankton, all of which contribute to complex spectral signals that are difficult to disentangle. Traditional approaches have employed bio-optical models to link optical measurements with the biogeochemical properties of water [26,27,28,29]. These models are typically based on empirical relationships derived from in situ measurements and vary in their complexity and accuracy. This variability presents a challenge for the accurate retrieval of key parameters, such as chlorophyll-a concentration [30], which is a crucial indicator of phytoplankton biomass.

The spatial and temporal resolutions of satellite sensors pose significant challenges for traditional data interpretation in ocean color remote sensing. The relatively coarse spatial resolution of certain sensors can limit their ability to capture small-scale features and transient events, such as algal blooms or upwelling phenomena. Moreover, the temporal resolution, often dictated by the orbital characteristics of the satellite, may lead to data gaps caused by cloud cover, particularly in regions prone to high cloudiness or frequent storm activity. These data gaps can hinder the capacity to conduct time-series analyses, which are essential for detecting trends and seasonal variations in marine ecosystems [31,32].

Moreover, the traditional approach to ocean color data interpretation has largely depended on handcrafted algorithms that are based on specific assumptions regarding the relationships between optical properties and biogeochemical parameters. These algorithms often require a priori knowledge of the water type and can be highly sensitive to changes in environmental conditions. For example, the presence of optically active constituents other than phytoplankton, such as CDOM or suspended sediments, can introduce biases in the retrieved chlorophyll concentrations if not properly accounted for [33,34].

Overall, traditional methods face several limitations, such as atmospheric interference requiring complex correction models. Optical complexity in coastal waters arises from high concentrations of suspended sediments, CDOM, and diverse phytoplankton, which generate complex spectral signals that traditional models struggle to separate. The low spatial and temporal resolutions of satellite sensors limit their ability to capture small-scale features, such as algal blooms or upwelling, and results in data gaps due to cloud cover. Algorithm limitations often rely on predefined assumptions about water types, which can introduce biases when conditions change. The advent of artificial intelligence (AI), particularly through ML, has opened new avenues for addressing the traditional challenges associated with ocean color remote sensing. ML algorithms are uniquely equipped to detect complex patterns and relationships in data that conventional empirical methods may not capture effectively [35,36,37]. By harnessing the power of ML, researchers can significantly enhance atmospheric correction, overcoming the challenges posed by atmospheric interference, and more accurately interpret optically complex waters [38,39,40,41,42]. Additionally, ML has been demonstrated as an effective tool for improving spatial and temporal resolution, helping to detect small-scale phenomena such as algal blooms while filling data gaps caused by cloud covers [43]. ML can also ensure sensor consistency across different platforms and adapt to dynamic environmental conditions [44], offering a more flexible, accurate, and scalable approach to ocean color remote sensing that addresses the limitations of traditional methods.

3. Machine Learning Models and Algorithms

Machine learning is a subset of artificial intelligence that focuses on the development of algorithms capable of learning from and making decisions on the basis of data. It involves the creation and application of statistical models that enable computers to improve their performance on a specific task through experience learned from data [45]. The essence of ML lies in constructing models that can be generalized from the provided data to make predictions or decisions without being explicitly programmed for specific outcomes [46]. The ML process typically begins with data preprocessing—a critical phase involving the cleaning, normalization, and transformation of raw data into a format suitable for analysis [47]. This is followed by feature selection, wherein the most informative variables are identified to encapsulate the essence of the data [48]. The subsequent step is model selection, where an appropriate algorithm is chosen based on the nature of the problem, whether it involves classification, regression, or clustering. Selecting the right algorithm is vital to ensure that the model efficiently addresses the problem at hand [49]. The chosen model is then trained on a dataset, enabling it to recognize complex relationships and patterns. After training, it is imperative to evaluate the model’s performance via validation techniques, such as cross-validation, to ensure that it generalizes well to new data [50]. Ultimately, the trained model is deployed to make predictions or decisions on new data.

There are several types of ML, including supervised learning, unsupervised learning and reinforcement learning [51], each of which is suitable for different scenarios. Supervised learning is applied when the training data include labeled examples, enabling the model to learn how to predict outputs based on input features. The primary objective of supervised learning is to develop a model that can accurately map relationships between input data and corresponding output labels, thereby enabling it to make reliable predictions on new, unseen data. Typically, in supervised learning, the dataset is divided into a training set and a test set, where the model learns from the training set and subsequently has its performance evaluated on the test set. In contrast, unsupervised learning is utilized when the data lack labels, prompting the model to identify inherent structures or patterns within the data itself. The goal of unsupervised learning is for the model to uncover hidden structures or clusters in the data without any predefined output labels. This approach is frequently used in exploratory data analysis to reveal underlying patterns or groupings within a dataset. Reinforcement learning, on the other hand, centers on making a series of decisions aimed at maximizing a cumulative reward signal. The goal of reinforcement learning is for an agent to learn optimal decision-making strategies through interactions with its environment, where it continuously adjusts its actions based on the rewards or penalties received as feedback. This trial-and-error learning process enables the agent to develop strategies that maximize long-term rewards.

ML encompasses a diverse array of algorithms, each suited to different types of data and tasks. Linear models, known for their simplicity, are effective when dealing with linearly separable data. Decision trees and support vector machines (SVMs) are particularly powerful for both classification and regression tasks [30,52,53,54], with SVMs being renowned for their ability to handle high-dimensional spaces. Artificial neural networks (ANNs), inspired by the architecture of the human brain, excel in learning from vast amounts of data and form the foundation of deep learning. Deep learning involves multiple layers within neural networks, enabling the capture of intricate and complex patterns [21,22,55]. Ensemble methods, such as random forests [56,57] and gradient boosting machines [18,19,58,59,60], improve prediction accuracy and robustness by combining the strengths of multiple models. In the realm of unsupervised learning, clustering algorithms are employed to group similar data points without the need for prior labels. Dimensionality reduction techniques, such as principal component analysis (PCA), are utilized to simplify and visualize complex datasets [61,62], making them more manageable for analysis. The choice of an ML model is often dictated by the specific requirements of the task, the nature of the data, and the desired outcome. Each model has its unique strengths, making it well suited to particular types of problems. The field of ML is highly dynamic, with ongoing research continually introducing new algorithms and techniques to address an ever-expanding array of challenges.

As ML models have advanced in sophistication, they are increasingly being applied to complex domains such as ocean color remote sensing. Indeed, ML is particularly useful in ocean color remote sensing when the relationships between variables are nonlinear or too complex for traditional analytical models. Historically, radiative transfer models have been used to relate the observed remote sensing reflectance (Rrs) to the underlying oceanic properties. While these models are effective in simple cases, they struggle with more complex scenarios involving factors such as CDOM and suspended particles, which independently influence the water-leaving radiance. ML models, such as multi-layer perceptrons (MLPs), can better handle these complexities by capturing nonlinear relationships and processing multidimensional inputs from the data [16,63,64,65]. The process of machine learning for ocean color remote sensing begins with the collection of a large dataset from observations or simulations, which is subsequently divided into training and test sets. These datasets are used to train ML algorithms, allowing them to develop high-accuracy models through iterative parameter optimization. In the prediction phase, these models are employed to infer results from new data.

The integration of ML with OCRS significantly improves prediction accuracy, enhances data processing capabilities, and provides better insights into environmental factors, facilitating a more comprehensive understanding of marine ecosystems (Figure 1a). Furthermore, this combination enables seamless coverage of large ocean areas, while also improving long-term and short-term predictions by analyzing complex patterns and factors such as atmospheric effects and phytoplankton blooms. The key advantage of ML in this field is its ability to improve prediction accuracy by incorporating complex environmental factors, which are difficult to model using traditional methods. Moreover, deep learning models eliminate the need for manual feature extraction by learning directly from the data, offering a more flexible and robust approach to solving inverse problems in ocean color remote sensing. The application of ML in this field highlights its transformative potential in enhancing our ability to analyze and understand complex environmental systems.

4. Machine Learning-Enhanced Ocean Color Remote Sensing

Since Heinemann first applied machine learning in ocean color remote sensing in 1997 [66], the exponential growth of data in the domain of ocean color remote sensing has facilitated the increasing integration of ML techniques. This bibliometric analysis was conducted by performing a search in the Web of Science Core Collection via the following query:

“(TS = (artificial intelligence) OR TS = (machine learning) OR TS = (machine-learning) OR TS = (deep learning) OR TS = (deep-learning) OR TS = (data mining) OR TS = (data-mining) OR TS = (neural network *) OR TS = (transfer learning)) AND (TS = (ocean color remote sensing))”. The results of this search were then exported and analyzed via Bibliometrix ver4.3.3 [67].

The analysis highlights the dramatic rise in scientific production, as shown in Figure 1b, where the number of publications sharply increased after 2015, indicating a period of significant expansion in ML applications within this field. The co-occurrence network in Figure 1c further offers a deeper understanding of the key themes driving research in this area. The network reveals the interconnections between crucial terms such as “machine learning” “remote sensing” “ocean color” and “chlorophyll” among others, emphasizing how these terms collectively define the focal points of current research. These interconnected themes highlight the central role that machine learning techniques are playing in ocean color retrieval, showing how advancements in computational methods are allowing for more accurate and comprehensive analysis of oceanic data. The evolution of key research topics over time shown in Figure 1d highlights the development and growing prominence of various themes within the intersection of machine learning and ocean color remote sensing. Deep learning and machine learning have seen a rapid increase in relevance starting from the early 2010s, signaling the growing integration of these computational techniques into oceanographic studies. This rise coincides with the broader adoption of remote sensing reflectance algorithms and advancements in atmospheric correction, both of which are critical for improving the accuracy of satellite-based ocean color measurements, particularly in the context of complex coastal waters and varying atmospheric conditions. The continuous increase in research surrounding chlorophyll reflects their importance as key parameters in ocean color remote sensing for monitoring phytoplankton growth and marine ecosystem health. The neural network term, representing a critical aspect of ML applied to remote sensing, has also demonstrated a significant uptick, emphasizing the growing use of neural network-based approaches for processing ocean color data. Harmful algal blooms and phytoplankton are additional prominent topics that have emerged as critical areas of focus, highlighting the role of ML models in detecting and predicting these phenomena through the analysis of ocean color data.

Table 1 lists the high-frequency keywords in machine learning in ocean color remote sensing, with terms like “remote sensing”, “ocean color”, and “machine learning” appearing most frequently, reflecting the central themes in the field. Other notable keywords include “chlorophyll-a”, “atmospheric correction”, and “satellite remote sensing”, indicating key areas of focus in ML applications for ocean color remote sensing. The specific improvement of ML over canonical methods can be seen in Table 2.

4.1. Atmospheric and Optical Correction Innovations

Atmospheric correction represents a fundamental and critical step in numerous ocean color remote sensing algorithms. The primary objective of atmospheric correction is to eliminate the radiative contributions of the atmosphere—including those from air molecules, aerosols, and surface reflection—from the satellite-measured top-of-atmosphere (TOA) radiance. This refined radiance, known as water-leaving radiance, is essential for deriving ocean color products such as chlorophyll-a concentration. Notably, atmospheric contributions can constitute up to 90% of the TOA radiance observed by satellite sensors, with this influence being even more pronounced in coastal areas, particularly within the blue spectral band, or in highly turbid waters dominated by sediment. In these scenarios, the atmospheric influence diminishes in the red and near-infrared bands [75]. Consequently, the development of accurate atmospheric correction algorithms is crucial for the precise processing of ocean color data [24]. Early research often relied on various data interpolation methods, including statistical interpolation, linear regression, nonlinear regression, and multiple regression, to reconstruct marine atmospheric parameters. These methods typically utilize commonly available satellite data, encompassing variables such as air temperature, atmospheric pressure, humidity, wind speed, and aerosol optical depth.

However, due to the diverse and complex factors influencing atmospheric parameters across different marine environments, traditional models often lack universality when applied to varying sea areas worldwide. Historically, research has predominantly concentrated on developing atmospheric correction algorithms tailored to open ocean environments, with relatively few studies addressing the unique challenges posed by inland waters and coastal regions characterized by high turbidity. In recent years, ML-based algorithms have been increasingly adopted for predicting specific marine weather phenomena [38,39,40,41,42]. The rapid advancement of ML has similarly opened new avenues for modeling the nonlinear relationships inherent in global marine atmospheric environmental parameters involving inputs such as the solar zenith angle, viewing zenith angle, relative azimuth angle, and TOA radiance, and retrieving Rrs, which is essential for accurate ocean color analysis. These ML algorithms, which account for the complex interplay of regional factors and various environmental influences, have been extensively applied in the field of atmospheric correction [17,74,76,77,78,79,80]. A development in this domain is the intelligent polarization atmospheric correction (IPAC) algorithm proposed by He et al. (2024). This algorithm leverages the vector radiative transfer simulation model to simulate polarized TOA vector apparent reflectance over open ocean areas. The extreme gradient boosting (XGBoost) algorithm is employed for atmospheric correction. Compared to conventional atmospheric correction algorithms, the IPAC algorithm demonstrates a significant improvement in the accuracy of retrieved ocean color products [18]. In a related advancement, Song et al. (2023) introduced the OC-XGBRT algorithm, which integrates XGBoost with radiative transfer simulation. This approach addresses and mitigates the limitations of traditional atmospheric correction algorithms, particularly in scenarios involving absorbing aerosols [19]. Bayesian methodology proposed by Frouin et al. (2015) for inverting satellite ocean-color data uses a probability distribution framework to estimate marine reflectance from TOA reflectance, which can effectively model uncertainties and improve the reliability of ocean color retrievals [42].

Additionally, ANNs have emerged as critical tools in the atmospheric correction process. The FastMAPOL algorithm developed by Gao et al. (2021) employed an NN model to efficiently retrieve aerosol properties and water-leaving signals from polarimetric data [81]. Wang et al. (2022) introduced an atmospheric correction algorithm based on NASA’s SeaDAS, known as ACANIR-NN. This algorithm effectively performs atmospheric correction across both clear and turbid waters, even for sensors that lack shortwave infrared capabilities [82]. In another study, Sun et al. (2021) proposed a time-dependent neural network designed for automatic atmospheric correction and target detection using multi-scan hyperspectral data collected at various elevation angles. This neural network is capable of estimating atmospheric bulk properties under conditions of seasonal and diurnal variations and can address the challenges posed by missing data [83]. Similarly, Li et al. (2020) developed a novel ANN algorithm by training it on extensive matches between Rayleigh-corrected radiance from the morning and evening of GOCI (Geostationary Ocean Color Imager) observations and high-quality noontime Rrs. This approach yielded diurnally stable Rrs values in open ocean waters from morning to evening, effectively resolving data anomalies in GOCI inversions under high solar zenith angles [72].

Concurrent advancements are also being made in ML-based atmospheric correction inversion algorithms. Brockmann et al. (2016) further refined the Case 2 algorithm for Sentinel-2 satellites and developed the Case 2 Regional CoastColour (C2RCC) algorithm. This algorithm is versatile and applicable to all heritage, current, and potential future multispectral and hyperspectral ocean color sensors [84]. Additionally, Rusia et al. (2021) optimized an existing algorithm using ML by framing interpolation as a regression problem, significantly reducing data processing time and enhancing overall effectiveness [85]. Moreover, some researchers have proposed ML algorithms driven by coupled atmosphere-ocean radiative transfer simulations [86]. Compared with NASA’s standard products, these ML models, particularly those based on neural networks, exhibit superior performance under nearshore turbid water conditions. Unlike the traditional physical method-based lookup table and linear interpolation approaches, which are often complex and less adaptable, the nonlinear fitting capabilities of ML are better suited to handling intricate environments, especially in processing data from nearshore turbid areas and regions with absorbing aerosols.

4.2. Applications in Bio-Optical Property Retrieval

Significant progress has been made in monitoring Chl-a via satellite remote sensing technologies. A variety of algorithms have been developed to estimate Chl-a across different water types and seasonal conditions, employing multiple sensors. These include two-band ratio algorithms (BR), three-band algorithms (TBA), four-band algorithms (FBA), baseline subtraction algorithms (BS), nested band ratio algorithms (NBR), quasi-analytical algorithms (QAA), and QAA-based algorithms [26,29,87,88,89,90,91,92,93]. Furthermore, the optical water type (OWT) classification strategy has proven to be an effective tool for assessing optically complex waters. This approach facilitates the retrieval of Chl-a from widely dispersed in situ data [94,95,96].

Accurate prediction of coastal ocean Chl-a is essential for dynamic water quality monitoring, particularly given the critical role of eutrophication. The neural network method was introduced for chlorophyll pigment retrieval in ocean color remote sensing in 1999 [68,97]. Since then, the development of chlorophyll-a concentration retrieval methods based on the ANN method has been rapid [69,98,99,100]. The subsequent mixture density network (MDN) provides not only a single output but also a probability distribution, capturing uncertainty and multimodal predictions [73,101,102,103,104]. Gross-Colzy and Frouin (2003) proposed a methodology that uses PCA to retrieve marine reflectance and chlorophyll-a concentrations from satellite data [105]. The support vector machine algorithm has demonstrated strong performance in the inversion of Chl-a [52,53] and the mapping of phytoplankton function types [54]. The study by Niu et al. (2023) underscores the effectiveness of the Gaussian process regression (GPR) model in estimating Chl-a, highlighting the significance of including particulate organic carbon (POC) when modeling Chl-a [14]. Similarly, Cui et al. (2022) developed a Chl-a prediction model based on the XGBoost algorithm, employing an intelligent parameter optimization strategy. This model effectively uncovers the potential relationships between environmental factors and Chl-a in oceanic contexts [58]. In addition, XGBoost was used to derive Chl-a in turbid lakes from Landsat-8 Operational Land Imager [60]. Blix and Eltoft (2018) introduced an automatic model selection algorithm (AMSA) designed for global estimation of marine Chl-a in optically complex waters. This algorithm utilizes four ML feature ranking methods and three ML regression models, including Gaussian process regression (GPR), support vector regression (SVR), and partial least squares regression (PLSR) [30]. Additionally, Kolluru and Tiwari (2022) proposed a novel approach to derive Chl-a using an MLP neural network with resilient backpropagation, leveraging four ocean color bands available in most ocean color sensors [63]. Combined with self-organizing maps (SOMs), the accuracy of chlorophyll concentration retrieval can be significantly improved [106,107,108,109,110,111,112]. Moreover, Zhang et al. (2023) developed a spatial–temporal–ecological ensemble (STEE) model, which integrates gradient-boosted decision trees (GBDTs), ANN, and attentive interpretable tabular learning (TabNet) to construct a robust prediction framework for estimating Chl-a across eight distinct phytoplankton groups [59]. In another advancement, Zhang et al. (2023) employed deep learning techniques for spaceborne LiDAR, enabling the inversion of chlorophyll concentration in polar waters (as depicted in Figure 2a) [55]. This deep learning-based spaceborne LiDAR water parameter inversion model demonstrates a promising capability to extend the detection range of marine remote sensing.

Inherent optical properties (IOPs) are intrinsic characteristics of water bodies that depend solely on the constituents within the water, independent of external lighting conditions [113]. These properties are fundamental to oceanographic research and play crucial roles in understanding marine ecosystems, biogeochemical cycles, and ocean remote sensing. Many algorithms related to IOPs have employed the generalized stacked constraints model (GSCM) [114]. Compared to traditional regression models, ANNs offer advantages such as flexibility, fast computation, high reliability, and consistent output, particularly in handling nonlinear problems. In the field of ocean color, ANNs have been effectively utilized to derive various water quality parameters, including total IOPs [22,115,116], sub-component IOPs [71], and diffuse attenuation coefficients [70]. Zhang et al. (2023) reconstructed a spaceborne LiDAR water particulate backscattering inversion model using deep learning techniques [22]. Compared with traditional methods (as illustrated in Figure 2b), the deep learning approach provided results that were closer to the measured values, significantly enhancing the inversion accuracy. Additionally, random forest (RF) has been employed to retrieve the diffuse attenuation coefficient (Kd) from ICEsat-2 ATLAS Spaceborne Lidar, achieving high accuracy and efficacy [56].

The concentrations of optically significant marine components, such as phytoplankton, nonalgal particles, and CDOM, are critical in controlling light propagation in aquatic environments [117]. Liu et al. (2021) evaluated three ML algorithms—XGBoost, SVM, and ANN—for retrieving POC concentrations. Their study found that XGBoost was the most robust, whereas ANN was more effective in optically complex waters with extremely high POC concentrations [118]. Liu et al. (2022) developed an ML method to reconstruct monthly concentrations of dissolved inorganic nitrogen (DIN), dissolved inorganic phosphorus (DIP), and dissolved silicate (DSi) in the surface layer of the Yellow Sea and Bohai Sea from 2003 to 2019. This method is significant for monitoring spatiotemporal changes in nutrient concentrations in shelf seas, contributing to the understanding of marine primary productivity and ecological dynamics [119].

4.3. Enhanced Analysis of the Ocean Carbon Cycle

The role of the ocean in the carbon cycle can be estimated through the pCO₂ in seawater. The high spatiotemporal heterogeneity of seawater pCO₂ is a critical parameter for quantifying air–sea CO₂ exchange. Consequently, reconstructing seawater pCO₂ datasets through observational methods, including ship surveys, buoys, and satellite remote sensing, is essential for accurately understanding the dynamics of the ocean carbon reservoirs [120]. Among these methods, satellite remote sensing of seawater pCO₂ holds significant potential for reducing observation time and costs, making it a promising approach. However, practical challenges remain in the remote sensing of seawater pCO₂ due to the complex and diverse environmental factors influencing its variability. In early studies, various data interpolation techniques, such as statistical interpolation, linear regression, nonlinear regression, and multiple regression, were commonly employed to reconstruct seawater pCO₂. These techniques utilize satellite-derived data, including sea surface temperature (SST), sea surface salinity (SSS), Chl-a, Kd, wind speed, and mixed layer depth (MLD) [121,122,123,124,125,126,127].

However, because the diverse and complex factors influence seawater pCO₂ across different oceanic regions, traditional models often lack universal applicability at the global scale. The rapid development of ML has provided a new tool to capture the nonlinear relationships between global marine ecological environmental parameters and seawater pCO₂. ML algorithms, which account for the complex interplay of various factors across different regions, have been widely applied in the construction of global seawater pCO₂ datasets [12,20,64,128,129,130,131,132,133]. Landschützer et al. (2013) pioneered the use of a two-step neural network based on self-organizing maps to classify Atlantic seawater pCO₂, successfully reconstructing the monthly average seawater pCO₂ in the Atlantic with a spatial resolution of 1° for each region and season [130]. This remote sensing inversion algorithm has since been widely applied, enabling the inversion of seawater pCO₂ in various regions, including the global open ocean [134], global marginal seas [12], Arctic [131], and Northeast Pacific [135]. Another significant algorithm in the reconstruction of seawater pCO₂ is the MLPs. Jo et al. (2012) utilized MLPs to create a comprehensive dataset of marine parameters, facilitating the estimation of the global carbon budget [65]. Zeng et al. reconstructed the monthly average distribution datasets of the tropical Atlantic [136] and the global [137] using MLPs. Denvil-Sommer et al. (2019) proposed the use of an MLP model to achieve the inversion of seawater pCO₂ in the open ocean [64] and marginal seas [133]. Zhang et al. (2022, 2023) further expanded the application of MLPs in the Arctic [16] and the global day/night carbon cycle [138] utilizing spaceborne LiDAR data. Their work addressed the gaps in traditional datasets during the polar winter, marking the first time the nighttime distribution of global seawater pCO₂ has been depicted (as shown in Figure 3).

4.4. Development of Data Reconstruction Methods Based on Machine Learning

By integrating multiple remote sensing data sources with in situ observational data and utilizing multisource data fusion techniques alongside the processing power of large models, a comprehensive marine environmental monitoring framework can be developed. These models are capable of accurately reflecting trends and changes in the marine environment. However, the complexity of marine environments poses significant challenges in obtaining comprehensive and precise data, often necessitating substantial costs and advanced technical expertise. In many fields, including vertical temperature and salinity, the accumulated marine data remain insufficient for practical use [139]. Additionally, traditional global ocean models frequently exhibit low resolution, which limits their ability to capture regional details effectively [140]. The sparsity and discontinuity of observational data—constrained by financial and technical limitations—seriously hinder the study of marine processes and mechanisms, resulting in an incomplete understanding of local marine phenomena. To address these challenges, data reconstruction techniques serve as widely utilized and effective solutions, offering a means to provide continuous and comprehensive marine datasets.

In traditional data reconstruction tasks, commonly used methods include data assimilation based on dynamic frameworks, numerical-driven statistical approaches such as empirical orthogonal functions (EOFs) to infer missing data, and spatial autocorrelation-based data interpolation techniques like Kriging/optimal interpolation (OI) [141,142,143]. An advanced extension of EOF, the data interpolating empirical orthogonal functions (DINEOF), has demonstrated significant effectiveness in reconstructing marine remote sensing data. DINEOF utilizes the dominant modes of variability within the dataset to iteratively infer missing values without requiring external datasets or prior assumptions [144,145]. It has been widely applied in the reconstruction of variables such as SST [146], Chl-a [147], and other oceanographic parameters [148] by filling data gaps caused by cloud cover, sensor limitations, or noise.

However, these methods often face challenges, including difficulties in model construction, limited applicability to specific regions, challenges in handling nonlinear relationships, and high sensitivity to outliers. In recent years, ML methods have gained significant traction in the field of marine data reconstruction due to their ability to efficiently manage complex, multi-source, and multi-scale data [43]. ML models exhibit strong adaptability and scalability, making them applicable to diverse datasets and tasks [120]. Additionally, their automated and intelligent nature reduces human intervention and the potential for errors, thereby enhancing the efficiency and quality of data processing. By constructing complex mathematical models, ML can fill data gaps and reduce uncertainty. [149]. ML has been employed to optimize interpolation methods for data reconstruction [150,151,152,153,154]. Moreover, ML can complete existing datasets [155,156,157], and generate new high-quality datasets [57,158,159].

In various studies, researchers have employed different ML techniques to optimize interpolation methods for predicting missing continuous data. For instance, Martinez et al. (2020) use a nonlinear statistical approach based on SVR to reconstruct global chlorophyll-a variations, demonstrating its ability to accurately reproduce interannual and decadal variability in Chl-a from satellite observations and physical model predictors [160]. Mohebzadeh et al. (2021) utilized SVR to perform spatiotemporal interpolation of chlorophyll concentration based on MODIS satellite data and other environmental parameters [153]. Similarly, Ouala et al. (2018) implemented a neural network-based Kalman filter, integrating satellite-derived sea surface temperature and sea surface height data, to carry out spatiotemporal interpolation of sea surface temperature data [154]. Roussillon et al. (2023) propose a multi-mode convolutional neural network to reconstruct global satellite-derived Chl-a time series, accounting for regional biogeochemical variations and improving reconstruction performance while offering insights into physical–biogeochemical processes controlling phytoplankton variability [161]. Zhang et al. (2024) applied the optimal interpolation model to fuse data from multiple sources, including MERSI-II and MWR SST data, reconstructed data, and background field data, resulting in high-quality SST fusion products with a temporal resolution of 12 h and a spatial resolution of 5 km [151]. Cutolo et al. (2024) proposed the CLOINet model, which was trained to optimize the interpolation network by combining remote sensing data, in situ sparse observations, and deep learning techniques. This approach aims to reconstruct a comprehensive marine state image encompassing multiple ocean parameters, such as sea surface temperature, salinity, and chlorophyll concentration [150].

Researchers have also employed ML techniques to reconstruct low-quality marine datasets impacted by noise and cloud cover, thereby enhancing dataset integrity. For example, Hirahara et al. (2019) utilized a generative adversarial network (GAN) to learn the distribution of sea surface temperature images. By integrating physical model constraints into the GAN’s loss function, they improved the model’s accuracy and physical plausibility, resulting in the denoising and restoration of sea surface temperature images [156]. Similarly, Jouini et al. (2013) used neural networks to identify and reconstruct areas obscured by clouds, generating complete chlorophyll images [157]. Barth et al. (2021) applied a multivariate convolutional neural network, DINCAE 2.0, to reconstruct sea surface temperature and altimeter observation data, effectively filling data gaps caused by cloud cover and orbital gaps [155].

In addition to these methods, other researchers have integrated multiple data sources to create new, more comprehensive, and accurate marine datasets. For instance, Park et al. (2020) employed a random forest method based on ensemble learning to integrate remote sensing chlorophyll concentration data from the Ross Sea with various environmental parameters (such as sea ice, water depth, air temperature, etc.), resulting in more complete and accurate chlorophyll data [57]. Similarly, Ćatipović et al. (2023) used a GAN to reconstruct more complete and accurate chlorophyll concentration data by integrating satellite-derived chlorophyll data with other related datasets [158].

5. Challenges and Opportunities

5.1. Challenges

5.1.1. Limitations in Generalization and Model Adaptability

While ML has been widely used in ocean color remote sensing and excels at regression fitting, it does not generate any “new” information beyond the distribution of its training data [162]. This limitation is particularly evident in the context of the ocean’s inherent complexity and dynamic nature, where ML models can struggle with inadequate generalization, particularly when trained on limited or biased datasets [163]. As a result, models trained on region-specific datasets often fail when applied to other areas, due to variations in water properties, optical characteristics, and biological activity [163]. This limitation is further exacerbated by shifts in data distribution, which is a common issue in marine environments. For example, seasonal changes in phytoplankton dynamics or atmospheric conditions can significantly alter the spectral reflectance captured by remote sensors, leading to performance degradation in pre-trained models. Ensuring robust performance across varying spatial and temporal scales remains a significant challenge, highlighting the need for well-designed data segmentation and evaluation strategies to simulate intended use cases effectively.

In ocean remote sensing, changes in data distribution are common, and ML models are highly sensitive to these variations. Many ML models are also highly sensitive to local biases in training data, limiting their application to global-scale monitoring. Transferring models across sensors or geographic regions often requires significant pretraining or adaptation [70], which can be overlooked. Moreover, spatial and temporal autocorrelations, which occur when nearby data points or those taken at similar times exhibit dependencies, can further impact model performance. These dependencies can introduce information leakage between training and test datasets, leading to overestimated model performance [164]. To address this, it is critical to adopt appropriate sampling strategies. Although widely used, random sampling methods can introduce biases and negatively affect model accuracy [165]. Spatial and temporal fold cross-validation is a more robust approach, offering a more accurate representation of model errors [59]. Additionally, when simulation datasets are used, it is essential to validate model performance with additional measured datasets [18,19]. In addition, domain adaptation techniques, such as transfer learning, can further improve model robustness across regions and sensors, enabling their application to diverse datasets and environments [166].

5.1.2. Data Availability and Quality

A significant challenge in applying ML to marine environments is the quality and availability of training data. As encapsulated by the principle of “garbage in, garbage out” [167], ML models depend on large, high-quality datasets for effective training. However, collecting such datasets in marine environments is both costly and logistically challenging. The high cost and technical limitations of data collection result in insufficient spatial and temporal coverage, leaving critical gaps in the datasets. Additionally, data noise and biases are often underestimated, significantly impacting model performance. For example, noise in certain satellite datasets can directly increase the uncertainty of inversion results. Furthermore, the dynamic nature of the ocean necessitates continuous updates and retraining of models to maintain accuracy over time. A case in point is the combined use of Argo float and satellite monitoring data, where sparse temporal coverage makes it difficult for models to accurately capture small-scale phenomena. To address these challenges, several solutions have been proposed. Self-supervised learning techniques can reduce dependence on labeled data by leveraging latent information in unlabeled datasets to improve model performance [168]. Additionally, the integration of process-guided deep learning, which combines physical constraints with statistical models [169,170], has shown promise in enhancing adaptability to sparse datasets. Initiatives such as the global Ocean Color Climate Change Initiative (OC-CCI) aim to build open data platforms that promote the sharing and standardization of high-quality data, further enabling robust and scalable ML applications in ocean research [171,172].

5.1.3. Computational Complexity and Resource Limitations

Another challenge is the computational complexity associated with many ML algorithms. The processing power and time required to train sophisticated models can be substantial, which may hinder the feasibility of real-time applications or the analysis of extensive datasets [173]. For example, training deep learning models on global-scale satellite datasets, such as multiyear ocean color data from MODIS or Sentinel-3, often involves processing petabytes of data, requiring extensive GPU resources and days or weeks of computation. This poses significant barriers for resource-limited research institutions. The complexity of deep learning models not only limits their scalability but also makes real-time applications nearly impossible. Traditional optimization algorithms, while computationally more efficient, often fail to provide the level of accuracy needed, further restricting the practical application of these models [174]. To address these issues, potential solutions include leveraging cloud computing and distributed computing technologies [175] to reduce hardware costs and improve computational efficiency. Additionally, the development of low-complexity models, such as lightweight neural networks [176] optimized for sparse data, can significantly reduce training time and energy consumption while maintaining acceptable performance levels. These approaches not only make ML models more accessible to institutions with limited resources but also increase their scalability and adaptability for broader applications in ocean science.

Furthermore, as oceanographic monitoring increasingly involves remote or field-based applications, developing efficient AI models suitable for edge devices [177] (e.g., drones, autonomous vehicles, or mobile platforms) can help reduce latency, improve real-time decision-making, and increase operational feasibility. By deploying AI models directly on edge devices, the need to transmit vast amounts of data to centralized servers is minimized, enabling faster response times and more efficient use of available resources. Key strategies to achieve this include model pruning [178], where unnecessary parts of the model are removed to reduce its size, knowledge distillation, which transfers knowledge from a larger model to a smaller one, and algorithm optimization, which ensures that AI models run effectively in low-power, resource-constrained environments. These methods are essential for ensuring that AI-powered oceanographic monitoring systems can function effectively in remote locations, where traditional computational resources are limited. Such models are crucial for autonomous marine vehicles or drones used for continuous monitoring in challenging environments, providing real-time feedback for more timely and informed decision-making.

5.1.4. Model Interpretability and Transparency

Model interpretability presents another critical limitation, particularly for deep learning techniques. While these models are based on well-defined algorithms and cost functions, their complexity arises from the high-dimensional data they process, which can make it challenging to visualize the relationships they model. For instance, in ocean color remote sensing tasks such as atmospheric correction, deep learning models may capture statistical correlations within the data rather than true causal relationships. This can lead to failures under changing environmental conditions or unseen scenarios, raising concerns about the reliability of predictions. Unlike polynomial fitting, which can be represented in a two-dimensional input-output space (e.g., Chl-a versus Rrs slope), ML models often deal with multidimensional data that cannot be easily visualized. However, the advantage of ML models is their flexibility, which allows them to require less preprocessing and efficiently minimize cost functions, making them valuable tools for complex ocean color remote sensing problems.

5.1.5. Explainable AI for Ocean Color Remote Sensing

As discussed in Section 5.1.4, model interpretability is a key challenge in ocean color remote sensing, particularly for deep learning models that handle high-dimensional and complex data. While model transparency is essential for scientific and practical applications, explainable AI (XAI) techniques [179] offer specialized solutions to address the lack of interpretability, which can undermine trust in model predictions. XAI focuses on making the “black-box” nature of machine learning models more transparent, allowing users to understand how predictions are made and ensuring that these predictions align with physical principles governing the ocean environment.

XAI methods such as Shapley additive explanations (SHAP) [180] and local interpretable model-agnostic explanations (LIME) [181] are critical tools in this context. They provide insights into the contributions of individual input features to the output, enabling researchers to trace and validate the model’s decision-making process. These tools can help clarify how environmental variables, such as chlorophyll concentration or water turbidity, influence the model’s predictions, providing confidence in its results, especially when applied to unseen data or novel environmental conditions. Furthermore, the integration of physics-informed neural networks (PINNs) offers a powerful way to enhance both model interpretability and scientific validity [182]. By embedding physical constraints, such as radiative transfer models, within AI architectures, PINNs ensure that the model’s predictions remain consistent with established oceanographic principles. This hybrid approach addresses the challenge of physical interpretability, ensuring that AI predictions are not only understandable but also scientifically grounded. Overall, XAI serves a dual purpose: improving the transparency of AI models while ensuring that they remain aligned with the physical processes inherent in ocean color remote sensing. The combination of XAI techniques and hybrid models will be essential for advancing oceanographic research, enabling AI models to be more reliable, scientifically valid, and easier to interpret, fostering trust and facilitating their application in real-world scenarios.

5.2. Opportunities and the Way Forward

5.2.1. Harnessing Technological Innovation and Computational Advances

Despite these challenges, ML is rapidly evolving and presents significant opportunities for advancing ocean color remote sensing. The combination of emerging algorithms and powerful computational infrastructures has opened new frontiers for analyzing large, complex datasets. For example, advanced architectures such as transformers and self-supervised learning techniques [183] could help researchers process and extract insights from satellite imagery and in situ observations, uncovering patterns that would otherwise remain hidden. A notable example is the CyanoTracker project, which employs advanced DL and ML methods to monitor and predict cyanobacterial harmful algal blooms in near real-time [184]. Using satellite-based remote sensing data and in situ observations, models such as CNN and long short-term memory networks detect and forecast bloom dynamics, providing timely alerts for water resource management and environmental protection. This ability to combine multiple data sources and generate actionable insights exemplifies the power of ML in addressing pressing environmental challenges. Beyond algorithmic innovation, advances in computational infrastructure have dramatically enhanced the feasibility of ML applications. Cloud-based platforms, such as Google Earth Engine, and distributed computing systems have significantly reduced the computational costs associated with processing large-scale satellite data, enabling a broader range of research institutions to adopt these technologies. Additionally, hardware innovations such as tensor processing units (TPUs) and neural network accelerators have improved the speed and efficiency of training ML models. These developments pave the way for more responsive monitoring systems, allowing researchers to address challenges such as detecting transient oceanographic phenomena or generating high-resolution spatial predictions.

5.2.2. Leveraging Open Data and Interdisciplinary Collaboration

The availability of high-quality, accessible data is fundamental to the success of ML in ocean color remote sensing. Open data initiatives such as the Ocean Color Climate Change Initiative [171,172], Tara Oceans [185,186], and Argo Program [187,188] play pivotal roles in providing high-quality, freely accessible datasets for the research community. In addition to these global efforts, platforms such as the Open Science Framework and Dryad [189] further ensure that these datasets adhere to FAIR (findable, accessible, interoperable, and reusable) principles, facilitating their integration into ML workflows. Moreover, initiatives such as EcoDataScience and Openscapes [190] promote inclusive and transparent data science practices, building capacity for collaborative research in marine environments. By investing further in the collection, curation, and sharing of both satellite imagery and in situ observations, these efforts not only enhance the quality of available data but also foster innovation in ML applications, enabling models to generate more robust and reliable predictions.

Interdisciplinary collaboration is equally critical in ensuring that ML models are scientifically rigorous and operationally effective. Oceanographers provide essential domain knowledge, guiding the selection of meaningful features and ensuring that predictions align with physical and biological processes. Remote sensing specialists ensure that input data are accurately calibrated, whereas data scientists optimize ML algorithms to handle high-dimensional datasets efficiently. By integrating these disciplines, collaborative teams can address key challenges such as data heterogeneity, overfitting, and generalization. Moreover, these collaborations facilitate the development of hybrid models (discussed further in Section 5.2.3) and the adoption of interpretability tools such as SHAP, which enhance model transparency by quantifying the contributions of individual input variables. To maximize the potential of these partnerships, initiatives such as joint training programs, interdisciplinary workshops, and shared research platforms should be promoted. These efforts will help bridge knowledge gaps and foster innovation across disciplines.

5.2.3. The Use of Hybrid Models for Scientific and Practical Applications

Data-driven scientific exploration should not overlook the importance of understanding the underlying systems. While ML excels at making predictions, it is important to complement these predictions with inferences about the underlying mechanisms. Hybrid modeling represents a powerful tool to achieve this balance by combining the strengths of data-driven ML techniques with the mechanistic understanding provided by physical models. By embedding physical constraints into ML architectures, hybrid models ensure that predictions are not only accurate but also consistent with established scientific principles. For example, radiative transfer models can be incorporated into neural networks to constrain outputs, reducing the likelihood of spurious correlations and improving generalizability. Several studies have demonstrated the benefits of hybrid modeling in ocean color remote sensing. For instance, incorporating physical feature information into models can increase their accuracy and generalizability, as illustrated in Figure 4 [21,191]. This integration is particularly valuable in scenarios where observational data are sparse or when models must extrapolate beyond the range of the training dataset. By incorporating physical feature information, such as light attenuation or nutrient transport, hybrid models enhance not only predictive power but also model robustness under varying environmental conditions. By revealing which variables drive predictions, SHAP can identify potential biases or limitations in the data and ensure that models are interpretable and aligned with real-world processes. The combination of physical consistency, interpretability, and improved generalizability makes hybrid modeling a transformative approach for advancing Earth system science and operational oceanographic monitoring. By bridging the gap between data-driven exploration and mechanistic understanding, hybrid models can provide actionable insights while retaining the transparency needed for scientific rigor and practical decision-making.

Another way to enhance the embedding of physical processes with AI models is through the use of PINNs [182]. PINNs combine the power of machine learning with the constraints imposed by physical laws, ensuring that the models’ predictions remain consistent with known physical processes such as light absorption, scattering, and the behavior of ocean constituents. This integration not only enhances the interpretability of AI models but also improves their generalization capability across different environments, reducing the risk of overfitting and making the models more robust [192]. Moving forward, effective integration of AI with physical principles will involve developing hybrid models where physical laws are incorporated into the loss function of AI algorithms, guiding them toward physically plausible solutions. This approach facilitates the training of AI models that are interpretable and better aligned with real-world oceanographic phenomena. The advancement of PINNs in this context will drive the future of ocean color remote sensing by enhancing both the transparency of AI models and their ability to generalize when exposed to new, unseen data. This development will be crucial for the reliable interpretation and application of remote sensing data in diverse ocean environments.

5.2.4. Building Capacity for Uncertainty-Aware Models

Uncertainty quantification is a critical component of deploying ML in complex and dynamic marine environments. Reliable uncertainty estimates help researchers assess the confidence of model predictions and guide decision-making in applications such as resource management or environmental risk assessment. To improve the performance of models in complex and uncertain environments, several key technologies can be considered: (1) MC-Dropout is a method that introduces randomness during the model training process. By activating dropout layers during the training phase, the model can consider multiple different network structures during inference, thereby providing uncertainty estimates [193,194]. This method has been proven to effectively improve the model’s generalization ability on small sample data and has the potential to estimate uncertainty in ocean remote sensing data. (2) Deep ensemble improves the overall model performance and uncertainty estimation by training multiple deep learning models and integrating their predictions [195,196]. Each model is exposed to slightly different data during training, offering a more comprehensive perspective on prediction uncertainty. This approach is especially beneficial given the variability and complexity inherent in ocean remote sensing data. (3) Pixelwise uncertainty estimation is crucial for generating robust and reliable inversion products. Advanced statistical methods, such as bootstrap methods and Bayesian approaches, have been adopted to assess the uncertainty of each predicted pixel. These methods can provide confidence intervals for each prediction [197], helping users understand the credibility of model predictions. By introducing these technologies, we have not only improved the model’s ability to assess uncertainty when dealing with ocean remote sensing data but also enhanced the model’s generalization ability and the reliability of inversion products. The application of these methods enables the model to provide more accurate and reliable prediction results when facing complex and uncertain marine environments.

5.2.5. Data Fusion and Integration from Multiple Platforms

In ocean color remote sensing, integrating data from multiple platforms, such as satellites, in situ sensors, and aerial imagery, enhances the accuracy and robustness of models. Combining these diverse data types provides a more comprehensive understanding of environmental conditions, improving monitoring of ocean health and water quality [44]. However, the integration of data from different sources presents challenges, such as differences in spatial resolution, temporal coverage, and sensor characteristics. For instance, satellite imagery offers global coverage with limited temporal resolution, whereas in situ sensors provide high-frequency, localized measurements but lack broad spatial reach. Aerial imagery offers high spatial resolution but limited coverage. To overcome these challenges, advanced data fusion techniques—such as multisource remote sensing fusion, spatiotemporal interpolation, and sensor calibration—are applied to harmonize datasets. These methods ensure consistent and reliable combined datasets, enhance model robustness and support more effective monitoring. By combining satellite data with in situ measurements, models can improve estimates of parameters such as chlorophyll-a concentration or turbidity. Aerial imagery can further increase model accuracy, particularly for localized phenomena such as algal blooms or coastal changes.

6. Conclusions

The integration of ML into ocean color remote sensing represents a significant advancement in our ability to monitor, analyze, and understand the marine environment. ML has substantially contributed to this field by enhancing the accuracy of atmospheric correction and bio-optical property retrieval, providing actionable insights into marine ecosystem dynamics, and assessing the impacts of climate change. The transformative potential of ML is evident in its ability to process vast datasets, improve spatial and temporal resolution, and fill data gaps, thereby enabling a more comprehensive and nuanced understanding of the role of the ocean in global climate and biogeochemical cycles. Despite challenges such as data quality issues, model generalizability, and computational complexity, the opportunities presented by ML are considerable. The ongoing development of new algorithms, increased accessibility to computational resources, and the potential for interdisciplinary collaboration promise to further enhance the capabilities of ocean color remote sensing. The fusion of ML with ocean color remote sensing is poised to play a pivotal role in advancing oceanographic research and environmental monitoring. Continued innovation and the application of ML techniques will enable us to fully exploit the potential of satellite observations, leading to more accurate, efficient, and insightful analyses of our planet’s vast and vital marine ecosystems.

Author Contributions

Conceptualization, P.C.; writing—original draft, Z.Z.; writing—review and editing, P.C., S.Z., H.H., Y.P. and D.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation (42322606, 42276180, W2521002), Zhejiang Provincial Natural Science Foundation (LZ25D060001), China Postdoctoral Science Foundation (2023M740809), National Key Research and Development Program of China (2022YFB3901703; 2022YFB3902603), and Key Special Project for Introduced Talents Team of Southern Marine Science and Engineering Guangdong Laboratory (GML2021GD0809), Donghai Laboratory Preresearch project (DH2022ZY0003).

Acknowledgments

We thank the reviewers for their suggestions, which significantly improved the presentation of the paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Dai, Y.; Yang, S.; Zhao, D.; Hu, C.; Xu, W.; Anderson, D.; Li, Y.; Song, X.-P.; Boyce, D.; Gibson, L.; et al. Coastal phytoplankton blooms expand and intensify in the 21st century. Nature 2023. [Google Scholar] [CrossRef]
Ferreira, A.; Mendes, C.R.B.; Costa, R.R.; Brotas, V.; Tavano, V.M.; Guerreiro, C.V.; Secchi, E.R.; Brito, A.C. Climate change is associated with higher phytoplankton biomass and longer blooms in the West Antarctic Peninsula. Nat. Commun. 2024, 15, 6536. [Google Scholar] [CrossRef]
Lewis, K.M.; van Dijken, G.L.; Arrigo, K.R. Changes in phytoplankton concentration now drive increased Arctic Ocean primary production. Science 2020, 369, 198–202. [Google Scholar] [CrossRef]
Westberry, T.K.; Silsbe, G.M.; Behrenfeld, M.J. Gross and net primary production in the global ocean: An ocean color remote sensing perspective. Earth-Sci. Rev. 2023, 237, 104322. [Google Scholar] [CrossRef]
Brewin, R.J.W.; Sathyendranath, S.; Kulk, G.; Rio, M.-H.; Concha, J.A.; Bell, T.G.; Bracher, A.; Fichot, C.; Frölicher, T.L.; Galí, M.; et al. Ocean carbon from space: Current status and priorities for the next decade. Earth-Sci. Rev. 2023, 240, 104386. [Google Scholar] [CrossRef]
Balch, W.M.; Mitchell, C. Remote sensing algorithms for particulate inorganic carbon (PIC) and the global cycle of PIC. Earth-Sci. Rev. 2023, 239, 104363. [Google Scholar] [CrossRef]
Hopkins, J.; Henson, S.A.; Poulton, A.J.; Balch, W.M. Regional Characteristics of the Temporal Variability in the Global Particulate Inorganic Carbon Inventory. Glob. Biogeochem. Cycles 2019, 33, 1328–1338. [Google Scholar] [CrossRef]
Cael, B.B.; Bisson, K.; Boss, E.; Dutkiewicz, S.; Henson, S. Global climate-change trends detected in indicators of ocean ecology. Nature 2023, 619, 551–554. [Google Scholar] [CrossRef]
Dutkiewicz, S.; Hickman, A.E.; Jahn, O.; Henson, S.; Beaulieu, C.; Monier, E. Ocean colour signature of climate change. Nat. Commun. 2019, 10, 578. [Google Scholar] [CrossRef]
Li, X.; Yang, Y.; Ishizaka, J.; Li, X. Global estimation of phytoplankton pigment concentrations from satellite data using a deep-learning-based model. Remote Sens. Environ. 2023, 294, 113628. [Google Scholar] [CrossRef]
Kolluru, S.; Gedam, S.S.; Inamdar, A.B. A machine learning approach for deriving spectral absorption coefficients of optically active oceanic constituents. Comput. Geosci. 2021, 155, 104879. [Google Scholar] [CrossRef]
Laruelle, G.G.; Landschützer, P.; Gruber, N.; Tison, J.L.; Delille, B.; Regnier, P. Global high-resolution monthly pCO₂ climatology for the coastal ocean derived from neural network interpolation. Biogeosciences 2017, 14, 4545–4561. [Google Scholar] [CrossRef]
Camps-Valls, G.; Martinez, E.; Fablet, R.; Jamet, C. Editorial: AI and remote sensing in ocean sciences. Front. Mar. Sci. 2024, 10, 1248591. [Google Scholar] [CrossRef]
Niu, J.; Feng, Z.; He, M.; Xie, M.; Lv, Y.; Zhang, J.; Sun, L.; Liu, Q.; Hu, B.X. Incorporating marine particulate carbon into machine learning for accurate estimation of coastal chlorophyll-a. Mar. Pollut. Bull. 2023, 192, 115089. [Google Scholar] [CrossRef] [PubMed]
Wang, Y.; Li, X.; Song, J.; Li, X.; Zhong, G.; Zhang, B. Reconstruction of pCO₂ Data in the Southern Ocean Based on Feedforward Neural Network. Springer: Singapore, 2023; pp. 189–208. [Google Scholar]
Zhang, S.; Chen, P.; Zhang, Z.; Pan, D. Carbon Air–Sea Flux in the Arctic Ocean from CALIPSO from 2007 to 2020. Remote Sens. 2022, 14, 6196. [Google Scholar] [CrossRef]
Schroeder, T.; Schaale, M.; Lovell, J.; Blondeau-Patissier, D. An ensemble neural network atmospheric correction for Sentinel-3 OLCI over coastal waters providing inherent model uncertainty estimation and sensor noise propagation. Remote Sens. Environ. 2022, 270, 112848. [Google Scholar] [CrossRef]
He, X.; Pan, T.; Bai, Y.; Shanmugam, P.; Wang, D.; Li, T.; Gong, F. Intelligent Atmospheric Correction Algorithm for Polarization Ocean Color Satellite Measurements Over the Open Ocean. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–22. [Google Scholar] [CrossRef]
Song, Z.; He, X.; Bai, Y.; Dong, X.; Wang, D.; Li, T.; Zhu, Q.; Gong, F. Atmospheric correction of absorbing aerosols for satellite ocean color remote sensing over coastal waters. Remote Sens. Environ. 2023, 290, 113552. [Google Scholar] [CrossRef]
Friedrich, T.; Oschlies, A. Neural network-based estimates of North Atlantic surface pCO₂ from satellite data: A methodological study. J. Geophys. Res. Ocean. 2009, 114, C03020. [Google Scholar] [CrossRef]
Zhang, Z.; Zhang, S.; Behrenfeld, M.J.; Chen, P.; Jamet, C.; Di Girolamo, P.; Dionisi, D.; Hu, Y.; Lu, X.; Pan, Y.; et al. Combining deep learning with physical parameters in POC and PIC inversion from spaceborne lidar CALIOP. ISPRS J. Photogramm. Remote Sens. 2024, 212, 193–211. [Google Scholar] [CrossRef]
Zhang, Z.; Chen, P.; Jamet, C.; Dionisi, D.; Hu, Y.; Lu, X.; Pan, D. Retrieving bbp and POC from CALIOP: A deep neural network approach. Remote Sens. Environ. 2023, 287, 113482. [Google Scholar] [CrossRef]
Chen, P.; Li, Y.; Zhao, Z.; Zhang, S.; Zhang, Z.; Wang, J.; Pan, D. Current Applications of Ocean Color Remote Sensing Machine Learning Models and Prospects for Calibration, Validation. Oceanol. Limnol. Sin. 2025, 56, 3–24. [Google Scholar]
Fan, Y.; Li, W.; Gatebe, C.K.; Jamet, C.; Zibordi, G.; Schroeder, T.; Stamnes, K. Atmospheric correction over coastal waters using multilayer neural networks. Remote Sens. Environ. 2017, 199, 218–240. [Google Scholar] [CrossRef]
Emberton, S.; Chittka, L.; Cavallaro, A.; Wang, M. Sensor Capability and Atmospheric Correction in Ocean Colour Remote Sensing. Remote Sens. 2016, 8, 1. [Google Scholar] [CrossRef]
Gons, H.J.; Rijkeboer, M.; Ruddick, K.G. A chlorophyll-retrieval algorithm for satellite imagery (Medium Resolution Imaging Spectrometer) of inland and coastal waters. J. Plankton Res. 2002, 24, 947–951. [Google Scholar] [CrossRef]
Gordon, H.R.; Boynton, G.C.; Balch, W.M.; Groom, S.B.; Harbour, D.S.; Smyth, T.J. Retrieval of coccolithophore calcite concentration from SeaWiFS Imagery. Geophys. Res. Lett. 2001, 28, 1587–1590. [Google Scholar] [CrossRef]
Stramski, D.; Reynolds, R.A.; Kahru, M.; Mitchell, B.G. Estimation of Particulate Organic Carbon in the Ocean from Satellite Remote Sensing. Science 1999, 285, 239–242. [Google Scholar] [CrossRef]
Le, C.; Li, Y.; Zha, Y.; Sun, D.; Huang, C.; Lu, H. A four-band semi-analytical model for estimating chlorophyll a in highly turbid lakes: The case of Taihu Lake, China. Remote Sens. Environ. 2009, 113, 1175–1182. [Google Scholar] [CrossRef]
Blix, K.; Eltoft, T. Machine Learning Automatic Model Selection Algorithm for Oceanic Chlorophyll-a Content Retrieval. Remote Sens. 2018, 10, 775. [Google Scholar] [CrossRef]
Zhang, X.; Friedl, M.A.; Schaaf, C.B. Sensitivity of vegetation phenology detection to the temporal resolution of satellite data. Int. J. Remote Sens. 2009, 30, 2061–2074. [Google Scholar] [CrossRef]
Gong, J.; Xiao, Y.; Cai, X.; Mu, B.; Qin, P.; Liu, R.; Cui, T. Impact of the Spatial Resolution of Satellite Image on the Remote Sensing Monitoring of Green Macroalgae Bloom. Acta Laser Biol. Sin. 2014, 6, 579–584. [Google Scholar] [CrossRef]
Siegel, D.A.; Maritorena, S.; Nelson, N.B.; Behrenfeld, M.J. Independence and interdependencies among global ocean color properties: Reassessing the bio-optical assumption. J. Geophys. Res. Ocean. 2005, 110, C07011. [Google Scholar] [CrossRef]
Dierssen, H.M.; Smith, R.C. Bio-optical properties and remote sensing ocean color algorithms for Antarctic Peninsula waters. J. Geophys. Res. Ocean. 2000, 105, 26301–26312. [Google Scholar] [CrossRef]
Park, J.; Kim, J.-H.; Kim, H.-c.; Kim, B.-K.; Bae, D.; Jo, Y.-H.; Jo, N.; Lee, S.H. Reconstruction of Ocean Color Data Using Machine Learning Techniques in Polar Regions: Focusing on Off Cape Hallett, Ross Sea. Remote Sens. 2019, 11, 1366. [Google Scholar] [CrossRef]
Ahmed, E.-H.; Sam, A.; Michael, E.O.; Vincent, L. Analyses of satellite ocean color retrievals show advantage of neural network approaches and algorithms that avoid deep blue bands. J. Appl. Remote Sens. 2019, 13, 024509. [Google Scholar] [CrossRef]
Aguilera, H.; Guardiola-Albert, C.; Serrano-Hidalgo, C. Estimating extremely large amounts of missing precipitation data. J. Hydroinform. 2020, 22, 578–592. [Google Scholar] [CrossRef]
Zhou, L.; Gao, C.; Zhang, R.-H. A spatiotemporal 3D convolutional neural network model for ENSO predictions: A test case for the 2020/21 La Niña conditions. Atmos. Ocean. Sci. Lett. 2023, 16, 100330. [Google Scholar] [CrossRef]
Zhao, N.; Huang, B.; Zhang, X.; Ge, L.; Chen, G. Intelligent identification of oceanic eddies in remote sensing data via Dual-Pyramid UNet. Atmos. Ocean. Sci. Lett. 2023, 16, 100335. [Google Scholar] [CrossRef]
Ouyang, L.; Ling, F.; Li, Y.; Bai, L.; Luo, J.-J. Wave forecast in the Atlantic Ocean using a double-stage ConvLSTM network. Atmos. Ocean. Sci. Lett. 2023, 16, 100347. [Google Scholar] [CrossRef]
Liu, Z.; Zhou, W.; Yuan, Y. 3D DBSCAN detection and parameter sensitivity of the 2022 Yangtze river summertime heatwave and drought. Atmos. Ocean. Sci. Lett. 2023, 16, 100324. [Google Scholar] [CrossRef]
Frouin, R.; Pelletier, B. Bayesian methodology for inverting satellite ocean-color data. Remote Sens. Environ. 2015, 159, 332–360. [Google Scholar] [CrossRef]
Amani, M.; Moghimi, A.; Mirmazloumi, S.M.; Ranjgar, B.; Ghorbanian, A.; Ojaghi, S.; Ebrahimy, H.; Naboureh, A.; Nazari, M.E.; Mahdavi, S.; et al. Ocean Remote Sensing Techniques and Applications: A Review (Part I). Water 2022, 14, 3400. [Google Scholar] [CrossRef]
Hong, Z.; Long, D.; Li, X.; Wang, Y.; Zhang, J.; Hamouda, M.A.; Mohamed, M.M. A global daily gap-filled chlorophyll-a dataset in open oceans during 2001–2021 from multisource information using convolutional neural networks. Earth Syst. Sci. Data 2023, 15, 5281–5300. [Google Scholar] [CrossRef]
Stemina, S.; Raja, B. A Review of Machine Learning and It’s Method. Int. J. Emerg. Technol. Innov. Eng. 2019, 5, 1–7. [Google Scholar]
Khan, Z. An insight on machine learning algorithms and its applications. Eur. Chem. Bull. 2023, 12, 6029–6034. [Google Scholar] [CrossRef]
Lawatre, P.; Muzzammil, M.; Hingal, R. An efficient data pre-processing model for machine learning. Int. J. Adv. Res. Ideas Innov. Technol. 2021, 7, 1612–1614. [Google Scholar]
Kang, M.; Tian, J. Machine Learning: Data Pre-processing. In Prognostics and Health Management of Electronics; John Wiley and Sons Ltd.: Hoboken, NJ, USA, 2018; pp. 111–130. [Google Scholar]
Guruvayur, S.R.; Suchithra, R. A detailed study on machine learning techniques for data mining. In Proceedings of the 2017 International Conference on Trends in Electronics and Informatics (ICEI), Tirunelveli, India, 11–12 May 2017; pp. 1187–1192. [Google Scholar]
Priya, M. Guide on Way to approach a Machine Learning problem. In Proceedings of the Big Data Analytics 2020, Dubai, United Arab Emirates, 14 October 2020. [Google Scholar]
Chatzilygeroudis, K.; Hatzilygeroudis, I.; Perikos, I. Machine Learning Basics. In Intelligent Computing for Interactive System Design: Statistics, Digital Signal Processing, and Machine Learning in Practice; Association for Computing Machinery: New York, NY, USA, 2021; Volume 34, pp. 143–193. [Google Scholar]
Chegoonian, A.M.; Pahlevan, N.; Zolfaghari, K.; Leavitt, P.; Davies, J.-M.; Baulch, H.; Duguay, C. Comparative Analysis of Empirical and Machine Learning Models for Chl a Extraction Using Sentinel-2 and Landsat OLI Data: Opportunities, Limitations, and Challenges. Can. J. Remote Sens. 2023, 49, 215333. [Google Scholar] [CrossRef]
Wattelez, G.; Dupouy, C.; Mangeas, M.; Lefèvre, J.; Touraivane, T.; Frouin, R. A Statistical Algorithm for Estimating Chlorophyll Concentration from MODIS Data; SPIE: Bellingham, WA, USA, 2014; Volume 9261. [Google Scholar]
Boissieu, F.d.; Menkes, C.; Dupouy, C.; Rodier, M.; Bonnet, S.; Mangeas, M.; Frouin, R.J. Phytoplankton global mapping from space with a support vector machine algorithm. In Proceedings of the SPIE Asia-Pacific Remote Sensing, Beijing, China, 13–16 October 2014; p. 92611R. [Google Scholar]
Zhang, Z.; Chen, P.; Zhong, C.; Xie, C.; Sun, M.; Zhang, S.; Chen, S.; Wu, D. Chlorophyll and POC in polar regions derived from spaceborne lidar. Front. Mar. Sci. 2023, 10, 1050087. [Google Scholar] [CrossRef]
Corcoran, F.; Parrish, C.E. Diffuse Attenuation Coefficient (Kd) from ICESat-2 ATLAS Spaceborne Lidar Using Random-Forest Regression. Photogramm. Eng. Remote Sens. 2021, 87, 831–840. [Google Scholar] [CrossRef]
Park, J.; Kim, H.C.; Bae, D.; Jo, Y.H. Data Reconstruction for Remotely Sensed Chlorophyll-a Concentration in the Ross Sea Using Ensemble-Based Machine Learning. Remote Sens. 2020, 12, 1898. [Google Scholar] [CrossRef]
Cui, Z.; Du, D.; Zhang, X.; Yang, Q. Modeling and Prediction of Environmental Factors and Chlorophyll a Abundance by Machine Learning Based on Tara Oceans Data. J. Mar. Sci. Eng. 2022, 10, 1749. [Google Scholar] [CrossRef]
Zhang, Y.; Shen, F.; Sun, X.; Tan, K. Marine big data-driven ensemble learning for estimating global phytoplankton group composition over two decades (1997–2020). Remote Sens. Environ. 2023, 294, 113596. [Google Scholar] [CrossRef]
Cao, Z.; Ma, R.; Duan, H.; Pahlevan, N.; Melack, J.; Shen, M.; Xue, K. A machine learning approach to estimate chlorophyll-a from Landsat-8 measurements in inland lakes. Remote Sens. Environ. 2020, 248, 111974. [Google Scholar] [CrossRef]
Gossn, J.; Frouin, R.; Dogliotti, A. Atmospheric Correction of Satellite Optical Imagery over the Río de la Plata Highly Turbid Waters Using a SWIR-Based Principal Component Decomposition Technique. Remote Sens. 2021, 13, 1050. [Google Scholar] [CrossRef]
Gross-Colzy, L.; Colzy, S.; Frouin, R.; Henry, P. A general ocean color atmospheric correction scheme based on principal components analysis: Part I. Performance on Case 1 and Case 2 waters. In Proceedings of the Optical Engineering + Applications, San Diego, CA, USA, 26–30 August 2007; p. 668002. [Google Scholar]
Kolluru, S.; Tiwari, S.P. Modeling ocean surface chlorophyll-a concentration from ocean color remote sensing reflectance in global waters using machine learning. Sci. Total Environ. 2022, 844, 157191. [Google Scholar] [CrossRef]
Denvil-Sommer, A.; Gehlen, M.; Vrac, M.; Mejia, C. ffnn-lsce: A two-step neural network model for the reconstruction of surface ocean pco2 over the global ocean. Geosci. Model Dev. 2019, 12, 2091–2105. [Google Scholar] [CrossRef]
Jo, Y.H.; Dai, M.; Zhai, W.; Yan, X.H.; Shang, S. On the variations of sea surface pCO₂ in the northern South China Sea: A remote sensing based neural network approach. J. Geophys. Res. Ocean. 2012, 117, C08022. [Google Scholar] [CrossRef]
Heinemann, T.; Fischer, J. Simultaneous Retrieval of Oceanic and Atmospheric Properties Using Satellite Remote Sensing Measurements; SPIE: Bellingham, WA, USA, 1997; Volume 2963. [Google Scholar]
Aria, M.; Cuccurullo, C. bibliometrix: An R-tool for comprehensive science mapping analysis. J. Informetr. 2017, 11, 959–975. [Google Scholar] [CrossRef]
Gross, L.; Thiria, S.; Frouin, R. Applying artificial neural network methodology to ocean color remote sensing. Ecol. Model. 1999, 120, 237–246. [Google Scholar] [CrossRef]
Gross, L.; Frouin, R.; Dupouy, C.; André, J.M.; Thiria, S. Reducing variability that is due to secondary pigments in the retrieval of chlorophyll a concentration from marine reflectance: A case study in the western equatorial Pacific Ocean. Appl. Opt. 2004, 43, 4041–4054. [Google Scholar] [CrossRef]
Jamet, C.; Loisel, H.; Dessailly, D. Retrieval of the spectral diffuse attenuation coefficient K(λ) in open and coastal ocean waters using a neural network inversion. J. Geophys. Res. Ocean. 2012, 117, C10023. [Google Scholar] [CrossRef]
Ioannou, I.; Gilerson, A.; Gross, B.; Moshary, F.; Ahmed, S. Deriving ocean color products using neural networks. Remote Sens. Environ. 2013, 134, 78–91. [Google Scholar] [CrossRef]
Li, H.; He, X.; Bai, Y.; Shanmugam, P.; Park, Y.-J.; Liu, J.; Zhu, Q.; Gong, F.; Wang, D.; Huang, H. Atmospheric correction of geostationary satellite ocean color data under high solar zenith angles in open oceans. Remote Sens. Environ. 2020, 249, 112022. [Google Scholar] [CrossRef]
Pahlevan, N.; Smith, B.; Schalles, J.; Binding, C.; Cao, Z.; Ma, R.; Alikas, K.; Kangro, K.; Gurlin, D.; Hà, N.; et al. Seamless retrievals of chlorophyll-a from Sentinel-2 (MSI) and Sentinel-3 (OLCI) in inland and coastal waters: A machine-learning approach. Remote Sens. Environ. 2020, 240, 111604. [Google Scholar] [CrossRef]
Zhao, X.; Ma, Y.; Xiao, Y.; Liu, J.; Ding, J.; Ye, X.; Liu, R. Atmospheric correction algorithm based on deep learning with spatial-spectral feature constraints for broadband optical satellites: Examples from the HY-1C Coastal Zone Imager. ISPRS J. Photogramm. Remote Sens. 2023, 205, 147–162. [Google Scholar] [CrossRef]
Durkee, P.A.; Jensen, D.R.; Hindman, E.E.; Haar, T.H.V. The relationship between marine aerosol particles and satellite-detected radiance. J. Geophys. Res. Atmos. 1986, 91, 4063–4072. [Google Scholar] [CrossRef]
Allam, M.; Meng, Q.; Elhag, M.; Giardino, C.; Ghirardi, N.; Su, Y.; Al-Hababi, M.A.M.; Menenti, M. Atmospheric Correction Algorithms Assessment for Sentinel-2A Imagery over Inland Waters of China: Case Study, Qiandao Lake. Earth Syst. Environ. 2024, 8, 105–119. [Google Scholar] [CrossRef]
Cuartero, A.; Cáceres-Merino, J.; Torrecilla-Pinero, J.A. An application of C2-Net atmospheric corrections for chlorophyll-a estimation in small reservoirs. Remote Sens. Appl. Soc. Environ. 2023, 32, 101021. [Google Scholar] [CrossRef]
Men, J.; Feng, L.; Chen, X.; Tian, L. Atmospheric correction under cloud edge effects for Geostationary Ocean Color Imager through deep learning. ISPRS J. Photogramm. Remote Sens. 2023, 201, 38–53. [Google Scholar] [CrossRef]
Shen, M.; Luo, J.; Cao, Z.; Xue, K.; Qi, T.; Ma, J.; Liu, D.; Song, K.; Feng, L.; Duan, H. Random forest: An optimal chlorophyll-a algorithm for optically complex inland water suffering atmospheric correction uncertainties. J. Hydrol. 2022, 615, 128685. [Google Scholar] [CrossRef]
Zhou, Q.; Wang, S.; Liu, N.; Townsend, P.A.; Jiang, C.; Peng, B.; Verhoef, W.; Guan, K. Towards operational atmospheric correction of airborne hyperspectral imaging spectroscopy: Algorithm evaluation, key parameter analysis, and machine learning emulators. ISPRS J. Photogramm. Remote Sens. 2023, 196, 386–401. [Google Scholar] [CrossRef]
Gao, M.; Franz, B.A.; Knobelspiesse, K.; Zhai, P.W.; Martins, V.; Burton, S.; Cairns, B.; Ferrare, R.; Gales, J.; Hasekamp, O.; et al. Efficient multi-angle polarimetric inversion of aerosols and ocean color powered by a deep neural network forward model. Atmos. Meas. Tech. 2021, 14, 4083–4110. [Google Scholar] [CrossRef]
Wang, J.; Wang, Y.; Lee, Z.; Wang, D.; Chen, S.; Lai, W. A revision of NASA SeaDAS atmospheric correction algorithm over turbid waters with artificial Neural Networks estimated remote-sensing reflectance in the near-infrared. ISPRS J. Photogramm. Remote Sens. 2022, 194, 235–249. [Google Scholar] [CrossRef]
Sun, J.; Xu, F.; Cervone, G.; Gervais, M.; Wauthier, C.; Salvador, M. Automatic atmospheric correction for shortwave hyperspectral remote sensing data using a time-dependent deep neural network. ISPRS J. Photogramm. Remote Sens. 2021, 174, 117–131. [Google Scholar] [CrossRef]
Brockmann, C.; Doerffer, R.; Peters, M.; Kerstin, S.; Embacher, S.; Ruescas, A. Evolution of the C2RCC Neural Network for Sentinel 2 and 3 for the Retrieval of Ocean Colour Products in Normal and Extreme Optically Complex Waters. In Proceedings of the Living Planet Symposium 2016, Prague, Czech Republic, 9–13 May 2016; Volume 740. [Google Scholar]
Rusia, P.; Bhateja, Y.; Misra, I.; Moorthi, S.M.; Dhar, D. An Efficient Machine Learning Approach for Atmospheric Correction. J. Indian Soc. Remote Sens. 2021, 49, 2539–2548. [Google Scholar] [CrossRef]
Fan, Y.; Li, W.; Chen, N.; Ahn, J.-H.; Park, Y.-J.; Kratzer, S.; Schroeder, T.; Ishizaka, J.; Chang, R.; Stamnes, K. OC-SMART: A machine learning based data analysis platform for satellite ocean color sensors. Remote Sens. Environ. 2021, 253, 112236. [Google Scholar] [CrossRef]
Liu, G.; Li, L.; Song, K.; Li, Y.; Lyu, H.; Wen, Z.; Fang, C.; Bi, S.; Sun, X.; Wang, Z. An OLCI-based algorithm for semi-empirically partitioning absorption coefficient and estimating chlorophyll a concentration in various turbid case-2 waters. Remote Sens. Environ. 2020, 239, 111648. [Google Scholar] [CrossRef]
Xue, K.; Ma, R.; Duan, H.; Shen, M.; Boss, E.; Cao, Z. Inversion of inherent optical properties in optically complex waters using sentinel-3A/OLCI images: A case study using China’s three largest freshwater lakes. Remote Sens. Environ. 2019, 225, 328–346. [Google Scholar] [CrossRef]
Simis, S.G.H.; Gons, P.H.J. Remote sensing of the cyanobacterial pigment phycocyanin in turbid inland water. Limnology and Oceanography. Limnol. Oceanogr. 2005, 50, 237–245. [Google Scholar] [CrossRef]
Gower, J.; King, S.; Borstad, G.; Brown, L. Detection of intense plankton blooms using the 709 nm band of the MERIS imaging spectrometer. Int. J. Remote Sens. 2005, 26, 2005–2012. [Google Scholar] [CrossRef]
Dall’Olmo, G.; Gitelson, A.A.; Rundquist, D.C.; Leavitt, B.; Barrow, T.; Holz, J.C. Assessing the potential of SeaWiFS and MODIS for estimating chlorophyll concentration in turbid productive waters using red and near-infrared bands. Remote Sens. Environ. 2005, 96, 176–187. [Google Scholar] [CrossRef]
Lee, Z.; Carder, K.L.; Arnone, R.A. Deriving inherent optical properties from water color: A multiband quasi-analytical algorithm for optically deep waters. Appl. Opt. 2002, 41, 5755–5772. [Google Scholar] [CrossRef] [PubMed]
Dekker, A.G. Detection of Optical Water Quality Parameters for Eutrophic Waters by High Resolution Remote Sensing. Ph.D. Thesis, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands, 1993. [Google Scholar]
Liu, X.; Steele, C.; Simis, S.; Warren, M.; Tyler, A.; Spyrakos, E.; Selmes, N.; Hunter, P. Retrieval of Chlorophyll-a concentration and associated product uncertainty in optically diverse lakes and reservoirs. Remote Sens. Environ. 2021, 267, 112710. [Google Scholar] [CrossRef]
Neil, C.; Spyrakos, E.; Hunter, P.D.; Tyler, A.N. A global approach for chlorophyll-a retrieval across optically complex inland waters based on optical water types. Remote Sens. Environ. Interdiscip. J. 2019, 229, 159–178. [Google Scholar] [CrossRef]
Bi, S.; Li, Y.; Xu, J.; Liu, G.; Xu, J. Optical classification of inland waters based on an improved Fuzzy C-Means method. Opt. Express 2019, 27, 34838. [Google Scholar] [CrossRef] [PubMed]
Schiller, H.; Doerffer, R. Neural network for emulation of an inverse model operational derivation of Case II water properties from MERIS data. Int. J. Remote Sens. 1999, 20, 1735–1746. [Google Scholar] [CrossRef]
Tanaka, A.; Kishino, M.; Oishi, T.; Doerffer, R.; Schiller, H. Application of the Neural Network Method to Case II Water. In Remote Sensing of Oceanand Sea Ice 2000, Proceedings of the Europto Remote Sensing Barcelona, Spain, 25–29 September 2000; SPIE: Bellingham, WA USA; Volume 4172. [CrossRef]
Cao, Z.; Ma, R.; Pahlevan, N.; Liu, M.; Melack, J.; Duan, H.; Xue, K.; Shen, M. Evaluating and Optimizing VIIRS Retrievals of Chlorophyll-a and Suspended Particulate Matter in Turbid Lakes Using a Machine Learning Approach. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4211417. [Google Scholar] [CrossRef]
Gao, M.; Franz, B.; Zhai, P.; Knobelspiesse, K.; Sayer, A.; Xu, X.; Martins, V.; Cairns, B.; Castellanos, P.; Fu, G.; et al. Simultaneous retrieval of aerosol and ocean properties from PACE HARP2 with uncertainty assessment using cascading neural network radiative transfer models. Atmos. Meas. Tech. 2023, 16, 5863–5881. [Google Scholar] [CrossRef]
Pahlevan, N.; Smith, B.; Binding, C.; Gurlin, D.; Li, L.; Bresciani, M.; Giardino, C. Hyperspectral retrievals of phytoplankton absorption and chlorophyll-a in inland and nearshore coastal waters. Remote Sens. Environ. 2020, 253, 112200. [Google Scholar] [CrossRef]
Smith, B.; Pahlevan, N.; Schalles, J.; Ruberg, S.; Errera, R.; Ma, R.; Giardino, C.; Bresciani, M.; Barbosa, C.; Moore, T.; et al. A Chlorophyll-a Algorithm for Landsat-8 Based on Mixture Density Networks. Front. Remote Sens. 2021, 1, 623678. [Google Scholar] [CrossRef]
Pahlevan, N.; Smith, B.; Alikas, K.; Anstee, J.; Barbosa, C.; Binding, C.; Bresciani, M.; Cremella Palmerini, B.; Giardino, C.; Gurlin, D.; et al. Simultaneous retrieval of selected optical water quality indicators from Landsat-8, Sentinel-2, and Sentinel-3. Remote Sens. Environ. 2022, 270, 112860. [Google Scholar] [CrossRef]
Saranathan, A.; Werther, M.; Balasubramanian, S.V.; Odermatt, D.; Pahlevan, N. Assessment of advanced neural networks for the dual estimation of water quality indicators and their uncertainties. Front. Remote Sens. 2024, 5, 1383147. [Google Scholar] [CrossRef]
Gross-Colzy, L.S.; Frouin, R.J. Remote sensing of chlorophyll concentration from space via principal component analysis of atmospheric effects. In Proceedings of the Optical Science and Technology, SPIE’S 48th Annual Meeting, San Diego, CA, USA, 3–8 August 2003; pp. 112–123. [Google Scholar]
Correa, K.; Machu, E.; Brajard, J.; Diouf, D.; Sall, S.M.; Demarcq, H. Adaptation of a Neuro-Variational Algorithm from SeaWiFS to MODIS-Aqua Sensor for the Determination of Atmospheric and Oceanic Variables. Remote Sens. 2023, 15, 3613. [Google Scholar] [CrossRef]
Puissant, A.; El Hourany, R.; Charantonis, A.A.; Bowler, C.; Thiria, S. Inversion of Phytoplankton Pigment Vertical Profiles from Satellite Data Using Machine Learning. Remote Sens. 2021, 13, 1445. [Google Scholar] [CrossRef]
Yala, K.; Niang, N.D.; Brajard, J.; Mejia, C.; Ouattara, M.; El Hourany, R.; Crépon, M.; Thiria, S. Estimation of phytoplankton pigments from ocean-color satellite observations in the Senegalo–Mauritanian region by using an advanced neural classifier. Ocean Sci. 2020, 16, 513–533. [Google Scholar] [CrossRef]
El Hourany, R.; Abboud-Abi Saab, M.; Faour, G.; Mejia, C.; Crépon, M.; Thiria, S. Phytoplankton Diversity in the Mediterranean Sea From Satellite Data Using Self-Organizing Maps. J. Geophys. Res. Ocean. 2019, 124, 5827–5843. [Google Scholar] [CrossRef]
El Hourany, R.; Abboud-Abi Saab, M.; Faour, G.; Aumont, O.; Crépon, M.; Thiria, S. Estimation of Secondary Phytoplankton Pigments From Satellite Observations Using Self-Organizing Maps (SOMs). J. Geophys. Res. Ocean. 2019, 124, 1357–1378. [Google Scholar] [CrossRef]
Charantonis, A.A.; Badran, F.; Thiria, S. Retrieving the evolution of vertical profiles of Chlorophyll-a from satellite observations using Hidden Markov Models and Self-Organizing Topological Maps. Remote Sens. Environ. 2015, 163, 229–239. [Google Scholar] [CrossRef]
Farikou, O.; Sawadogo, S.; Niang, A.; Diouf, D.; Brajard, J.; Mejia, C.; Dandonneau, Y.; Gasc, G.; Crépon, M.; Thiria, S. Inferring the seasonal evolution of phytoplankton groups in the Senegalo-Mauritanian upwelling region from satellite ocean-color spectral measurements. J. Geophys. Res. Ocean. 2015, 120, 6581–6601. [Google Scholar] [CrossRef]
Mobley, C.D. Light and Water: Radiative Transfer in Natural Waters; Academic Press: Cambridge, MA, USA, 1994. [Google Scholar]
Zheng, G.; Stramski, D.; DiGiacomo, P.M. A model for partitioning the light absorption coefficient of natural waters into phytoplankton, nonalgal particulate, and colored dissolved organic components: A case study for the Chesapeake Bay. J. Geophys. Res. Ocean. 2015, 120, 2601–2621. [Google Scholar] [CrossRef]
Chami, M.; Defoin-Platel, M. Sensitivity of the retrieval of the inherent optical properties of marine particles in coastal waters to the directional variations and the polarization of the reflectance. J. Geophys. Res. Ocean. 2007, 112, C05037. [Google Scholar] [CrossRef]
Ibrahim, A.; Harmel, T.; Tonizzo, A.; Ioannou, I.; Gilerson, A.; Ahmed, S. Exploring the Relation Between Polarized Light Fields and Physical-Optical Characteristics of the Ocean for Remote Sensing Applications. In Proceedings of the SPIE Optical Engineering + Applications, San Diego, CA, USA, 21–25 August 2011; Volume 8160, p. 81600H. [Google Scholar]
Kirk, J.T.O. Light and Photosynthesis in Aquatic Ecosystems; Cambridge University Press: Cambridge, UK, 1983. [Google Scholar]
Liu, H.; Li, Q.; Bai, Y.; Yang, C.; Wang, J.; Zhou, Q.; Hu, S.; Shi, T.; Liao, X.; Wu, G. Improving satellite retrieval of oceanic particulate organic carbon concentrations using machine learning methods. Remote Sens. Environ. 2021, 256, 112316. [Google Scholar] [CrossRef]
Liu, H.; Lin, L.; Wang, Y.; Du, L.; Wang, S.; Zhou, P.; Yu, Y.; Gong, X.; Lu, X. Reconstruction of Monthly Surface Nutrient Concentrations in the Yellow and Bohai Seas from 2003–2019 Using Machine Learning. Remote Sens. 2022, 14, 5021. [Google Scholar] [CrossRef]
Sadaiappan, B.; Balakrishnan, P.; C.R., V.; Vijayan, N.T.; Subramanian, M.; Gauns, M.U. Applications of Machine Learning in Chemical and Biological Oceanography. ACS Omega 2023, 8, 15831–15853. [Google Scholar] [CrossRef]
Stephens, M.P.; Samuels, G.; Olson, D.B.; Fine, R.A.; Takahashi, T. Sea-air flux of CO₂ in the North Pacific using shipboard and satellite data. J. Geophys. Res. Ocean. 1995, 100, 13571–13583. [Google Scholar] [CrossRef]
Sarma, V.V.S.S. Monthly variability in surface pCO₂ and net air-sea CO₂ flux in the Arabian Sea. J. Geophys. Res. Ocean. 2003, 108, 3255. [Google Scholar] [CrossRef]
Jamet, C.; Moulin, C.; Lefèvre, N. Estimation of the oceanic pCO₂ in the North Atlantic from VOS lines in-situ measurements: Parameters needed to generate seasonally mean maps. Ann. Geophys. 2007, 25, 2247–2257. [Google Scholar] [CrossRef]
Ono Corresponding author, T.; Saino†, T.; Kurita, N.; Sasaki, K. Basin-scale extrapolation of shipboard pCO₂ data by using satellite SST and Chla. Int. J. Remote Sens. 2004, 25, 3803–3815. [Google Scholar] [CrossRef]
Zhang, S.; Bai, Y.; He, X.; Yu, S.; Song, Z.; Gong, F.; Zhu, Q.; Pan, D. The carbon sink of the Coral Sea, the world’s second largest marginal sea, weakened during 2006–2018. Sci. Total Environ. 2023, 872, 162219. [Google Scholar] [CrossRef]
Rödenbeck, C.; Bakker, D.C.; Gruber, N.; Iida, Y.; Jacobson, A.R.; Jones, S.; Landschützer, P.; Metzl, N.; Nakaoka, S.-I.; Olsen, A. Data-based estimates of the ocean carbon sink variability–first results of the Surface Ocean pCO₂ Mapping intercomparison (SOCOM). Biogeosciences 2015, 12, 7251–7278. [Google Scholar] [CrossRef]
Rödenbeck, C.; Bakker, D.C.E.; Metzl, N.; Olsen, A.; Sabine, C.; Cassar, N.; Reum, F.; Keeling, R.F.; Heimann, M. Interannual sea–air CO₂ flux variability from an observation-driven ocean mixed-layer scheme. Biogeosciences 2014, 11, 4599–4613. [Google Scholar] [CrossRef]
Lefèvre, N.; Watson, A.J.; Watson, A.R. A comparison of multiple regression and neural network techniques for mapping in situ pCO₂ data. Tellus B Chem. Phys. Meteorol. 2005, 57, 375–384. [Google Scholar] [CrossRef]
Telszewski, M.; Chazottes, A.; Schuster, U.; Watson, A.; Moulin, C.; Bakker, D.; González-Dávila, M.; Johannessen, T.; Körtzinger, A.; Lüger, H. Estimating the monthly pCO₂ distribution in the North Atlantic using a self-organizing neural network. Biogeosciences 2009, 6, 1405–1421. [Google Scholar] [CrossRef]
Landschützer, P.; Gruber, N.; Bakker, D.C.; Schuster, U.; Nakaoka, S.-I.; Payne, M.R.; Sasse, T.P.; Zeng, J. A neural network-based estimate of the seasonal to inter-annual variability of the Atlantic Ocean carbon sink. Biogeosciences 2013, 10, 7793–7815. [Google Scholar] [CrossRef]
Nakaoka, S.-i.; Telszewski, M.; Nojiri, Y.; Yasunaka, S.; Miyazaki, C.; Mukai, H.; Usui, N. Estimating temporal and spatial variation of ocean surface pCO₂ in the North Pacific using a self-organizing map neural network technique. Biogeosciences 2013, 10, 6093–6106. [Google Scholar] [CrossRef]
Zeng, J.; Nojiri, Y.; Landschützer, P.; Telszewski, M.; Nakaoka, S.-I. A global surface ocean f CO₂ climatology based on a feed-forward neural network. J. Atmos. Ocean. Technol. 2014, 31, 1838–1849. [Google Scholar] [CrossRef]
Chau, T.T.T.; Gehlen, M.; Chevallier, F. A seamless ensemble-based reconstruction of surface ocean pCO₂ and air–sea CO₂ fluxes over the global coastal and open oceans. Biogeosciences 2022, 19, 1087–1109. [Google Scholar] [CrossRef]
Landschützer, P.; Gruber, N.; Bakker, D.C.; Schuster, U. Recent variability of the global ocean carbon sink. Glob. Biogeochem. Cycles 2014, 28, 927–949. [Google Scholar] [CrossRef]
Hales, B.; Strutton, P.G.; Saraceno, M.; Letelier, R.; Takahashi, T.; Feely, R.; Sabine, C.; Chavez, F. Satellite-based prediction of pCO₂ in coastal waters of the eastern North Pacific. Prog. Oceanogr. 2012, 103, 1–15. [Google Scholar] [CrossRef]
Zeng, J.; Matsunaga, T.; Saigusa, N.; Shirai, T.; Nakaoka, S.-I.; Tan, Z.-H. Evaluation of three machine learning models for surface ocean CO₂ mapping. Ocean Sci. 2017, 13, 303–313. [Google Scholar] [CrossRef]
Zeng, J.; Nojiri, Y.; Nakaoka, S.i.; Nakajima, H.; Shirai, T. Surface ocean CO₂ in 1990–2011 modelled using a feed-forward neural network. Geosci. Data J. 2015, 2, 47–51. [Google Scholar] [CrossRef]
Zhang, S.; Chen, P.; Hu, Y.; Zhang, Z.; Jamet, C.; Lu, X.; Dionisi, D.; Pan, D. Diurnal global ocean surface p CO₂ and air-sea CO₂ flux reconstructed from spaceborne LiDAR data. PNAS Nexus 2023, 3, pgad432. [Google Scholar] [CrossRef]
Ćatipović, L.; Matić, F.; Kalinić, H. Reconstruction Methods in Oceanographic Satellite Data Observation—A Survey. J. Mar. Sci. Eng. 2023, 11, 340. [Google Scholar] [CrossRef]
Sanikommu, S.; Langodan, S.; Dasari, H.P.; Zhan, P.; Krokos, G.; Abualnaja, Y.O.; Asfahani, K.; Hoteit, I. Making the Case for High-Resolution Regional Ocean Reanalyses: An Example with the Red Sea. Bull. Am. Meteorol. Soc. 2023, 104, E1241–E1264. [Google Scholar] [CrossRef]
Meng, L.; Yan, X.-H. Remote Sensing for Subsurface and Deeper Oceans: An Overview and a Future Outlook. IEEE Geosci. Remote Sens. Mag. 2022, 10, 72–92. [Google Scholar] [CrossRef]
Yang, M.; Khan, F.A.; Tian, H.; Liu, Q. Analysis of the Monthly and Spring-Neap Tidal Variability of Satellite Chlorophyll-a and Total Suspended Matter in a Turbid Coastal Ocean Using the DINEOF Method. Remote Sens. 2021, 13, 632. [Google Scholar] [CrossRef]
Chen, Z.; Wang, X.; Liu, L. Reconstruction of Three-Dimensional Ocean Structure From Sea Surface Data: An Application of isQG Method in the Southwest Indian Ocean. J. Geophys. Res. Ocean. 2020, 125, e2020JC016351. [Google Scholar] [CrossRef]
Azcarate, A.; Barth, A.; Sirjacobs, D.; Lenartz, F.; Beckers, J.-M. Data Interpolating Empirical Orthogonal Functions (DINEOF): A tool for geophysical data analyses. Mediter. Mar. Sci. 2011, 12. [Google Scholar] [CrossRef]
Siswanto, E.; Tanaka, K. Phytoplankton Biomass Dynamics in the Strait of Malacca within the Period of the SeaWiFS Full Mission: Seasonal Cycles, Interannual Variations and Decadal-Scale Trends. Remote Sens. 2014, 6, 2718–2742. [Google Scholar] [CrossRef]
Ganzedo, U.; Alvera-Azcarate, A.; Esnaola, G.; Ezcurra, A.; Saenz, J. Reconstruction of sea surface temperature by means of DINEOF: A case study during the fishing season in the Bay of Biscay. Int. J. Remote Sens. 2011, 32, 933–950. [Google Scholar] [CrossRef]
Hong, T.; Qin, R.; Xu, Z. An Improved Data Interpolating Empirical Orthogonal Function Method for Data Reconstruction: A Case Study of the Chlorophyll-a Concentration in the Bohai Sea, China. Appl. Sci. 2024, 14, 2803. [Google Scholar] [CrossRef]
Alvera-Azcárate, A.; Vanhellemont, Q.; Ruddick, K.; Barth, A.; Beckers, J.-M. Analysis of high frequency geostationary ocean colour data using DINEOF. Estuar. Coast. Shelf Sci. 2015, 159, 28–36. [Google Scholar] [CrossRef]
Schneegans, S.; Straza, T.; Lewis, J.; Gluckman, P.; Amaradasa, R. UNESCO Science Report: The Race Against Time for Smarter Development; UNESCO: Paris, France, 2021. [Google Scholar]
Cutolo, E.; Pascual, A.; Ruiz, S.; Zarokanellos, N.D.; Fablet, R. CLOINet: Ocean state reconstructions through remote-sensing, in-situ sparse observations and deep learning. Front. Mar. Sci. 2024, 11, 1151868. [Google Scholar] [CrossRef]
Zhang, M.; Xu, N.; Chen, L. Fusion SST from Infrared and Microwave Measurement of FY-3D Meteorological Satellite. J. Trop. Meteorol. 2024, 30, 89–96. [Google Scholar] [CrossRef]
Liu, J.; Sun, Y.; Ren, K.; Zhao, Y.; Deng, K.; Wang, L. A Spatial Downscaling Approach for WindSat Satellite Sea Surface Wind Based on Generative Adversarial Networks and Dual Learning Scheme. Remote Sens. 2022, 14, 769. [Google Scholar] [CrossRef]
Mohebzadeh, H.; Mokari, E.; Daggupati, P.; Biswas, A. A machine learning approach for spatiotemporal imputation of MODIS chlorophyll-a. Int. J. Remote Sens. 2021, 42, 7381–7404. [Google Scholar] [CrossRef]
Ouala, S.; Fablet, R.; Herzet, C.; Chapron, B.; Pascual, A.; Collard, F.; Gaultier, L. Neural Network Based Kalman Filters for the Spatio-Temporal Interpolation of Satellite-Derived Sea Surface Temperature. Remote Sens. 2018, 10, 1864. [Google Scholar] [CrossRef]
Barth, A.; Alvera-Azcárate, A.; Troupin, C.; Beckers, J.M. DINCAE 2.0: Multivariate convolutional neural network with error estimates to reconstruct sea surface temperature satellite and altimetry observations. Geosci. Model Dev. 2021, 15, 2183–2196. [Google Scholar] [CrossRef]
Hirahara, N.; Sonogashira, M.; Kasahara, H.; Iiyama, M. Denoising and Inpainting of Sea Surface Temperature Image with Adversarial Physical Model Loss. In Proceedings of the Asian Conference on Pattern Recognition, Auckland, New Zealand, 26–29 November 2019. [Google Scholar]
Jouini, M.; Lévy, M.; Crépon, M.; Thiria, S. Reconstruction of satellite chlorophyll images under heavy cloud coverage using a neural classification method. Remote Sens. Environ. 2013, 131, 232–246. [Google Scholar] [CrossRef]
Ćatipović, L.; Matić, F.; Kalinić, H.; Sathyendranath, S.; Županović, T.; Dingle, J.; Jackson, T. CCGAN as a Tool for Satellite-Derived Chlorophyll a Concentration Gap Reconstruction. J. Mar. Sci. Eng. 2023, 11, 1814. [Google Scholar] [CrossRef]
Archambault, T.; Filoche, A.; Charantonis, A.A.; Béréziat, D. Multimodal Unsupervised Spatio-Temporal Interpolation of Satellite Ocean Altimetry Maps. In Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2023), Lisbon, Portugal, 19–21 February 2023. [Google Scholar]
Martinez, E.; Gorgues, T.; Lengaigne, M.; Sauzède, R.; Menkes, C.; Uitz, J.; Lorenzo, E.; Fablet, R. Reconstructing Global Chlorophyll-a Variations Using a Non-linear Statistical Approach. Front. Mar. Sci. 2020, 7, 464. [Google Scholar] [CrossRef]
Roussillon, J.; Fablet, R.; Gorgues, T.; Drumetz, L.; Littaye, J.; Martinez, E. A Multi-Mode Convolutional Neural Network to reconstruct satellite-derived chlorophyll-a time series in the global ocean from physical drivers. Front. Mar. Sci. 2023, 10, 1077623. [Google Scholar] [CrossRef]
Gray, P.C.; Boss, E.; Prochaska, X.; Kerner, H.; Demeaux, C.B.; Lehahn, Y. The Promise and Pitfalls of Machine Learning in Ocean Remote Sensing. Oceanography 2024, 37, 52–63. [Google Scholar] [CrossRef]
Bracco, A.; Brajard, J.; Dijkstra, H.; Hassanzadeh, P.; Lessig, C.; Monteleoni, C. Machine learning for the physics of climate. Nat. Rev. Phys. 2024, 7, 6–20. [Google Scholar] [CrossRef]
Stock, A. Spatiotemporal distribution of labeled data can bias the validation and selection of supervised learning algorithms: A marine remote sensing example. ISPRS J. Photogramm. Remote Sens. 2022, 187, 46–60. [Google Scholar] [CrossRef]
Stock, A.; Subramaniam, A. Iterative spatial leave-one-out cross-validation and gap-filling based data augmentation for supervised learning applications in marine remote sensing. GISci. Remote Sens. 2022, 59, 1281–1300. [Google Scholar] [CrossRef]
Pan, S.J.; Yang, Q. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [Google Scholar] [CrossRef]
Ndung’u, R.N. Data Preparation for Machine Learning Modelling. Int. J. Comput. Appl. Technol. Res. 2022, 11, 231–235. [Google Scholar]
Vyas, T.K. Deep Learning with Tabular Data: A Self-supervised Approach. arXiv 2024, arXiv:2401.15238. [Google Scholar]
Read, J.S.; Jia, X.; Willard, J.; Appling, A.P.; Zwart, J.A.; Oliver, S.K.; Karpatne, A.; Hansen, G.J.; Hanson, P.C.; Watkins, W. Process-guided deep learning predictions of lake water temperature. Water Resour. Res. 2019, 55, 9173–9190. [Google Scholar] [CrossRef]
Sadler, J.M.; Koenig, L.E.; Gorski, G.; Carter, A.M.; Hall, R.O., Jr. Evaluating a process-guided deep learning approach for predicting dissolved oxygen in streams. Hydrol. Process. 2024, 38, e15270. [Google Scholar] [CrossRef]
Sathyendranath, S.; Brewin, R.J.; Brockmann, C.; Brotas, V.; Calton, B.; Chuprin, A.; Cipollini, P.; Couto, A.B.; Dingle, J.; Doerffer, R. An ocean-colour time series for use in climate studies: The experience of the ocean-colour climate change initiative (OC-CCI). Sensors 2019, 19, 4285. [Google Scholar] [CrossRef]
Sathyendranath, S.; Brewin, B.; Mueller, D.; Doerffer, R.; Krasemann, H.; Mélin, F.; Brockmann, C.; Fomferra, N.; Peters, M.; Grant, M. Ocean colour climate change initiative—Approach and initial results. In Proceedings of the 2012 IEEE International Geoscience and Remote Sensing Symposium, Munich, Germany, 22–27 July 2012; pp. 2024–2027. [Google Scholar]
Osman, A.I.; Nasr, M.; Farghali, M.; Bakr, S.S.; Eltaweil, A.S.; Rashwan, A.K.; Abd El-Monaem, E.M. Machine learning for membrane design in energy production, gas separation, and water treatment: A review. Environ. Chem. Lett. 2024, 22, 505–560. [Google Scholar] [CrossRef]
Durlik, I.; Miller, T.; Dorobczyński, L.; Kozlovska, P.; Kostecki, T. Revolutionizing Marine Traffic Management: A Comprehensive Review of Machine Learning Applications in Complex Maritime Systems. Appl. Sci. 2023, 13, 8099. [Google Scholar] [CrossRef]
Kulkarni, M.; Deshpande, P.; Nalbalwar, S.; Nandgaonkar, A. Cloud computing based workload prediction using cluster machine learning approach. In Proceedings of the International Conference on Computing in Engineering & Technology, Lonere, India, 12–13 February 2022; pp. 591–601. [Google Scholar]
Chen, F.; Li, S.; Han, J.; Ren, F.; Yang, Z. Review of lightweight deep convolutional neural networks. Arch. Comput. Methods Eng. 2024, 31, 1915–1937. [Google Scholar] [CrossRef]
Suganya, B.; Gopi, R.; Kumar, A.R.; Singh, G. Dynamic task offloading edge-aware optimization framework for enhanced UAV operations on edge computing platform. Sci. Rep. 2024, 14, 16383. [Google Scholar] [CrossRef]
Cheng, C.; Hou, X.; Wang, C.; Wen, X.; Liu, W.; Zhang, F. A Pruning and Distillation Based Compression Method for Sonar Image Detection Models. J. Mar. Sci. Eng. 2024, 12, 1033. [Google Scholar] [CrossRef]
Dramsch, J.S.; Kuglitsch, M.M.; Fernández-Torres, M.-Á.; Toreti, A.; Albayrak, R.A.; Nava, L.; Ghaffarian, S.; Cheng, X.; Ma, J.; Samek, W.; et al. Explainability can foster trust in artificial intelligence in geoscience. Nat. Geosci. 2025, 18, 112–114. [Google Scholar] [CrossRef]
Lundberg, S. A unified approach to interpreting model predictions. arXiv 2017, arXiv:1705.07874. [Google Scholar]
Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why should i trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar]
Farea, A.; Yli-Harja, O.; Emmert-Streib, F. Understanding Physics-Informed Neural Networks: Techniques, Applications, Trends, and Challenges. AI 2024, 5, 1534–1557. [Google Scholar] [CrossRef]
Hojjati, H.; Ho, T.K.K.; Armanfard, N. Self-supervised anomaly detection in computer vision and beyond: A survey and outlook. Neural Netw. 2024, 172, 106106. [Google Scholar] [CrossRef] [PubMed]
Mishra, D.R.; Kumar, A.; Ramaswamy, L.; Boddula, V.K.; Das, M.C.; Page, B.P.; Weber, S.J. CyanoTRACKER: A cloud-based integrated multi-platform architecture for global observation of cyanobacterial harmful algal blooms. Harmful Algae 2020, 96, 101828. [Google Scholar] [CrossRef]
Sunagawa, S.; Acinas, S.G.; Bork, P.; Bowler, C.; Acinas, S.G.; Babin, M.; Bork, P.; Boss, E.; Bowler, C.; Cochrane, G.; et al. Tara Oceans: Towards global ocean ecosystems biology. Nat. Rev. Microbiol. 2020, 18, 428–445. [Google Scholar] [CrossRef] [PubMed]
Pesant, S.; Not, F.; Picheral, M.; Kandels-Lewis, S.; Le Bescot, N.; Gorsky, G.; Iudicone, D.; Karsenti, E.; Speich, S.; Troublé, R.; et al. Open science resources for the discovery and analysis of Tara Oceans data. Sci. Data 2015, 2, 150023. [Google Scholar] [CrossRef] [PubMed]
Wong, A.P.S.; Wijffels, S.E.; Riser, S.C.; Pouliquen, S.; Hosoda, S.; Roemmich, D.; Gilson, J.; Johnson, G.C.; Martini, K.; Murphy, D.J.; et al. Argo Data 1999–2019: Two Million Temperature-Salinity Profiles and Subsurface Velocity Observations from a Global Array of Profiling Floats. Front. Mar. Sci. 2020, 7, 700. [Google Scholar] [CrossRef]
Johnson, K.; Claustre, H. Bringing biogeochemistry into the Argo age. Eos Trans. Am. Geophys. Union 2016, 97. [Google Scholar] [CrossRef]
Fredston, A.L.; Lowndes, J.S.S. Welcoming More Participation in Open Data Science for the Oceans. Annu. Rev. Mar. Sci. 2024, 16, 537–549. [Google Scholar] [CrossRef]
Shaw, S.; Sales, A. Using the Open Science Framework to promote Open Science in Education Research. In Proceedings of the Educational Data Mining, Durham, UK, 24–27 July 2022. [Google Scholar]
Kong, Q.; Wang, R.; Walter, W.R.; Pyle, M.; Koper, K.; Schmandt, B. Combining Deep Learning with Physics Based Features in Explosion-Earthquake Discrimination. Geophys. Res. Lett. 2022, 49, e2022GL098645. [Google Scholar] [CrossRef]
Cuomo, S.; Di Cola, V.S.; Giampaolo, F.; Rozza, G.; Raissi, M.; Piccialli, F. Scientific Machine Learning Through Physics–Informed Neural Networks: Where we are and What’s Next. J. Sci. Comput. 2022, 92, 88. [Google Scholar] [CrossRef]
Gal, Y.; Ghahramani, Z. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. In Proceedings of the International Conference on Machine Learning, Lille, France, 7–9 July 2015. [Google Scholar]
Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
Lee, H.; Kim, N.-W.; Lee, J.-G.; Lee, B.-T. Uncertainty-aware deep learning forecast using dropout-based ensemble method. In Proceedings of the 2019 International Conference on Information and Communication Technology Convergence (ICTC), Jeju Island, Republic of Korea, 16–18 October 2019; pp. 1120–1125. [Google Scholar]
Lakshminarayanan, B.; Pritzel, A.; Blundell, C. Simple and scalable predictive uncertainty estimation using deep ensembles. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Kendall, A.; Gal, Y. What uncertainties do we need in bayesian deep learning for computer vision? In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]

Figure 1. Conceptual diagram of the progress on ML-based ocean color remote sensing. (a) Integration schematic of the ocean color remote sensing and ML techniques. (b) Growth of ML-related papers. (c) Semantic clustering of keyword co-occurrence networks. (d) Evolution of key research topics over time.

Figure 2. Deep learning results for spaceborne lidar water body parameter inversion: (a) comparison of winter Arctic Ocean chlorophyll inversion results between the lidar CALIOP and passive remote sensing MODIS. The dark blue represents the sea ice; (b) comparison between the lidar deep learning inversion model (CALIOP_DNN).

Figure 3. Arctic climatological pCO₂ distribution [16].

Figure 4. Machine learning models with coupled physical parameters [21]: (a) dual-branch two-step deep learning model framework; (b) POC inversion model comparison; (c) PIC inversion model comparison.

Table 1. List of high-frequency keywords in ML-OCRS.

Keywords	Occurrences	Keywords	Occurrences	Keywords	Occurrences
remote sensing	104	meris	8	chlorophyll-a (chl-a)	5
ocean color	69	algorithm development	7	classification	5
machine learning	60	artificial neural network	7	feature extraction	5
neural network	36	cdom	7	inland and coastal waters	5
atmospheric correction	25	chlorophyll-a concentration	7	normalized fluorescence height	5
chlorophyll-a	24	ocean color	7	radiative transfer	5
neural networks	23	random forest	7	arctic ocean	4
water quality	22	satellite	7	artificial intelligence	4
deep learning	21	satellites	7	Barents sea	4
modis	21	seawifs	7	coastal water	4
ocean color remote sensing	20	aerosols	6	geostationary ocean color imager (goci)	4
phytoplankton	17	Baltic sea	6	hyperspectral	4
chlorophyll	14	inherent optical properties	6	image color analysis	4
remote sensing reflectance	11	karenia brevis	6	inland waters	4
coastal waters	10	oceans	6	inversion	4
harmful algal blooms	10	satellite remote sensing	6	machine learning algorithm	4
olci	9	sea measurements	6	new Caledonia	4
sentinel-3	9	viirs	6	ocean color remote sensing reflectance	4
chlorophyll a	8	west Florida shelf	6	ocean color remote sensing	4
goci	8	algorithm	5	ocean optics	4

Table 2. Performance metrics of machine learning models in ocean color remote sensing.

	Research Focus	Model Performance
(Gross et al., 1999) [68]	NN inversion for chlorophyll-a retrieval from satellite reflectances.	ANN achieved ±3% accuracy compared to 15–30% error in polynomial fits.	Used SeaWiFS simulated data.
(Gross et al., 2004) [69]	NN to retrieve chlorophyll-a from marine reflectance in the Western Equatorial Pacific Ocean.	Improved performance by 75% compared to classical algorithms using reflectance ratios.	Model calibrated with synthetic and in situ data.
(Jamet et al., 2012) [70]	NN to estimate Kd from SeaWiFS data.	ANN method has RMSE of 0.27 m⁻¹ for Kd(490), significantly better than traditional methods: Kd(Werdell) RMSE = 1.41 m⁻¹, Kd(Zhang) RMSE = 0.71 m⁻¹, Kd(Morel) RMSE = 1.56 m⁻¹.	Model trained with synthetic data and in situ data and evaluated with field measurement data.
(Ioannou et al., 2013) [71]	NN for retrieving IOP and Chl from MODIS Rrs.	NN algorithms improve chlorophyll retrieval with R² of ~0.90, compared to OC3 (~0.84).	Used both simulated and field data.
(Fan et al., 2017) [24]	Atmospheric correction using NN for coastal waters.	NN algorithm reduced the Average Percentage Difference (APD) in AOD retrievals by up to 25% in the blue bands (412 nm and 443 nm) compared to SeaDAS NIR and NIR/SWIR algorithms.	Model trained with synthetic data and in situ data and evaluated with field measurement data.
(Cao et al., 2020) [60]	XGBoost to estimate Chl-a in turbid inland lakes using Landsat-8 data.	BST performed well with MAPD = 24% compared to RF (MAPD = 30%) and band-ratio algorithms (MAPD = 64%).	Field data from 8 lakes in eastern China (N = 225) and SeaWiFS Bio-optical Archive (N = 97).
(Li et al., 2020) [72]	NN for atmospheric correction of GOCI data at high solar zenith angles.	The NN AC algorithm yielded stable Rrs even at solar zenith angles ≥70° (APD = 30%), outperforming traditional NIR algorithms (APD = 87%).	Model trained with GOCI Rayleigh-corrected radiance and noontime Rrs matchups and evaluated with in situ data from AERONET-.
(Pahlevan et al., 2020) [73]	MDN for estimating Chl-a from Sentinel-2 MSI and Sentinel-3 OLCI data in inland and coastal waters.	MDN outperformed OC algorithms (MAPE was improved by 2–3 times).	Model trained with 1000 co-located in situ Rrs–Chla pairs and evaluated with independent in situ data (n > 1900) from multiple regions, including lakes, rivers, and estuaries.
(Zhao et al., 2023) [74]	ANN for HY-1C CZI atmospheric correction.	APD was reduced to 9.78% on average compared to ACAOD’s 105.48%.	Model trained with HY-1C CZI and Landsat 8 OLI spatio-temporally synchronized datasets and evaluated with in situ data and quasi-synchronous Landsat 8 Rrs data.
(He et al., 2024) [18]	XGBoost for polarization atmospheric correction.	MAPE of Rrs(490) was reduced to 34.43 compared to GlobColour products (MAPE > 60%).	Model trained with simulated data and evaluated with field measurement data.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, Z.; Chen, P.; Zhang, S.; Huang, H.; Pan, Y.; Pan, D. A Review of Machine Learning Applications in Ocean Color Remote Sensing. Remote Sens. 2025, 17, 1776. https://doi.org/10.3390/rs17101776

AMA Style

Zhang Z, Chen P, Zhang S, Huang H, Pan Y, Pan D. A Review of Machine Learning Applications in Ocean Color Remote Sensing. Remote Sensing. 2025; 17(10):1776. https://doi.org/10.3390/rs17101776

Chicago/Turabian Style

Zhang, Zhenhua, Peng Chen, Siqi Zhang, Haiqing Huang, Yuliang Pan, and Delu Pan. 2025. "A Review of Machine Learning Applications in Ocean Color Remote Sensing" Remote Sensing 17, no. 10: 1776. https://doi.org/10.3390/rs17101776

APA Style

Zhang, Z., Chen, P., Zhang, S., Huang, H., Pan, Y., & Pan, D. (2025). A Review of Machine Learning Applications in Ocean Color Remote Sensing. Remote Sensing, 17(10), 1776. https://doi.org/10.3390/rs17101776

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Review of Machine Learning Applications in Ocean Color Remote Sensing

Abstract

1. Introduction

2. Fundamentals of Ocean Color Remote Sensing

2.1. Principles of Ocean Color Remote Sensing

2.2. Traditional Challenges in Data Interpretation and Analysis

3. Machine Learning Models and Algorithms

4. Machine Learning-Enhanced Ocean Color Remote Sensing

4.1. Atmospheric and Optical Correction Innovations

4.2. Applications in Bio-Optical Property Retrieval

4.3. Enhanced Analysis of the Ocean Carbon Cycle

4.4. Development of Data Reconstruction Methods Based on Machine Learning

5. Challenges and Opportunities

5.1. Challenges

5.1.1. Limitations in Generalization and Model Adaptability

5.1.2. Data Availability and Quality

5.1.3. Computational Complexity and Resource Limitations

5.1.4. Model Interpretability and Transparency

5.1.5. Explainable AI for Ocean Color Remote Sensing

5.2. Opportunities and the Way Forward

5.2.1. Harnessing Technological Innovation and Computational Advances

5.2.2. Leveraging Open Data and Interdisciplinary Collaboration

5.2.3. The Use of Hybrid Models for Scientific and Practical Applications

5.2.4. Building Capacity for Uncertainty-Aware Models

5.2.5. Data Fusion and Integration from Multiple Platforms

6. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI