Simulation and Predictive Environmental Modeling for Marine Forecasting: A Systematic Review

Souri, Annamaria; Kokkinaki, Angelika

doi:10.3390/jmse14050493

Open AccessSystematic Review

Simulation and Predictive Environmental Modeling for Marine Forecasting: A Systematic Review

by

Annamaria Souri

^*

and

Angelika Kokkinaki

Department of Management, School of Business, University of Nicosia, Nicosia 2417, Cyprus

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2026, 14(5), 493; https://doi.org/10.3390/jmse14050493

Submission received: 30 January 2026 / Revised: 22 February 2026 / Accepted: 3 March 2026 / Published: 4 March 2026

(This article belongs to the Section Marine Environmental Science)

Download

Browse Figures

Versions Notes

Abstract

Coastal and marine systems are governed by fragile water-quality dynamics, where disturbances can trigger harmful algal blooms with significant ecological and societal consequences. These pressures have intensified interest in forecasting systems that can anticipate bloom development and support environmental management. This study presents a systematic review of simulation-based and predictive environmental modeling approaches used for marine forecasting of water quality and harmful algal bloom phenomena. Following PRISMA guidelines, 11,185 records were identified, 127 articles were screened in full text for eligibility, and 40 peer-reviewed studies published between 2015 and 2025 were included and synthesized using a structured extraction framework capturing modeling paradigms, forecast targets, data inputs, spatial and temporal scope, validation practices, operational context, and reported limitations. The reviewed literature indicates the dominance of predictive and hybrid modeling approaches, with forecasting efforts primarily focused on coastal systems and short-term applications. Harmful algal blooms and chlorophyll-a emerge as dominant forecast targets, commonly supported by satellite observations, in situ measurements, and environmental forcing variables. Despite substantial methodological advances, persistent challenges related to data availability and quality, validation rigor, system integration, and operational deployment remain evident across modeling paradigms. Overall, the findings suggest that while marine forecasting models have become increasingly sophisticated, their translation into reliable and operational systems remains uneven, highlighting the need for closer alignment.

Keywords:

marine forecasting; harmful algal blooms; predictive environmental modeling; simulation modeling

1. Introduction

Coastal and marine environments reflect some of the most dynamic interactions between natural processes and human pressures, with water quality degradation and harmful algal blooms (HABs) emerging as persistent and visible challenges. HAB events can produce potent toxins, contaminate seafood, disrupt marine ecosystems, and lead to widespread closures of aquaculture and fisheries, resulting in substantial economic losses and increased risks to human health [1,2,3]. The increasing frequency and intensity of these events which are often driven by factors such as climate change, eutrophication, and anthropogenic pressures have intensified the demand for reliable forecasting systems capable of supporting timely decision-making and environmental management [2]. Such challenges underscore the importance of anticipation as a core component of marine environmental management.

Accurate marine forecasting enables authorities to anticipate the onset, transport, and potential impacts of HABs, allowing for the implementation of preventive measures, optimized monitoring strategies, and targeted mitigation actions [4,5]. Forecasting key indicators such as chlorophyll-a concentration has become central to early warning systems, as elevated levels often signal an increased likelihood of red tide formation [6]. Beyond short-term management, forecasting also contributes to improved understanding of the environmental drivers of bloom dynamics and supports long-term ecosystem planning and resilience-building strategies [1,5]. Additionally, simulation-based and predictive environmental models constitute the core technological foundation of contemporary marine forecasting efforts. Process-based numerical models, such as hydrodynamic and biogeochemical simulations, are widely used to reproduce ocean circulation, nutrient dynamics, and bloom transport pathways, enabling scenario testing and mechanistic interpretation of bloom [7,8]. In parallel, predictive and data-driven models with emphasis on those based on machine learning, have gained increasing prominence due to their ability to capture complex, nonlinear relationships between environmental drivers and bloom occurrence [9,10]. Approaches such as random forests, neural networks, and quantile-based prediction frameworks allow for flexible forecasting across multiple temporal horizons and risk levels, offering valuable support for operational decision-making [3].

Despite these extensive advances in simulation-based and predictive environmental modelling, the translation of these approaches into reliable, real-time marine forecasting systems remains constrained. Across the reviewed literature, recurring challenges are identified concerning model validation, data availability and integration, as well as operational deployment readiness. Several studies note that while predictive accuracy metrics are often reported, validation is frequently conducted under restricted conditions or using limited datasets, raising concerns regarding robustness and transferability [3,11,12]. Similar limitations are highlighted in studies employing advanced predictive schemes, where validation remains constrained to selected cases and lacks broader reliability assessment [13].

Similarly, data-related constraints further hinder forecasting performance and real-time applicability. Numerous studies report reliance on historical datasets, satellite-derived observations, or sparsely distributed in situ measurements, which limit both temporal resolution and predictive accuracy [14,15]. Challenges associated with integrating heterogeneous data sources such as physical, biological, and data-driven inputs are also frequently acknowledged, underscoring the difficulty of constructing cohesive forecasting pipelines [16,17,18].

In addition to validation and data integration challenges, few studies demonstrate readiness for real-time or operational deployment. Several authors explicitly acknowledge that their proposed models remain experimental or offline in nature, with high computational demands, system complexity, and lack of automated pipelines identified as key barriers to implementation [19,20,21]. Moreover, red tide prediction models are often not fully integrated into operational monitoring systems, further widening the gap between methodological development and practical application [22].

While existing review studies have examined specific aspects of marine environmental modelling [4,5] with emphasis on ecological processes, algorithmic developments, or remote sensing applications, the lack of systematic synthesis explicitly focused on forecasting-oriented simulation and predictive models persists. As a result, the challenges identified across individual studies remain fragmented and inconsistently discussed, making it difficult to assess the overall maturity of current marine forecasting approaches or to identify the barriers hindering real-time and autonomous deployment.

To address this gap, this study conducts a systematic review of simulation-based and predictive environmental modelling approaches used for marine forecasting of water quality and harmful algal bloom phenomena. The review synthesizes existing modeling approaches by examining their forecast targets, data inputs, and spatial–temporal scopes, while systematically identifying integration, validation, and operational limitations reported in the literature. By consolidating existing evidence and categorizing recurring challenges, this study provides a structured assessment of the state of the art and informs the development of more robust, real-time, and autonomous marine forecasting systems.

Accordingly, this study is guided by the following research questions and objectives.

RQ1: How are simulation-based and predictive environmental modelling approaches currently used for marine forecasting of water quality and red tide phenomena?

RO1: To systematically identify, classify, and synthesize simulation-based and predictive environmental modelling approaches used for marine forecasting of water quality and red tide phenomena, including their forecast targets, data inputs, and spatial–temporal scopes.

RQ2: What integration, validation, and operational limitations are reported in existing marine forecasting models that hinder real-time and autonomous deployment?

RO2: To analyse reported integration, validation, and operational limitations of existing marine forecasting models to identify barriers to real-time and autonomous deployment.

The remainder of this paper is structured as follows. Section 2 describes the systematic review methodology, including the search strategy, inclusion and exclusion criteria, and study selection process in accordance with PRISMA guidelines. Section 3 presents the results of the review, providing an overview of the selected studies and synthesizing the characteristics of simulation-based and predictive modelling approaches used in marine forecasting. Section 4 discusses the findings in relation to the identified research gaps, with particular emphasis on validation, data integration, and operational deployment challenges. Finally, Section 5 concludes the paper by summarizing the key findings and highlighting their implications for the development of robust, real-time, and autonomous marine forecasting systems.

2. Methodology

2.1. Review Design and Protocol

This study adopts a systematic literature review (SLR) design to synthesize existing research on simulation-based and predictive environmental modelling approaches for marine forecasting of water quality and harmful algal bloom phenomena. A systematic review methodology was selected to enable a transparent, rigorous, and reproducible synthesis of a fragmented and methodologically heterogeneous body of literature, where modelling approaches, forecast targets, and evaluation practices vary substantially across studies. Systematic reviews are particularly suitable for consolidating dispersed evidence, identifying recurring limitations, and providing a structured overview of the state of the art in complex and interdisciplinary research domains [23,24].

The review was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines (Supplementary Table S1) [25]. PRISMA was selected as it provides a standardized and widely accepted framework for the transparent reporting of systematic reviews, particularly those involving heterogeneous study designs and methodological approaches. Given the diversity of simulation-based and predictive modelling techniques applied in marine forecasting, the PRISMA framework was deemed appropriate for guiding the identification, screening, eligibility assessment, and inclusion of relevant studies in a consistent and replicable manner. By explicitly documenting each stage of the study selection process, PRISMA supports methodological clarity and minimizes the risk of selection bias.

The scope of the review was defined to include peer-reviewed research articles published between 2015 and 2025, reflecting recent advances in environmental modelling and forecasting techniques relevant to contemporary marine challenges. The review focused exclusively on studies published in English and indexed in established academic databases to ensure scholarly quality and accessibility.

2.2. Data Sources and Search Strategy

The literature search was conducted using three major academic databases: Scopus, ProQuest, and EBSCOhost. These databases were selected due to their broad coverage of peer-reviewed journals across environmental science, marine science, engineering, and applied modelling disciplines, ensuring comprehensive retrieval of relevant studies.

Search queries were constructed using combinations of keywords related to marine forecasting, simulation modelling, predictive modeling, water quality, and harmful algal blooms, with logical operators (AND/OR) applied to refine and combine search terms. Searches were performed across titles, abstracts, and author-provided keywords to capture studies that explicitly addressed forecasting-oriented modelling approaches. Filters were applied to restrict results to peer-reviewed journal articles, published in English, within the 2015–2025 period.

To maintain transparency and reproducibility, the search strategy was consistently applied across all databases. Minor adaptations to keyword syntax were made where necessary to accommodate database-specific search functionalities.

2.3. Inclusion and Exclusion Criteria

Clear inclusion and exclusion criteria were established prior to the literature search to guide study selection and ensure alignment with the research objectives. The criteria were applied consistently throughout the screening process to ensure objectivity and methodological rigor. A summary of the inclusion and exclusion criteria is presented below in Table 1.

2.4. Study Selection Process

The initial literature search across the three selected databases yielded a total of 11,185 records, including 5086 from Scopus, 5993 from ProQuest, and 106 from EBSCOhost. Following the application of language (English), publication period (2015–2025), and peer-review filters, 4687 records remained. After removing duplicates (n = 2), the dataset was reduced to 4685 records. Subsequent screening for relevance to the research objectives, based on title and abstract review, resulted in 127 records being retained for further assessment.

These records were subjected to full-text screening, during which studies were evaluated against the predefined inclusion and exclusion criteria and any remaining duplicates. As a result of this process, 40 studies met all eligibility requirements and were included in the final review. The complete study selection process, including identification, screening, eligibility assessment, and final inclusion, is summarized in the PRISMA flow diagram (Figure 1).

2.5. Data Extraction and Synthesis

Following the study selection process, data were systematically extracted from each of the 40 included studies using a structured extraction template developed in alignment with the research questions and objectives of this review. The extraction process focused on capturing both descriptive and methodological characteristics of each study to enable a comprehensive synthesis of current marine forecasting practices.

For each included article, data extraction captured the type of modeling approach employed (simulation-based, predictive/data-driven, or hybrid), the primary forecast targets (such as water quality parameters, harmful algal blooms, or chlorophyll-a), and the nature of the data inputs used, including satellite observations, in situ measurements, historical datasets, or combinations thereof. In addition, information was extracted on the spatial scale of application, the temporal scope and forecasting horizon, as well as the validation and evaluation methods applied. Particular attention was given to author-reported integration, validation, and operational limitations, which formed the basis for identifying recurring challenges across studies.

The extracted data were synthesized using a qualitative and descriptive approach, appropriate for the heterogeneity of modelling techniques, data sources, and application contexts represented in the reviewed literature. Studies were grouped and classified based on shared methodological characteristics and forecast objectives to address RQ1, while reported limitations were thematically analysed to address RQ2. Importantly, identified gaps and challenges were derived exclusively from author-stated limitations and discussions to ensure consistency and to avoid interpretive bias.

This structured extraction and synthesis process enabled a transparent assessment of how simulation-based and predictive environmental modelling approaches are currently applied in marine forecasting, as well as a systematic identification of the recurring barriers that hinder real-time and autonomous deployment.

2.6. Methodological Quality Assessment

The methodological quality of the included studies (n = 40) was assessed using a five-domain rubric developed for heterogeneous marine forecasting research, encompassing process-based simulation studies, data-driven predictive models, and hybrid approaches. A domain-based rubric was used because the included literature varies substantially in modeling paradigm, data sources, validation strategies, and deployment context, and no single standardized risk-of-bias tool is directly applicable across this combination of study types.

Each study was scored across five domains: (i) dataset adequacy, (ii) validation rigor, (iii) operational maturity, (iv) model transparency, and (v) data completeness. Each domain was scored on a 0–2 scale, yielding a total score of 0–10. Total scores were mapped to quality categories as follows: low (0–4), moderate (5–7), and high (8–10). Across domains, a score of 0 indicates absent reporting or clearly insufficient practice for forecasting-oriented inference; a score of 1 indicates partially adequate practice with acknowledged or evident limitations; and a score of 2 indicates adequate practice with clear reporting consistent with robust forecasting evaluation and interpretation. Where reporting was insufficient to evaluate a domain, the domain was conservatively scored as 0.

Dataset adequacy was assessed using thresholds derived from extracted metadata, including the reported study duration, sampling frequency (or temporal resolution), spatial coverage, and sample size. Validation rigor distinguished between basic evaluation approaches (for example, a single random train–test split or direct observation comparison without a clearly separated forecasting test period) and more rigorous designs, such as cross-validation (including time-aware variants when reported), out-of-bag evaluation when explicitly implemented, and temporal holdout strategies based on withheld periods or years. Operational maturity was classified according to whether the study reported an experimental, near-operational, or fully operational forecasting system. Model transparency received higher scores when studies reported explainability methods or uncertainty quantification (for example, SHAP-based attribution, quantile-based outputs, or explicitly interpretable model structures). Data completeness was evaluated based on the breadth of predictor categories reported, including physical oceanography, chemical and biogeochemical, meteorological, biological and ecological, anthropogenic and catchment drivers, and remote sensing indices or spectral bands.

For review and synthesis articles, domains that require an implemented forecasting workflow within the study (dataset adequacy, validation rigor, and operational maturity) were scored as 0 to preserve a consistent 0–10 scoring scale across the full study set. This operationalization is not intended as a methodological critique of review articles; rather, it reflects the limited applicability of implementation-focused domains to non-implementation study types. To support interpretation, a sensitivity analysis was conducted by repeating the quality-summary statistics after excluding review and synthesis articles.

All quality summaries stratified by modeling paradigm use the primary approach classification reported in the following section, where each study is assigned to exactly one group: predictive/data-driven, hybrid, or simulation-only.

3. Results

3.1. Overview of Included Studies

A total of 40 peer-reviewed studies were included in the final synthesis following the PRISMA-based screening process. Overall, the body of literature spans the period 2016–2025, reflecting nearly a decade of research activity in simulation and predictive environmental modeling for marine forecasting. As indicated in Figure 2 below, the annual number of publications demonstrates a gradual but consistent increase over time, with relatively modest output prior to 2020 and a more pronounced rise from 2021 onward. The highest concentration of studies was observed in 2024 (n = 8) and 2025 (n = 7), suggesting an intensification of scholarly activity in the most recent years. This upward trajectory was further examined through a Pearson correlation analysis between publication year and annual publication count, which revealed a statistically significant positive association (r = 0.68, p = 0.031). The results confirm that the observed increase over time is not attributable to random fluctuation but reflects a meaningful and statistically significant growth trend in scholarly output.

Beyond temporal trends, it is equally important to examine the geographic distribution of the reviewed studies. As shown in Figure 3, research activity is predominantly concentrated in Asia (n = 16), North America (n = 13), and Europe (n = 9). Notably, the United States and China emerge as the most frequently represented countries, together accounting for more than half of the total study corpus. In contrast, contributions from other regions including the Middle East, Africa, and polar environments are present but comparatively limited, highlighting an uneven global research landscape.

In addition to geographic patterns, the reviewed studies also differ in terms of the marine environments they address. An overview of the marine environment types examined in the literature is provided in Table 2 in which coastal environments dominate, representing 55.0% of the included studies, followed by coastal–estuarine systems (15.0%) and coastal aquaculture environments (12.5%). By contrast, estuarine-only (7.5%), freshwater-influenced (5.0%), and open-sea environments (5.0%) are considerably less represented. Taken together, this distribution highlights a strong research emphasis on near-shore and transitional marine systems.

To further contextualize these findings, the relationship between study region and marine environment type was examined. As illustrated in Figure 4, coastal environments constitute the dominant focus across Asia, Europe, and North America, accounting for more than half of the studies within each region. Moreover, coastal–estuarine systems are most prevalent in Europe and Asia, while estuarine-only studies appear exclusively in North America. It is also noteworthy that coastal aquaculture modeling is primarily associated with European studies and represents the sole focus of the single African contribution. Across all regions, freshwater and open-sea environments remain sparsely represented.

Collectively, these findings demonstrate that the existing literature on simulation and predictive environmental modeling for marine forecasting is both temporally expanding and geographically concentrated, with a strong emphasis on coastal and near-shore environments and comparatively limited coverage of offshore systems and underrepresented regions. Building on this overview of the temporal, geographic, and environmental characteristics of the reviewed studies, the following section shifts focus to the modeling approaches employed in marine forecasting research.

3.2. Modeling Approaches Used in Marine Forecasting

At a high level, the reviewed studies can be classified into three main modelling approach types: simulation-based, predictive (data-driven), and hybrid approaches.

As summarized in Table 3, hybrid approaches represent the largest share of the literature, accounting for 52.5% (n = 21) of the included studies. Predictive or data-driven models follow closely, comprising 45.0% (n = 18) of studies, while purely simulation-based approaches are comparatively rare, representing only 2.5% (n = 1) of the total amount.

In addition to this primary classification, the reviewed literature was further examined based on the model categories and algorithms employed, as several studies implement multiple model types within a single methodological framework. As presented in Table 4. Prevalence of model categories across the included studies., predictive and data-driven model categories are the most prevalent, appearing in 57.5% (n = 23) of studies, followed by hybrid model categories (27.5%, n = 11) and simulation-based model categories (15.0%, n = 6). This complementary perspective highlights the diversity of modelling components used across studies, irrespective of their dominant classification.

A wide range of predictive modelling techniques is reported across the reviewed studies. Neural network-based approaches, including ANNs, LSTMs, CNNs, transformer architectures, and attention-based models, are the most frequently used (n = 10). Tree-based methods, such as Random Forest, XGBoost, GBDT, and quantile random forests, are also common (n = 7), followed by regression-based and empirical approaches (n = 6), including SVR, ARIMA, MARS, optical chlorophyll-algorithms (e.g., OC3), and Gaussian processes.

Across the included studies, simulation-based components (Table 4) are methodologically varied, including hydrodynamic models (n = 2), coupled hydrodynamic–biogeochemical models (n = 2), ecological or process-based models (n = 1), and Lagrangian particle tracking frameworks (n = 1) primarily used to represent transport processes, ecosystem dynamics, and tracer movement.

Hybrid approaches explicitly combine simulation-based and predictive components. The most frequently reported hybrid configurations include coupled physical–machine learning models (n = 4) and simulation-informed machine learning approaches (n = 4), alongside sensor-integrated machine learning frameworks that incorporate observational data streams (n = 3). Overall, these findings summarize the prevalence of predictive, simulation-based, and hybrid methods and the diversity of coupling strategies employed in contemporary marine forecasting research.

3.3. Forecast Targets, Data Inputs, and Spatio-Temporal Scope

Having established the dominant modeling approaches used in marine forecasting, this section shifts focus to what these models are designed to predict and the data contexts within which they operate. In particular, the reviewed studies are examined in terms of their primary forecast targets, input variables, and the spatial and temporal scope of their applications.

With respect to forecast targets, each study was classified according to its main predictive focus. As summarized in Table 5, harmful algal blooms (HABs) constitute the most common single target, accounting for 35.0% (n = 14) of the reviewed studies. Chlorophyll-a–only forecasts represent 25.0% (n = 10) of the literature, while combined HABs and chlorophyll-a or biomass predictions account for an additional 30.0% (n = 12). Other forecast targets, including broader water quality states, shellfish toxicity, and transport-related variables, are comparatively limited (10.0%, n = 4). Considered as a whole, these results indicate a strong emphasis on biological and bio-optical indicators of marine ecosystem dynamics.

In addition to forecast targets, the reviewed studies differ substantially in the environmental variables used as model inputs. An overview of the forecast variables employed is provided in Table 6. Chlorophyll-a or optical proxies are the most frequently used inputs, appearing in 80.0% of studies, followed by temperature variables (65.0%) and nutrient concentrations (55.0%). Meteorological forcing, including wind-related variables, is incorporated in 52.5% of studies, while salinity (40.0%) and hydrodynamic variables such as currents (35.0%) are also commonly included. Remote sensing spectral bands and indices (32.5%), dissolved oxygen and water chemistry variables (30.0%), and turbidity or light-related measures (27.5%) further contribute to model inputs. Less frequently used variables include river discharge (22.5%), toxins or species-specific cell counts (17.5%), and land-use or anthropogenic drivers (7.5%).

Beyond the choice of variables, the reviewed studies also vary in their temporal forecasting scope. As illustrated in Figure 5, short-term forecasts, ranging from hours to several days, dominate the literature, representing 67.5% (n = 27) of studies. Multi-day forecasting horizons are reported in 20.0% (n = 8) of cases, while seasonal forecasts are present in 17.5% (n = 7) of studies. A subset of studies explicitly reports multiple forecast horizons (10.0%), whereas 15.0% do not clearly specify a forecasting timescale. Overall, these patterns reflect a strong orientation toward near-term operational and early-warning applications.

Finally, the spatial scale of model deployment provides additional context for how marine forecasting approaches are applied across the reviewed literature. Coastal scale applications are the most prevalent, reported in 18 studies (45%), followed by local scale implementations in 13 studies (32.5%) and regional scale models, also present in 13 studies (32.5%). In addition, a subset of studies (10%) explicitly operates across multiple spatial scales, combining local and regional or coastal and regional perspectives. This distribution underscores both the adaptability of existing forecasting approaches and the continued dominance of nearshore and coastal environments as primary application domains for marine forecasting research.

Overall, the analysis of forecast targets, input variables, and spatio-temporal scope reveals a strong emphasis on biologically driven indicators and near-term forecasting applications. The predominance of chlorophyll-a and HAB-focused targets, combined with frequent reliance on temperature, nutrients, and meteorological forcing, reflects the central role of bio-optical and environmental drivers in current marine forecasting research. At the same time, the dominance of short term and coastal scale applications highlights a clear orientation toward localized monitoring and early warning use cases, while longer term and broader scale forecasting remains comparatively limited. Together, these patterns provide important context for understanding not only what marine forecasting models aim to predict, but also the operational constraints and design choices that shape their development. Building on this analysis, the following section examines the forecasting characteristics of the reviewed models and the extent to which they are integrated with observational systems and operational monitoring infrastructures.

3.4. Forecasting Characteristics and Operational System Integration

Considerable variation is observed in the operational characteristics of the reviewed marine forecasting models, particularly with respect to real-time data usage, system integration, and operational readiness. Slightly more than half of the reviewed studies (55.0%, n = 22) explicitly report incorporating real-time or near–real-time data streams, whereas 37.5% (n = 15) rely exclusively on non-real-time data sources, including historical observations, retrospective satellite records, or reanalysis products. A further 7.5% of studies (n = 3) do not clearly state whether real-time data are used.

In contrast, integration with sensors or environmental monitoring systems is reported more consistently. Most studies (87.5%, n = 35) incorporate data from in situ sensors, monitoring stations, or observational networks, while 12.5% (n = 5) do not report such integration. These inputs include fixed coastal stations, buoys, autonomous sensors, and structured field sampling campaigns, often combined with satellite-derived or historical datasets.

Operational status also varies across the reviewed literature. Experimental or research-oriented implementations account for 47.5% of studies (n = 19), near-operational implementations for 22.5% (n = 9), and fully operational systems for 12.5% (n = 5), while 17.5% (n = 7) do not clearly report operational status. Validation practices reflect this diversity: comparison with observations is reported in 55.0% of studies (n = 22), train–test or train–validation–test splits in 35.0% (n = 14), and cross-validation in 12.5% (n = 5). In addition, 12.5% of studies (n = 5) report validation metrics drawn from prior work without new validation, and 7.5% (n = 3) do not address validation explicitly.

Overall, most forecasting approaches remain offline or semi-integrated, typically implemented as standalone workflows executed in batch mode within research computing environments. Fully integrated systems combining continuous data streams, automated model execution, and persistent monitoring are reported in only a small subset of studies, although several near-operational implementations indicate movement toward greater integration through sensor-driven pipelines and semi-automated processing frameworks.

3.5. Reported Limitations of the Reviewed Studies

Despite substantial methodological progress, the reviewed studies consistently report a range of limitations that constrain data availability, validation robustness, system integration, and operational deployment. To support a clearer and more systematic analysis, the reported limitations were categorised into four overarching groups: data related, validation related, operational, and integration related constraints. Each study could contribute to more than one category, reflecting the frequent co-occurrence of limitations across various aspects of model development and deployment.

The relative prevalence of these limitation categories is summarised in Table 7.

Among the identified categories, data related limitations are the most prevalent, accounting for 77.5% of studies. These limitations encompass constraints associated with the availability, quality, resolution, continuity, and representativeness of both input and target datasets. Commonly reported issues include missing or irregular observations caused by cloud contamination, sensor downtime, fouling, or adverse weather conditions; limited temporal resolution that fails to capture rapid bloom dynamics; sparse spatial coverage or reliance on single site case studies; dependence on long, high quality historical records; and the absence of critical environmental drivers such as nutrients, salinity, hydrodynamics, or biological rate parameters.

Validation-related limitations concern how model performance is assessed and how transferable or reliable those assessments are across space, time, or conditions. These limitations include validation at a limited number of sites, strong class imbalance between bloom and non-bloom conditions, sensitivity of results to the choice of performance metrics, lack of uncertainty quantification, absence of cross-regional or out-of-sample testing, and reliance on qualitative or indirect validation approaches. Validation limitations are reported in 55.0% of studies (n = 22).

Operational limitations relate to the ability of models to function reliably in real-world forecasting or decision-support contexts. Reported issues include short effective forecast horizons, degradation of performance in multi-step predictions, sensitivity to sudden environmental changes, accumulation of errors over time, high computational or maintenance requirements, limited explainability of data-driven models, and difficulty transitioning from offline analysis to real-time or continuous systems. Operational limitations are identified in 52.5% of studies (n = 21).

Integration-related limitations refer to challenges in combining multiple data sources, models, or system components into coherent forecasting frameworks. These include weak coupling between physical and data-driven models, limited data assimilation, poor fusion of satellite, in situ, and numerical datasets, region-specific algorithms with limited transferability, and the absence of unified system architectures. Integration limitations are reported less frequently overall, appearing in 22.5% of studies (n = 9).

To further examine how these limitations vary across modelling paradigms, Table 8 presents the distribution of reported limitation categories by modelling approach. Data-related limitations are once again prevalent across all approaches, affecting 78.3% of predictive/data-driven studies, 81.8% of hybrid studies, and 66.7% of simulation-based studies. Validation limitations are most reported in predictive models (60.9%), followed by hybrid approaches (54.5%) and simulation-based models (33.3%). Operational limitations are particularly prominent in simulation-based (66.7%) and hybrid approaches (63.6%), compared to 43.5% of predictive models. Integration-related limitations show the strongest contrast, occurring in 45.5% of hybrid studies, but in only 13.0% of predictive and 16.7% of simulation-based studies.

Evidently, the reported limitations indicate that constraints related to data availability and quality, validation practices, and operational readiness are pervasive across the reviewed literature, whereas integration challenges are less frequently reported and are most strongly associated with hybrid modeling frameworks. Collectively, the results reveal consistent patterns in modeling approaches, data usage, operational context, and reported limitations. The following section discusses these patterns in relation to current research trends and methodological challenges in marine forecasting.

3.6. Study Quality Assessment

Based on the methodological quality rubric, the overall quality of the included studies was predominantly moderate. Of the 40 studies, 3 (7.5%) were classified as high quality (scores 8–10), 22 (55.0%) as moderate quality (scores 5–7), and 15 (37.5%) as low quality (scores 0–4). When stratified by primary modeling paradigm (Table 3), predictive/data-driven studies had the highest mean quality score (mean = 5.33, n = 18), followed by hybrid studies (mean = 4.57, n = 21). Only one simulation-only study was included (n = 1); therefore, its methodological quality is reported descriptively rather than summarized as a mean, and it received a total quality score of 7 (out of 10). The hybrid category includes both primary hybrid implementations and review or synthesis papers captured by the search strategy; to facilitate interpretation of methodological quality among primary modeling studies, a sensitivity analysis excluding review and synthesis articles was conducted.

In this sensitivity subset (n = 33), 3 studies (9.1%) were classified as high quality (8–10), 22 (66.7%) as moderate quality (5–7), and 8 (24.2%) as low quality (0–4). These results indicate that the predominance of moderate-quality evidence persists when focusing on implemented forecasting studies, while the proportion of low-quality scores decreases after excluding review and synthesis papers.

To assess whether the observed paradigm distribution is sensitive to study quality, the modeling-paradigm summary was repeated considering only moderate- to high-quality studies (scores 5–10). In this subset (n = 25), predictive/data-driven studies accounted for 48.0% (12/25), hybrid studies for 48.0% (12/25), and simulation-only studies for 4.0% (1/25), indicating that paradigm prevalence is sensitive to quality stratification.

4. Discussion

Section 4.1, Section 4.2 and Section 4.3 discuss the findings in relation to the stated research questions, with Section 4.1 focusing on RQ1 (current use of modelling approaches and forecasting practices) and Section 4.2 and Section 4.3 focusing on RQ2 (reported integration, validation, and operational limitations and their implications for deployment).

4.1. Modeling Paradigms and Forecasting Practices

In relation to RQ1, the results of this review indicate a clear dominance of predictive and hybrid modelling approaches in contemporary marine forecasting studies, alongside a comparatively limited use of purely simulation-based models. This distribution reflects broader methodological developments in environmental modelling, driven by the increasing availability of large environmental datasets and the growing capacity of machine learning methods to capture complex, nonlinear system behaviour. Machine learning models have demonstrated impressive performance in forecasting harmful algal blooms and related water quality indicators because they are well suited to learning relationships directly from data without requiring explicit representation of all underlying physical or biogeochemical processes [11,12,26]. This capability is particularly valuable in aquatic ecosystems, where algal dynamics are governed by interacting climatic, hydrological, biological, and chemical drivers that are difficult to parameterize comprehensively.

Advances in satellite remote sensing, sensor networks, and Internet of Things technologies have further accelerated the adoption of data-driven approaches by enabling access to high frequency and high-volume environmental observations [11]. At the same time, the results suggest that single model approaches often face limitations when data are sparse, irregular, or temporally coarse, particularly in environments characterized by rapid ecological change. In response to these challenges, hybrid modelling frameworks have emerged as a dominant paradigm. By combining complementary techniques such as machine learning with physical or statistical models, hybrid approaches can leverage both data-driven pattern recognition and domain knowledge, improving predictive performance and robustness [3,26]. The prominence of hybrid models in the reviewed literature therefore reflects a pragmatic effort to balance model flexibility with physical consistency in complex marine systems.

The strong focus on coastal environments observed in this review is similarly grounded in both practical and societal considerations. Coastal and estuarine systems are regions where harmful algal blooms exert the most immediate and severe impacts on human health, fisheries, aquaculture, and tourism, making them a priority for monitoring and forecasting efforts [27]. These environments are also subject to more extensive long term observation programs, which provide the data required to support predictive modelling [26]. In addition, the physical and biogeochemical complexity of coastal zones, driven by strong gradients, freshwater inputs, and human pressures, presents both a challenge and an opportunity for model development, encouraging focused methodological research in these settings [17,28].

The predominance of short-term forecasting horizons further reflects the dynamic and nonlinear nature of marine ecosystems. As demonstrated across the reviewed studies, forecast accuracy typically declines as prediction horizons extend, due to error accumulation, data noise, and sensitivity to rapidly changing environmental conditions [10,29]. Short term forecasts, ranging from hours to several days, are therefore favoured because they are better aligned with the temporal resolution of available data and are most relevant for early warning and management responses [3]. While longer term and seasonal forecasts are valuable for strategic planning, their reliability in complex coastal systems remains constrained by uncertainty in both data and model structure [10].

Finally, the emphasis on harmful algal blooms and chlorophyll-a as primary forecast targets reflects their vital importance in marine environmental management. Harmful algal blooms pose significant ecological, economic, and public health risks through toxin production, hypoxia, and ecosystem disruption, resulting in substantial global economic losses [1,11,30]. Chlorophyll-a is widely used as a proxy for phytoplankton biomass and serves as a key observable variable in both in situ and satellite-based monitoring systems [26]. Its relative accessibility and strong association with bloom dynamics make it a practical and informative target for predictive modelling, particularly in data-driven and hybrid frameworks. Consequently, the dominance of harmful algal bloom and chlorophyll-a forecasting in the literature reflects both their environmental significance and their suitability for current modelling capabilities.

The low prevalence of simulation-only studies in this review (2.5%) should be interpreted as a property of the forecasting-oriented corpus captured by our search strategy and inclusion criteria, rather than as evidence that process-based oceanographic modelling is inactive. In the broader modelling community, hydrodynamic and coupled biogeochemical models remain widely used for operational nowcasting and scenario analysis, and many are disseminated through agency reports, operational portals, or literature that does not foreground ‘forecasting’ terminology in titles and keywords. In addition, some studies that rely on numerical model outputs as drivers or constraints were classified here as hybrid, which shifts purely process-based approaches out of the ‘simulation-only’ category. We therefore interpret the 2.5% figure as underrepresentation within this specific review scope, and we avoid framing it as a definitive decline of simulation-based modelling.

4.2. Data, Validation, and Operational Constraints

Addressing RQ2, the results of this review reveal that limitations related to data availability, validation practices, and operational readiness are not evenly distributed, but instead cluster in systematic ways across modelling paradigms. Data related limitations are the most pervasive, reported in 77.5 percent of the reviewed studies, followed by validation limitations in 55.0 percent and operational constraints in 52.5 percent. These patterns reflect structural challenges inherent to marine and aquatic forecasting rather than shortcomings of individual modelling approaches.

The predominance of data related limitations is largely driven by persistent constraints in data availability, quality, and resolution. Many forecasting studies rely on observational datasets that lack sufficient temporal frequency to capture rapid ecological dynamics, particularly in systems where algal bloom development occurs over short time scales. In situ monitoring programs often operate at monthly or biweekly sampling intervals, which limits the ability of models to detect anomalies or respond to sudden environmental changes [10,17]. Spatial coverage is similarly constrained, with many studies based on single site or limited station networks, reducing generalizability across regions and ecosystem types. In addition, critical environmental drivers such as nutrient loading, salinity gradients, hydrodynamic forcing, and atmospheric deposition are frequently unavailable or inconsistently measured, further limiting model completeness [10,17].

Geographic concentration of studies also contributes to data related constraints. A substantial proportion of the reviewed literature focuses on coastal systems in the Northern Hemisphere, leaving tropical regions and other aquatic environments comparatively underrepresented [10]. Even when satellite remote sensing data are available, technical limitations such as coarse spatial resolution, cloud contamination, and atmospheric correction uncertainties restrict their applicability, particularly in optically complex or small water bodies [10]. Together, these factors explain why data limitations consistently emerge as the dominant constraint across predictive, hybrid, and simulation-based modelling paradigms.

Beyond data availability, several limitations operate through identifiable mechanisms that directly affect forecast skill and deployment. Temporal sparsity and irregular sampling reduce the ability to learn precursors to rapid bloom onset, which typically degrades multi-step forecasts through error accumulation. Spatial sparsity increases site dependence, which can inflate within-site validation while reducing transferability to new locations. Satellite and in situ data also differ in how error enters models: satellite products are vulnerable to cloud-driven missingness and retrieval uncertainty, while in situ records are often sparse and operationally discontinuous. In addition, spatial and temporal resolution mismatch between point measurements, gridded satellite pixels, and model fields introduces representativeness error, which can lower achievable accuracy even when modelling choices are appropriate. Finally, missing biogeochemical and hydrodynamic drivers increases structural uncertainty by forcing models to rely on proxies, which can appear successful under restricted validation but fail under regime shifts, limiting operational robustness.

Validation limitations, reported in over half of the reviewed studies, reflect the growing gap between increasing model complexity and the rigor of evaluation practices. Advanced machine learning and hybrid architectures are designed to capture complex spatial and temporal patterns, yet their validation is often constrained by limited datasets and simplified evaluation strategies. Many studies rely on comparisons with observational data from a small number of sites or time periods, which may not adequately represent the full range of environmental variability. Class imbalance between bloom and non-bloom conditions further complicates validation, leading to performance metrics that can overstate predictive skill [3]. As a result, models that perform well during training may exhibit substantial degradation when evaluated on unseen or extreme conditions [12].

The reviewed literature also indicates that validation approaches frequently emphasize point estimates of accuracy rather than uncertainty aware evaluation. Limited use of cross validation, out of sample testing, or probabilistic performance metrics constrains the assessment of model robustness and transferability. In highly dynamic coastal environments, where sudden environmental shifts and nonlinear interactions are common, even sophisticated hybrid models such as convolutional and recurrent neural network combinations may fail to generalize when critical drivers are missing or poorly represented [3,12]. This mismatch between model sophistication and validation rigor helps explain why validation limitations persist despite methodological advances.

Across the reviewed studies, validation strategies can be grouped into a small number of recurring types: random train–test splits, temporal holdout testing using withheld periods or years, cross-validation (often not time-aware), and observation-based comparisons without a clearly separated forecasting test period. From an operational perspective, temporal holdouts and, where feasible, spatial holdouts provide stronger evidence of real-world forecasting performance than random splits in autocorrelated environmental time series, because they better test generalization under changing conditions.

Performance metrics also influence how model skill is interpreted. RMSE is sensitive to large errors and can be dominated by extremes, while MAE is less sensitive to outliers and can make performance appear more stable across events. R-squared can be inflated in stable regimes and can decrease sharply when variance is low or when models fail under regime shifts. For bloom warning tasks where events are rare, event-focused evaluation (for example precision–recall oriented reporting) is often more informative than aggregate error metrics alone, and uncertainty-aware reporting is important when outputs are used for risk-based operational decisions.

Integration related limitations, although less frequently reported overall, are most pronounced in hybrid modelling frameworks. Hybrid models seek to combine complementary strengths of physical, statistical, and machine learning approaches, yet effective integration requires both architectural coherence and comprehensive data inputs. In practice, many hybrid implementations apply component models sequentially rather than jointly, limiting their ability to capture feedback and interactions across system components [26]. Integration challenges are further exacerbated when key environmental variables such as dissolved oxygen, nutrient fluxes, or water column stratification are unavailable, reducing the effectiveness of coupled modelling strategies [12]. These constraints help explain why integration limitations are more strongly associated with hybrid approaches than with purely predictive or simulation-based models.

Operational limitations reflect the cumulative impact of data and validation constraints on real world deployment. Although many studies demonstrate promising predictive performance in controlled or retrospective settings, the transition to fully operational forecasting systems remains limited. Harmful algal bloom dynamics are influenced by interacting atmospheric, oceanographic, and biogeochemical processes that are difficult to observe and parameterize consistently, particularly under changing climate conditions [1,27]. Satellite observations are affected by discontinuous temporal coverage and environmental interference, while in situ monitoring systems often lack the spatial density required for regional scale forecasting [1].

Moreover, both statistical and process-based models face fundamental challenges when applied beyond the conditions represented in historical datasets. Statistical models lose reliability as forcing conditions diverge from past observations, while process-based models require extensive calibration and rely on biological processes that are often poorly defined or uncertain [27]. These challenges help explain why most forecasting systems remain experimental or near operational, with few studies reporting continuous, end user-oriented deployment.

Taken together, the clustering of data, validation, integration, and operational limitations reflects the inherent complexity of marine forecasting rather than a lack of methodological innovation. While advances in machine learning and hybrid modelling have expanded predictive capabilities, persistent constraints in data availability, evaluation practices, and system integration continue to shape the operational maturity of forecasting approaches. These findings underscore the need to align model development more closely with data infrastructure, validation rigor, and deployment requirements to support reliable and scalable marine forecasting systems.

4.3. Implications for Future Marine Forecasting Systems

Building on the findings related to RQ1 and RQ2, the strong reliance on historical and satellite data, limited real-time integration, short forecasting horizons, and weak system-level integration observed across the reviewed studies indicate that advances in modelling methodology must be accompanied by parallel progress in data infrastructure and system design.

One key implication concerns the need for next generation monitoring and forecasting systems that support continuous, high frequency data acquisition. Advances in sensor technology, including rapid toxin biosensors and emerging hyperspectral satellite missions such as PACE, offer new opportunities to improve phytoplankton discrimination and bloom detection [10,11]. These developments can enhance the volume and quality of observational data available for training and validating machine learning and deep learning models, which increasingly rely on dense and diverse data streams [31]. The integration of such sensors within real-time monitoring frameworks is essential for supporting early warning capabilities and improving short term forecast reliability.

Beyond individual sensors, the results point to the importance of integrated, multi scale observation systems. Combining satellite observations with data from smart buoys, autonomous platforms, and in situ monitoring networks can provide a more comprehensive representation of bloom dynamics across spatial and temporal scales [10]. Several studies also emphasize the potential value of incorporating molecular ecology and omics-based measurements to improve biological parameterization within numerical and hybrid models, particularly for representing species specific behaviors and life cycle processes [31]. These integrated data architectures are a prerequisite for moving from standalone analytical workflows toward cohesive forecasting systems.

The limited number of fully operational models identified in this review further underscores the need to address barriers to operational readiness. High computational demands remain a major constraint for fine resolution hydrodynamic and coupled models, often requiring access to high performance computing resources that limit continuous deployment [32]. In addition, shortcomings in biological model components, including simplified representations of cell mortality and bloom termination processes, reduce the realism of long-term simulations [32]. Improvements in physical process representation, such as wave wind interactions and vertical mixing in stratified coastal waters, are also necessary to enhance short term forecast performance under rapidly changing conditions.

From a methodological perspective, the reviewed literature indicates an increasing emphasis on integrated modelling paradigms that combine data-driven and process-based methods. Statistical models are increasingly recognized as limited for long term projections, particularly when environmental conditions deviate from the historical record [27]. Process based models, while more demanding in terms of data and calibration, offer greater potential for extrapolation under climate change scenarios by explicitly representing physical and biological mechanisms [27]. At the same time, advances in deep learning architectures, including transformer based and foundation models, provide new opportunities to capture complex temporal dependencies and nonlinear interactions when sufficient data are available [31].

The findings also point to the importance of embracing ensemble and uncertainty aware modelling strategies. Ensemble approaches can help quantify uncertainty arising from data limitations, model structure, and external forcing, thereby improving confidence in forecast outputs and supporting decision making under uncertainty [27]. Feature attribution techniques and interpretable machine learning methods further offer pathways to enhance transparency and support the integration of predictive models into management contexts.

Overall, the implications of this review suggest that future progress in marine forecasting will depend less on isolated algorithmic advances and more on the coordinated development of data rich, integrated, and operationally oriented systems. Aligning model design with observational capacity, validation rigor, and deployment requirements will be critical for translating methodological innovation into reliable forecasting tools capable of supporting effective marine environmental management.

5. Conclusions

This systematic review examined the current state of simulation-based and predictive environmental modelling approaches used for marine forecasting of water quality and harmful algal bloom phenomena. By synthesizing forty peer-reviewed studies published between 2015 and 2025, the review provides a structured assessment of how contemporary forecasting models are designed, validated, and deployed, with particular attention to forecast targets, data inputs, spatial and temporal scope, and reported limitations. In doing so, it addresses a critical gap in the literature by focusing explicitly on forecasting-oriented modelling and the barriers that constrain real-time and autonomous operational use. Across the reviewed studies, predictive and hybrid approaches are the most prevalent modelling paradigms, whereas purely simulation-based approaches are comparatively less common.

The findings indicate a predominance of predictive and hybrid approaches in the reviewed forecasting-oriented literature, driven by advances in machine learning, increased data availability, and the need to capture complex nonlinear dynamics in marine ecosystems. Forecasting efforts are predominantly concentrated on harmful algal blooms and chlorophyll-a, reflecting their ecological significance and relevance to environmental management. These models are most often applied at coastal and local scales and are designed for short-term forecasting horizons, aligning with early warning and monitoring use cases rather than long-term projection. Despite methodological sophistication, purely simulation-based approaches are comparatively rare, and fully operational forecasting systems remain limited.

Across modelling paradigms, persistent constraints related to data availability and quality, validation rigor, system integration, and operational readiness continue to shape the maturity of marine forecasting research. Data limitations remain the most pervasive challenge, followed closely by validation and operational constraints, while integration challenges are reported less frequently but are most pronounced in hybrid frameworks. Together, these limitations highlight a recurring gap between model development and deployment, where advances in algorithmic complexity are not always matched by corresponding progress in data infrastructure, evaluation practices, and system-level integration.

The implications of this review underscore the need for a more holistic approach to marine forecasting system development. Progress toward reliable and scalable forecasting will depend not only on methodological innovation, but also on coordinated investments in high-frequency monitoring, multi-source data integration, uncertainty-aware validation, and operational system design. Aligning predictive models with observational capacity and decision-making contexts is essential if forecasting tools are to move beyond experimental applications and support real-world environmental management, particularly under changing climate conditions [27].

This review is subject to certain limitations. The analysis is restricted to peer-reviewed journal articles published in English and indexed in selected academic databases, which may exclude relevant gray literature or operational systems documented outside traditional scholarly outlets. In addition, the synthesis relies on author-reported limitations and evaluation practices, which may vary in depth and transparency across studies. Nevertheless, by consolidating dispersed evidence and systematically characterizing recurring challenges, this review provides a clear and comprehensive foundation for advancing marine forecasting research.

Ultimately, the transition from sophisticated models to dependable forecasting systems will be defined not by any single algorithm, but by the ability to integrate data, models, and operations into coherent, adaptive frameworks that anticipate change rather than react to it, and that transform observation into foresight in service of resilient marine environments.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/jmse14050493/s1, Table S1: PRISMA 2020 Checklist.

Author Contributions

Conceptualization, A.S. and A.K.; methodology, A.S. and A.K.; validation, A.S. and A.K.; formal analysis, A.S.; investigation, A.S.; resources, A.K.; data curation, A.S.; writing—original draft preparation, A.S.; writing—review and editing, A.S. and A.K.; visualization, A.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
ANN	Artificial Neural Network
ARIMA	Autoregressive Integrated Moving Average
ASL	Advanced Search List
BMJ	British Medical Journal
Chl-a	Chlorophyll-a
CNN	Convolutional Neural Network
DL	Deep Learning
GBDT	Gradient Boosting Decision Trees
HABs	Harmful Algal Blooms
HPC	High Performance Computing
JMSE	Journal of Marine Science and Engineering
KS	Kolmogorov–Smirnov test
LSTM	Long Short-Term Memory
MAE	Mean Absolute Error
MARS	Marine Autonomous Risk System
MAPE	Mean Absolute Percentage Error
MCC	Matthews Correlation Coefficient
MSE	Mean Squared Error
NIR	Near-Infrared
NRMSE	Normalized Root Mean Square Error
OOB	Out of Bag
PACE	Plankton, Aerosol, Cloud, ocean Ecosystem mission
PR	Precision Recall
PR-AUC	Area Under the Precision Recall Curve
PRISMA	Preferred Reporting Items for Systematic Reviews and Meta-Analyses
R²	Coefficient of Determination
RF	Random Forest
RMSE	Root Mean Square Error
ROC-AUC	Area Under the Receiver Operating Characteristic Curve
SARIMA	Seasonal Autoregressive Integrated Moving Average
SLR	Systematic Literature Review
SVM	Support Vector Machine
SVR	Support Vector Regression
VIIRS	Visible Infrared Imaging Radiometer Suite
XGBoost	Extreme Gradient Boosting

References

Wen, J.; Yang, J.; Li, Y.; Gao, L. Harmful algal bloom warning based on machine learning in maritime site monitoring. Knowl.-Based Syst. 2022, 245, 108569. [Google Scholar] [CrossRef]
Yan, Z.; Kamanmalek, S.; Alamdari, N. Predicting coastal harmful algal blooms using integrated data-driven analysis of environmental factors. Sci. Total Environ. 2024, 912, 169253. [Google Scholar] [CrossRef] [PubMed]
Kalhoro, M.A.; Chinta, V.; Tahir, M.; Sanaullah, S.; Baloch, A.; Mehmood, T.; Bashir, S.; Liang, Z.; Song, J. Machine learning-based prediction and forecasting of chlorophyll-a in the northern Indian Ocean using satellite data. Ecol. Inform. 2025, 92, 103482. [Google Scholar] [CrossRef]
Guan, W.; Bao, M.; Lou, X.; Zhou, Z.; Yin, K. Monitoring, modeling and projection of harmful algal blooms in China. Harmful Algae 2022, 111, 102164. [Google Scholar] [CrossRef]
Zahir, M.; Su, Y.; Shahzad, M.I.; Ayub, G.; Rahman, S.U.; Ijaz, J. A review on monitoring, forecasting, and early warning of harmful algal bloom. Aquaculture 2024, 593, 741351. [Google Scholar] [CrossRef]
He, X.; Shi, S.; Geng, X.; Xu, L.; Zhang, X. Spatial-temporal attention network for multistep-ahead forecasting of chlorophyll. Appl. Intell. 2021, 51, 4381–4393. [Google Scholar] [CrossRef]
Kruk, M. Prediction of environmental factors responsible for chlorophyll-a-induced hypereutrophy using explainable machine learning. Ecol. Inform. 2023, 75, 102005. [Google Scholar] [CrossRef]
Karbassi, A.; Abdollahzadeh, E.M.; Attaran-Fariman, G.; Nazariha, M.; Mazaheri-Assadi, M. Predicting the distribution of harmful algal bloom (HAB) in the coastal area of Oman Sea. Nat. Environ. Pollut. Technol. 2017, 16, 753–764. [Google Scholar]
Yu, P.; Gao, R.; Zhang, D.; Liu, Z.-P. Predicting coastal algal blooms with environmental factors by machine learning methods. Ecol. Indic. 2021, 123, 107334. [Google Scholar] [CrossRef]
Caballero, C.B.; Martins, V.S.; Paulino, R.S.; Butler, E.; Sparks, E.; Lima, T.M.; Novo, E.M.L.M. The need for advancing algal bloom forecasting using remote sensing and modeling: Progress and future directions. Ecol. Indic. 2025, 172, 113244. [Google Scholar] [CrossRef]
Busari, I.; Sahoo, D.; Harmel, R.D.; Haggard, B.E. A review of machine learning models for harmful algal bloom monitoring in freshwater systems. J. Nat. Resour. Agric. Ecosyst. 2023, 1, 63–76. [Google Scholar] [CrossRef]
Ding, W.; Li, C. Algal blooms forecasting with hybrid deep learning models from satellite data in the Zhoushan fishery. Ecol. Inform. 2024, 82, 102664. [Google Scholar] [CrossRef]
Ye, W.; Zhang, F.; Du, Z. Machine learning in extreme value analysis: An approach to detecting harmful algal blooms with long-term multisource satellite data. Remote Sens. 2022, 14, 3918. [Google Scholar] [CrossRef]
Chang, W.; Li, X.; Chaudhary, V.; Dong, H.; Zhao, Z.; Nguyen, T.G. Prediction of chlorophyll-a data based on triple-stage attention recurrent neural network. IET Commun. 2025, 19, e12542. [Google Scholar] [CrossRef]
Zeng, C.; Xu, H.; Fischer, A.M. Chlorophyll-a estimation around the Antarctica Peninsula using satellite algorithms: Hints from field water leaving reflectance. Sensors 2016, 16, 2075. [Google Scholar] [CrossRef]
Yu, X.; Shen, J.; Zheng, G.; Du, J. Chlorophyll-a in Chesapeake Bay based on VIIRS satellite data: Spatiotemporal variability and prediction with machine learning. Ocean Model. 2022, 180, 102119. [Google Scholar] [CrossRef]
Yu, X.; Shen, J. A data-driven approach to simulate the spatiotemporal variations of chlorophyll-a in Chesapeake Bay. Ocean Model. 2021, 159, 101748. [Google Scholar] [CrossRef]
Molares-Ulloa, A.; Rocruz, E.; Rivero, D.; Padín, X.A.; Nolasco, R.; Dubert, J.; Fernandez-Blanco, E. Towards improved harmful algal bloom forecasts: A comparison of symbolic regression with DoME and stream learning performance. Comput. Electron. Agric. 2025, 233, 110112. [Google Scholar] [CrossRef]
Rostam, N.A.P.; Ahamed Hassain Malim, N.H.A.H.; Abdullah, R.; Ahmad, A.L.; Ooi, B.S.; Derek, D.J.C. A complete proposed framework for coastal water quality monitoring system with algae predictive model. IEEE Access 2021, 9, 108249–108265. [Google Scholar] [CrossRef]
Ajmal, T.; Mohammed, F.; Goodchild, M.S.; Sudarsanan, J.; Halse, S. Mitigating the impact of harmful algal blooms on aquaculture using technological interventions: Case study on a South African farm. Sustainability 2024, 16, 3650. [Google Scholar] [CrossRef]
Shahmiri, A.; Seyed-Djawadi, M.H.; Siadatmousavi, S.M. AI-driven forecasting of harmful algal blooms in Persian Gulf and Gulf of Oman using remote sensing. Environ. Model. Softw. 2025, 185, 106311. [Google Scholar] [CrossRef]
Xie, M.; Li, Y.; Liu, Z.; Gou, T. Prediction of red tide outbreaks using time-series hyperspectral observations: Implications on the optimal prediction model and spectral index. Acta Oceanol. Sin. 2025, 44, 177–186. [Google Scholar] [CrossRef]
Kitchenham, B.; Charters, S. Guidelines for Performing Systematic Literature Reviews in Software Engineering; EBSE Technical Report EBSE-2007-01; Keele University: Staffordshire, UK; University of Durham: Durham, UK, 2007. [Google Scholar]
Xiao, Y.; Watson, M. Guidance on conducting a systematic literature review. J. Plan. Educ. Res. 2019, 39, 93–112. [Google Scholar] [CrossRef]
Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021, 372, n71. [Google Scholar] [CrossRef]
Yan, Z.; Alamdari, N. Integrating temporal decomposition and data-driven approaches for predicting coastal harmful algal blooms. J. Environ. Manag. 2024, 364, 121463. [Google Scholar] [CrossRef]
Ralston, D.K.; Moore, S.K. Modeling harmful algal blooms in a changing climate. Harmful Algae 2020, 91, 101729. [Google Scholar] [CrossRef]
Pinto, L.; Mateus, M.; Silva, A. Modeling the transport pathways of harmful algal blooms in the Iberian coast. Harmful Algae 2016, 53, 8–16. [Google Scholar] [CrossRef]
Sun, X.; Yan, D.; Wu, S.; Chen, Y.; Qi, J.; Du, Z. Enhanced forecasting of chlorophyll-a concentration in coastal waters through integration of Fourier analysis and Transformer networks. Water Res. 2024, 263, 122160. [Google Scholar] [CrossRef]
Szewczyk, T.M.; Aleynik, D.; Davidson, K. Ensemble models improve near-term forecasts of harmful algal bloom and biotoxin risk. Harmful Algae 2025, 142, 102781. [Google Scholar] [CrossRef]
Wang, Y.; Xu, C.; Lin, Q.; Xiao, W.; Huang, B.; Lu, W.; Chen, N.; Chen, J. Modeling of algal blooms: Advances, applications and prospects. Ocean Coast. Manag. 2024, 255, 107250. [Google Scholar] [CrossRef]
Aleynik, D.; Dale, A.C.; Porter, M.; Davidson, K. A high resolution hydrodynamic model system suitable for novel harmful algal bloom modelling in areas of complex coastline and topography. Harmful Algae 2016, 53, 102–117. [Google Scholar] [CrossRef]

Figure 1. PRISMA Diagram (by the Author). Template from [25]. * This includes records removed for language (129), year (2805), and peer-reviewed criteria (3564). ** This includes records excluded based on relevance to research objectives.

Figure 2. Publication Year Distribution (by the Author).

Figure 3. Geographic Heatmap of Study Locations (by the Author).

Figure 4. Distribution of marine environment types across study regions (by the Author).

Figure 5. Distribution of forecast horizons across the included studies (by the Author).

Table 1. Inclusion and exclusion criteria applied in the systematic review (by the Author). The inclusion and exclusion criteria were applied consistently throughout the screening process to ensure objectivity and methodological rigor.

Inclusion Criteria	Exclusion Criteria
Peer-reviewed journal articles indexed in Scopus, ProQuest, or EBSCOhost	Non-peer-reviewed sources (e.g., conference abstracts, theses, reports)
Direct relevance to simulation-based or predictive environmental modeling for marine forecasting	Publications released prior to 2015
Focus on water quality and/or harmful algal bloom phenomena	Studies written in languages other than English
Published between 2015 and 2025	Articles deemed irrelevant to the defined research questions
Written in English

Table 2. Distribution of marine environment types across the included studies (by the Author).

Marine Environment Type	Percentage (%) of Studies
Coastal	55.0
Coastal–Estuary	15.0
Coastal aquaculture	12.5
Estuary	7.5
Freshwater	5.0
Open Sea	5.0

Table 3. Distribution of modeling approach types across the included studies (by the Author).

Modeling Approach Type	Percentage (%) of Studies
Hybrid approaches	52.5
Predictive/data-driven models	45.0
Simulation-based models	2.5

Table 4. Prevalence of model categories across the included studies (by the Author).

Model Category	Percentage (%) of Studies
Predictive/data-driven models	57.5
Hybrid approaches	27.5
Simulation-based models	15

Table 5. Primary forecast targets across the included studies (by the Author).

Primary Forecast Target Category	Percentage (%) of Studies
HABs only	35.00
Chlorophyll-a only	25.00
HABs + Chlorophyll-a/biomass	30.00
Other (water quality, toxicity, transport)	10.00

Table 6. Input Variables Used in the Reviewed Studies (by the Author).

Forecast Variable Category	Percentage (%) of Studies
Chlorophyll-a/optical proxies	80.00
Temperature (water or air)	65.00
Nutrients (N, P, inorganic salts)	55.00
Wind/meteorological forcing	52.50
Salinity	40.00
Currents/hydrodynamics	35.00
Remote sensing spectral bands/indices	32.50
Dissolved oxygen/water chemistry	30.00
River discharge/freshwater flux	22.50
Turbidity/Secchi depth/light	27.50
Toxins/species-specific cell counts	17.50
Land use/anthropogenic drivers	7.50

Table 7. Frequency of reported limitation categories across the reviewed studies (by the Author).

Limitation Category	Percentage (%) of Studies
Data	77.5
Validation	55
Operational	52.5
Integration	22.5

Table 8. Distribution of reported limitation categories by modeling approach (by the Author).

Limitation Category	Predictive (%)	Hybrid (%)	Simulation (%)
Data	78.3	81.8	66.7
Validation	60.9	54.5	33.3
Operational	43.5	63.6	66.7
Integration	13	45.5	16.7

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Souri, A.; Kokkinaki, A. Simulation and Predictive Environmental Modeling for Marine Forecasting: A Systematic Review. J. Mar. Sci. Eng. 2026, 14, 493. https://doi.org/10.3390/jmse14050493

AMA Style

Souri A, Kokkinaki A. Simulation and Predictive Environmental Modeling for Marine Forecasting: A Systematic Review. Journal of Marine Science and Engineering. 2026; 14(5):493. https://doi.org/10.3390/jmse14050493

Chicago/Turabian Style

Souri, Annamaria, and Angelika Kokkinaki. 2026. "Simulation and Predictive Environmental Modeling for Marine Forecasting: A Systematic Review" Journal of Marine Science and Engineering 14, no. 5: 493. https://doi.org/10.3390/jmse14050493

APA Style

Souri, A., & Kokkinaki, A. (2026). Simulation and Predictive Environmental Modeling for Marine Forecasting: A Systematic Review. Journal of Marine Science and Engineering, 14(5), 493. https://doi.org/10.3390/jmse14050493

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Simulation and Predictive Environmental Modeling for Marine Forecasting: A Systematic Review

Abstract

1. Introduction

2. Methodology

2.1. Review Design and Protocol

2.2. Data Sources and Search Strategy

2.3. Inclusion and Exclusion Criteria

2.4. Study Selection Process

2.5. Data Extraction and Synthesis

2.6. Methodological Quality Assessment

3. Results

3.1. Overview of Included Studies

3.2. Modeling Approaches Used in Marine Forecasting

3.3. Forecast Targets, Data Inputs, and Spatio-Temporal Scope

3.4. Forecasting Characteristics and Operational System Integration

3.5. Reported Limitations of the Reviewed Studies

3.6. Study Quality Assessment

4. Discussion

4.1. Modeling Paradigms and Forecasting Practices

4.2. Data, Validation, and Operational Constraints

4.3. Implications for Future Marine Forecasting Systems

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI