1. Introduction
Dissolved oxygen (DO) is a fundamental indicator of aquatic ecosystem health, directly governing respiration and metabolic activities of organisms and regulating biogeochemical cycling processes in marine and freshwater environments. Dissolved oxygen plays a critical role in sustainable water resource management, as oxygen depletion can lead to hypoxia, biodiversity loss, and ecosystem degradation, especially in coastal and estuarine regions [
1,
2]. Accurate DO prediction is therefore of great importance for water quality monitoring, ecological risk assessment, early-warning systems, and adaptive resource management.
Traditional DO simulation approaches are mainly based on mechanistic models and numerical process-based frameworks, which explicitly describe oxygen transfer, biochemical reactions, phytoplankton growth, and hydrodynamic mixing. Despite their theoretical rigor, such models typically require high-resolution temporal observations, detailed parameter calibration, and comprehensive knowledge of local hydrodynamic and biogeochemical conditions. In real-world monitoring scenarios, however, marine sampling is often sparse, irregular, and incomplete due to sensor malfunction, harsh environmental conditions, and operational cost constraints, making the required temporal continuity and model inputs unavailable in many practical cases [
3,
4]. Existing approaches often rely on continuous temporal observations or purely data-driven learning paradigms, which not only limit their applicability under sparse observations but may also lead to predictions that violate known physical relationships. Consequently, developing reliable DO estimation methods under incomplete observational regimes remains a challenging yet highly practical problem.
Recent studies have increasingly emphasized data-driven and hybrid modeling approaches for dissolved oxygen (DO) prediction, aiming to improve predictive accuracy in complex and heterogeneous aquatic environments. For example, advanced machine learning and deep learning frameworks, including ensemble learning, transfer learning, and hybrid neural networks, have been widely adopted [
5]. In particular, the integration of multi-source environmental variables with optimization strategies has been shown to significantly enhance DO prediction performance across rivers, estuaries, and aquaculture systems [
6]. However, most of these approaches are inherently developed within a time-series prediction paradigm, relying on continuous temporal observations, historical sequences, or recurrent architectures (e.g., CNN-GRU and related sequence-based models), which fundamentally limits their applicability under sparse or irregular sampling conditions [
7]. Moreover, purely data-driven models often lack physical interpretability and may produce predictions that violate known physical constraints, particularly in data-scarce or heterogeneous environmental regimes [
8].
To address the limitations of purely data-driven approaches, recent studies have increasingly explored physics-informed neural networks (PINNs) for modeling water systems, demonstrating their capability to incorporate governing physical laws and improve prediction consistency under limited-data conditions [
9]. For instance, PINNs have been successfully applied to hydrodynamic and water quality processes, such as salinity transport and shallow water flow modeling, by embedding partial differential equations and boundary conditions into the learning framework [
10,
11]. Recent reviews further highlight that PINNs can enhance generalization and reduce data requirements while maintaining physical consistency, making them particularly suitable for environmental systems with incomplete observations [
11]. However, despite these advantages, most existing PINN-based approaches are inherently formulated within a spatiotemporal framework, requiring temporal derivatives, initial conditions, or continuous observations to solve governing equations. This strong dependence on temporal information limits their applicability in real-world scenarios characterized by sparse, irregular, or discontinuous sampling. Additionally, challenges related to model convergence, scalability, and the requirement for well-defined physical equations further constrain their practical deployment in complex environmental systems [
11].
To overcome the limitations of process-based modeling, machine learning (ML) and deep learning (DL) techniques have been increasingly applied to water quality prediction tasks, including DO forecasting and related variables such as biochemical oxygen demand, chlorophyll concentration, and nutrient loads [
12,
13,
14,
15]. Ensemble learning models, such as Random Forest (RF) [
16], Extreme Gradient Boosting (XGBoost) [
17], and Support Vector Regression (SVR) [
18], have demonstrated strong capability in nonlinear regression with heterogeneous environmental inputs [
19,
20]. Meanwhile, deep neural architectures including convolutional neural networks (CNNs) [
21], recurrent neural networks (RNNs) [
22], and long short-term memory networks (LSTMs) [
23] have been widely adopted for spatiotemporal modeling in hydrological and environmental systems [
8,
24]. More recently, Transformer-based models have shown promising performance for long-range dependency learning in environmental time-series forecasting [
25,
26].
However, despite the rapid progress of ML/DL-based water quality modeling, two critical limitations remain unresolved. First, most existing DO prediction frameworks are inherently time-dependent, relying on lagged variables, sequential inputs, or continuous monitoring records. Such requirements significantly restrict their applicability in real marine monitoring systems where observations are typically intermittent and affected by missing values [
2,
27]. Second, purely data-driven predictors often behave as black-box approximators and may produce physically implausible predictions that violate well-established environmental principles. For example, DO solubility generally decreases with increasing temperature, and DO profiles typically vary smoothly with depth due to stratification and mixing effects. Similarly, chlorophyll-a concentration may influence DO through photosynthetic oxygen production under productive conditions, although this relationship can vary substantially depending on respiration, eutrophication, stratification, and other ecological processes. These mechanisms imply that DO is jointly governed by multiple drivers and their coupled nonlinear interactions [
28,
29]. Without explicitly modeling factor interactions and enforcing physical consistency, ML/DL models may overfit observational noise, learn spurious correlations, and suffer from degraded generalization under distribution shifts, limiting their reliability in ecological monitoring.
To improve robustness and interpretability, physics-aware learning paradigms have been actively studied in recent years. Physics-informed neural networks (PINNs) and related physics-guided frameworks incorporate physical laws into the training objective or model structure, thereby reducing the effective hypothesis space and improving generalization [
30,
31,
32]. Such approaches have achieved notable success in fluid dynamics, reactive transport, climate modeling, and hydrological prediction tasks [
33,
34,
35]. In the water quality domain, several studies have attempted to integrate process knowledge into learning-based models to enhance reliability and physical plausibility [
36,
37]. Nevertheless, most existing physics-informed DO prediction studies still focus on temporal forecasting settings or assume access to high-frequency sequential data, leaving the challenging problem of non-temporal DO estimation from instantaneous environmental observations largely underexplored.
Motivated by realistic marine monitoring deployments where only snapshot measurements are available, this work investigates a non-temporal DO prediction setting where each observation is treated as an independent sample and DO concentration is estimated solely from contemporaneous environmental drivers. Under such a formulation, the model must learn physically meaningful nonlinear couplings among depth, salinity, temperature, and chlorophyll-a from limited feature dimensions, while simultaneously ensuring physical plausibility.
To address this gap, we propose a novel non-temporal dissolved oxygen prediction framework that estimates DO concentration directly from four key environmental factors, including depth, salinity, temperature, and chlorophyll-a concentration, without requiring temporal features or lagged inputs. Specifically, we develop a Factor-Interaction Neural Network (FINN), which explicitly decomposes the regression function into main-effect subnetworks and pairwise interaction subnetworks. This structured design enables FINN to capture nonlinear couplings between environmental drivers in a more interpretable manner than conventional multilayer perceptrons (MLPs) [
38] and is conceptually aligned with additive and interaction modeling principles such as generalized additive models (GAM) and neural additive models (NAM) [
39,
40]. However, unlike conventional additive formulations that assume independent variable contributions, FINN explicitly introduces structured pairwise interaction subnetworks to model nonlinear environmental couplings. Therefore, the proposed framework emphasizes interaction-aware representation learning rather than merely increasing neural network complexity. Building upon FINN, we further introduce a physics-informed extension termed PI-FINN, which incorporates differentiable physical regularization terms derived from oceanographic principles, including temperature-dependent oxygen solubility monotonicity, depth-wise smoothness priors, and chlorophyll-related oxygen production tendencies. These constraints are implemented through automatic differentiation, enabling end-to-end optimization under physical guidance. Although related ideas such as additive modeling and feature interaction learning have been explored in prior work, the proposed FINN and PI-FINN frameworks provide a structured and interpretable neural formulation specifically designed for non-temporal environmental prediction and have not been previously introduced in this form.
Beyond model construction, we emphasize that evaluating physical plausibility is equally important for reliable environmental prediction. Existing evaluation strategies often rely solely on prediction accuracy metrics (RMSE, MAE, and ), which cannot reveal whether the learned predictor violates fundamental physical relationships. Although several studies have explored monotonic violation counting as a physical inconsistency indicator, such binary metrics are often overly sensitive to noise and may impose unrealistic global monotonic assumptions across heterogeneous environmental regimes. In this paper, we propose a novel evaluation criterion named Region-Aware Soft Physics Consistency Violation Rate (RS-PCVR). RS-PCVR introduces a continuous relaxation to measure both violation frequency and severity, and activates constraints only within empirically reliable regimes. This design provides a stable and interpretable physical plausibility assessment and further enables a convergence-stability analysis framework for physics-informed learning, offering a rigorous evaluation protocol for physically guided DO prediction. By improving the reliability and physical consistency of DO prediction, this study contributes to sustainable environmental monitoring and informed decision-making.
The main contributions of this work can be summarized as follows:
We formulate DO estimation as a non-temporal regression task that predicts DO directly from depth, salinity, temperature, and chlorophyll-a, enabling modeling under sparse and discontinuous marine observations.
We propose FINN, a structured architecture that disentangles main effects and pairwise interaction effects among environmental drivers, improving nonlinear representational capability and interpretability.
We develop PI-FINN by embedding differentiable oceanographic priors into the training objective, thereby enforcing physically consistent DO responses through end-to-end optimization.
We introduce RS-PCVR, a region-aware soft physical violation metric that provides stable physical plausibility assessment and enables convergence-stability analysis for physics-informed learning.
Extensive experiments and ablation studies against multiple ML and DL baselines validate the effectiveness and practical value of the proposed framework.
4. Discussion
This study investigates dissolved oxygen (DO) prediction in marine environments under a non-temporal learning paradigm, where DO is estimated solely from four driving variables, namely depth, temperature, salinity, and chlorophyll concentration. Comprehensive experiments are conducted to compare conventional machine learning baselines (SVR, RF, XGBoost), data-driven neural models (MLP), and physics-inspired architectures (FINN and PI-FINN). In addition, PCVR and its enhanced variant RS-PCVR are introduced as physics-consistency evaluation metrics, and their convergence behaviors under different weight configurations are systematically analyzed, providing a new perspective for assessing model stability and physical reliability. The proposed framework further demonstrates practical potential for environmental decision-making by delivering physically consistent DO estimates under sparse observation conditions. Such capability is particularly valuable for identifying hypoxic regions, supporting adaptive monitoring strategies, and informing early warning systems.
From a sustainability perspective, the proposed framework provides a practical tool for improving the reliability of environmental monitoring systems under data-limited conditions. Physically consistent predictions of dissolved oxygen are essential for detecting hypoxia risks, supporting ecosystem protection, and enabling informed decision-making in water resource management. This contributes to sustainability by enhancing the scientific basis for environmental assessment and long-term ecological management.
4.1. Performance Gap Between Classical ML and Neural Physics-Inspired Models
The experimental results show that ensemble-based machine learning methods, particularly RF and XGBoost, achieve remarkably strong predictive accuracy, with RMSE values in the range of 0.16∼0.18 and exceeding 0.99. In contrast, the MLP baseline exhibits a substantially higher RMSE, while FINN improves upon MLP but still does not surpass RF/XGBoost in terms of pure numerical accuracy. Notably, PI-FINN does not consistently outperform FINN and, in several cases, even shows degraded RMSE and MAE performance.
This phenomenon can be attributed to both the characteristics of the dataset and the underlying modeling assumptions. Since the task explicitly excludes temporal dependencies, the problem reduces to a static regression mapping between environmental variables and DO. In such a setting, tree-based ensemble methods (RF/XGBoost) are well known to be highly competitive, as they can effectively capture complex nonlinear interactions, accommodate heteroscedasticity, and remain robust to outliers as well as feature-scale inconsistencies. Of course, the extremely high R2 values observed in tree-based models may indicate potential overfitting. In contrast, neural models typically require more careful hyperparameter tuning, feature normalization, and training stabilization, particularly when the architecture involves multiple interaction branches or physics-guided components.
Therefore, the superior performance of RF/XGBoost suggests that the dataset contains highly learnable nonlinear structures that can be effectively approximated through purely data-driven statistical fitting. This observation further indicates that, under the current experimental setting, the primary advantage of physics-inspired neural networks may not lie in marginal gains in RMSE, but rather in improved interpretability, training stability, and physical plausibility. In particular, restricting the model to second-order interactions enhances interpretability and structural transparency, but may simultaneously limit its capacity to capture higher-order nonlinear dependencies present in complex marine environments. The performance gap between MLP and FINN further indicates that the explicit interaction decomposition contributes substantially to the observed improvements. This suggests that FINN benefits not only from neural nonlinear approximation capacity but also from its structured interaction-aware design.
4.2. Why PI-FINN Does Not Always Outperform FINN
A critical finding of this study is that PI-FINN does not consistently guarantee superior performance compared to FINN, and its effectiveness is sensitive to training configurations. Several factors can account for this behavior.
First, PI-FINN introduces additional structural constraints that effectively reduce the hypothesis space. While such constraints may improve generalization when the embedded physical assumptions are well aligned with the underlying system, they can also introduce inductive bias if the assumed mechanisms are incomplete or partially inconsistent with the data distribution. In real marine environments, DO dynamics are governed by complex and highly coupled processes (e.g., mixing, photosynthesis, respiration, stratification, and biological consumption), many of which are not explicitly captured in a simplified physics-informed formulation. Consequently, the physics-guided branch may restrict model flexibility and limit its capacity to approximate highly nonlinear relationships, potentially leading to degraded predictive performance.
Second, PI-FINN involves more complex gradient propagation pathways due to the incorporation of physics-based regularization terms. Without appropriate feature standardization and carefully designed training strategies (e.g., staged optimization or weight scheduling), the learning process may become unstable. This is consistent with the empirical observation that PCVR and RS-PCVR curves often exhibit oscillatory or plateauing behavior, particularly under relatively large weight settings. Such instability indicates that PI-FINN is more sensitive to hyperparameters such as learning rate and parameter initialization, increasing the likelihood of convergence to suboptimal local minima.
Therefore, PI-FINN should primarily be viewed as a physically guided reliability-enhancement framework rather than a purely accuracy-oriented predictive model whose effectiveness depends on both the validity of the assumed physical mechanisms and the stability of the optimization process. In this context, the primary role of PI-FINN is to enforce physical consistency in model predictions rather than to directly maximize predictive accuracy. Accordingly, PI-FINN prioritizes physical consistency and robustness over purely predictive performance, especially in scenarios where physical plausibility is of primary concern.
4.3. Effects of Learning Rate on Metric Convergence Stability
Another important observation is that reducing the learning rate from to significantly improves the stability of PCVR/RS-PCVR curves across models and weight configurations. Under the smaller learning rate, the reported RMSE and values become more consistent across different settings, and the fluctuations of physical-consistency metrics are reduced.
This result indicates that PCVR and RS-PCVR are not only indicators of physical plausibility but also sensitive diagnostic tools reflecting training dynamics. Since PCVR/RS-PCVR are used purely as evaluation metrics rather than loss constraints, their convergence behavior is implicitly determined by the smoothness of prediction updates. A high learning rate causes larger parameter jumps and stronger oscillations in prediction distributions, which amplifies local physical inconsistency and increases metric variance. In contrast, a smaller learning rate leads to smoother parameter evolution and more stable physical-consistency trajectories, which is particularly beneficial for PI-FINN due to its complex multi-branch structure.
This finding also implies that training stability is a necessary prerequisite before evaluating whether physics-inspired architectures truly improve physical reliability.
4.4. Interpretation of Opposite Trends Between PCVR and RS-PCVR
The convergence analysis reveals a consistent pattern: PCVR tends to decrease rapidly in early epochs and then stabilizes, whereas RS-PCVR exhibits a gradual increasing trend with smoother convergence. This opposite behavior reflects the fundamental difference between the two metrics.
PCVR directly measures absolute physical violation frequency or magnitude. In the early training stage, the model quickly reduces large prediction errors and moves outputs into more feasible physical ranges, leading to an immediate decrease in violations. However, later-stage fine-tuning may focus on fitting small-scale nonlinearities and noise patterns, which can reintroduce local inconsistency and cause PCVR to plateau or oscillate.
RS-PCVR, in contrast, evaluates physical consistency from a normalized or relative perspective. Instead of focusing solely on whether the output violates an absolute constraint, it measures whether the model prediction preserves physically meaningful response patterns under varying temperature, depth, and chlorophyll regimes. Since DO is governed by stable response structures (e.g., temperature-dependent solubility decrease and stratification-induced oxygen depletion), the gradual improvement of RS-PCVR suggests that the model increasingly learns physically interpretable relationships beyond pure numerical fitting.
Therefore, RS-PCVR provides complementary information: PCVR emphasizes feasibility, while RS-PCVR emphasizes structural physical interpretability. This explains why RS-PCVR can be considered more suitable for convergence stability analysis and for evaluating whether a model truly aligns with physical mechanisms.
4.5. Sensitivity to Configurations and the Role of Driving Variables
The experiments further demonstrate that the weight configuration , which controls the relative contributions of temperature T, depth D, and chlorophyll C in the PCVR/RS-PCVR computation, has a measurable impact on both the resulting metric values and their convergence behavior. In general, assigning nonzero weights to multiple driving variables simultaneously tends to increase PCVR values, indicating that enforcing multi-factor physical consistency is more challenging than satisfying single-factor constraints. This observation is consistent with the fact that DO dynamics are governed by multiple interacting processes, and achieving consistency across all drivers requires the model to learn a more globally coherent mapping. In particular, dissolved oxygen profiles may exhibit abrupt vertical variations due to stratification effects, and the relationship between chlorophyll concentration and dissolved oxygen is inherently complex and not strictly monotonic across different environmental regimes.
Additionally, the results suggest that different models exhibit distinct sensitivities under identical configurations. FINN generally produces more stable PCVR/RS-PCVR trends compared to MLP, implying that its structured interaction design introduces an implicit regularization effect. In contrast, PI-FINN, while offering enhanced physical interpretability, demonstrates increased sensitivity under certain multi-driver combinations. This behavior supports the interpretation that physics-informed components introduce additional optimization complexity and require more careful tuning.
From an oceanographic perspective, this sensitivity analysis provides meaningful insights: it suggests that DO prediction models may respond differently under varying dominant environmental regimes, such as temperature-driven surface variability or depth-induced stratification. Therefore, analyzing PCVR/RS-PCVR across different configurations offers an interpretable framework for assessing model robustness and physical consistency under heterogeneous marine conditions.
4.6. Implications for Real-World DO Monitoring and Algorithmic Contribution
Although RF and XGBoost outperform deep models in pure RMSE and , the proposed evaluation framework demonstrates that accuracy alone is insufficient for reliable marine DO monitoring. In real applications such as hypoxia risk warning, marine ecosystem protection, and intelligent ocean observation systems, prediction reliability depends not only on numerical fit but also on whether the predicted DO values behave consistently with physical mechanisms.
The proposed RS-PCVR metric and the convergence stability analysis provide a practical solution to this issue. By evaluating physical consistency dynamically across training and across different driving-variable weight configurations, the framework enables a more comprehensive assessment of model reliability and interpretability. This is particularly valuable when deploying data-driven models in environmental monitoring systems, where unseen distribution shifts and noisy measurements are common.
Therefore, the key contribution of this work is not limited to proposing a physics-inspired model architecture, but rather establishing a new evaluation paradigm for marine DO prediction: combining conventional accuracy metrics with RS-PCVR-based physical consistency assessment and convergence stability analysis. This paradigm provides a robust and extensible methodology that can be generalized to other oceanographic variables and environmental prediction tasks.
4.7. Limitations and Future Work
The proposed framework is subject to several limitations, including the absence of uncertainty quantification, simplified physical assumptions, sensitivity to hyperparameter configurations, and limited validation across diverse settings. First, the current statistical formulation remains limited, and discrepancies may exist between the empirical modeling strategy and the underlying physical processes. In particular, the model implicitly assumes homoscedastic errors and does not explicitly account for non-Gaussian data distributions. Furthermore, another limitation of the current study is the absence of explicit uncertainty quantification. Although the proposed framework evaluates physical consistency and convergence stability, predictive uncertainty remains unexplored. Future work may incorporate bootstrap confidence intervals, Bayesian neural estimation, or ensemble-based uncertainty analysis to improve the reliability assessment of dissolved oxygen prediction under sparse observational conditions.
Second, the model does not explicitly incorporate temporal dynamics, which may restrict its applicability in environments characterized by strong temporal variability. By design, the proposed framework omits temporal dependencies, thereby limiting its ability to capture short-term environmental fluctuations such as storms, mixing events, and upwelling processes. These episodic phenomena can induce abrupt variations in dissolved oxygen that are not fully explained by instantaneous environmental variables alone. Consequently, the current model is better suited for capturing relatively stable environmental relationships rather than transient dynamics. Future work may explore hybrid modeling strategies that incorporate event-level or sparse temporal information without relying on fully continuous time-series data.
It should also be noted that the present study is conducted using a single regional marine dataset collected from Rongcheng Bay. Although the proposed framework demonstrates promising performance under sparse observational conditions, its generalization capability across different oceanographic regimes, climatic conditions, and monitoring systems remains to be further validated. Future work will therefore investigate cross-region transferability and domain adaptation under heterogeneous marine environments.
Therefore, the generalization capability of FINN and PI-FINN beyond the training region remains an open question. In addition, the model may exhibit sensitivity to hyperparameter configurations, especially under sparse data conditions, which can affect both convergence stability and predictive performance. These limitations should be carefully considered when deploying the model in real-world applications. Future work will further extend the framework to multi-regional datasets and investigate domain adaptation techniques to enhance model transferability across heterogeneous marine environments.
5. Conclusions
This paper proposes a non-temporal dissolved oxygen (DO) prediction framework that estimates DO concentration directly from environmental drivers without explicitly relying on time-series dependencies. To address the nonlinear and coupled relationships among depth, salinity, temperature, and chlorophyll-a concentration, we develop a Factor-Interaction Neural Network (FINN), which explicitly decomposes DO formation into individual factor contributions and structured pairwise interaction effects. Such an interaction-driven architecture enables the model to capture complex cross-factor dependencies while maintaining improved interpretability compared with conventional black-box neural regressors.
Building upon FINN, we further propose a physics-informed extension, namely PI-FINN, by incorporating oceanographic-consistent priors that reflect key physical mechanisms in DO variation, including temperature-related solubility behavior, depth-dependent stratification effects, and chlorophyll-associated biological oxygen production patterns. In addition to standard accuracy metrics, we introduce a physics-consistency evaluation protocol based on physically meaningful violation-rate measures and conduct a systematic convergence and stability analysis under different driver-weight configurations. This provides a practical diagnostic tool for assessing whether a predictive model exhibits physically plausible responses across heterogeneous marine regimes.
Extensive experiments and ablation studies demonstrate that explicitly modeling factor interactions improves DO prediction performance and training robustness. Moreover, the physics-informed design of PI-FINN contributes to enhanced physical plausibility and stability, particularly under noisy observations and varying environmental conditions. These results indicate that the proposed framework offers a promising approach for integrating data-driven modeling and physical consistency, contributing to sustainable environmental monitoring and management under data-limited conditions. Overall, the results highlight a fundamental trade-off between predictive accuracy and physical consistency: while purely data-driven models may achieve higher numerical accuracy, they are more prone to violating physical constraints, whereas PI-FINN explicitly prioritizes physical consistency and robustness over purely predictive performance. Consequently, the practical applicability of the proposed framework depends on the validity of the underlying physical assumptions and the target application requirements. Overall, PI-FINN mainly contributes to enhancing physical plausibility, robustness, and interpretability under sparse monitoring conditions, rather than consistently improving conventional prediction-error metrics. Of course, the general applicability of FINN and PI-FINN across different marine environments still requires further validation using multi-region datasets.
Future work will place greater emphasis on environmental interpretability and domain-specific analysis, aiming to further bridge the gap between data-driven modeling and physical understanding. Meanwhile, we focus on extending the proposed framework toward broader generalization by incorporating spatiotemporal context, multi-depth vertical profile structures, and uncertainty-aware prediction, thereby enabling risk-sensitive DO forecasting and real-time decision support in practical ocean monitoring systems.