Our physics-informed neural network (PINN) model integrates machine learning with physical oceanographic principles by embedding radiative transfer equations into the neural network loss function. The architecture employs a Multi-Layer Perceptron with 16-dimensional input processed through three hidden layers (64 nodes each) to predict salinity correction values. The fundamental innovation lies in incorporating L-band brightness temperature physics as a constraint mechanism—a forward radiative transfer model predicts brightness temperature from corrected salinity, creating a physics-based loss term alongside traditional data-fitting loss. This dual-constraint system enables the network to learn corrections that are both empirically accurate and physically plausible, addressing limitations of purely data-driven approaches that may violate electromagnetic wave propagation theory.
The process begins with the ANN, which takes seven geophysical parameters as input to generate an initial salinity prediction (). This prediction is then fed into two parallel loss calculations: (1) a direct comparison with measured in situ salinity () to produce the data loss (), and (2) a forward radiative transfer model, which accounts for factors like learnable wind speed factors A and B to produce a predicted brightness temperature (). This is then compared against SMAP observations to generate the physics loss (). The combined, weighted loss () drives the iterative optimization of the ANN. Moreover, because the internal physical relationships within the model remain consistent—satisfying Maxwell’s equations and the laws of radiative transfer—the model maintains its physical plausibility. This allows the forward model to be structured as a loss term, thereby embedding physical information into the neural network.
2.2.2. Construction of the PINN Model
The PINN model integrates radiative Transfer physics with neural network optimization through a forward physics model that calculates theoretical brightness temperature from salinity values, followed by a multi-component loss function that ensures both physical consistency and data accuracy. The derivation begins with the Radiative Transfer Model (RTM) from Meissner & Wentz [
23,
29], which combines the formulation of radiative transfer and systematic simplifications needed to accurately represent surface emissivity, including corrections for wind-induced roughness. This foundational framework progresses through the application of a Stokes vector to describe the combined polarization state of the received signal at the Top of Atmosphere (TOA), facilitating a detailed analysis of the signal’s polarization characteristics, while also transitioning the Stokes vector measured at the Earth’s surface to the TOI brightness temperature vector, accounting for specific gain characteristics of the satellite system and contributions from galactic radiation. The process culminates in the optimization of a multi-component loss function, ensuring that the model maintains both physical consistency and accuracy in its salinity predictions. In the equations that follow, the superscript
P indicates the polarization state of the respective elements.
- (1)
Radiative Transfer Model Formulation
Based on the Radiative Transfer Model (RTM) function from Meissner & Wentz [
23,
29], the general expression for the polarized radiative Transfer model at the Top of Atmosphere (TOA) is formulated in Equation (1):
where
represents the upwelling radiation from the ocean surface,
is atmospheric Transmittance,
is the total sea surface emissivity, and
is the sea surface brightness temperature.
is the downwelling sky radiation that is scattered off the ocean surface in the direction of the observation. At L-band frequencies, it is a very good approximation to write:
where
is the downwelling atmospheric radiation that is incident on the ocean surface,
is the Space radiation, which consists of cosmic background (2.73 K).
is the ocean surface reflectivity [
30].
- (2)
Emissivity Components and Corrections
The surface emissivity
can be simplified to comprise two components in Equation (3), the flat surface emissivity and wind-induced roughness correction.
The emissivity of the specular ocean surface
is by far the largest part. It depends on
f the frequency band, specifically the L-band for this detection,
θ the incidence angle,
T the sea surface temperature, and
S the sea surface salinity and is related to the complex dielectric constant of sea water ε by means of the Fresnel equations. Following Fresnel reflection theory, the flat surface emissivity
for polarization p at the SMAP satellite’s incidence angle of
is calculated as
The complex dielectric constant
in Equation (5) links electromagnetic properties to oceanographic parameters through the GW model of Zhou et al. [
21]:
where
and
are empirical coefficients, T is sea surface temperature in Celsius, and
is sea surface salinity in PSU.
To address the dominant error source in SSS bias correction, wind-induced surface roughness is incorporated through a learnable correction mechanism:
represents wind speed at 10 m height, where A and B are learnable parameters optimized for different latitudinal bands. The Pacific Ocean was divided into four latitudinal bands, each with unique wind speed parameters A and B. Using a PyTorch (version 2.5.1+cu121) implementation, the model minimized RMSE between the forward model-predicted brightness temperature and the SMAP satellite-observed TOA brightness temperature.
- (3)
Stokes Vector and Polarization Transformation
To describe the combined polarization state of the received signal at the TOA, we represent it using the Stokes vector
, which is defined as
where
represents the total intensity of the observed brightness temperature, which is derived from the sum of the vertical and horizontal polarized components (
,
). The
component indicates the difference between these two polarizations, providing insight into the polarization state of the radiation. In this formulation,
and
correspond to the third and fourth Stokes parameters, which characterize additional aspects of the polarized light.
The rotation angle
is the sum of the antenna polarization angle
and the Faraday rotation angle
, which accounts for the effects of the ionosphere. Here,
represents the total polarization rotation angle including the Faraday effect, while
denotes the geometric polarization rotation angle. The relation is expressed as
Additionally, the rotation matrix
is defined to model the transitions in polarization state, represented as
The Stokes parameters at the Top of Ionosphere (TOI) can be derived through Faraday rotation. The equations encapsulating this transformation are as follows:
Through these transformations, we account for how the polarization state is affected by both the antenna setup and the ionospheric conditions, particularly the effects introduced by Faraday rotation.
Subsequently, the polarized brightness temperatures at the TOI are obtained via an inverse Stokes transformation:
This transformation allows for the extraction of polarized brightness temperatures from the Stokes parameters, enabling a more nuanced understanding of the radiation characteristics at the TOI.
- (4)
Galactic Radiation Contribution
Furthermore, to transition the Stokes vector
measured at the Earth’s surface to the TOI brightness temperature vector
, by utilizing a transformation matrix
, which encapsulates the specific gain characteristics of the satellite system [
31]. This relationship is expressed as
where the elements of matrix A (4-byte real) correspond to the gain coefficients that adjust the Stokes parameters
I,
Q,
S3, and
S4.
Moreover, the brightness temperature predictions
are influenced by both Earth’s thermal emissions
, and aggregated space radiation terms
which are critical in [
32]. While the direct galactic contribution
and the reflected galactic component
, which can reach up to 5 K, complicate the measurement process. Accurately accounting for these elements is vital for discerning the true characteristics of the Earth’s radiation. This relationship is expressed as
- (5)
Loss Function
The PINN optimization leverages this physics model through a multi-component loss function that ensures both physical consistency and data accuracy. The physics-based constraint in Equation (11) ensures that predicted salinity values produce brightness temperatures consistent with satellite observations:
where
is calculated using the forward model described in Equations (3)–(10). The data-fitting component in Equation (12) directly optimizes salinity prediction accuracy:
The total loss function combines these objectives through variance-normalized weighting:
where
and
are the normalized brightness temperature loss and salinity loss, respectively. The coefficient
encapsulates the impact of brightness temperature on the overall loss. By reducing this brightness temperature loss, the model aims to extract elements that make predicted salinity values as close as possible to actual measurements. The brightness temperature physical consistency loss also ensures that the salinity predictions output by each hidden layer are constrained, so that salinity correction results not only follow data statistical patterns but also adhere to the physical processes of radiative transfer. This provides dual constraints on salinity correction, making the predicted salinity more accurate than measured salinity while maintaining its physical relationship with brightness temperature. This is an advantage that traditional data-driven methods cannot offer.
2.2.3. Model Training Methodology
To facilitate a fair and dependable comparison, this study employs three well-established machine learning algorithms as baselines: Artificial Neural Network (ANN) [
33], Gradient Boosting Regression Trees (GBRT) [
34,
35,
36,
37], and Extreme Gradient Boosting (XGBoost) [
38,
39]. All models examined in this study, including the proposed PINN method and the baseline algorithms for comparison, rely on the same set of seven input parameters to enable a fair comparison. Each model adheres to standardized training protocols under uniform experimental conditions, highlighting reproducibility and comparability through consistent data preprocessing, validation strategies, and evaluation metrics.
The training framework uses a time-based data partitioning method to guarantee reliable performance evaluation and avoid data leakage. The full dataset contains 1,453,838 samples, which are divided along chronological lines: data from 2020 and earlier is used for the training phase (1,453,838 samples), data from 2021 serves as the validation set (81,606 samples), and data from 2022 is set aside as the independent testing period (79,214 samples), enabling an unbiased performance assessment. This sequential time-based division ensures that model development and hyperparameter tuning rely solely on historical training data, while the subsequent year’s validation data guides model optimization, and the future year’s data acts as a completely independent test set to evaluate how well the model generalizes over time and its preparedness for deployment in application situations.
Before the final training, we performed hyperparameter tuning for each model to enhance performance. For the PINN and ANN models, we defined a hyperparameter space suitable for grid search, including batch size, hidden layer size, and learning rate. Similarly, for tree-based methods, such as GBRT and XGB, comprehensive parameter tuning was conducted, optimizing the number of training estimators, maximum tree depth, subsampling ratio, and learning rate, employing cross-validation techniques to ensure the reliability of evaluations.
During this process, we initially conducted Bayesian optimization, performing a total of nine iterations. The results indicated that, for the neural network models, the most crucial hyperparameter was the learning rate, while for the tree models, the two most significant hyperparameters were the subsampling ratio and the learning rate. Building upon these findings, we conducted a 3 × 3 grid search to refine the hyperparameter settings, ensuring the identification of local optimal parameters. This process was complemented by an additional small-scale grid search, based on the results from Bayesian optimization, which allowed us to further fine-tune the hyperparameter configuration. Throughout the training process, an early stopping mechanism was employed to prevent overall overfitting of the model and to maintain the best generalization performance. Specifically, all models were set to stop training before reaching 100 epochs, thus establishing a Training Epochs limit of 100.
Table A1 presents the hyperparameter optimization process, while
Table 3 details the final training configurations for each model, highlighting the consistency of training parameters for the neural network and the optimized settings for tree-based methods. This systematic approach to hyperparameter adjustment and optimization not only ensures comparability of model performance but also establishes a solid foundation for subsequent analysis and interpretation of results.
The PINN model utilized the same training hyperparameters as the ANN model to assess the effectiveness of physical loss, thereby ensuring that the optimization process could achieve local model optimality and enhance the reliability of the results. The parameter in the PINN model is specifically designed to account for various polarization settings based on the correlation between brightness temperature and sea surface salinity (SSS). It is set to be proportional to the size of the brightness temperature values across different polarizations. This simplification ensures that the model effectively captures the relationship between these parameters, enhancing its ability to accurately predict outputs based on varying polarization conditions.
2.2.4. Evaluation Metrics
The evaluation of satellite salinity data primarily uses several statistical indicators, including bias, Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and coefficient of determination (R2).
Bias represents the systematic error between predicted data and in situ measurements, indicating whether the predicted data demonstrates consistent overestimation or underestimation. RMSE measures the overall error magnitude between predicted and in situ data. Unlike bias, RMSE captures random errors present in the data. MAE offers a robust measure of average prediction error that is less sensitive to outliers compared to RMSE. R2 signifies the proportion of variance in the in situ data explained by the predictions, with values closer to 1 indicating improved model performance.
For regional comparative analysis, we developed an improvement metric to quantify changes in the predictive capability of models across different grid points.
The methodology for calculating bias improvement includes:
- (1)
spatial aggregation—organizing all prediction points within each 0.5° grid cell for the year 2022;
- (2)
cumulative bias computation-calculating for original SMAP L2C bias and for algorithm-predicted bias within each grid cell;
- (3)
, yielding values ranging from −1 to +1, where positive values indicate successful bias reduction and negative values represent performance degradation relative to the original satellite bias correction.
Table 4 presents the corresponding algorithms for these evaluation metrics, where N represents the total number of samples,
and
refer to the predicted and in situ measured salinity values for the i sample, respectively, and
represents the mean value of all in situ measurements.