1. Introduction
Copernicus, the Earth observation component of the European Union’s space program, plays a vital role in monitoring the planet and its environment for the benefit of European citizens [
1]. As part of this initiative, the Sentinel-1 (S1) mission, operated by the European Space Agency (ESA), deploys satellites equipped with Synthetic Aperture Radar (SAR). These satellites operate in the C-band (central frequency 5.405 GHz) and support multiple acquisition modes. In particular, the WV mainly uses vertical–vertical (VV) polarization to observe open ocean conditions [
2]. Sentinel-1A was launched on 3 April 2014, followed by Sentinel-1B on 25 April 2016. Both satellites shared the same orbital plane with a 180° phase difference. Sentinel-1B stopped transmitting data on 23 December 2021, was officially decommissioned in 2023, and was replaced by Sentinel-1C, launched on 5 December 2024.
Data acquired in WV mode are routinely processed into L2 OCN products, which include directional ocean wave spectra generated using a quasi-linear inversion method [
3,
4]. This inversion provides valuable information on the distribution of wave energy across frequency and direction. SAR-based inversion is mainly effective for long ocean swells (wavelengths greater than 200 m) [
5,
6,
7,
8] and for wave propagation in the marginal ice zone [
9,
10].
Although integral parameters derived from SAR spectra can effectively characterize single-swell systems, they become less informative when multiple, overlapping wave systems are present (
Figure 1). In such cases, partitioning the ocean wave spectrum into distinct wave systems is essential for accurate analysis and quality control. Within the L2 OCN product, the wave spectrum is divided into up to five separate partitions [
3] using the “watershed” algorithm, originally introduced by [
11,
12]. Each partition is described by its corresponding integral parameters, including partition effective significant wave height (Hs), peak wavelength, and peak wave direction (
Figure 1).
A direct comparison between wave partitions derived from SAR observations and those produced by the WW3 model [
13] reveals notable discrepancies, particularly between the dominant and secondary swell systems. These differences are not random; instead, they exhibit systematic spatial patterns, with positive clustering of partitioned wavelengths observed along the azimuth direction and negative clustering along the range direction [
14,
15,
16].
Such discrepancies underscore the broader challenge of accurately retrieving ocean wave spectra from SAR data. At the core of this challenge lies the modeling of the Modulation Transfer Function (MTF), which governs the transformation from the observed SAR image to the true ocean wave field. Most existing retrieval methods rely on quasi-linear approximations to represent this mapping. However, these approaches remain affected by significant non-linearities that are not yet fully taken into account, limiting the fidelity of the retrieved wave parameters.
One of the primary sources of distortion is the Doppler shift induced by the radial component of the wave orbital motion, which modulates the SAR backscatter signal. As described in [
6], this results in either constructive or destructive velocity bunching effects: the well-organized orbital motion of the longest waves of the wave spectrum (swell) leads to deterministic misregistration in the azimuth direction and an apparent constructive redistribution of the backscatter intensity along the azimuth direction; on the other hand, the shortest waves lead to random misregistration in the azimuth direction, leading to possible significant degradation in the azimuth resolution and to distortions of the resulting SAR ocean image spectra (non-linear relationship between SAR image spectra and ocean wave spectra).
Destructive effects prevent one from directly recovering the sea surface profile and individual waves, but, indirectly, methods have been developed to retrieve information of the wave spectra, including the shortest part of the wave spectra (e.g., total Hs, as discussed by [
17]). The constructive effects distort the apparent wave spectrum by shifting energy away from its true frequency and direction, resulting in biased estimates of wave height and other integral parameters if left uncorrected.
Compared to the WW3 model, ref. [
18] reports Root Mean Square Errors (RMSEs) of 0.70 m for significant wave height (Hs), 0.9 s for mean wave period, and 30° for wave direction, highlighting the limitations of SAR-based wave spectrum retrieval in accurately capturing key oceanographic parameters.
To address these limitations, recent studies have explored machine learning techniques as post-processing tools. Rather than directly resolving the non-linearities inherent in SAR imaging, these approaches aim to correct integral parameters, such as total Hs, as discussed by [
17], and reduce uncertainties associated with MTF-induced distortions.
Building on this concept, our proposed method also operates as a post-processing approach, but with a distinct focus: it seeks to qualify the retrieved SAR wave spectrum partitions by evaluating their reliability and consistency. This is particularly important in complex sea states where multiple wave systems coexist and interact.
To achieve this, we applied data-driven machine learning techniques to define a prior quality flag (QF) for each of the five wave partitions derived from Sentinel-1 WV OCN products. The QF is determined by learning partition error parameters, including the relative error in the effective significant wave height (Equation (
1)), the relative error in the peak period (Equation (
2)), and the absolute error in the peak wave direction (Equation (
4)). This enables a more targeted and interpretable assessment of partition quality, which is essential to ensure the robustness of downstream oceanographic analyses and applications.
where
Building on previously defined error metrics, we implement the qualification framework using collocated SAR and WW3 partition pairs. Specifically, the effective significant wave height and peak period are expressed as relative errors (Equations (
1) and (
2)), while the peak wave direction is evaluated using an absolute error (Equation (
4)). This distinction ensures that each parameter is assessed in a way that reflects its physical characteristics and variability.
Moreover, the use of relative errors for certain parameters helps achieve a more balanced distribution of values across the QF classes. This adjustment mitigates the disproportionate influence of higher values on error estimation, thereby reducing bias and enabling a more fair and more representative evaluation of partition quality.
The primary contributions of this research are threefold: (1) the application of machine learning algorithms to improve the qualification of the Sentinel-1 ocean wave spectrum partitioning; (2) an interpretability analysis of the model’s predictions in relation to the geophysical inversion process underlying the SAR data; and (3) the establishment of a framework to support future research aimed at improving the retrieval of the ocean wave spectrum from SAR observations.
This manuscript is structured as follows:
Section 2 presents the dataset utilized in this study.
Section 3 outlines the complete machine learning methodology.
Section 4 details the results and model validation.
Section 5 provides an in-depth discussion of the findings, and
Section 6 concludes this paper with perspectives for future research.
3. Method Details
3.1. Quality Flag Definition
The Normalized Radar Cross-Section (NRCS) is influenced by several error sources, including geometric distortions, sensor-specific acquisition errors, and the presence of nongeophysical factors such as atmospheric influences and surface roughness at the air–sea interface. Additionally, the quasi-linear inversion process employed to derive ocean wave spectra from SAR data introduces further inaccuracies. The partitioning algorithm, which divides the ocean wave spectrum into distinct wave systems, is also subject to its own set of errors. To address these complexities, the first critical step in our algorithmic approach is to independently quantify the errors in , , and at the partition level with respect to the WW3 model data.
These errors are estimated using a supervised ML algorithm, which leverages a set of observables derived from SAR (that is, features) to predict discrepancies in partition wave parameters [
27]. The features used for the error estimation process are detailed in
Table 1. The calculation of some features is described in detail in [
3].
Regardless of the acquisition mode (WV1 or WV2), the same set of features is used to estimate the three primary errors. This requires the training of six separate models tailored to the S1A mission, each aimed at improving the accuracy of partition-level error quantification and ultimately refining the ocean wave parameter retrieval process.
The second step of the algorithm focuses on computing a combined error metric for each individual acquisition mode (i.e., WV1 and WV2). This combined error, denoted as
and defined in Equation (
6), is calculated by multiplying the three primary error estimates—
,
, and
—previously inferred using the trained models applied to the validation dataset, which consists of all S1A acquisitions from February 2025. The metric is expressed as follows:
The combined error is not designed to preserve the individual error magnitudes, but rather to serve as a synthetic indicator of partition quality. In general, the combined error related to such artificial SAR partitions is much higher than the one related to real SAR partitions for which the associated WW3 partition shows different integral parameters but of the same order magnitude. Moreover, discrepancies between a SAR partition and a WW3-modeled partition often affect all three integral parameters simultaneously. For example, if a SAR partition is split into two while WW3 models a single-wave partition, this will influence the partition parameters together.
Alternative tests using a weighted sum yielded similar results, but the multiplicative approach preserves the equal influence of each partition characteristic and provides a robust general formulation.
Since the ultimate goal is to assess the quality of ocean swell systems in a manner that is independent of the SAR acquisition mode, a unified combined error metric is produced by merging the mode-specific error estimates from both WV1 and WV2. This approach is motivated by the need for consistent and homogeneous error characterization across different SAR configurations. Since both WV1 and WV2 data are used interchangeably in operational wave monitoring and analysis, having a mode-agnostic quality metric ensures comparability between partitions retrieved under different acquisition conditions. By assembling the combined error from both modes, we enable a seamless interpretation of partition-level quality, facilitating the identification and classification of ocean swell systems without bias introduced by the acquisition geometry.
This reference error is then partitioned into five equally probable intervals using the q-quintile method, commonly referred to as Quintile. These intervals represent different levels of error severity, ranging from low to high. Once this reference error distribution is established, the calculated for each partition is matched to the corresponding quintile range. Based on this comparison, each partition is assigned a quality label: “very good,” “good,” “medium,” “low,” or “poor.” This classification serves as an indication of the reliability and accuracy of the partition, facilitating informed decision making in further analyses.
3.2. Machine Learning Dataset
In the context of algorithm development, we focused on the S1 L2 OCN WV dataset, which was acquired over two months: December 2024 and January 2025. These data were assigned to the training phase, with an 80–20% split for training and validation, respectively. This approach ensures that the model is rigorously tested on unseen data during training. Data from February 2025 were used exclusively for model evaluation, providing an independent test set to assess the model’s generalizing ability.
During the training process, no data normalization was performed, as the chosen machine learning algorithm is inherently robust to raw, unnormalized data, facilitating direct input handling without additional pre-processing steps and data transformation.
Our ML approach relies on previously defined error metrics. Relative errors are used to allow the model to take account of the scale of these parameters and to ensure a balanced representation across the QF classes. In contrast, the peak wave direction is evaluated using an absolute error, which is more appropriate given its angular nature and bounded range. Consequently, ML targets are defined as , , and .
This design focuses on the magnitude of the discrepancy between SAR and WW3, rather than its sign, and avoids depending on the absolute accuracy of the WW3 model, which may be limited in capturing certain geophysical conditions.
3.3. Machine Learning Modeling
In this study, we initially used a traditional Random Forest algorithm; however, it soon revealed limitations in terms of interpretability. Consequently, we shifted our focus to the eXtreme Gradient Boosting (XGBoost) supervised learning algorithm, which offered improved performance and explainability. XGBoost is a scalable distributed machine learning library based on gradient-boosted decision trees (GBDTs). A decision tree is a model that makes decisions by splitting data into branches based on simple conditions, forming a set of “true or false” statements. Since its debut in 2014, XGBoost has become one of the most widely adopted algorithms among data scientists and machine learning practitioners due to its high performance, particularly for structured data problems. The library is open-source [
28] and is able to efficiently train and test models on large datasets.
One of the primary reasons for selecting XGBoost in our study is its regularization capabilities. Regularization helps to control overfitting, where the model fits too closely to the training data, thereby reducing its ability to generalize to unseen data. XGBoost applies L1 and L2 penalties to the weights and biases of each tree, helping to maintain model simplicity and prevent overfitting. Additionally, XGBoost is highly optimized, with features that make it more memory-efficient, such as cache awareness, which is crucial when working with large datasets.
XGBoost also stands out for its ability to handle missing data, eliminating the need for imputation, and its ability to work with data in their raw, unnormalized state, simplifying the data processing workflow. These advantages make XGBoost an attractive choice for dealing with complex data problems.
To fine-tune the XGBoost model and achieve optimal performance, hyperparameter tuning was performed. Hyperparameters are critical settings that influence how the model learns, and selecting the right ones can significantly improve model accuracy. Given the large number of hyperparameters, finding the best combination manually is impractical. Therefore, we used the GridSearch technique [
29] to systematically explore different combinations of hyperparameters. In this process, multiple sets of hyperparameters are tested within a defined search space, and the performance of the model is evaluated on a validation dataset. Although this method is effective, it can be computationally expensive, particularly as the number of parameters and their possible values increases. A list of the hyperparameters most commonly tuned in XGBoost is provided in
Table 2, with additional parameters available in [
28].
Machine learning models must exhibit robustness, meaning that they should minimize the impact of outliers and prioritize the influence of typical data points. In tasks such as parameter estimation, using a robust loss function (e.g., absolute error) is often preferred over a non-robust one (e.g., squared error) because it is less sensitive to large deviations. Common loss functions in regression tasks are squared loss
and absolute loss
. Although squared loss is highly sensitive to outliers, making it less reliable in such cases, absolute loss is more resilient because it focuses on the order of the data rather than their absolute magnitude [
30].
To take advantage of both, the pseudo-Huber [
31,
32,
33] loss function is often used. This hybrid function blends the properties of both quadratic and absolute loss while maintaining the smooth differentiability required for optimization. It is mathematically expressed as follows:
where
is the threshold parameter that controls the transition between the quadratic and linear behavior of the loss.
For small values of
x, the pseudo-Huber loss behaves like a quadratic function,
while for large values of
x, it approximates the absolute loss,
In XGBoost, the model is optimized using gradient-based methods, such as gradient descent, to minimize the chosen loss function while incorporating regularization to prevent overfitting. The objective function of the model combines the loss function and a regularization term, as shown in Equation (
8):
Here,
l represents the loss function (for example, pseudo-Huber),
denotes the prediction from the
t-th tree, and
is the regularization term that helps avoid overfitting by penalizing overly complex models. The gradient descent process iteratively updates the model parameters to minimize this objective. Additional details on regularization can be found in [
27].
4. Results
4.1. Metric Definition
Model training was performed using an 8GB NVIDIA RTX A2000 graphics card. Due to XGBoost’s ability to leverage parallel processing, the grid search and model training time for each model was approximately three hours. The performance of XGBoost models in predicting errors in the partitioning parameters and controlling the final partitioning labeling was assessed using several evaluation metrics. These metrics include , Normalized Root Mean Squared Error (), standard deviation (), Median Absolute Error (), Scatter Index (), coefficient of determination (), and explained variance score ().
The
is a widely used metric that measures the square root of the average squared differences between predicted and observed values. It is sensitive to large errors, thus providing a strong indication of how far the predictions are from the actual values. The RMSE is defined as follows:
where
is the predicted value obtained with the model and
is the true value.
The NRMSE is a normalized version of the RMSE that scales the error by the range or the mean of the observed values. This allows for a comparison of error across different datasets with different ranges or units. It is calculated as follows:
The STD measures the amount of variation or dispersion of a set of values. In the context of model evaluation, it is useful for understanding the variability in the errors and is given by
where
is the mean of
.
The MAE measures the median of the absolute differences between the predicted and true values, making it a robust metric to outliers, as it gives equal weight to each error regardless of magnitude. The MAE is defined as follows:
SI is a metric used to assess the consistency of the model predictions by calculating the ratio of the RMSE to the mean of the observed values. It provides insight into the relative error of the predictions and is given by:
R
2 is a statistical measure that indicates how well the predicted values match the actual values. It measures the proportion of variance in the dependent variable that is predictable from the independent variables. An R
2 value of 1 means perfect predictions, while a value of 0 indicates no correlation between predicted and actual values. R
2 is computed as follows:
The EVS is similar to R
2 but with the key difference that it does not penalize for systematic offsets in the predictions. It measures the proportion of variance in the target variable that is explained by the model and is defined as follows:
where
is the variance of the true values, and
is the variance of the residuals.
For a more detailed description, the reader is referred to [
34].
4.2. Overall Model Performances
The results for the two acquisition modes, classified by partition QF, are derived from the validation dataset using the partition effective significant wave height, peak wavelength, and peak wave direction with respect to WW3 and are presented in
Table 3,
Table 4 and
Table 5, respectively. This analysis provides a clear view of how the model distinguishes between good and bad partitions.
Performance metrics for each class, evaluated based on partition parameters and compared to the WW3 model, reveal insightful trends. For each parameter, the model demonstrates strong initial agreement with WW3. Specifically, for the “very good” class, the agreement is excellent, with an R2 value of 0.77 (overall mean). However, as the class quality decreases towards “poor,” the agreement progressively weakens. In the “poorest” class, the distributions of SAR and WW3 diverge significantly, reflecting a clear mismatch between the two. This trend holds true for almost the entire range of partition wave parameters.
For the subsequent analysis, we will focus exclusively on the extreme classes—“very good” and “poor”—along with the “medium” class, as these categories represent the most relevant distinctions in partition quality for our purposes.
To showcase the robustness of the methodology in partition classification, from high-quality (very good) to low-quality (poor) classes for both WV1 and WV2 across partition parameter ranges, scatter plots are presented in
Figure 2 for the integral parameter Hs. In addition to the clear class separation, the plots reveal a consistent trend of increasing error as partition quality declines, with a notable mismatch in the poor class. This mismatch underscores the model’s ability to effectively capture variations in error distribution, reinforcing the reliability of the classification approach. These findings highlight the strength of the model in preserving accuracy across diverse data quality levels, even under challenging conditions. Further partition parameters, including peak wavelength and peak wave direction, are detailed in Appendices
Appendix A and
Appendix B.
4.3. Focused Analysis of Partition Classification Performance
The class distribution depicted in
Figure 3 is largely uniform across all partitions, indicating an overall balanced classification with no evident bias toward any particular class.
Minor deviations are observed in the medium and low classes between ascending and descending passes. These differences likely reflect the aggregation of error metrics across both modes, which alters the percentile thresholds relative to evaluating each mode independently, rather than indicating systematic discrepancies.
A detailed analysis of
Figure 4 highlights how the algorithm assigns quality classifications across wave partitions. The first partition, typically representing the dominant swell, is consistently identified as the most significant and is predominantly classified as “very good” or “good.” This outcome aligns with expectations, as the algorithm is designed to prioritize the most geophysically relevant wave information extracted from the SAR data.
The second partition, often associated with a secondary swell, shows a more evenly distributed classification between quality levels. This trend is particularly noticeable in the WV1 data and appears to be largely unaffected by the heading of the satellite. In contrast, WV2 displays a slightly different pattern, which may be due to the uncertainty introduced by merging classification thresholds across acquisition modes.
Lower-ranked partitions tend to be classified as “low” or “poor,” reflecting their reduced geophysical relevance. This clear stratification in partition quality helps end users focus on the most significant components of the sea state when interpreting SAR wave data.
The mapping of the spatial distribution of classes (“very good,” “medium,” and “poor”) across WV1 and WV2 modes (
Figure 5) reveals the performance of the classification in different regions. The distributions are largely consistent between the two modes, suggesting that the observed patterns are primarily dictated by the algorithm rather than by differences in the acquisition mode.
Some regional variations can be attributed to differences in WV acquisition coverage. For example, coverage over Europe is relatively sparse due to the use of alternative Sentinel-1 acquisition modes: Interferometric Wide Swath (IW) is mainly employed for mainland land monitoring, while Extra-Wide Swath (EW) is more frequently used in regions such as the Azores. Consequently, fewer WV observations are available for quality assessment in these areas.
A more detailed examination of the spatial distribution highlights certain regional tendencies.
Very Good: This class tends to be more frequent in mid-latitude. In contrast, it is less commonly observed in the northern Indian Ocean and in coastal regions, where environmental factors such as monsoon activity, coastal topography, and proximity to land can affect the quality of wave retrievals.
Medium: This class is relatively evenly distributed across the globe, with an increased presence in transition zones near the equator and subpolar regions. These areas are characterized by more variable conditions that often result in intermediate-quality inversions.
Poor: This class is more frequently observed in high-latitude regions, where strong and variable winds, complex atmospheric dynamics, and environmental variability can affect both SAR retrievals and the performance of the WW3 model. Importantly, this classification does not exclusively reflect the limitations of SAR. It captures the general inconsistency between SAR and WW3 outputs, which may stem from either a dataset or geophysical conditions that influence both. The aim is to characterize the reliability of the partition based on observed disagreement, rather than attributing the error to a single source.
5. Discussion
Retrieving ocean wave parameters from SAR data is inherently complex due to the limitations of the radar system and environmental variability. The MTF plays a central role in translating SAR cross-spectrum information into wave spectra, but factors such as wave motion, whether moving toward or away from the radar, can distort this process (
Figure 6). Some inconsistencies can thus be directly attributed to the SAR inversion process. This is illustrated in
Figure 6, where the red partition represents an artificial azimuth-oriented feature caused by underestimation of the azimuthal MTF, and the yellow partition reflects a comparable range-oriented artifact. The errors associated with such artificial partitions are substantially larger than those of genuine SAR-derived partitions, for which the corresponding WW3 partitions differ in integral parameters but remain of the same order of magnitude.
In addition to MTF-related challenges, environmental phenomena such as rainfall and other atmospheric or oceanic characteristics further impact NRCS, affecting wave retrieval. Rainfall (
Figure 7) can influence the NRCS by altering surface roughness, potentially masking the geophysical signal. Oceanic fronts, characterized by sharp gradients in sea surface temperature and salinity, and atmospheric fronts (
Figure 8), marked by strong near-surface wind variations, also introduce discontinuities in the NRCS, further complicating the retrieval of the wave spectrum.
To go deeper into model explanation and interpretability, an illustrative example of model interpretation is provided using SHAP-based explanations [
35] to estimate errors in the three partition-integral parameters for the WV1 acquisition (see
Figure 9,
Figure 10 and
Figure 11). The explanation for the WV2 acquisition is essentially the same and is therefore not shown. Each plot investigation contains two types of analysis, one for the model explanation: this explanation helps to interpret the importance of each feature (
Table 1) and how they affect the model’s predictions. A second plot based on the importance of features introduces the concept of feature clustering to help visualize redundancy among features. This clustering aims to group features that provide information similar to the model. This means that a model might be able to use one of two related features and achieve similar predictive performance. Traditional methods such as correlation matrices can identify such relationships. In the context of SHAP’s feature clustering, “distance” between features is typically scaled between 0 and 1, where a distance of 0 means the features are perfectly redundant, and a distance of 1 means they are completely independent. In our case, we set the clustering cut-off at 0.5: the bar plot will only show clusters (or groups) of features that have a clustering distance of less than 0.5. This means that only highly redundant features (those sharing more than 50% of their explanation power) will be visually grouped together. Less redundant features, even if they have some relationship, will be displayed individually.
The main predictors for the estimation of
(as depicted in
Figure 9a) are the effective partition
and the normalized variance of the SLC imagette. These features are physically meaningful, as they directly relate to wave energy and sea surface texture, and their influence is clearly reflected in the behavior of the model. Specifically, the model tends to associate higher effective values
with a lower predicted error, leading to a higher quality classification for those partitions. In contrast, lower
values are typically associated with higher predicted errors and lower quality assessments. This indicates that the model prioritizes energetic sea states, while treating weaker partitions more conservatively.
A further insight comes from the feature clustering (
Figure 9b): statistical moments derived from the SLC imagette—namely skewness, kurtosis, and normalized variance—form a single cluster, indicating that they collectively explain a significant portion (over 50%) of the model’s behavior. Another correlated group includes peak wave parameters such as peak direction and peak period. Most other features appear relatively independent of their contribution to predict
, suggesting that the model relies on a focused subset of physically meaningful indicators to make its predictions.
In the context of the estimation of the peak period error
(
Figure 10a), the predominant determinant of the fidelity of the model is the energy ratio between the spectral peak of the partition and the maximum energy boundary. A decreased energy ratio corresponds to an increase in relative error, underscoring its critical influence on the model’s uncertainty quantification. In contrast, elevated energy ratios are associated with reduced error magnitudes, indicating enhanced precision in the peak period retrieval. Smaller energy ratio predominantly plays a role in higher sensitivity of the partitioning process and reduced ability to cross-assign the right partitions between the model and the SAR: typically, swell systems seen by the model as a single energy peak may have a double peak in the corresponding SAR spectrum domain.
Consistent with the objective of the model, the peak period itself constitutes the second most significant feature that impacts error prediction. Its influence manifests non-linearly and is modulated by the swell propagation direction relative to the SAR azimuth, as well as hydrodynamic effects and tilt modulation intrinsic to the SAR wave retrieval mechanism.
In particular, while the partition index has a limited influence on the estimation of , the model leverages the partition rank (i.e., the “p” feature) more substantially when predicting . This suggests that the model attempts to refine its error estimation by accounting for the sequential ordering of swell systems, which aligns with the earlier observations regarding the distribution of quality classes across partition ranks. This strategy reflects the model’s recognition of the varying relevance and complexity of each partition in accurately retrieving peak period information.
Furthermore, we observe a similar behavior to
in terms of feature redundancy (
Figure 10b): the SLC imagette statistics are often redundant, leading the model to group them together. This redundancy extends to some other peak partition parameters computed in different geometries, which also appear to be redundant for the model.
Analysis of the error estimation of the partition wave direction
(
Figure 11a) reveals a pronounced sensitivity to alignment between the dominant wave direction and the direction of the satellite. This confirms the fundamental influence of wave propagation dynamics relative to the SAR flight azimuth, which imposes intrinsic constraints on the inversion accuracy, as previously described. The elevated importance of the ambiguity factor—ranked third among predictors—further highlights its critical role in modulating directional uncertainty, distinguishing this error metric from others.
Consistent with patterns observed in the estimation of other wave parameters, the energy ratio once again proves to be a principal driver, ranking as the second most important feature. The peak wave direction also contributes significantly, occupying the sixth position in the feature importance hierarchy, emphasizing its relevance in capturing directional variability.
In addition to the correlated statistical features derived from the SLC imagette (
Figure 11b), the model does not identify additional significant groups of features influencing
. This suggests that directional error arises from a more discrete set of physical and retrieval factors, underscoring the complex interplay between wave dynamics and SAR observation geometry. These insights pave the way for targeted improvements in SAR wave inversion methodologies with the potential to enhance the fidelity of directional retrieval under challenging environmental conditions.
6. Conclusions
This study proposes a novel methodology to improve the reliability of SAR-based ocean wave retrieval by introducing an advanced partition classification algorithm. Through the integration of machine learning and explainability techniques, the approach effectively mitigates persistent challenges in the interpretation of wave spectra, particularly those associated with system-induced distortions such as azimuth cut-off effects and MTF limitations.
Looking forward, the proposed QF classification framework holds promise in enabling a more systematic identification of error sources, especially those that recur under specific SAR observation conditions. Furthermore, it may serve as a foundation for targeted advancements in wave inversion methodologies, including the refinement of MTF models and the development of sophisticated filtering techniques to suppress non-wave signals.
What sets this work apart is not just the classification accuracy, but the intelligent selection of physically meaningful features that align with the behavior of ocean waves. This synergy between data-driven modeling and geophysical understanding enables a deeper insight into SAR performance under varying sea conditions. The model’s capacity to isolate the most reliable partitions creates opportunities for more targeted and confident use of SAR data in both scientific research and operational settings.
Beyond technical contributions, the integration of interpretability adds a valuable layer of transparency. Rather than treating the algorithm as a black box, the use of SHAP-based analysis illuminates the factors influencing each decision, allowing users to validate outcomes and potentially refine the input data or retrieval strategies.
In a broader sense, this work contributes to a growing effort in Earth observation to make remote sensing tools not only more accurate but also more accountable and accessible. The proposed method offers a foundation upon which future SAR-based systems can be built: systems that are capable of adapting to complex marine environments while providing clear, interpretable results. As ocean monitoring becomes increasingly vital in the context of climate change and maritime activity, such tools will be essential for both research and real-world applications.