Fusion of In-Situ and Modelled Marine Data for Enhanced Coastal Dynamics Prediction Along the Western Black Sea Coast
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe present work treats about the fusion of in-situ and modelled marine data for enhanced coastal dynamics prediction along the western black sea coast. The topic is interesting, relevant to the field, and the work is well-written. In my opinion, the paper can be published subject to the following minor modifications:
Please delete ref [1] from the abstract. The abstract should not include references.
Please indicate what AI and ML means in the abstract.
Line 18, this is section 1. “1. Introduction”. Please check all the section numerations.
Line 8, please change “Makarynskyy O. [8]” to “Makarynskyy [8]”.
Please justify why wind and wave parameters are enough and speculate about the possibility to include more parameters in your model.
Line 163, please change “p < 0.05, As all” to “p < 0.05, as all”.
Figures 3 and 4 are too interesting to analyze the data but a deeper analysis about the data should be provided.
Please provide more data about the methodology. The training, validation, and test processes are not clear.
There is no conclusion section and this is too important. I recommend to split the section about discussion into “Results and discussion” and “Conclusions” since the discussion section includes several conclusions of the work.
Author Response
Thank you very much for taking the time to review this manuscript. Please find the detailed responses below and the corresponding revisions/corrections highlighted in the re-submitted files.
Comments 1: [Please delete ref [1] from the abstract. The abstract should not include references.]
Response 1: Thank you for pointing this out. We agree with this comment. Therefore, we have deleted the reference from the abstract. The paragraph: “We integrate in-situ observations from five meteo-oceanographic stations with modelled geospatial marine data from the Copernicus Marine Service.”. The highlighting is not supported in the abstract (we are using the latex template).
Comments 2: [Please indicate what AI and ML means in the abstract.]
Response 2: Agree. We have, accordingly, indicate the abbreviations in the abstract. The paragraph: “This research provides a deeper understanding of atmosphere-marine interactions and demonstrates the efficacy of Artificial intelligence (AI) / Machine Learning (ML) in bridging observational and modelled data gaps for informed coastal zone management decisions, essential for maritime safety and coastal management along the Western Black Sea coast.”
Comments 3: [Line 18, this is section 1. “1. Introduction”. Please check all the section numerations.]
Response 3: Thank you for pointing this out. We agree with this comment. Therefore, I have modified the numbering of the sections accordingly.
Comments 4: [Line 8, please change “Makarynskyy O. [8]” to “Makarynskyy [8]”.
Response 4: Thank you. We agree and removed accordingly.
Comments 5: [Please justify why wind and wave parameters are enough and speculate about the possibility to include more parameters in your model.}
Response 5: Thank you for pointing this out. In this study, the focus was placed on wind and wave parameters due to their direct relevance and significant impact on coastal dynamics. These parameters serve as fundamental inputs for understanding and predicting coastal processes. However, it is acknowledged that additional parameters could be incorporated into the model to potentially enhance its accuracy and provide a more comprehensive understanding of coastal dynamics. These parameters may include: Sea surface temperature as this parameter can influence water density, affecting currents and stratification patterns; Salinity - Similar to temperature, salinity variations can impact water density and circulation patterns; Atmospheric pressure: Changes in atmospheric pressure can contribute to sea level fluctuations and influence weather patterns that affect coastal dynamics; Bathymetry: The depth and shape of the seabed can influence wave propagation and current patterns. Incorporating these parameters into the model could potentially improve its ability to capture the complex interactions that drive coastal dynamics. However, it is important to consider the availability and quality of data for these additional parameters, as well as the potential increase in model complexity and computational demands.
Comments 6: [Line 163, please change “p < 0.05, As all” to “p < 0.05, as all”.]
Response 6: Thank you. We agree and removed accordingly.
Response 7: Thank you for pointing this out. We agree with this comment. Therefore, to address the reviewer's comment regarding Figures 3 and 4, the following analysis is provided and included in the paper:
“Figure 3, Figure 4a and Figure 4b present a visual representation of wind and wave data, showcasing intriguing patterns and correlations. Both wind and wave data exhibit a clustering of points around the origin, indicating a prevalence of lower magnitudes.
Both Figures 4a and 4b show a high concentration of data points around the origin (0, 0) and exhibit an elliptical shape, suggesting a potential correlation between the U and V components. However, the wind speed distribution in Figure 4a} appears more dispersed than the wave component distribution in Figure 4b, indicating a wider range of magnitudes and directions. This observation suggests that wind patterns exhibit greater variability than wave patterns, potentially reflecting the influence of atmospheric conditions on wind dynamics.
Additionally, as is pointed out in \cite{Nedelcu2023}, the wind and wave data exhibit similar trends throughout the year. The winter months show higher wind speeds and wave heights, while the summer months show lower values. This seasonal variation is consistent with the general understanding of the Black Sea's climate and its influence on wind and wave patterns \cite{Nedelcu2023}.”
Comments 8: [Please provide more data about the methodology. The training, validation, and test processes are not clear.]
Response 8: You are right, we mainly focused upon the hyperparameter tuning procedure and less upon the train/test split. The change is highlighted on line 229 to 231.
Comments 9: [There is no conclusion section and this is too important. I recommend to split the section about discussion into “Results and discussion” and “Conclusions” since the discussion section includes several conclusions of the work.]
Response 9: Thank you for pointing this out. We agree with this comment. Following all reviewers’ comments, we agreed to split the Discussion section in: from lines 381 – 468 for Results Section and from line 470 – 530 Conclusion Section. All revisions are included in the paper revised attached.
Reviewer 2 Report
Comments and Suggestions for AuthorsDear Authors,
Thank you for the interesting and valuable article. Applications of the ML technics tot he oceanography are highly appreciated.
There are some minor comments to the text generally.
1) The introduction section shouldn’t be number 0, it is usually number 1.
2) You forgot to write conclusions in the article.
3) In the figures 1 there are no blue dots, they are green. And why the part of the marine waters are in light green?
4) Starting from figure 5, captions in the figures are unreadable in the printed form. Also please change the style of the lines, to be readable in the black and white printed article version.
5) Table 1. Why do you calculate statistical parameters for latitude and longitude?
6) Line 155: Such a dataset, while using comma-separated values (CSV), leads to a 21.69 Mb file unsuitable for analytics. This statement is incorrect, data set might be too big for the existing computation sources, but it is suitable for the future data analysis. You reduced size of the file, but you didn’t change data.
7) Please avoid abbreviations, line in line 170 TFT before the introduction of this method itself. Line 183 CMEMS – unclear what do you mean here.
8) Line 249 …the last column... Do you mean table 5 or figure 5? Please specify to make text easier to read.
9) Table 5. Please use the formatting of the table regarding the journal requirements.
10) Discussion section can be renamed to the discussion and conclusions. Or part of this section can be moved to the conclusions.
Author Response
Thank you very much for taking the time to review this manuscript. Please find the detailed responses below and the corresponding revisions/corrections highlighted in the re-submitted files.
Comments 1: [The introduction section shouldn’t be number 0, it is usually number 1.]
Response 1: Thank you for pointing this out. We agree with this comment. Therefore, we have modified the numbering of the sections accordingly.
Comments 2: [ You forgot to write conclusions in the article.]
Response 2: Thank you for pointing this out. We agree with this comment. Following all reviewers’ comments, we agreed to split the Discussion section in: from lines 383 – 470 for Results Section and from line 472 – 532 Conclusion Section. All revisions are included in the paper revised attached.
Comments 3: [ In the figures 1 there are no blue dots, they are green. And why the part of the marine waters are in light green?]
Response 3: Thank you for pointing this out. We agree with this comment. Therefore, we have reformulated the text for Figure 1, including citations for the water bodies that are evidenced in the Figure: “Spatial distribution of data sources along the Western Black Sea coast. illustrates the spatial distribution of data sources along the Western Black Sea coast, including in-situ coastal stations and the coverage of the Copernicus Marine Service data. The red stars indicate the locations of coastal automatic weather stations, while the green dots represent the Copernicus Marine Service data coverage. The blue line demarcates the Romanian coastal waters as the Economic Exclusive Zone (EEZ). The marine waters are classified into four distinct water bodies based on the Marine Strategy Framework Directive (MSFD) \cite{Mihailov2014, Boicenco2018}: Waters with variable salinity in the north influenced by the Danube River (BLK\_RO\_RG\_TT03), Coastal waters from the central to the south (BLK\_RO\_RG\_CT), Marine Waters from the 30m isobath to 200m (BLK\_RO\_RG\_MT01), and Offshore Waters with depths of at least 200m (BLK\_RO\_RG\_MT02). Each water body is characterized by specific average seasonal and annual salinity ranges”
Comments 4: Starting from figure 5, captions in the figures are unreadable in the printed form. Also please change the style of the lines, to be readable in the black and white printed article version.
Response 4: Thank you for pointing this out, we changed all figures with subfigures fonts in order to make the more readable. Thank you.
Comment 5: [Table 1. Why do you calculate statistical parameters for latitude and longitude?]
Response 5: The statistical parameters for latitude and longitude are calculated to provide a descriptive summary of the spatial distribution of the data. These parameters, including the mean, standard deviation, minimum, quartiles, and maximum, offer insights into the central tendency, variability, and range of the latitude and longitude values within the dataset. This information helps characterize the spatial coverage of the data and understand the geographical extent of the study area
Comment 6: [Line 155: Such a dataset, while using comma-separated values (CSV), leads to a 21.69 Mb file unsuitable for analytics. This statement is incorrect, data set might be too big for the existing computation sources, but it is suitable for the future data analysis. You reduced size of the file, but you didn’t change data.
Response 6: Thank you for pointing this out. We agree with this comment. The reviewer is correct, a dataset of 21.69 MB is not inherently unsuitable for analytics. The phrasing in the original sentence was inaccurate. The issue was not the size of the dataset itself, but rather the limitations of using CSV files for storing and processing large geospatial datasets.
Here's a revised version of the paragraph:
Lines 155-162: "While the dataset is represented using comma-separated values (CSV), this leads to a 21.69 MB file that can be impracticable for efficient analytics due to the limitations of CSV format in handling large geospatial datasets. To optimize processing and analysis, the data was converted to the parquet standard with single file storage, resulting in a 10.16 MB file with non-sequential access capabilities. This conversion improves the efficiency of data handling without altering the dataset itself.”
Comment 7: Please avoid abbreviations, line in line 170 TFT before the introduction of this method itself. Line 183 CMEMS – unclear what do you mean here.
Response 7: You are right, we prefixed these abbreviations with their full name (it can also be found at the abbreviation sections) in order to avoid confusion. Thank you.
Comment 8: Line 249 …the last column... Do you mean table 5 or figure 5? Please specify to make text easier to read.
Response 8: We agree, it's not clear if we refer the Figure 5 or the Table 5. We fixed this by naming the Figure 5 in the text in order to avoid confusion.
Comment 9: Table 5. Please use the formatting of the table regarding the journal requirements.
Response 9: We are using the MDPI LATEX template and the table was rendered automatically by this. The only change we had performed is to fix the width to textwidth in order to improve readability. The alignment suggests the static variable Patience.
Comment 10: [Discussion section can be renamed to the discussion and conclusions. Or part of this section can be moved to the conclusions.]
Response 10: Thank you for pointing this out. We agree with this comment. Following all reviewers’ comments, we agreed to split the Discussion section in: from lines 383 – 470 for Results Section and from line 472 – 532 Conclusion Section. All revisions are included in the paper revised attached.
Reviewer 3 Report
Comments and Suggestions for AuthorsIn this manuscript, the authors leverage data fusion of in-situ and model data to improve coastal predictions in a specific area. I think this is an interesting and timely work. The results look robust and convincing, and there is clear value in the approach and methodology chosen. I am, therefore, supportive of publication. I have a few comments that I would like the authors to consider and, once these are addressed, I think that the manuscript should be publishable.
- The authors should take an iteration through the figures of the manuscript, fonts are often too small to be readable, and more work and thoughts can be given to make the figures informative and easy to understand. For now the figures are not reaching their full potential at communicating the science and the findings.
- It may be worth to take one more literature check for other works that use ML / AI / ANNs to perform model correction and / or data fusion of different data sources in geophysics, I think there have been a few such works coming out recently. For example for ocean level, see https://doi.org/10.1016/j.ocemod.2024.102334 , https://doi.org/10.1016/j.ocemod.2009.12.007 . Note that this approach has also been used for other applications, e.g. recent works such as http://dx.doi.org/10.13140/RG.2.2.30906.91846 , though this is further from the present application. But I think that, overall, it may be useful to have a short paragraph discussing the fact that "by combining several sources of data, it is possible to improve predictions and estimates". This can be related to textbook / standard data assimilation practices - in a sense, all of this is an extension / improvements / non-linear extension of the classical best linear unbiased estimator / "BLUE" result. The difference being that BLUE is only optimal when considering linear problems, while applying a non linear neural network allows to provide a BLUE-like improvement, but in a non-linear framework, which naturally improves performance.
- The paragraph lines 45-52 is very interesting. Having worked with this kind of data and problems, I can confirm that this is a critical point. I think it would be useful that the authors extend this paragraph with a couple of sentences, to provide a quick insight into the key ingredients used by these works to allow to remediate these challenges.
- Regarding XAI, an important category of methods is the Grad-CAM and related gradient based methods; maybe this could be discussed in a sentence into the paragraph of lines 59-70. This method is known under many names, so the authors can check if this is actually similar to one of the methods discussed there.
- In fig. 1, the insert figure has too small fonts and the lat and lon on it are hard to read. Also, can you insert a box in the insert figure that shows the location of the higher resolution map in the black sea?
- Fig. 2 is nice "in theory", but is hard to read in practice - improve the figure resolution and increase the size.
- fig. 5 is not readable, the fonts are far too small.
- figs 6, 7, 8, 9 need more work - it is not clear what all curves on the figures on the figures are from the legend and caption.
- It is not so easy for now to understand intuitively the models performance. In particular, just looking at the loss is not sufficient to understand model performance. Indeed, model performance is more than just error metrics such as RMSE and MAE and the likes implemented in the loss function. In particular, the intrinsic variability of the data also need to be validated - otherwise, it is relatively easy to smooth out data and obtain better error metrics without truly getting a better model, since the improved error comes with an artificial and non-realistic reduction in the produced data variability. This is why tools such as Taylor plots are so useful. I would recommend that the authors look into this point and present a few more information on this aspect.
- There is a reproducibility crisis happening in science, especially with the rise of data driven / ML / AI techniques. In this context, would the authors consider releasing all or part of their data and code as open source materials, on a hosting platform such as github or similar? This would significantly improve the reproducibility of the study, as well as the impact of the work, since other groups are more likely to re-use and build on your results if they can use your code as a starting point.
- I miss a plot showing the general architecture of the NN used, especially as it is relatively complex. Can you add a high-level figure showing the architecture of your network? Similarly, there are many metaparameters in the design of a NN and its training - can you add a table summarizing these, including information about the parameter count, learning rate, initialization, loss function, stochastic gradient algorithm used, etc.
- A point that is not that clear at present and that requires careful read by the authors to be understood, is the temporal use of the model. Consider adding a figure that makes it very clear if this model is used to only correct old data, and / or can or is also used in operational operations, as well as provides a simple and easy to grasp overview of the training vs. validation data split. I.e. some figures similar to e.g. 10.1016/j.ocemod.2024.102334 fig. 4 would be most useful. Similar for the training vs. validation data, something similar to fig. 6 in the same paper would be useful.
Author Response
Thank you very much for taking the time to review this manuscript. Please find the detailed responses below and the corresponding revisions/corrections highlighted/in track changes in the re-submitted files.
Comment 1: [It may be worth to take one more literature check for other works that use ML / AI / ANNs to perform model correction and / or data fusion of different data sources in geophysics, I think there have been a few such works coming out recently. For example for ocean level, see https://doi.org/10.1016/j.ocemod.2024.102334 , https://doi.org/10.1016/j.ocemod.2009.12.007 . Note that this approach has also been used for other applications, e.g. recent works such as http://dx.doi.org/10.13140/RG.2.2.30906.91846 , though this is further from the present application. But I think that, overall, it may be useful to have a short paragraph discussing the fact that "by combining several sources of data, it is possible to improve predictions and estimates". This can be related to textbook / standard data assimilation practices - in a sense, all of this is an extension / improvements / non-linear extension of the classical best linear unbiased estimator / "BLUE" result. The difference being that BLUE is only optimal when considering linear problems, while applying a non linear neural network allows to provide a BLUE-like improvement, but in a non-linear framework, which naturally improves performance.]
Response 1: Thank you for pointing this out. We agree with this comment. Therefore, we have included the suggested papers in Discussion Section:
“Recent work by Tedesco highlights the application of neural networks for bias correction in operational storm surge forecasts, demonstrating that machine learning can effectively enhance the accuracy of predictions in complex marine environments \cite{Tedesco2024}. This study is a foundational reference for understanding how neural networks can be utilized to correct systematic errors in traditional forecasting models. The findings indicate that by integrating observational data with model outputs, the performance of storm surge forecasts can be significantly improved, which is particularly relevant for the Romanian Black Sea context, where various atmospheric and oceanographic factors influence coastal dynamics.
Similarly, the research conducted by Bajo and Umgiesser explores the synergy between dynamic models and neural networks for storm surge forecasting \cite{Bajo2010}. Their findings suggest that a hybrid approach, combining both methodologies' strengths can lead to more accurate and reliable predictions. This aligns with the overarching theme of this literature review, which posits that integrating multiple data sources—whether observational or modelled—can enhance the predictive capabilities of coastal dynamics models. The study emphasizes the importance of data assimilation practices, which are critical for improving the accuracy of forecasts in non-linear environments, such as those encountered in coastal regions.
Comment 2: [ The paragraph lines 45-52 is very interesting. Having worked with this kind of data and problems, I can confirm that this is a critical point. I think it would be useful that the authors extend this paragraph with a couple of sentences, to provide a quick insight into the key ingredients used by these works to allow to remediate these challenges.]
Response 2:: Thank you for pointing this out. We agree with this comment. Therefore, we have included ta paragraph at lines 53-59 to address the reviewer's comment and provide further insights into the key ingredients used to overcome the challenges of integrating in-situ and modelled data, we can extend the paragraph with the following sentences:
"These studies employ various techniques to address discrepancies and uncertainties, including data normalization, bias correction, and spatio-temporal matching. For instance, deep learning models are used to fuse satellite-derived data with in-situ measurements, while machine learning-based data assimilation techniques integrate modelled data with buoy observations. These approaches aim to create a unified dataset that accurately represents the coastal dynamics while accounting for each data source's inherent biases and limitations.”
Comment 3: [ Regarding XAI, an important category of methods is the Grad-CAM and related gradient based methods; maybe this could be discussed in a sentence into the paragraph of lines 59-70. This method is known under many names, so the authors can check if this is actually similar to one of the methods discussed there.]
Response 3: Thank you for pointing this out. We agree with this comment. Therefore, we have included a paragraph at lines 60-79:
In response to the reviewer's comment regarding the inclusion of Grad-CAM and related gradient-based methods in the discussion of explainable artificial intelligence (XAI), it is essential to highlight the significance of these techniques in enhancing the interpretability of complex models, particularly in the context of meteorological and oceanographic applications.
“Grad-CAM, or Gradient-weighted Class Activation Mapping, is a powerful visualization technique that provides insights into the decision-making processes of deep learning models by generating heatmaps that indicate the regions of input data that most influence the model's predictions. This capability is particularly valuable in fields such as meteorology and oceanography, where understanding the underlying factors driving model outputs is crucial for effective decision-making and management.
The application of Grad-CAM in meteorological contexts has been explored in various studies, demonstrating its utility in interpreting model predictions. For instance, Choi et al. utilized Grad-CAM to enhance the interpretability of time-series predictions in meteorological models, showcasing how gradient-based methods can elucidate the relationships between input variables and model outputs \cite{Choi2022}. This aligns with the reviewer's suggestion to incorporate a discussion of Grad-CAM in the context of our work, as it can provide a deeper understanding of the atmospheric influences on coastal dynamics along the Romanian Black Sea coast.
Moreover, integrating Grad-CAM with other XAI techniques can further enhance model interpretability. For example, Diaz-Gomez et al. employed Grad-CAM in conjunction with occlusion analysis to assess the contributions of various input features to model predictions in medical imaging \cite{Diaz2022}. This dual approach not only highlights the most influential regions in the input data but also allows for a more comprehensive understanding of the model's decision-making process. Such methodologies can be adapted to our research, where understanding the impact of atmospheric conditions on marine dynamics is critical.”
Comment 4: [ In fig. 1, the insert figure has too small fonts and the lat and lon on it are hard to read. Also, can you insert a box in the insert figure that shows the location of the higher resolution map in the black sea?}
Response 4: Thank you for pointing this out, we changed the fonts and the resolution in order to make the figure more readable.
Comment 5:[ Fig. 2 is nice "in theory", but is hard to read in practice - improve the figure resolution and increase the size.]
Response 5: You are right, this image was unreadable, we increased fonts and resolution for this image too.
Comment 6: [ fig. 5 is not readable, the fonts are far too small.]
Response 6: We agree, using subfigure requires larger fonts, we increased the fonts in order to make Figure more readable. We hope it's readable now as we increased the fonts for this figure too.
Comment 7: [ figs 6, 7, 8, 9 need more work - it is not clear what all curves on the figures on the figures are from the legend and caption.]
Response 7: You are right, we needed to improved the font size for each of the Figures 6-9 components. We hope that it’s readable now. We also discussed about the readability of the represented components within the caption including the explanation for the gray line (attention).
Comment 8: Comment 8: It is not so easy for now to understand intuitively the models performance. In particular, just looking at the loss is not sufficient to understand model performance. Indeed, model performance is more than just error metrics such as RMSE and MAE and the likes implemented in the loss function. In particular, the intrinsic variability of the data also need to be validated - otherwise, it is relatively easy to smooth out data and obtain better error metrics without truly getting a better model, since the improved error comes with an artificial and non-realistic reduction in the produced data variability. This is why tools such as Taylor plots are so useful. I would recommend that the authors look into this point and present a few more information on this aspect.
Response 8: Thank you for pointing out this, this type of plots provide a composite performance comparison view for several models. Within this work we focus on the TFT model - as a new architecture - applicability for geospatial data. While we focus on prediction performance we run several predictions for the same input data for interpretability purposes. We like the ideea for this type of plots (it was an interesting learning experience) and we will keep it in mind for a review article that also include traditional (LSTM) architectures.
Comment 8: It is not so easy for now to understand intuitively the models performance. In particular, just looking at the loss is not sufficient to understand model performance. Indeed, model performance is more than just error metrics such as RMSE and MAE and the likes implemented in the loss function. In particular, the intrinsic variability of the data also need to be validated - otherwise, it is relatively easy to smooth out data and obtain better error metrics without truly getting a better model, since the improved error comes with an artificial and non-realistic reduction in the produced data variability. This is why tools such as Taylor plots are so useful. I would recommend that the authors look into this point and present a few more information on this aspect.
Response 9: You are right, a lot of AI/ML articles do not provide code examples for their results. In order to address this issue we added supplemental materials including models (uo.pth, vo.pth, vsdx.pth and vsdy.pth) and an calling example (example.ipynb and it's PDF export example.pdf). In order to test this we also published the dataset thus making it available for the research community (DOI: 10.5281/zenodo.14641323).
Comment 10: I miss a plot showing the general architecture of the NN used, especially as it is relatively complex. Can you add a high-level figure showing the architecture of your network? Similarly, there are many metaparameters in the design of a NN and its training - can you add a table summarizing these, including information about the parameter count, learning rate, initialization, loss function, stochastic gradient algorithm used, etc.
Response 10: Thank you for pointing this out, we avoided including a picture due the large TFT model and it would have been unreadable and we found out that, as TFT is a common architecure, a table presenting it's parameters is more suited for the reader. If the reader needs to access the architecture they can obtain it from the supplemental materials attached after this revision where models are included (without being pruned).
Comment 11: A point that is not that clear at present and that requires careful read by the authors to be understood, is the temporal use of the model. Consider adding a figure that makes it very clear if this model is used to only correct old data, and / or can or is also used in operational operations, as well as provides a simple and easy-to-grasp overview of the training vs. validation data split. I.e. some figures similar to e.g. 10.1016/j.ocemod.2024.102334 fig. 4 would be most useful. Similar for the training vs. validation data, something similar to fig. 6 in the same paper would be useful.
Response 11: Thank you, we checked out the suggested article but it fouses on a smaller architecture, our main goal is to apply the newer TFT architecture for ocean modeling, specially for forecasting it's components. We also included an paragraph explaining how the training and validation was split from the original data (Line 226-229).
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsDear authors,
Thank you for the improved version of the article. By my opinion, it can be published in its current form.