A Data Storage, Analysis, and Project Administration Engine (TMFdw) for Small- to Medium-Size Interdisciplinary Ecological Research Programs with Full Raster Data Capabilities

Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe paper is interesting and the work has value to a wide audience. The content is basically there but needs to be restructured and clear goals of the work need to be defined early on in the paper. This type of paper does not lend itself to a Methods/Results format but the example does and should be formatted in that way. Also, the text could be simplified and flow better to make the information available to a wider audience. Also, there are a lot of terms that need to be defined or references provided. Specific comments include:
Having one cite does not make something "undisputed". There are other models that also work.
14 GB tabular data -> 14 GB of tabular data
It sounds like this is a paper about the evolution of a data store. It would help to have a specific goal or at least a scope of the data store (i.e. what types of data and users will it support).
Are you really using "tape storage"?
Define EML (I'm guessing it is Ecological Modeling Language but there are other names that would match)
Extra tab on line 158
Line 177, "gridded data" - is this raster data? Using both is confusing.
Line 207 - indicated what is specifically is meant by "blending".
Lie 228 - define "chunk-wise"
Line 243 - define ECMWF
Line 244 - "machine-learning process" not defined and only used once
Line 273 - "aim" of what? This project? Should be part of goals that would be stated earlier
Line 283 - need to define "KeyCloak , e.g. via Docker, Podman or Kubernetes"
Line 284 - " dw’s" - did you mean TMFdw's?
Figure 4 - contains unreadable text
Line 304 - this looks like a GUI rather than a "raster engine"
Figure 5 - text is unreadable
Figure 6 - Clarify if this work flow implemented in the TMFdw
Line 400 - I have never heard of "Partial Least Squares Regression" being referred to as AI.
Line 420 - need detailed methods for how RMSE and R-squared were used. Change "comprehensive" as there are many other methods.
Line 438 - "very good" based on what goal?
RESPECT - needs to be clearly defined as it seems central to the work.
Comments on the Quality of English LanguageEnglish is almost there. Some of the words need to be changed to be accurate.
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Reviewer 2 Report
Comments and Suggestions for AuthorsThe article deals with the issue of data storage related to scientific research and project activities. The topic is topical. Nowadays, there is a lot of data, whether vector or raster, which represent project inputs and are further processed and analyzed. It would be useful to archive these results in the form of digital data in some way for future research.
Comments:
Is the data stored in the repository open source, or is commercial data also provided for a fee?
Can users download the data to their harddisk and then work with and analyze it?
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Reviewer 3 Report
Comments and Suggestions for AuthorsRecommendations to authors
My main concerns focus on improving several aspects of the text, particularly regarding the model description, the appropriate use of figures, and the presentation of the conclusions and the approach's usefulness. The conclusions should be tightened to emphasize key outcomes and the significance of each development phase, highlighting the transition from a temporary solution to a DOI-certified data center.
My recommendations to further improve some parts of the manuscript:
• Lines 31 – 35: This section regarding the dataset availability needs your attention.
• Line 157: Check alignment.
• Line 171: In Figure 2, you present two flowcharts with differing design styles. I recommend using a consistent format for both. Additionally, the second (bottom) diagram appears to lack detail and should be replaced with a higher-resolution version. To enhance readability, consider splitting Figure 2 into two separate figures. Furthermore, adding color to illustrate the proposed novel TMFdw raster engine would be beneficial.
• Line 349: Figure 5 should be presented in much higher resolution. Again, please consider splitting Figure 5 into two separate figures.
• Line 370: In Figure 6, please ensure a consistent format for all flow diagrams and consider adding additional key information about the AI model.
• Line 395: It is stated that 'The topography-related area-wide predictors rely on an Airborne Laser Scanning (ALS) campaign which yielded the digital elevation model.' However, I am wondering why you opted not to use DEM data available from multiple (reliable) digital sources and instead relied on ALS, which may not fully represent the Earth's surface. I think you must elaborate on this and add some text in the manuscript as well.
• Line 400: There are numerous AI models and AI approaches available today. Why was PLSR chosen? There is limited explanation or justification provided for selecting this approach. Additionally, the statement 'PLSR is well-suited for small sample sizes' seems inconsistent with the size of the dataset used. Please elaborate and clarify in the text the rationale behind selecting PLSR.
• Line 400: Partial Least Squares Regression (PLSR) is primarily considered a statistical method rather than a traditional AI model. I agree that it can be part of AI frameworks when used in predictive modeling or machine learning contexts, especially for tasks involving multicollinearity in large datasets but this is not the case here. While it incorporates elements of modeling and prediction, categorizing it strictly as "AI" may not sound correct. Please elaborate.
• Line 432: How were these target variables selected?
• Line 491: It is noted that the resulting data volume per year totals 620 GB, significantly exceeding the tabular data volume of approximately 14 GB. However, no recommendations are provided on how to address this issue.
• Line 494: I think that you should include a limitation section. I would suggest that you should a section “Usage Notes” providing guidance for data interpretation or limitations. This is a very important issue that needs to be addressed.
• Line 497: The conclusions section needs revision, as it currently resembles a brainstorming output rather than a concise summary. I recommend structuring the conclusions to better clarify TMFdw’s status, value, and relevance to ecological informatics as follows: (a) focus and conciseness; (b) highlight of innovations; (c) future directions; (d) technical validation; (e) impact on the field.
• Line 500: “organic growth”?
• Line 501: The conclusions could be tightened to emphasize the key outcomes and the relevance of each development phase, such as its growth from a temporary solution to a DOI-certified data center.
• Line 509: You should highlight the innovations in the proposed approach by adding a specific note on the TMFdw's unique contributions or innovations within ecological data management, especially compared to other systems that would strengthen the impact and justify its continued support.
• Line 523: Please consider elaborating the impact of the TMFdw approach on the field. I suggest concluding with a clear statement on how TMFdw will advance interdisciplinary ecological research capabilities, both in the Andes and in broader applications.
• Line 534: Although mentioned here, you may provide some further information for future research directions, especially regarding the raster engine's potential applications beyond the current user base to make the conclusions more forward-looking.
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsThe paper contains a relatively large amount of information that is interesting to those involved in this type of work (including myself). I do recommend a few additional changes and having someone with a strong background in editing and formatting review the document.
My structure query language -> My Structured Query Language
Move "(https://vhrz669.hrz.uni-148 marburg.de/tmf_respect/UserFiles/File/respect/generalinformations/RESPECT_data_149 use_agreement_approved.pdf)" to a citation
monthly images or datasets that change over time - Isn't a "monthly image" something that would change over time? How about "monthly images or other datasets that change over time"
For Figures 4,5 and 6: Even zooming in, the text is unreadable. I would recommend shrinking the window size and then recapturing the screen. This should make the text large enough to be readable.
Comments on the Quality of English Language
The English is good considering the amount of technical information and the length of the paper. Some additional edits could be made for formatting and flow.
Author Response
Please see the attachement.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsI thank the authors for preparing the responses and supplementing the article.
Author Response
Please see the attachement.
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsRecommendations to authors
General comments:
Please review the figure captions, as they are too long and include unnecessary information that should be placed in the main text. Additionally, ensure the captions adhere to the journal's style guidelines. Links in the text must be replaced with references. The resolution and size of some figures could be slightly increased.
My recommendations to further improve some parts of the manuscript:
• Line 36: Please delete “for example,”.
• Line 76: The reference to the “Research Unit RESPECT” must be given as any other reference following the guidelines of the journal. Please revise.
• Line 149: According to the journal, the links must be also presented as references. Check the journal guidelines.
• Line 194: This caption is too long. Consider revising the caption text and moving the detailed description to the main body of the text. Also, font size and style are wrong. Check the journal guidelines.
• Line 207: Same comments as above. Please make corrections.
• Line 257: Same comments as above. You do not have to provide detailed information in the caption; instead move this to the main body of the text.
• Line 414: Figure 8, which depicts a schematic representation of the workflow, is a good attempt, but I believe it needs revision as it does not clearly convey the impression of a workflow. Please address this issue and replace the figure with an improved version.
• Line 430: Please format the link as a reference following the journal's guidelines.
• Line 450: Same comment as above.
• Line 586: The conclusions section, although improved, still needs revision to provide a concise summary.
Author Response
Please see the attachement.
Author Response File: Author Response.pdf