Next Article in Journal
Bimolecular Reactive Transport Experiments and Simulations in Porous Media
Next Article in Special Issue
The Water-Saving Strategies Assessment (WSSA) Framework: An Application for the Urmia Lake Restoration Program
Previous Article in Journal
Automatic Extraction of Open Water Using Imagery of Landsat Series
Previous Article in Special Issue
The Identification of Factors Determining the Probability of Practicing Inland Water Tourism Through Logistic Regression Models: The Case of Extremadura, Spain
 
 
Article
Peer-Review Record

Generalised Linear Models for Prediction of Dissolved Oxygen in a Waste Stabilisation Pond

Water 2020, 12(7), 1930; https://doi.org/10.3390/w12071930
by Duy Tan Pham 1,*, Long Ho 1, Juan Espinoza-Palacios 1, Maria Arevalo-Durazno 1,2, Wout Van Echelpoel 1 and Peter Goethals 1
Reviewer 1:
Reviewer 2:
Reviewer 3:
Water 2020, 12(7), 1930; https://doi.org/10.3390/w12071930
Submission received: 1 June 2020 / Revised: 29 June 2020 / Accepted: 3 July 2020 / Published: 7 July 2020
(This article belongs to the Special Issue Sustainable Development of Lakes and Reservoirs)

Round 1

Reviewer 1 Report

Authors studied key driven factors of dissolved oxygen variability in Ucubamba WSP (Ecuador) by applying and comparing numerous GLMs. The work is well written of a general good quality as a scientific publication.

I have however some comments, therefore I consider to be "reconsider after major revisions" 

Some specific comments

Line 55, consider the use of “Y” to refer to variable Y.

Line 58, delete “fancier”

Line 59-60: “In short, GLMs finds the line that fits the experimental data points best” Described it differently to be more technically correct than “fit best”, perhaps use “minimize error”, as these are optimization methods to minimize the error (distance) between the data point and line.

Line 74-77, poorly redacted, please improve it.

Line 78: “However, from the previous pond” delete “the”

Line 87: Delete “From this perspective”

Line 87 “this study aims to develop the first GLM application” This is kind of a bold statement, just in Ecuador you have the work of Professor J. Calderon & Professor S. Sonnenholzner both working in linear models in Ponds (shrimp ponds, some 20 years ago), perhaps you could add “in the best of our knowledge”.

Line 96: Study area, please describe the soil conditions (clay?), total volume, average depth, and inclination if any.

Line 126: “analysis using American Public Health Association methods [21]”, APHA describe different methods for each analyte, please cite the code for the corresponding method.

Line 133: “2.3.1 Variables used to develop models” in this subsection authors mention 6 main variables considered in the models. These out of “all predictors”, please mention what are all these “predictors” and the “scores” or “contributions” in the first round LM. These “less relevant” variables (predictors) are of much importance in your publication (for several reasons that I may not describe here). I suggest a table.

Line 149: “Six variables were always used as predictors in the models. These were chlorophyll a, BOD, water temperature, solar radiation, wind speed and air temperature”. Why? I suppose LM will weigh the importance of each variable, that in turn will make “the decision” about what to include and what not to include in a give model, perhaps something not clear in the text? redact it better.

How many data series in total are included in the analysis?

Line 220-221: “The data of predictor variables” is perhaps not a good description of “Variability of physicochemical and biological parameters and climatic conditions in the ponds” which is the title of appendix 2.  

Line 219: Generally section 3.1 is written in a very “qualitative” way and even when authors refer to appendix 2, it will be good if authors make an effort to re-write it in a more quantitative way. Following an example:

Line 230-231: “Water temperature did not change that much between the three sampling times” variation cannot be described as “that much”, How much is that much? Authors may use % of error or other units.

General observation

Authors did discuss the predictive accuracy of such models. This considering data within the range these variables where measure. I think it is important to clarify, for the general readers that such linear models are valid just in the specific range of measure predictors, under which such models are developed, and are “hard” to extrapolate. Thus, the most important feature of this work is not the model itself but the approach and method of model development (as data partitioning and cross validation strategies); it will be good if this is state a little more clear in the manuscript.

This make the stated conclusion less relevant for the general reader since these are valid just in Cuenca, Ecuador. Perhaps conclusion should be more lined-up towards the usefulness of model development.

Author Response

Cover letter

 

Manuscript ID: water-837952

Generalised linear models for Prediction of Dissolved Oxygen in a Waste Stabilisation Pond by Duy Tan Pham, Long Ho, Juan Espinoza-Palacios, Maria Arevalo-Durazno, Wout Van Echelpoel and Peter Goethals.

 

Dear Editors and Reviewers,

 

We would like to thank the reviewer for their relevant and constructive remarks. We have revised our manuscript accordingly. We acknowledge that these modifications definitely improve the quality of our manuscript. We hope that the changes and explanations are acceptable and satisfactory with the expectation of the editors and reviewer.

 

You can find below the details of the modifications and explanations.

 

Thank you very much for revising our manuscript again!

 

Yours sincerely,

 

Tan Duy Pham

Department of Animal Sciences and Aquatic Ecology

Ghent University, Belgium

E-mail address: [email protected].

 

 

Reviewer 1

Authors studied key driven factors of dissolved oxygen variability in Ucubamba WSP (Ecuador) by applying and comparing numerous GLMs. The work is well written of a general good quality as a scientific publication.

I have however some comments, therefore I consider to be "reconsider after major revisions"

Some specific comments

  • Line 55, consider the use of “Y” to refer to variable Y.

Corrected

  • Line 58, delete “fancier”

Deleted

  • Line 59-60: “In short, GLMs finds the line that fits the experimental data points best” Described it differently to be more technically correct than “fit best”, perhaps use “minimize error”, as these are optimization methods to minimize the error (distance) between the data point and line.

We changed to “In short, GLMs finds a line that minimize the errors between the line and the experimental data points.”

  • Line 74-77, poorly redacted, please improve it.

We revised the manuscript as follows.

  • In wastewater treatment facilities, dissolved oxygen (DO) plays a crucial role in the biodegradation of organic matter, controlling of odours and removal of pathogens. Hence, aeration costs normally account for 40-60 % of the total energy consumption of a wastewater treatment plant (WWTP). On the other hand, DO is naturally supplied by algal photosynthetic process in WSPs which reduce the operational costs and constrain potential risks from the emission of volatile organic compounds by avoiding mechanical aerations [16]. Due to the dependence of algal metabolisms on day-night cycles, highly fluctuated DO levels can affect the performance of WSPs. Besides, when discharged into water bodies, DO in the effluent might comprise various implications from the ecological and environmental points of view as it can affect aquatic organisms living in the water bodies. (lines 74-83)

 

  • Line 78: “However, from the previous pond” delete “the”

Deleted

  • Line 87: Delete “From this perspective”

Deleted

  • Line 87 “this study aims to develop the first GLM application” This is kind of a bold statement, just in Ecuador you have the work of Professor J. Calderon & Professor S. Sonnenholzner both working in linear models in Ponds (shrimp ponds, some 20 years ago), perhaps you could add “in the best of our knowledge”.

We changed to “This study aims to develop an GLM application to investigate the key driving factors of DO variability in WSPs”. (lines 87-88)

  • Line 96: Study area, please describe the soil conditions (clay?), total volume, average depth, and inclination if any.

We added the depth of the ponds from which, together with the given area, the total volume of the ponds can be calculated. We added the info about the inclination and the bottom of the ponds as follows.

  • The total area of the aerated lagoons is 6 ha with a depth of 4.5 m (two times 3 ha). Subsequently, the aerated wastewater flows from the aerated lagoon into the FPs where further removal of soluble BOD takes place. The total area of the FPs is 26 ha with a depth of 2m (two times 13 ha) and the theoretical HRT is five to six days. The MPs are the last stage in the biological treatment chain and mainly remove pathogens [20]. The total area of the MPs is 13 ha with a depth of 1.8 m (7.4 ha in MPs from line 1 and 5.6 ha in line 2) and the HRT is three to four days. With no inclination, the bottom of the ponds is well-sealed by geotextiles to avoid seepage. (lines 106-113)

 

  • Line 126: “analysis using American Public Health Association methods [21]”, APHA describe different methods for each analyte, please cite the code for the corresponding method.

The code 5210 was added.

  • Line 133: “2.3.1 Variables used to develop models” in this subsection authors mention 6 main variables considered in the models. These out of “all predictors”, please mention what are all these “predictors” and the “scores” or “contributions” in the first round LM. These “less relevant” variables (predictors) are of much importance in your publication (for several reasons that I may not describe here). I suggest a table.

The other predictors were Depth and Timing which were taken into account in different model development strategies. The predictors were written in the general model in the revised manuscript as follows.

  • Besides these six variables, depths ranging from 5 to 175 cm from the water surface and timing (the time points when the samples were taken) ranging from 8:00 to 17:00 were used in some of the models to test whether this inclusion will result in better model performance. The general predictive model of DO was showed as follows:

DO= β_0 "+" β_1 "×Chl"+β_2 "×" BOD_5+β_3 "×" WT+β_4 "×" SR+β_5 "×WS + " β_6 "×" AT+β_7 "×Depth +" β_8 "×Timing" (Please see the attachment for better version of this equation)

 

  • Line 149: “Six variables were always used as predictors in the models. These were chlorophyll a, BOD, water temperature, solar radiation, wind speed and air temperature”. Why? I suppose LM will weigh the importance of each variable, that in turn will make “the decision” about what to include and what not to include in a given model, perhaps something not clear in the text? redact it better.

The reasons of the presence of the six variables in the models were based on the mass balance of oxygen in the ponds. While the main oxygen sources in the WSP system were photosynthesis and the direct exchange of atmospheric oxygen through the air/water interface, oxygen consumption was mostly done by aerobic bacteria for mineralizing organic matter and nitrification process. We revised the manuscript accordingly as follows.

  • Six variables, i.e. chlorophyll a, BOD, water temperature, solar radiation, wind speed and air temperature, were always used as predictors in the models given the mass balance of oxygen in the ponds. While the main oxygen sources in the WSP system were photosynthesis and the direct exchange of atmospheric oxygen through the air/water interface, oxygen consumption was mostly done by aerobic bacteria for mineralizing organic matter and nitrification process [23]. (lines 151-155)

 

  • How many data series in total are included in the analysis?

Three sampling campaigns were implemented as written in the revised manuscript.

  • Three sampling campaigns was implemented on 25 and 26 July (T1); 14 and 15 August (T2) and 26 and 27 August (T3) in 2013. At each sampling time, one WSP line was sampled over the course of one day starting from 8:00 to 17:00. (lines 129-131)

 

  • Line 220-221: “The data of predictor variables” is perhaps not a good description of “Variability of physicochemical and biological parameters and climatic conditions in the ponds” which is the title of appendix 2.

We changed to “The variability of physicochemical and biological parameters and climatic conditions are showed in Appendix A2.” (line 227-228)

  • Line 219: Generally section 3.1 is written in a very “qualitative” way and even when authors refer to appendix 2, it will be good if authors make an effort to re-write it in a more quantitative way. Following an example:

Quantitative description was added in the section 3.1 as follows

  • Specifically, the algal biomass near the water surface was around double that in the bottom. This proportion was lower in the FPs (around 1.5) while higher in the MPs (around 2.5). Higher algal biomass could also be found in the FPs compared to their consecutive ponds, i.e. 354.8 and 161 µg Chl a.L-1. The concentration of BOD followed more or less the same pattern as chlorophyll a except that there was no large variability of BOD concentration between three different sampling times, which could be appointed to the quite stable BOD removal efficiency of the system. It is also showed that the concentration of BOD decreased from the FPs to MPs by a factor of two. i.e. 33.7 and 18.8 mg.L-1. Water temperature did not change that much between the three sampling times, fluctuated around 18-19 o Also, water temperature seemed to be homogenous throughout the water column and between the two pond types. Related to the climatic conditions, only air temperature remained unaltered, i.e. 16.8±2.1 oC, while wind speed and especially solar radiation did change a lot across the three sampling times, i.e. 2.4±1.0 m.s-1 and 469.2±223.8 W.m-2, respectively. As DO is in fact influenced by the BOD concentration and the diurnal activity of algae, it also had large variability across the three sampling times (Figure 3). Between the two pond types, DO across the three sampling times had larger variability in FPs than in MPs. There was also a difference of DO between the surface and bottom of both FPs and MPs. Within line 1 of the WSP, there was a decrease of DO from FP 1 to MP 1 in both surface and bottom, while in line 2 of the WSP DO throughout the ponds were more or less the same in both surface and bottom layers. From the outlet part of FP1 to MP1 inlet, DO values near the water surface dropped about 70%, i.e., from above 10 mg O2·L−1 to around 3 mg O2·L−1, while the oxygen level remained similar between two ponds in the upper line. (lines 233-253)

 

  • Line 230-231: “Water temperature did not change that much between the three sampling times” variation cannot be described as “that much”, How much is that much? Authors may use % of error or other units.

We added the minor fluctuation of the water temperature in the revised manuscript as follows

  • Water temperature did not change that much between the three sampling times, fluctuated around 18-19 o C. (lines 240-241)

General observation

  • Authors did discuss the predictive accuracy of such models. This considering data within the range these variables where measure. I think it is important to clarify, for the general readers that such linear models are valid just in the specific range of measure predictors, under which such models are developed, and are “hard” to extrapolate. Thus, the most important feature of this work is not the model itself but the approach and method of model development (as data partitioning and cross validation strategies); it will be good if this is state a little more clear in the manuscript. This make the stated conclusion less relevant for the general reader since these are valid just in Cuenca, Ecuador. Perhaps conclusion should be more lined-up towards the usefulness of model development.

The authors appreciated and agreed with the comment. We added this argument into the conclusion as follows.

  • Despite the limitation of the data-driven approach for global extrapolation, it is expected that the data partitioning and cross validation strategies developed in this study can be widely applied to identify the optimal models for prediction purposes. (lines 484-486)

 

The reviewer can also find our response to his/her comments and other reviewers' comments in the attachment.

 

Author Response File: Author Response.docx

Reviewer 2 Report

You have done a very nice job in putting together the material for this manuscript and have done a very good job in statsitical analysis.  It is an important research work and I hope you will expand that by generating more data points.  I envision application of this work for flood control reservoirs as well.

Please make sure to show Line 1 and Line 2 clearly on the schematic.

Please add % on line 354.

Also in Line 352, please add" at the" after DO. 

In Line 343, please change Firstly to "First".

Line 58, please do not use word fancier, it is not a technical word.

Line 67:  use only "or both".

Best of Luck!

Author Response

Cover letter

 

Manuscript ID: water-837952

Generalised linear models for Prediction of Dissolved Oxygen in a Waste Stabilisation Pond by Duy Tan Pham, Long Ho, Juan Espinoza-Palacios, Maria Arevalo-Durazno, Wout Van Echelpoel and Peter Goethals.

 

Dear Editors and Reviewers,

 

We would like to thank the reviewer for their relevant and constructive remarks. We have revised our manuscript accordingly. We acknowledge that these modifications definitely improve the quality of our manuscript. We hope that the changes and explanations are acceptable and satisfactory with the expectation of the editors and reviewer.

 

You can find below the details of the modifications and explanations.

 

Thank you very much for revising our manuscript again!

 

Yours sincerely,

 

Tan Duy Pham

Department of Animal Sciences and Aquatic Ecology

Ghent University, Belgium

E-mail address: [email protected].

 

Reviewer 2

You have done a very nice job in putting together the material for this manuscript and have done a very good job in statistical analysis.  It is an important research work and I hope you will expand that by generating more data points.  I envision application of this work for flood control reservoirs as well.

 

  • Please make sure to show Line 1 and Line 2 clearly on the schematic.

It shows in the schematic which can be seen in the attachment.

  • Please add % on line 354.

Added

  • Also in Line 352, please add" at the" after DO.

Added

  • In Line 343, please change Firstly to "First".

Changed

  • Line 58, please do not use word fancier, it is not a technical word.

Changed

  • Line 67: use only "or both".

Changed

 

The reviewer can also find our responses to his/her comments and other reviewers' comments in the attachment.

Author Response File: Author Response.docx

Reviewer 3 Report

Basically I am not convinced of "this non scientific" solver (that is my problem). But I do know from different applications of "Big data" management, that you can apply it. (I would have decided for a very different approach). So I limited my view to the correctness of the application of their algorithm. The mathematics behind this paper are correct. The language (the writing) is perfect. I could not identify even one mistake. I assume that the scientific community will be curious of this paper, although they will be disappointed, because the outcome of their solver is not surprising but expected.

Author Response

Cover letter

 

Manuscript ID: water-837952

Generalised linear models for Prediction of Dissolved Oxygen in a Waste Stabilisation Pond by Duy Tan Pham, Long Ho, Juan Espinoza-Palacios, Maria Arevalo-Durazno, Wout Van Echelpoel and Peter Goethals.

 

Dear Editors and Reviewers,

 

We would like to thank the reviewer for their relevant and constructive remarks. We have revised our manuscript accordingly. We acknowledge that these modifications definitely improve the quality of our manuscript. We hope that the changes and explanations are acceptable and satisfactory with the expectation of the editors and reviewer.

 

You can find below the details of the modifications and explanations.

 

Thank you very much for revising our manuscript again!

 

Yours sincerely,

 

Tan Duy Pham

Department of Animal Sciences and Aquatic Ecology

Ghent University, Belgium

E-mail address: [email protected].

 

Reviewer 3

  • Basically I am not convinced of "this non-scientific" solver (that is my problem). But I do know from different applications of "Big data" management, that you can apply it. (I would have decided for a very different approach). So I limited my view to the correctness of the application of their algorithm. The mathematics behind this paper are correct. The language (the writing) is perfect. I could not identify even one mistake. I assume that the scientific community will be curious of this paper, although they will be disappointed, because the outcome of their solver is not surprising but expected.

We appreciated your comments and definitely agreed in certain degrees. We would clarify some of the innovative elements in our study that can significantly contribute to the knowledge of pond researchers and engineers. The novelty of the study is based on not solely the model algorithm but also the data partitioning and cross validation strategies developed in this study and the unique high-altitude characteristics of the wastewater treatment facility. Linear regression using ordinary least squares (OLS)  is not indeed a sophisticated and novel model and, as a data-driven approach, its results in our study will be restricted due to its limited application for global extrapolation. However, the data partitioning and cross validation strategies in this study can be applied to find optimal models for prediction performance in other research. Cross validation is not always straightforward as it is heavily dependent on how the data were split to generate the training and test datasets. This makes the performance of hold-out cross validation less stable while leave-one-out cross validation was not used in this study given its computational cost and the bias-variance tradeoff. We added this argument in our conclusion as follows.

  • Despite the limitation of the data-driven approach for global extrapolation, it is expected that the data partitioning and cross validation strategies developed in this study can be widely applied to identify the optimal models for prediction purposes. (lines 484-486)

Moreover, from the viewpoint of pond treatment technology, the Ucubamba waste stabilisation pond possesses unique high-altitude characteristics, i.e. strong light intensity, low air temperature, great variation of temperature and low oxygen pressure. These characteristics enlarged the variation of DO and other variables causing the variation of the pond behaviour; hence, specific design and operation guidelines need to be deployed to ensure its performance. This argument is written in the revised manuscript as follows.

However, it was also interesting to note that the maximum concentration observed in the WSP was 500 µg.L-1, which was considered low according to Mara [1], as the chlorophyll a concentration in “healthy” WSPs is usually in the range of 500-2000 μg.L-1. Therefore, more samplings and long term data collection should be done to figure out whether this low concentration of chlorophyll a was related to short-term data collection or this could be a characteristic of a WSP operated at high altitude [34-38].

 

The reviewer can also find our responses to his/her comments and other reviewers' comments in the attachment. 

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

Dear authors, corrections to the suggestions/comments of my earlier report are satisfactory, I therefore gladly recommend this scientific manuscript for publication.

Best Wishes.

Back to TopTop