Next Article in Journal
Relative Condition Parameters for Fishes of Montana, USA
Previous Article in Journal
Intestinal Bile Acids Induce Behavioral and Olfactory Electrophysiological Responses in Large Yellow Croaker (Larimichthys crocea)
 
 
Essay
Peer-Review Record

Nonlinearity and Spatial Autocorrelation in Species Distribution Modeling: An Example Based on Weakfish (Cynoscion regalis) in the Mid-Atlantic Bight

by Yafei Zhang 1,2,*, Yan Jiao 2,* and Robert J. Latour 3
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3: Anonymous
Submission received: 14 October 2022 / Revised: 19 December 2022 / Accepted: 29 December 2022 / Published: 31 December 2022

Round 1

Reviewer 1 Report

Comments to authors:

This manuscript presents the results of a data analysis that the authors conducted to compare effectiveness of different statistical tests (modeling approaches) in dealing with nonlinear relationships and spatial autocorrelation that is often present within analyses of fish-habitat relationships.  The authors used a data set from involving seven years of a marine fish species (weakfish Cynoscion regalis) from along the Mid-Atlantic coast of the United States.  Below I provide comments on the manuscript that I hope will help the authors with improving the manuscript. 

 Overall Comments:

A. lines 72 -78:  This paragraph serves to introduce the readers to weakfish and why it was chosen to serve as the example for this paper.  This information is minimal and will not be informative to those readers who do not work with marine fishes.  I recommend adding additional information about weakfish and describing the biological and ecological traits of this marine fish species that make it a good example.  For example- are they migratory species with a wide range or typically occur in a limited range?  Are they small bodied or large bodied species?  Are they long-lived species or short lived? Are they predators of other fishes or do they feed on zooplankton?  There are other potential traits that could be discussed.  I am not suggesting the authors discuss all of them, just some additional ones that highlight why weakfish data maybe nonlinear and exhibit spatial autocorrelation and those that convey to the reader basically what type of fish weakfish are

B. line 89-96:  Need to provide more details here related to fish collection methods to confirm for the reader that the data collected from different locations is comparable.  What sampling method was used to capture weakfish?  Was it by trawling?  If so, what type of trawl was used, how big were the trawls, and what was the mesh size of the trawls used?  Also, in the survey process how much of an area did the trawls sample in each location?   Finally, need to confirm if any of the sampling locations were repeatedly sampled over the 7 year period.  If the locations were repeatedly sampled then that likely requires mixed effect model analyses to address pseudoreplication and potentially invalidates the use of generalized linear model and GAM. 

C. Methods section needs a subsection that describes clearly the analytical pipeline use to evaluate the 10 different models.  Additionally, this new subsection needs to state exactly what computer program was used to conduct these analyses.  This is critical detail, because without it someone else could not repeat the study.

D. Order of the presentation of the results needs to be revised.  The first thing that needs to be presented in the Results is the results of the tests showing that the weakfish data set used exhibits nonlinear relationships with the selected habitat variables and that the weakfish data set exhibits spatial autocorrelation and to describe for both nonlinearity and spatial autocorrelation – qualitatively does the data set exhibit low, moderate, or high levels of nonlinearity and spatial autocorrelation.  This is a keypoint needed because if the data is not non-linear and does not exhibit spatial autocorrelation then this comparison of models is a moot point.  

Once the authors have established that the weakfish data are nonlinear and exhibit spatial autocorrelation then they should present the results of their model comparisons (Table 4).  However, a major change is needed because the results presented in Table 4 are flawed because the habitat variables are not the same for all models with Delta GAM positive, Delta Gam prob, Delta spatial GAM positive, and Delta spatial GAM prob all having water temperature and the other six models not having water temperature.   This inclusion of an additional habitat would alter the AIC values and as such invalidate the comparisons.  I strongly recommend the authors redo this analysis with a standard set of habitat variables (latitude, year, month, depth, and salinity) and present the revised results.  In this part I recommend also present revised results from Table 6 showing 3 fold cross validation results from the models with a standard set of habitat variables. 

Following the comparisons of the 10 models with a standard set of habitat variables then the section should end with the comparisons of which habitat variables are selected (Table 1) and the training and testing errors from the habitat variable selection analysis (Table 2)

E. It is confusing for me that some tables have 10 models, some have six models.  I could not figure out why.   This inconsistency needs to get resolved and clarified. 

Specific Comments:

line 2: Delete “Title:”

line 12:  I do not consider nonlinearity and spatial autocorrelation “essential characteristics” in fish distributions.  Often scientists analyzing data try to avoid these issues rather than making sure they occur in the data sets/analyses.  Thus, replace “essential characteristics” with something else.   Suggestion - “common features observed in marine fish data sets, but..”

line 24-25:  revise as follows:  ….GAM as a potential candidate model when….

line 27:  “spatial autocorrelation” and “weakfish” are terms included in the title.  Either delete these terms from the keywords or replace them with other terms

line 30 and throughout the manuscript:  what is meant by species distributional data?  Is this presence/absence of species, abundance, density, catch per unit effort (i.e., biomass capture per unit effort), or all four?  Please clarify here.   Also, in your revisions please be sure other places indicating data types clearly convey what the fish data type is.  I did not try to identify all places with this issue in my review. 

line 33:  insert “future” between “predicting” and “abundance”

line 39 and throughout manuscript:  I recommend against the use of the acronym “SAC” and simply writing out spatial autocorrelation – this small change will make the manuscript easier for the reader to read. 

line 57: replace “populations” with “species”

line 58-59:  this sentence needs a rewording for clarity.  I understand the intent – marine fishes live and occur in close promity to each other and this close proximity results in similar catch rates among species.  However, as stated this sentence does not convey that thought.

line 72:  When I first read this I thought “Atlantic weakfish” was a different common name for weakfish.  I get that the authors are indicating weakfish that occur in the Atlantic ocean, but I recommend deleting “Atlantic” here to avoid confusion. 

line 78:  revise as follows:  …motivated us to conduct…

line 86: replace “NC” with “North Carolina”

line 89: delete “Atlantic weakfish is used as case study.” This is unnecessary repetition as stated this in the Introduction.

line 99: replace “is” with “was”

line 105: need to clarify what you mean by highly correlated variables – what level of r is considered high? 

line 108: replace “is” with “was”

line 108:  Are you justified in using AIC instead of its small sample correction AICc?  Need to report the n/K information and confirm that your n/K is > 40

line 110-111:  with Delta AIC as defined by Burnham and Anderson in their 2002 book -- typically this is calculated as the difference between the best model with the lowest AIC value and the model being compared to it.  This is good standard for comparisons.  As written sounds like Delta AIC used here does not follow that calculation standard.   If it does not, then the authors need to justify why and clarify the difference in methodology. 

line 118:  here state 5 types of models being compared, but Table 1 of Results shows 10 different model types.  Need to reword and address this inconsistency.  Also, here no mention of the linear mixed effects model so the revision needs to summarize all model types evaluated

line 130: provide a brief description of how delta-lognormal models differ from zero-inflated and hurdle models.  This will give context to readers not familiar with this type of model. 

Table 1 – I recommend presenting Delta AIC instead of AIC values as this will enable the readers to interpret the results quicker

Table 4 last row -  R2?  How can you report an R2 value because generalized linear models, linear mixed effect models, and generalized linear mixed effect models do not provide an R2 value.  Pseudo R2 values can be calculated and reported but they are not the same calculations among the different tests.  Instead I recommend authors replace R2 values with Akaike’s weight (wi) to complement comparisons of Delta AIC among models

Author Response

Thanks so much for your careful review. Please see our responses in the word file attached. Thanks! 

Author Response File: Author Response.pdf

Reviewer 2 Report

The manuscript provides important information about selection of optimal models for analysis of spatial distribution of weakfish of the Eastern coast of North America and can be published after some improvements. I think that it in addition to purely modelling information, it requires more contextual information. For instance, the authors  write: "Further, broader scale temporal and/or spatial variables (e.g., climate, environmental, anthropogenic) not examined in this study could be structuring weakfish in the mid-Atlantic, and analyses of these represent critical next steps". To justify that it is needed, the authos should write what fraction of total variation of spatial distirbution is not explained by the models. As only if this fraction is large enough, such data needed to be collected and analysed. Also, it would be interesting to know what species (in terms of their life history patterns) the Delta spatial GAM can be effectively applied to.

Technical remarks:

I would put Table 1 after, but not before the first para of Results section.

It is unclear why Fig. 5 is put in teh Discussion, but not Results section. 

 

Author Response

Thanks so much for your careful review. Please see our responses in the word file attached. Thanks! 

Author Response File: Author Response.pdf

Reviewer 3 Report

See attached pdf review.

Comments for author File: Comments.pdf

Author Response

Thanks so much for your careful review. Please see our responses in the word file attached. Thanks! 

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

This is a revised one that I reviewed previously and presents the results of a data analysis that the authors conducted to compare effectiveness of different statistical tests (modeling approaches) in dealing with nonlinear relationships and spatial autocorrelation.  The authors used a data set from involving seven years of a marine fish species (weakfish Cynoscion regalis) from along the Mid-Atlantic coast of the United States.  Below I provide comments on the manuscript that I hope will help the authors with improving the manuscript. 

Overall Comments:

A. line 98 -112: Need to provide more details here related to fish collection methods

-what are the stratification factors and levels used as part of the stratified random sampling design.  this needs to be described so readers understand how sampling locations were selected

-also with respect to sampling location – need to confirm for the reader if any of the sampling locations were sampled more than once during the seven year period.  If so, then need to acknowledge the presence of pseudoreplication within the data and the potential need to use linear mixed effect model analyses

-what type of trawl was used?

-how big were the trawls? 

-what was the mesh size of the trawls? 

-with respect to trawls – need to confirm for the reader that the same type, size, and mesh size of trawls were the same at all locations and during all years. 

B. Section 2.2.6 Software:  This section needs to add more details so that the readers know exactly what R function and packages were used to conduct the analyses.  This is critical detail, because without it someone else could not repeat the study.   Specifically, the following details are needed:

-need provide Citation for R version  used at end of first sentence

-need provide package information and citation for glm and the package used  

-also to my knowledge and I double checked this on the R CRAN list for the glm function – the glm function does not run a hurdle GLM or delta GLM.  So how can the glm be used to conduct delta GLM as part of the analyses?  Is there a data transformation being used as part of the analyses to run a delta GLM using the standard R glm function.  Or have the authors created a unique R function to do so.  Either way these details all need to be described and shared

-what function was used to run delta GAM with the mgcv package? 

- again to my knowledge and the mgcv package does not run a delta GAM.  So how is a delta GAM being run with the mgcv package

-what function from spatial reg package was used to run SAR models? 

-need provide citations for mgcv and spatialreg packages

-what function and package was used to run the linear mixed effect models? 

-what function and package was used to run the autocovariate models?  

C. with respect to lacking statistical analysis information -- also Table one lists GLM prob, GAM prob, and Spatial GAM prob but there is no description of these models in the text as to what they are and the function and packages used to run them

D. Previously, I indicated that authors needed to demonstrate spatial autocorrelation and non-linearity in the weakfish data set.  The authors’ results confirm the presence of spatial autocorrelation, but they have not explicitly confirmed that the data is non-linear.   Their results showing the advantages of delta GAM and delta GAM spatial, which are non-linear tests, is indicative that the data are non-linear (better performance with nonlinear than linear models).  This aspect of the results needs to be highlighted in the Results section and clearly conveyed that the data set is non-linear. 

 

Specific Comments:

line 2: Delete “Title:”

line 20: Define “GAM” with first usage

line 68:  insert “)” at end of “(SAR)”

line 77:  replace “is” with “was”

line 82:  replace “Apr” with “April”

line 83-84: replace “Oct” with “October” and “Nov.” with “November”

line 132: five types?  Abstract says six.  Which one is it? 

line 169:  should this be “smoothing” rather than “smooth”

line 227: should “auto-covariate regression” be included in this list?  It is listed in Table 1.  

line 234:  I “newly surveyed data” is not a good word choice here because it could be interpreted by reader to mean data that was newly collected in the field.  Need to use a word/words that confirm that the data sets were created and/or simulated.  I suggest “simulated datasets”.  Whatever word choice is selected be sure to update figure 2 accordingly.

Table 1: I suggest that the smallest AIC value in each column or in each row be bolded.  Highlighting the smallest values in a column or row will provide a reference point for the readers

line 321: replace “showed” with “indicated”

line 363-364:  delete “and large R2 values” because no R2 values were presented. 

line 408:  Previously the authors specified they use ten variables in all models.  So in which analysis did they only use three variables?  if this is not a typo then the authors need to revise the methods section to confirm this change.

Author Response

Please see attached. Thanks. 

Author Response File: Author Response.docx

Reviewer 2 Report

"More data is needed if the unexplained fraction of the total variation of spatial autocorrelation is large". My question was about the quantitative proportion of explained and unexplained variation. Please indicate if it is possible to estimate

Author Response

Thanks again for your careful review. R2 or R2adjusted can be used to assess the variation explained by models. We can compare the R2 values of the models with and without spatial autocorrelation to indirectly evaluate the variation explained by the spatial autocorrelation. But we cannot directly estimate and separate the variation explained by the spatial autocorrelation. Our study found that the use of a nonlinear model did increase R2 values by about 35% (from 26.5% in GLM to 35.7% in GAM) and the use of a spatial model increase the R2 values by about 10% (from 35.7% in GAM to 39.1% in spatial GAM). Other spatial models also improved the R2 values. We included R2 for each model in the initial version but removed it as suggested by another reviewer because AIC has already been included.

Reviewer 3 Report

I don't feel that may main points have been adequately dealt with. The 2nd version is not a substantial improvement over the 1st version. The ms. still is narrow in its focus and of little interest to anyone not directly involved with the NEAMAP survey. Still has a title that makes the reader think of an important methodological innovation when in fact it is just the application of a gam and other linear approximations.

Author Response

Thank you again for the careful review of our manuscript. Again, our goal is to provide a framework and general methodology for most of the cases of fish species distribution. To reach our goal, we added a simulation study and simulated cases with and without spatial autocorrelation, with linear and nonlinear relationships with environmental factors, for the fish distribution and density. Please see the flowchart (Figure 2) and Section 2.3 in the manuscript for details. Our simulations found that in all those scenarios, spatial GAM over competes other models based on the mean squared error. That indicates that the recommended model works well not only for the NEAMAP survey but also for other cases as in the simulated scenarios.

Back to TopTop