Nonlinearity and Spatial Autocorrelation in Species Distribution Modeling: An Example Based on Weakﬁsh ( Cynoscion regalis ) in the Mid-Atlantic Bight

: Nonlinearity and spatial autocorrelation are common features observed in marine ﬁsh datasets but are often ignored or not considered simultaneously in modeling. Both features are often present within ecological data obtained across extensive spatial and temporal domains. A case study and a simulation were conducted to evaluate the necessity of considering both characteristics in marine species distribution modeling. We examined seven years of weakﬁsh ( Cynoscion regalis ) survey catch rates along the Atlantic coast, and ﬁve types of statistical models were formulated using a delta model approach because of the high percentage of zero catches in the dataset. The delta spatial generalized additive model (GAM) conﬁrmed the presence of nonlinear relationships with explanatory variables, and results from 3-fold cross-validation indicated that the delta spatial GAM yielded the smallest training and testing errors. Spatial maps of residuals also showed that the delta spatial GAM decreased the spatial autocorrelation in the data. The simulation study found that the spatial GAM over competes other models based on the mean squared error in all scenarios. That indicates that the recommended model not just works well for the NEAMAP survey but also for other cases as in the simulated scenarios.


Introduction
Species distributional data, such as biological survey data that often include capture location, time, and animal abundance in number or weight, are frequently used for analyzing the relationship between species distribution and environmental factors.Such analyses can help shape the foundation of species management and conservation procedures by predicting future abundance distributions based on future environmental conditions and spatial management policies.Spatial data analysis was introduced into fisheries during the late 1980s to improve understanding of species distribution and stock assessments for spatially aggregated species [1,2], and efforts have been directed at detecting the spatial distribution patterns of fish stocks [3,4].
Spatial data analysis is complex due to spatial autocorrelation and nonlinearity.Spatial autocorrelation is a phenomenon where samples are not independent from each other at nearby locations [5].Spatial nonlinearity, which reflects the nonlinear relationship between species density and a suite of predictor variables, has been widely observed for many marine species distributions [6,7].Multiple factors can cause spatial autocorrelation [8][9][10], such as biological processes (i.e., distance-related dispersal or species interactions), inappropriate modeling of nonlinear relationships between environmental factors and species, or failure to incorporate critical environmental variables that exhibit spatial structure [11].The second and the third factors are sometimes considered spatial dependency rather than spatial autocorrelation [12].
Generalized linear models (GLMs) and generalized additive models (GAMs) are commonly used to model the relationship between catch rates, which are often defined as the number of fish captured per unit of effort, and environmental, spatial, and temporal factors [13,14].Effort in fisheries surveys can be measured as tow distance, soak time, or spatial area fished.GLMs assume a linear relationship between the link function of the expected value of response variable and explanatory variables [15,16], while GAMs are an extension of GLMs to deal with nonlinear relationships between the link function of the expected value of the response variable and explanatory variables.However, both GLMs and GAMs assume that catch rate observations are independent across the survey domain, which may not be appropriate for fish species that exhibit spatially aggregated distributions.Similar catch rates can occur because many marine fishes live and occur in close proximity to each other.Thus, spatial autocorrelation can be problematic for species distribution modeling and catch rate standardization when using standard GLMs and GAMs.Alternative methods have been used to deal with spatial autocorrelation in biological survey data [17,18], such as the autoregressive model (simultaneous autoregressive model (SAR) and conditional autoregressive model (CAR)), which incorporates a spatial weight matrix where the neighborhood of each sampling location and weight of each neighbor is included in the standard linear model.Another method is the generalized linear mixed model (GLMM), in which the linear predictors include random effect and within subject error can be spatially autocorrelated [19].Generalized least square (GLS) adds a weight matrix, which is calculated by some correlation functions, on both sides of the regression and obtain least square estimators [20].The spatial dependence of a location on neighboring locations is modeled within the variance-covariance matrix [21][22][23].
Survey data of weakfish (Cynoscion regalis) along Atlantic coast was used as dataset in this study.Weakfish are found along the western coast of the Atlantic Ocean from New York to North Carolina where they support recreational and commercial fisheries.Since the early 1980s, weakfish landings have declined steadily and have reached a historic low in recent years.Weakfish exhibit northerly, inshore migrations during spring (April-May), entering estuaries and bays along the eastern U.S. coast to feed and spawn.As water temperatures cool during fall (October-November), fish form aggregations and engage in southerly, offshore movements to overwintering grounds.The most recent weakfish stock assessment concluded the stock was depleted with overfishing not occurring.A key research recommendation from the assessment motivated us to conduct temporal and spatial analyses of fishery-independent survey data [24].
The main goal of this study was to diagnose the necessity of considering nonlinearity and spatial autocorrelation in fish distribution based on the case-study weakfish survey data and through a simulation analysis.During this study we also evaluated the performance of spatial, non-spatial, and nonlinear models in analyzing spatially autocorrelated survey data, and explore the influence of environmental factors on catch rate data for weakfish collected from the nearshore mid-Atlantic Bight from southern New England to Cape Hatteras, North Carolina.

Weakfish Case Study
The data for weakfish were obtained from a fishery-independent survey, namely the Northeast Area Monitoring and Assessment Program (NEAMAP).We used the NEAMAP survey database from 2007 to 2013, which included 1820 samples.The survey began in fall 2007, and spring and fall cruises have been conducted annually.Locations of sampling sites, which follow a stratified random sampling design, range from 35.16 to 41.44 • N, and 70.87 to 75.99 • W. The survey area strata were based on longitudinal zones and water depth [25].The survey area was divided into 7 longitudinal zones from Martha's Vineyard through New York and 10 latitudinal zones from New Jersey to North Carolina.The boundaries of these zones The net used by the NEAMAP near shore trawl survey was a three-bridle trawl, the fishing circle of which is 400 meshes of 12 cm, 4 mm braided polyethylene (PE) twine (4800 cm fishing circle).The net codend was made of 12 cm, double 4 mm braided PE with a 2.54 cm knotless nylon liner.The sampling frame consists of 2006 1.5 × 1.5 min cells, with each cell considered to be a sampling unit, and the net is trawled along the bottom for 20 min, at a speed of 2.9-3.3 knots [25].Eighty samples were removed due to missing values for explanatory variables, reducing the total number of samples to 1740 (Figure 1).Nine explanatory variables were available, including seven continuous variables (depth, water temperature, percentage of oxygen saturation, salinity, dissolved oxygen, latitude, and longitude) and two categorical variables (year and month).Catch rate was calculated as the ratio of biomass collected (kg) and tow distance (m) of the trawl net.The net used by the NEAMAP near shore trawl survey was a three-bridle trawl, the fishing circle of which is 400 meshes of 12 cm, 4 mm braided polyethylene (PE) twine (4800 cm fishing circle).The net codend was made of 12 cm, double 4 mm braided PE with a 2.54 cm knotless nylon liner.The sampling frame consists of 2006 1.5 × 1.5 min cells, with each cell considered to be a sampling unit, and the net is trawled along the bottom for 20 min, at a speed of 2.9-3.3 knots [25].Eighty samples were removed due to missing values for explanatory variables, reducing the total number of samples to 1740 (Figure 1).Nine explanatory variables were available, including seven continuous variables (depth, water temperature, percentage of oxygen saturation, salinity, dissolved oxygen, latitude, and longitude) and two categorical variables (year and month).Catch rate was calculated as the ratio of biomass collected (kg) and tow distance (m) of the trawl net.The Spearman correlation among all explanatory variables was examined to detect highly correlated variables.The explanatory variables were selected through a stepwise selection based on Akaike's Information Criterion (AIC).Interactions between environmental factors were not considered to avoid additional multicollinearity problems and model interpretation difficulties [14,26].The selected variables from the GLMs were then used in the spatial autoregressive models and autocovariate models because both types are extensions of GLMs.

Models Considered
Models that considered nonlinearity and spatial autocorrelations were considered beyond the commonly used generalized linear model.We included five types of models: GLM, GAM, spatial GAM, autoregressive model, and auto-covariate regression [27].These models allowed us to assess the performance of models that accommodating nonlinearity, spatial autocorrelation, or both.

Generalized Linear Model (GLM)
The generalized linear model is among the most commonly used, and the default model structure in the study of species distribution and habitat quality analyses [15].A GLM is usually written as: where g() is the link function, y is the response variable, β 0 is the intercept, β i is the fixed-effect coefficient for variable i, and x i is the ith explanatory variable [28].Because the NEAMAP weakfish data contained a large number of zeros (48%) and plots of the nonzero catch rates were positively skewed, delta-lognormal model structures were considered [29].The delta model is also referred to as a hurdle model, which models zero observations and positive observations fully separately.A delta model contains two components; the first is fitted to estimate the probability of obtaining non-zero captures (Equations ( 2) and ( 3)), and the second is fitted to the positive observations (Equation ( 4)).The final index of relative abundance is obtained by multiplying these two model components [14,[29][30][31][32][33][34].To estimate the probability of nonzero observation, values of 0 (no fish captured) and 1 (at least one fish caught) are regarded as realizations of a Bernoulli random variable with a probability q of positive catch.The parameter q can be estimated by a GLM, which is accomplished through a Bernoulli distribution assumption: where q is the probability that at least one weakfish is captured and the α's are the regression coefficients.The model for positive catch, d in this study was assumed to be a normal distribution as follows: where the θ's are the regression coefficients.

Generalized Additive Model (GAM)
A GAM is a nonparametric generalization of a GLM with additive predictors rather than linear predictors [16].A delta-GAM again makes use of the Bernoulli distribution: where q is the probability of positive observation and f i is the smoothing function for the explanatory variable x i .A GAM with normal distribution is written as: where s i is the smoothing function for the explanatory variable x i .

Spatial GAM
A spatial GAM is an extension of an ordinary GAM by adding an interaction term between longitude and latitude as a smoothing surface.Similarly, a delta spatial GAM makes use of the Bernoulli distribution for the presence or absence of catching: Additionally, spatial GAM with normal distribution is written as:

Autoregressive Models
The delta GLM and delta GAM models are not appropriate for spatially autocorrelated data since observations in the data are not independent.In this situation, autoregressive models could be introduced to the analysis.For the Bernoulli data, an auto-covariate model is applied (see next section for details), while for the positive catch rate observations, autoregressive models are used to account for spatial autocorrelation.
For normally distributed data in linear models, spatial autocorrelation can be incorporated by autoregressive models such as the simultaneous autoregressive model (SAR).SAR models assume that the value of the response variable at location u is not only a function of the explanatory variables, but is also related to the neighboring locations v [23,35,36].The neighborhood relationship among each location is expressed in an n × n binary spatial weight matrix (W), with elements (w uv ) representing connections between u and v.The spatial weight matrix is specified by identifying the neighborhood structure of each cell.Here, the neighborhood was defined to be 100 km as determined by the semivariogram.
For the positive catch rate data, three different SAR models were compared to explore various hypotheses about the occurrence of spatial autocorrelation [20,35].The SAR error model assumes that spatial autocorrelation is found only in the error term such that the GLM (Y = Xβ + ε) is augmented by the inclusion of a spatial structure term (λW) with the spatial error term (µ): where λ is the spatial autoregression coefficient, W is the spatial weight matrix, β is a vector representing the slopes associated with the predictors in the original predictor matrix X, and ε is the identically distributed independent error.The SAR lag model assumes the autoregression process only occurs in the response variable ("inherent spatial autocorrelation"), and includes the term (ρW) to account for the spatial autocorrelation in the response variable: where ρ is the autoregression coefficient, and the remaining terms are as above.The SAR mixed model assumes spatial autocorrelation in both the response and predictor variables.In this case, another term (WXγ) is introduced in the model, which represents the regression coefficients (γ) of spatial lagged explanatory variables (WX):

Auto-Covariate Regression
Applications of SAR models to binary data have been limited [26,37].However, autocovariate regression is applicable in this situation, which is an extension of the Bernoulli GLM by adding a distance-weighted function of neighboring responses [38].The additional parameter is referred to as an auto-covariate, which is, in this case, applied to capture the spatial autocorrelation within Equations ( 3) and ( 5) of the delta-GLM and delta-GAM models, respectively.An auto-covariate regression is written as, where q is the probability of positive observation, θ is a vector of fixed-effect coefficients, X is a matrix of explanatory variables, ρ is the covariate of A such that where y j is the response value at site j among i's set of k i neighbors and w ij is the weight given to site j's influence over site i [39].The auto-covariate regression was used for the presence-absence data and combined with SAR models to form delta SAR models.

Software
All the analyses were conducted in R version 4.0.2[40].Parameters of the GLMs and auto covariate model were estimated using the "glm" function.Parameters of the GAMs were estimated using "gam" function in the "mgcv" package [41].Parameters of the SAR error model were estimated using the "errorsarlm" function, and parameters of the SAR lag model and SAR mixed model were estimated using the "lagsarlm" function from the "spatialreg" package [42].

Simulation
Three simulation scenarios were conducted, respectively with delta GLM, delta GAM, spatial delta GAM, and delta autoregressive models to compare their performances across the survey area under different scenarios (Figure 2).In the first scenario, a delta GLM is fitted to the real survey data to estimate the coefficient of each explanatory variable and the regression residuals.The "true" abundance at each survey point was simulated by applying the estimated coefficient from the delta GLM to the explanatory variables and calculating the response variable by introducing the variance from the regression residuals.Then, delta GAM, spatial delta GAM, and delta spatial models were fitted to the simulated data, respectively, and a standardized catch rate was be obtained.Lastly, the mean squared error between the predicted catch rate and the "true" catch rate was calculated.In the second and third scenarios, we used the same procedure except the simulated catch rates were from delta GAM and delta SAR error models, respectively.Five hundred simulations were conducted for each scenario.between the predicted catch rate and the "true" catch rate.

Model Evaluation
Three model selection approaches were considered to select the most appropriate model: (1) AIC [43], (2) spatial distribution maps of the spatial pattern of the distribution

Model Evaluation
Three model selection approaches were considered to select the most appropriate model: (1) AIC [43], (2) spatial distribution maps of the spatial pattern of the distribution of residuals and correlogram plots, and (3) cross-validation.AIC provides insight about model goodness-of-fit and complexity, the correlogram plots are indicative of Moran's coefficient on distance classes, and cross-validation can assess the performance of model prediction.

Akaike's Information Criterion (AIC)
The AIC function is expressed as: where p is the number of parameters in the model and L is the maximized value of the likelihood function for the model.AIC is particularly useful when dealing with the trade-off between model complexity and goodness-of-fit, and the model with minimum AIC value is preferred.

Cross-Validation
Multi-fold cross-validation was used to assess model fit (training error) and prediction accuracy (test error) [16,26].To conduct k-fold cross-validation, the full dataset was randomly divided into k sub-datasets with equal sizes.Each sub-dataset was then used as a test dataset to predict, while the remaining k-1 sub-datasets were considered as training data for model fitting.Training and test error was computed as: where N is the number of observations, y i is the ith observation, and ŷi is the estimated value.Three-fold cross-validations were performed for the delta models for 500 iterations, and the model that produced lower training and testing errors was preferred.

Results
Correlation coefficients among explanatory variables were high between latitude and longitude (0.91), dissolved oxygen and percentage of oxygen saturation (0.81), and water temperature and month (0.89) (Figure 3).Take water temperature and month for example: both GLM and GAM models that included month and water temperature yielded the smallest AICs (Table 1).Thus, water temperature and month are kept in all the models.Similarly, variables longitude and percentage of oxygen saturation were eliminated before a stepwise selection for all models.A stepwise procedure was applied to the remaining variables included in the delta models and spatial models, and variables with significant effects (p-values < 0.05) were retained.Thus, the variables kept in all ten models were identical: year, month, water temperature, depth, salinity, and latitude.
Among the candidate model types, the spatial GAM provided the smallest AIC for the positive catch rate data, followed by the GAM (3883, Table 1).The spatial GAM model also produces the smallest AIC value for estimating the probability of positive catches, followed by the GAM (1554).
Residuals of all six candidate model types yielded significant Moran's I statistics (Table 2).The spatial correlogram plots showed that the presence of spatial autocorrelation in the residuals was evenly distributed for the fitted spatial autocorrelation models (Figure 4).The different spatial autocorrelation models tended to capture spatial autocorrelation at different scales.For example, the delta spatial GAM and delta GAM models managed to consistently decrease spatial autocorrelation in the residuals, while the delta SAR mixed model could not eliminate it.Among the candidate model types, the spatial GAM provided the smallest AIC for the positive catch rate data, followed by the GAM (3883, Table 1).The spatial GAM model      Maps of residuals from the six delta models indicated that the delta spatial GAM was less autocorrelated when compared to the other models (Figure 5, Figures A1-A6 for the larger view).The primary reasons supporting this conclusion are the lack of dark points of residuals and the general similarity in color.The residuals maps of the delta SAR lag Maps of residuals from the six delta models indicated that the delta spatial GAM was less autocorrelated when compared to the other models (Figures 5 and A1-A6 for the larger view).The primary reasons supporting this conclusion are the lack of dark points of residuals and the general similarity in color.The residuals maps of the delta SAR lag model and delta SAR mixed model exhibit darker red and darker blue points, suggesting that these two models cannot adequately accommodate the autocorrelation in the data.
Results of the 3-fold cross-validation indicated that the delta spatial GAM model provided the smallest training error and testing error on average, followed by the delta GAM and delta SAR error models (Table 3).This result, in combination with the generally favorable delta spatial GAM residual maps, supports that the spatial GAM best performed in fitting weakfish spatial distribution in the mid-Atlantic Bight.model and delta SAR mixed model exhibit darker red and darker blue points, suggesting that these two models cannot adequately accommodate the autocorrelation in the data.In the simulation, the delta spatial GAM achieved the smallest mean squared error between predicted and "true" abundance when compared to other candidate models, and the delta GAM ranked second in all three scenarios, regardless of the "true" models (Table 4).The other spatial models did not perform as well as the non-spatial models and the delta SAR mixed model performed the worst in all three scenarios.

Discussion
Our analyses provide evidence that the delta spatial GAM performs well when fitted to data with appreciable spatial autocorrelation and a high percentage of zeroes.Failing to account for spatial autocorrelation among samples may cause imprecision and inaccuracy in parameter estimation, inferences regarding the importance of explanatory variables, and model predictions [37].However, with spatial models, over-fitting may decrease the prediction ability, and over-fitting usually occurs when a model is excessively complex.The inclusion of a spatial weight matrix based on neighboring samples does increase the complexity of spatial autoregressive models.Although the SAR model performed well in this study, its application does present a tradeoff between formally modeling spatial autocorrelation and a tradeoff between formally modeling spatial autocorrelation on one hand, and concerns about bias and precision on the other hand.Compared to the GLM models, all the spatial autoregressive models yielded smaller AIC values for both the positive catch rate data and the probability of non-zero catches.However, from 3-fold cross-validation, the training and testing errors from the delta SAR lag model and delta SAR mixed model are larger than those from delta GLM, indicating that over-fitting might be a problem when a delta SAR lag model or delta SAR mixed model is used.
Accommodating zero observations in spatial and non-spatial models is a well-documented challenge [44], particularly for datasets that exhibit positively skewed distributions (e.g., lognormal).Delta models along with zero-inflated and hurdle models, are commonly used to analyze fish survey data when a large fraction of zero observations is present.In general, delta GLM and delta (or delta spatial) GAM models are relatively easy to construct, but for spatial auto-regressive models, which can only be fitted to normally distributed data not containing excessive zero observations, we were forced to combine SAR and auto covariate regression models.In this application, the auto covariate regression model played the role of the Bernoulli models in delta GLM and delta GAM.
Variable selection is difficult when explanatory variables are collinear.SAR models are extensions of GLMs such that the variable selection strategies for SAR models are the same as those for GLMs.The explanatory variables month and water temperature were highly correlated and including both within the GLMs implied model over-fitting.However, variable selection strategies for GAMs can be relaxed when compared to those for GLMs, because the smoothing functions can often turn explanatory variables into non-parametric forms, thus making two linearly dependent variables independent [45].In this study, the variables month and water temperature could both be included in the GAMs, which likely contributed to those models providing better fits to the weakfish catch rate data than the GLMs.
All types of statistical models have their own strengths and weakness, and each can provide distinctive clarity on the importance of specific explanatory variables.One of the Fishes 2023, 8, 27 favorable properties of the delta spatial GAM and delta GAM is that they can yield stable and accurate estimation and prediction without incurring additional model complexity necessary to accommodate spatial autocorrelation [14].While this result holds for weakfish catch rate data collected in the mid-Atlantic Bight by the NEAMAP survey, generalizing it to the various types of spatially autocorrelated data commonly collected in the natural sciences warrants further investigation.Another advantage of GAMs is their ability to accommodate nonlinear relationships among response and explanatory variables, which were evident in our results (Figures 6 and 7) and have been shown for other fisheries data [25].The dramatic drop of AIC values in GAMs compared to GLMs and other spatial models also indicated the superiority of GAMs in dealing with non-linearity in the data (Table 1).
for GLMs, because the smoothing functions can often turn explanatory variables into nonparametric forms, thus making two linearly dependent variables independent [45].In this study, the variables month and water temperature could both be included in the GAMs, which likely contributed to those models providing better fits to the weakfish catch rate data than the GLMs.
All types of statistical models have their own strengths and weakness, and each can provide distinctive clarity on the importance of specific explanatory variables.One of the favorable properties of the delta spatial GAM and delta GAM is that they can yield stable and accurate estimation and prediction without incurring additional model complexity necessary to accommodate spatial autocorrelation [14].While this result holds for weakfish catch rate data collected in the mid-Atlantic Bight by the NEAMAP survey, generalizing it to the various types of spatially autocorrelated data commonly collected in the natural sciences warrants further investigation.Another advantage of GAMs is their ability to accommodate nonlinear relationships among response and explanatory variables, which were evident in our results (Figures 6 and 7) and have been shown for other fisheries data [25].The dramatic drop of AIC values in GAMs compared to GLMs and other spatial models also indicated the superiority of GAMs in dealing with nonlinearity in the data (Table 1).In this analysis, Moran's I values of residuals from the six candidate models were all significant, indicating spatial autocorrelation was still present (Figure 4).This problem might be due to insufficient explanatory variables, as only four environmental explanatory variables (water temperature, salinity, depth, and latitude) were considered in the models, and two of them were environmental factors (i.e., salinity and depth).All available explanatory variables measured synoptically with sampling were explored.However, it remains possible that other key variables significantly influence the abundance and spatial distribution of weakfish in the mid-Atlantic.Further, broader scale temporal and spatial variables (e.g., climate, environmental, anthropogenic) not examined in this study could be structuring weakfish in the mid-Atlantic, and analyses of these represent critical next steps.In this analysis, Moran's I values of residuals from the six candidate models were all significant, indicating spatial autocorrelation was still present (Figure 4).This problem might be due to insufficient explanatory variables, as only four environmental explanatory variables (water temperature, salinity, depth, and latitude) were considered in the models, and two of them were environmental factors (i.e., salinity and depth).All available explanatory variables measured synoptically with sampling were explored.However, it remains possible that other key variables significantly influence the abundance and spatial distribution of weakfish in the mid-Atlantic.Further, broader scale temporal and spatial variables (e.g., climate, environmental, anthropogenic) not examined in this study could be structuring weakfish in the mid-Atlantic, and analyses of these represent critical next steps.
In general, the delta spatial GAM performs well when fitted to the NEAMAP catch rate data for weakfish.We can also see that delta GAM is preferred for estimation and prediction but still has difficulty fully explaining spatial autocorrelation.While the SAR error model comparatively explains the spatial autocorrelation better, it cannot accommodate the nonlinearities among catch rate and explanatory variables revealed by the delta GAM.Delta spatial GAM combines the advantages of delta GAM and delta SAR error model: explaining the spatial autocorrelation as well as accommodating the nonlinearity.This conclusion may not hold for all fish species, and the model result may not remain the same for other datasets.However, this method could be a good option when dealing with nonlinearity and spatial autocorrelation especially when there are In general, the delta spatial GAM performs well when fitted to the NEAMAP catch rate data for weakfish.We can also see that delta GAM is preferred for estimation and prediction but still has difficulty fully explaining spatial autocorrelation.While the SAR error model comparatively explains the spatial autocorrelation better, it cannot accommodate the nonlinearities among catch rate and explanatory variables revealed by the delta GAM.Delta spatial GAM combines the advantages of delta GAM and delta SAR error model: explaining the spatial autocorrelation as well as accommodating the nonlinearity.This conclusion may not hold for all fish species, and the model result may not remain the same for other datasets.However, this method could be a good option when dealing with nonlinearity and spatial autocorrelation especially when there are large amounts of zero observations.Future fieldwork for the NEAMAP survey could focus on collecting more related variables measured with sampling.

Fishes
those established by the NEFSC (Northeast Fisheries Science Center's) Bottom Trawl Survey.For each longitudinal zone, the survey area is also stratified by 4 depth strata: 20-40 ft, 40-60 ft, 60-90 ft, and 90+ ft.

Fishes 2023, 8 ,
x FOR PEER REVIEW 3 of 22 sites, which follow a stratified random sampling design, range from 35.16 to 41.44° N, and 70.87 to 75.99° W. The survey area strata were based on longitudinal zones and water depth[25].The survey area was divided into 7 longitudinal zones from Martha's Vineyard through New York and 10 latitudinal zones from New Jersey to North Carolina.The boundaries of these zones corresponded roughly with those established by the NEFSC (Northeast Fisheries Science Center's) Bottom Trawl Survey.For each longitudinal zone, the survey area is also stratified by 4 depth strata: 20-40 ft, 40-60 ft, 60-90 ft, and 90+ ft.

Figure 1 .
Figure 1.Catch rate (log transformed) distribution map for NEAMAP weakfish survey data.Point color is proportional to the value of log transformed catch rate.

Figure 1 .
Figure 1.Catch rate (log transformed) distribution map for NEAMAP weakfish survey data.Point color is proportional to the value of log transformed catch rate.

Fishes 2023, 8 , 22 Figure 2 .
Figure 2. The flow chart of the simulation procedure to estimate the mean squared error (MSE) between the predicted catch rate and the "true" catch rate.

Figure 2 .
Figure 2. The flow chart of the simulation procedure to estimate the mean squared error (MSE) between the predicted catch rate and the "true" catch rate.

Figure 3 .
Figure 3. Pairwise scatter plots and Spearman correlation coefficients between explanatory variables collected by the NEAMAP survey.Abbreviations are as follows: DO = dissolved oxygen, PS = percentage of oxygen saturation, DE = depth, WT = water temperature, SA = salinity, LA = latitude, LO = longitude.

Figure 3 .
Figure 3. Pairwise scatter plots and Spearman correlation coefficients between explanatory variables collected by the NEAMAP survey.Abbreviations are as follows: DO = dissolved oxygen, PS = percentage of oxygen saturation, DE = depth, WT = water temperature, SA = salinity, LA = latitude, LO = longitude.

Figure 4 .
Figure 4. Correlation of residuals of six candidate models fitted to NEAMAP weakfish survey data.Highlighted dots represent significant values, and 1 distance class refers to 100 km ((a) residuals from delta GLM; (b) residuals from delta GAM; (c) residuals from delta spatial GAM; (d) residuals from delta SAR error model; (e) residuals from delta SAR lag model; (f) residuals from delta SAR mixed model).

Figure 4 .
Figure 4. Correlation of residuals of six candidate models fitted to NEAMAP weakfish survey data.Highlighted dots represent significant values, and 1 distance class refers to 100 km ((a) residuals from delta GLM; (b) residuals from delta GAM; (c) residuals from delta spatial GAM; (d) residuals from delta SAR error model; (e) residuals from delta SAR lag model; (f) residuals from delta SAR mixed model).

Figure 5 .Figure 5 .
Figure 5. Maps of residuals of six delta models fitted to NEAMAP weakfish survey data.Blue indicates positive, red indicates negative, and color darkness is proportional to the value of residualsFigure 5. Maps of residuals of six delta models fitted to NEAMAP weakfish survey data.Blue indicates positive, red indicates negative, and color darkness is proportional to the value of residuals ((a) residuals from delta GLM; (b) residuals from delta GAM; (c) residuals from delta spatial GAM; (d) residuals from delta SAR error model; (e) residuals from delta SAR lag model; (f) residuals from delta SAR mixed model).

Figure 6 .
Figure 6.Moving average of residuals over latitude for six candidate models ((a) residuals from delta GLM; (b) residuals from delta GAM; (c) residuals from Delta spatial GAM; (d) residuals from delta SAR error model; (e) residuals from delta SAR lag model; (f) residuals from delta SAR mixed model).

FishesFishes 22 Figure 6 .
Figure 6.Moving average of residuals over latitude for six candidate models ((a) residuals from delta GLM; (b) residuals from delta GAM; (c) residuals from Delta spatial GAM; (d) residuals from delta SAR error model; (e) residuals from delta SAR lag model; (f) residuals from delta SAR mixed model).

Figure A1 .
Figure A1.Map of residuals from delta GLM fitted to NEAMAP weakfish survey data.Blue indicates positive, red indicates negative, and color darkness is proportional to the value of residuals.

Figure A1 .
Figure A1.Map of residuals from delta GLM fitted to NEAMAP weakfish survey data.Blue indicates positive, red indicates negative, and color darkness is proportional to the value of residuals.

FishesFigure A2 .
Figure A2.Map of residuals from delta GAM fitted to NEAMAP weakfish survey data.Blue indicates positive, red indicates negative, and color darkness is proportional to the value of residuals.

Figure A2 .
Figure A2.Map of residuals from delta GAM fitted to NEAMAP weakfish survey data.Blue indicates positive, red indicates negative, and color darkness is proportional to the value of residuals.

Fishes 2023, 8 , 27 16 of 21 22 Figure A3 .
Figure A3.Map of residuals from delta spatial GAM fitted to NEAMAP weakfish survey data.Blue indicates positive, red indicates negative, and color darkness is proportional to the value of residuals.

Figure A3 .
Figure A3.Map of residuals from delta spatial GAM fitted to NEAMAP weakfish survey data.Blue indicates positive, red indicates negative, and color darkness is proportional to the value of residuals.

Fishes 2023, 8 , 27 17 of 21 22 Figure A4 .
Figure A4.Map of residuals from delta SAR error model fitted to NEAMAP weakfish survey data.Blue indicates positive, red indicates negative, and color darkness is proportional to the value of residuals.

Figure A4 . 22 Figure A5 .
Figure A4.Map of residuals from delta SAR error model fitted to NEAMAP weakfish survey data.Blue indicates positive, red indicates negative, and color darkness is proportional to the value of residuals.

Figure A5 . 22 Figure A6 .
Figure A5.Map of residuals from delta SAR lag model fitted to NEAMAP weakfish survey data.Blue indicates positive, red indicates negative, and color darkness is proportional to the value of residuals.

Figure A6 .
Figure A6.Map of residuals from delta SAR mixed model fitted to NEAMAP weakfish survey data.Blue indicates positive, red indicates negative, and color darkness is proportional to the value of residuals.

Table 1 .
AIC comparison of the three variable selection scenarios for each six model types fitted to NEAMAP weakfish survey data.Scenario 1: month but no water temperature; Scenario 2: water temperature but no month; Scenario 3: both water temperature and month.The sub-model in the delta model to estimate the catch rates when only positive values of the response variable were analyzed.b The sub-model in the delta model to estimate the probability of obtaining nonzero captures.

Table 2 .
Moran's I value of residuals of the six delta models fitted to NEAMAP weakfish survey data.

Table 2 .
Moran's I value of residuals of the six delta models fitted to NEAMAP weakfish survey data.

Table 3 .
Training and test errors from 3-fold cross-validation for six delta models fitted to NEAMAP weakfish survey data after 500 iterations.

Table 4 .
Simulation error for six delta models fitted to NEAMAP weakfish survey data after 500 iterations.The "true" models are the models used for data generation.