Polynomial and Wavelet-Type Transfer Function Models to Improve Fisheries’ Landing Forecasting with Exogenous Variables

It is well known that environmental fluctuations and fishing efforts modify fishing patterns in various parts of the world. One of the most affected areas is northern Chile. The reduction of the gaps in the implementation of national fisheries’ management policies and the basic knowledge that supports the making of such decisions are crucial. That is why in this research, a transfer function method with variable coefficients is proposed to forecast monthly disembarkation of anchovies and sardines in northern Chile, taking into account the incidence of large-scale climatic variables on landings. The method uses a least squares procedure and wavelets to expand the coefficients of the transfer function. Linear estimators of the time varying coefficients are proposed, followed by a truncation of the wavelet expansion up to an appropriate scale. Finally, the estimators for the transfer function coefficients are obtained by using the inverse wavelet transformation. Research results suggest that the transfer function models with variable coefficients fit the behavior of the anchovies’ landing with great accuracy, while the use of transfer function models with constant coefficients fits sardines’ landings better. Both fisheries’ landings could be explained to a large extent from the large scale climatic variables.


Introduction
Fish are organisms that cannot regulate the temperature of the environment independently, and the environment temperature changes influence their geographical distribution, migratory routes, and occupation of habitat [1]. On the other hand, although the species present variability associated with environmental changes, the composition and abundance are also affected by predators, competitors, and prey [2]. The link between the variation of anchovy abundance and environmental changes in different time-space scales opens the possibility of predicting fluctuations in landings in the short, medium, and long term [3]; which is one of the main objectives of fisheries' management [4].
As indicated by [5], it is fundamental to consider the impact of the environment and the interactions between fisheries for their management. In effect, the fisheries show different trends in response to environmental changes, since these changes affect various stages of larvae, reproduction, grazing habitat, and migration of different populations. In addition, an inevitable increase in fishing effort must be added. Potential climate change and climate variability at different time scales have immediate or phase effects, both locally and regionally. Possible changes in environmental variables by expanding the transfer function coefficients to a time varying approach by using a least squares procedure. It also highlights the improved performance of using the combination of traditional statistical techniques with the aforementioned extension when implemented to forecast sardine landings. Likewise, seeking to optimize the goodness of fit and quality of the forecast, it was also observed that after the application of various transformations to stabilize the variability of the observed series, significant improvements in the results could be achieved.
The paper is divided as follows: In Section 2, we briefly describe the time series modeling strategy and the required steps to fit these models. In Section 3, we explain the datasets used in the analysis, the methodology to process them, and all the results at each step, when fitting the transfer function models. We finally provide in Section 4 some conclusions and potential extensions of this work.

Environmental Setting and Data
Industrial fishing in the northern part of the country began in the 1950s with landings of Peruvian anchovy (Engraulis ringens), which increased, fluctuated, and then fell strongly in 1972-1973, remaining low until 1985, when they again began to fluctuate and increase, reaching new historic levels [13]. After the collapse of anchovy in 1972-1973, the sardine became a targeted species (Sardinops sagax), with catches increasing until 1985, before falling notably and remaining low until the present. The study zone comprised the area covered by the industrial seine fishing fleet that operates in northern Chile (18 • 21 -24 • 00 S) from the coast to 73 • W. The analyzed data included environmental and fishing registers for the 1963-2011 period, Table 1 shows a description of each variable considered for analysis.

Wavelet Transfer Function Model
There are many situations requiring the modeling of the impact of a regressor variable on a response variable through time, when the regressors and the response variables are both assumed stochastic processes. Herein, we will use the term predictors for the regressor variables and predictants for the response variable. One or more predictors can be considered as input variables to the model. On the other hand, predictors can have a lagged effect on the predictant variables, and one must decide how many past values of the predictor variable would make an impact on the predictant variable [16]. Following the transfer function models with the time varying approach used by [17], one might consider the following model: where the time series X t,T and Y t,T correspond to the explanatory and response variable, respectively, T is the number of observations, and ε t is considered independent and identically distributed (0, σ 2 ) random error. It is assumed that the error and the entries in the series are independent. The functions δ i (u), i = 1, . . . , m and ω j (u), j = 0, 1, . . . , n, have compact support in the interval [0,1] and are connected to the underlying series by an appropriate adjustment on the time scale, u = t/T. For the estimation of δ i (u) and ω j (u), i = 1, . . . , m, j = 0, 1, . . . , n, wavelet expansions are used in the time domain. The estimators of the wavelet coefficients are obtained through the least squares method [17]. From two basic functions, the scaling function φ(x) and the wavelet function ψ(x), infinite collections of scale and translated versions are defined, form an orthonormal basis of L 2 (R), for some coarse scale (l). To achieve a parsimonious representation of the amplitude of wavelet function classes in the series, it is necessary to construct φ and ψ functions with compact support, which generate an orthonormal system, with frequency and spatial localization [17]. The functions ω j (u) and δ i (u) are defined in a compact range [0,1]. Therefore, an orthonormal system that spans L 2 ([0, 1]) must be taken into account. Some authors use an adaptation step [18] with the periodized wavelet defined by:φ and these generate a ladder at the multiple resolution levelṼ 0 ⊂Ṽ 1 ⊂ · · · , in which the spacesṼ j are generated byψ j,k . For those that are not necessary negative values of j,φ =φ 0,0 = 1. If j ≤ 0, [19]). In the work, periodized wavelets are denoted simply by ψ j,k . Consequently, for any function f ∈ L 2 ([0, 1]), an orthogonal series expansion can be considered of the form: where we take l = 0 and I j = k : k = 0, · · · , 2 j − 1 . For each j, the set I j brings the values of k, so that β j,k belongs to the scale 2 j . For example, for j = 3, there are eight wavelet coefficients on a scale of 2 3 . The wavelet coefficients are given by: Often, the sum of Equation (3) is considered for a maximum level J, such that we approximate f in the spaceṼ j (for more details, see [17]).

Estimators of Time Varying Coefficients
The objective is to estimate the functions δ i (u) , i = 1, 2, · · · m and ω j (u) , j = 1, 2, · · · n (δ i (u) and ω j (u) ∈ [0, 1]) that appear in model Equation (1), given the T observations of the series. We assume that the orders of m and n are fixed and known. The idea is to expand these functions in wavelet series of the form [17]: The empirical wavelet coefficients are obtained by minimizing the expression: δ i (u) and ω j (u) are replaced by Equation (6) for v = max (m, n). In matrix notation, the solution of the least squares problem given by Equation (7), for 0 ≤ m ≤ J − 1, is obtained from the equations: where you have to do: After solving forβ (δ 1 ) andβ (ω 0 ) , they can be inserted in Equation (6). In this paper, the models were estimated according to the methodology proposed by [17], simplifying the steps to be followed as shown in Figure 1. To obtain the periodized wavelet ψ j,k (u) and φ j,k (u) Equation (2), the methodology implemented by [20] is used. We estimate the empirical wavelet coefficients by least squares in two stages, as described in Figure 1. In the first stage of the process, we return Y t,T (the response variable) from X t−j,T and Y t−i,T (explanatory variables) Equation (1). Expanding δ i t T and ω j t T in wavelet series, we obtain Equation (13). These empirical wavelet coefficients can be estimated using a Daubechies filter as in [17] and identifying the best resolution level using skill comparison metrics as: root mean squared error (RMSE) and mean absolute error (MAE), as applied in this document, but other metrics can be implemented as in [21], where a new wavelet entropy based approach was proposed to identify the optimal model specification and construct the effective wavelet entropy based forecasting models.
where we have restricted the values of j to a maximum scale. After obtaining estimates of the wavelet coefficients δ t (B) and ω t (B), we use the inverse wavelet transformation to obtain the estimates of δ t T andω t T , respectively, and with e t,T as the error of the regression model of the first stage Equation (14). For the coefficient ψ jk (u) and φ jk (u), we calculated j matrices (nine matrices) of dimension N · 2 J , for the maximum value of J, (512 × 512). That is to say, every periodized moment of the signal (t/T) is an element to be sampled and convolved with the wavelet for all the possible dyadic translations and resolutions; in our case, several Daubechies filters were implemented.
In the second stage of the process, we fit the model: where e 2 is the random error vector, with e 2;t,T , t = 2, · · · , T. In [17], it was show that each component of e 2 follows a locally stationary moving average process of order two: and it is obtained that e 2;t,T = t,T − δ 2 1 (t) t−2,T , which is a locally stationary MA (2). In this sense, in the second stage of the process, t,T of Equation (1) is replaced with MA(2) from e t,T , and Y t,T is replaced withŶ t−i,T to obtain the final estimates ofδ i t T andω j t T . According to the methodology of [17], the mother and father wavelets periodized with the original signal are convolved, and after several algebraic operations resulting from the least squares process, we obtain the wavelet coefficients, α and β Equation (4), which are introduced in the equation of wavelet expanded series to obtain finally the δ i (u) and ω j (u) coefficients Equation (6). The coefficients δ i (u) and ω j (u) are obtained in the wavelet domain. To interpret these coefficients in the time domain and substitute in Equation (1) as the weight each explanatory variable of the model, their inverse function must be calculated, which is resolved very similarly to how the inverse of a Fourier transform is calculated, that is to say: Synthesizing, we estimate the empirical wavelet coefficients by least squares in two stages. In the first stage of the process, we estimate the initial residuals e t,T and the adjusted values ofŶ t,T to obtain the final estimates ofδ(t, T) andω(t, T) in the second phase of the process. At each stage of the process, it must be identified under what level of resolution the error is minimized, and the model is better adjusted to the data. Given that in practice, an appropriate number of levels based on the nature of the signal is usually selected [22], we perform calculations of various goodness of fit indicators to identify which one or more resolution levels we could reconstruct ofδ(t, T) andω(t, T), and the calculation was applied in both phases of the process Figure 1.

Polynomial Transfer Function Model
When requiring the modeling of the impact of a regressive variable on a response variable over time and the regressors and the response variables are assumed as stochastic processes, the approach of the transfer function models proposed by [23] can be followed, expressed as the following lagged regression model: where X t and η t are independent stationary processes and the weights α j measure the impact of the past values of the input variable X t in Y t . The polynomial α (B) = ∑ ∞ j=0 α i B i is called the transfer function, and it is a polynomial in the delay operator B such that BX t = X t−1 . Its coefficients must satisfy ∑ ∞ j=0 α j < ∞ to ensure stability. The random noise η t is assumed to be stationary and can be written in the form η t = where Z t is a white noise process with variance σ 2 Z . Box et al. [23] proposed a more parsimonious representation of the transfer function as a polynomial relation: where δ (B) = δ 0 + δ 1 B + · · · + δ s B s and ω (B) = 1 − ω 0 + ω 1 B − · · · − ω r B r and d is a delay coefficient. The transfer function (s, d, r) will be determined completely by estimating the coefficients of the polynomials δ (B) and ω (B) the delay coefficient d. This involves estimating the vector of parameters (δ 0 , δ 1 , · · · , δ s , ω 1 , · · · , ω r ). It is possible to consider a transfer function model with two or more stochastic input variables. For two input variables x 1t and x 2t , the model has the form: This model has a much larger number of parameters than the model Equation (19), but its adjustment procedure is similar. A sequential methodology is applied to estimate the parameters of the transfer function presented in Equation (19). The methodology begins by adjusting an autoregressive moving average (ARMA) models of order (p, q) to the input time series x t of the form φ (B) x t = Θ (B) W t , where W t is a white noise process with variance σ 2 In this equation, we assume that W t andη t are independent, where W t is the pre-whitened input series x t andỹ t andη t are the filtered output series of y t and the random noise η t , respectively, using the operator of the ARMA (p, q) model as a filter. It can be shown that the cross-correlation between the filtered series and the pre-whitened series W t is γỹ tW t (h) = σ 2 W α h ; therefore, their sample values allow obtaining an approximate estimate of the coefficients of the transfer function α 0 , α 1 , · · · [16].
Shumway and Stoffer [24] presented a sequential process to fit the transfer function model, and this procedure is applied to the data as follow: (i) Fit an ARMA model to the input series to estimate the parameters φ, Θ and σ 2 w in the specification φ (B) x t = Θ (B) w t . Retain ARMA coefficients for use in the next Step (ii) and the fitted residualsŵ t for use in Step (iii).
(iii) Use the cross-correlation function betweenỹ t andŵ t in Steps (i) and (ii) to suggest a form for the components of the polynomial α (B) = δ(B)B d ω(B) and the estimated time delay d. (iv) Obtainβ = ω 1 , · · · ,ω r ,δ 0 , · · · ,δ s by fitting a linear regression. Retain the residualsû t for use in Step (v).
(v) Apply the moving average transformation to the residualsû t to find the noise seriesη t and fit an ARMA model to the noise, obtaining the estimated coefficients inφ η (B) andΘ η (B).

Model Validation Methods
The model was validated using 76 records not considered in the fitting procedure. The validation data corresponded to the period between September 2005 and December 2011. As suggested by [25], model parameterization was achieved by minimizing together the root mean squared error (RMSE) and the mean absolute error (MAE) to ensure optimal results over the prediction and maximizing the correlation coefficient (R). The commonly used RMSE quantifies the differences between predicted and observed values and thus indicates how far the forecasts are from actual data. A few major outliers in the series can skew the RMSE statistic substantially because the effect of each deviation on the RMSE is proportional to the size of the squared error.

Data Analysis
As detailed in the previous section, the disembarkation of two species of fish (sardines and anchovies) was selected as dependent variables, whose variability could be potentially explained from the nine climatic variables presented in Table 1. In order to work the variables at the same scale, they were anomalized and standardized; this in turn allowed us to explore model fitting results under diverse temporal patterns (seasonal and slightly stationary).

Variable Anomaly
The first step before estimating the models was to anomalize each of the indices, as well as the disembarkation of anchovy. This implies subtracting from each of the data the average monthly value calculated during a reference period and then dividing it by the monthly standard deviation (both calculated over a base period). To do so and because the World Meteorological Organization (WMO) states that a period of at least 30 years should be used for the anomaly calculation, the reference period from 1963 to 2011 was considered adequate. In other words: where the mean is given byX i = ∑ 2011 . The series of sardine landings was not anomalized because their cyclic variation is not determined by a monthly base period. To stabilize the series, a logarithmic transformation was applied.

Variable Standardization
The standardization consisted of subtracting from each of the data the average value calculated in a reference period and then dividing it by the standard deviation (both calculated over a base period). In other words: where the mean is given byX i = ∑ 2011 j=1963 X ij 588 and the variance σ 2 = ∑ 2011 j=1963 (X i −X) 2 587 . The first step in any analysis and forecast of time series is to plot the observations against time, to get an idea of the possible trends and/or cycles associated with the temporal evolution of the datasets [25]. In Figure 2, it can be seen that the standardized series had periodic variation, while the exogenous variables did not seem to show periodic variation when they were anomalized; also see that the fish landings stabilized slightly when their logarithmic transformations was standardized. Figure 3 describes the process to build the transfer functions. The raw data were divided into training and test data. The training data started from January 1963 to August 2005 (totaling 512 records), while the test data started from September 2005 to December 2011 (76 records). The different transformations (logarithmic, anomaly, standardize) were applied to the data partitions, and the transfer function models were fitted (training data). Fish landing forecasts (test data) and goodness of fit metrics were calculated for both the fitted and the forecast values. Finally, the metrics were compared, and the best model was identified.   Table 2 shows the variables that had a significant cross-correlation with fish landings. From this crossing, the transfer function models began to be built. Based on the cross-correlation function between the fisheries' landings and macro-climatic phenomena (Step iii, Section 2.3), correlation patterns among the series were detected; also see the variables that had the greatest linear association with the monthly anchovy and sardine disembarkation in northern Chile. These were SST,turbulence index (TI), and Niño Zone 1 + 2 (N12). Likewise, the variables with the lowest linear association were the Pacific Decadal Oscillation index (PDO), El Niño multivariate Southern Oscillation Index (MEI), Southern Oscillation index (SOI), and N34 (highly associated with N12).  It should be noted that there is a wide range of methods to identify significant variables and associated lags, such as those shown in [26]; however in the document, the methodology suggested by [24] and specified in the previous section was used (Step iii, Section 2.3). Table 3 presents a summary of the main models obtained, with their indicators of the goodness of fit and forecast. Synthesizing, a total of 31 combinations of resolution levels was considered for ten Daubechies filters in the first stage. In the second stage, we worked with the residuals, fitted values, and the filter selected from the first stage and 31 resolution combinations. A total 310 models was fitted for each desired transfer function, in order to select the best model under diverse goodness of fit criteria.

Wavelet Transfer Function Models
The results shown in Table 3 allowed us to identify that through the transfer functions of variable coefficients, we could fit both anomalized and standardized data with a good level of accuracy; however, its performance was much better when the signals were standardized. Likewise, it can be observed that the models continued showing a good behavior in their residuals for the forecast phase, preserving a moderate percentage of strength in their coefficient of determination (calculated exclusively for the data based on the forecast).
We worked with each level j independently, as well as together until reaching an optimum level in the fitting process. In principle, it was observed for each Daubechies filter and under each combination of j that the mean absolute error (MAE) was minimized or the coefficient of determination (R 2 ) was optimized. One might think that if the model fit were good enough, the behavior should reflect simultaneously, an elevated (R 2 ) for the minimum MAE. However, this is not always the case, and this is because the wave at a specific j level can have the pattern of the behavior of the original signal, but not be at the scale of the same; therefore, the error could be large like (R 2 ). Therefore, the best possible model for each Daubechies filter was identified, and the second stage of the process was executed, by tracking the metric goodness of fit again in the second stage, in order to obtain the best fit. Similarly, the goodness of fit statistics were calculated, but more indicators were added to make sure that the most appropriate decision was made. In this last adjustment, the square root of the average value of the squared residuals (RMSE), MAE, the coefficient of determination (R 2 ), the Pearson correlation coefficient, the Spearman correlation coefficient were calculated, as well as the Kendall correlation coefficient, all for the Daubechies filter selected in Phase 1, and in the same way for various resolution levels (j).
For example, in Figure 4, we show the goodness of fit metrics for wavelet transfer function models estimated to explain anchovy landings (standardized data). Traditional metrics were used to determine the optimal families of wavelets and the decomposition scale, which would produce an improved forecasting performance, so the filter and resolution were selected to achieve the best metrics (high R 2 and minimum residuals). The same procedure was performed for all fitted models. The wavelet entropy algorithm such as the one presented in [21] could be used in the future, to determine the optimal wavelet families and the decomposition scale that would produce an improved forecasting performance.  Table 3 presents a summary of the models obtained, with their indicators of the goodness of fit and forecast. The results shown allowed us to identify that the transfer functions of constant coefficients can model both anomalized and standardized data with a good level of fit; however, its performance was much better when the signals were standardized. Likewise, it can be observed that the models continued showing a good behavior in their residuals for the forecast phase, conserving a moderate percentage of strength in their coefficient of determination (calculated exclusively for the data based on the forecast).

Comparison between Transfer Function Modeling Approaches
When compared to the methodologies implemented, wavelet and polynomial coefficients with the different transformations applied to the data, in order to select the most accurate model in terms of residuals, it was observed that when the series were standardized to explain the anchovies' landings, the transfer function wavelet model showed a better fit, while when the exogenous series were anomalized to explain the standardized log(sardine landing), the transfer function polynomial model showed a better fit. In this sense, we must keep in mind that it is convenient for the data to have certain properties so that each approach is optimal. When using constant coefficient transfer functions, it is recommended that the series have a more stable variance, which is achieved with the anomalization; while when using the variable coefficient transfer functions, the high and low frequency components of the seasonal series are captured in such a way that the models perform better. Figure 5 shows the residuals of the fitted models, for different treatments of the variables and transfer function models. It was verified that the best models (with smaller residuals) were obtained when using a transfer function with variable coefficients to explain the anchovies' landings (data standardized); and when using a transfer function with polynomial coefficients to explain the sardines landings (explanatory variables anomalized and sardine series in standardized logarithmic scale). Figure 6 shows the fit behavior of both models, as well as the forecasts obtained from the test data.
Likewise, we can see that the variance explained during the validation phase for anchovy landings (96, 9%) showed important improvements in relation to previous works, such as that presented by [7], where the variance explained in external validation fluctuated between 84% and 87%, or in sardine and anchovy landing forecasts presented by [9], in which the variance explained by both models was slightly higher than 82%.  We can observe that the sardine landings did not reach as good a fit as those of anchovies. This could be optimized in a future work by considering, for example, a truncated model as suggested by [27], where the data series were modeled based on the assumption that the data followed a truncated and transformed multivariate normal distribution. In their work, the data predictive inferences showed very realistic results, capturing the typical variability of the series in time and space.

Conclusions
Based on the analysis of models built to forecast the monthly disembarkation of anchovy (Engraulis ringens) and sardine (Sardinops sagax) in northern Chile, the following conclusions emerged from the analysis: When various transformations were applied to the data to achieve better model precision, large differences in the benefits of the selected fitted models could be identified. Records of anchovy landings were better fitted and forecast with standardized data under a transfer model with wavelet coefficients, using Daubechies 10 filters with low resolution levels (associated with the slightly compressed wavelets j = 1 : 2). Sardine landings were better fitted when the variance of the landings was stabilized using the logarithmic function; and the variance of the explanatory variables was stabilized by anomalizing the variables; finally, modeling these sardine landings in a logarithmic scale with a traditional transfer function.
The variables that allowed explaining in a more robust way the disembarkation of anchovy was the turbulence index from Antofagasta Coastal Oceanographic Station (TI) and the Pacific sea surface temperature index (Niño Zone 1 + 2: N12); while the disembarkation of sardines was explained by local climatic variables: TI, sea surface temperature from Antofagasta Coastal Oceanographic Station (SST), and the log disembarkation of anchovy.
Given that the process of selecting the appropriate number of scales to optimize the model fit was made according to the researcher's choice, it is advisable to implement in the future some entropy based techniques that allow for the best possible scale selection. It is also recommended to evaluate if the results can be optimized considering other wavelet filters in addition to the Daubechies filters. Likewise, a non-linear structure model could be considered (for example, thresholding wavelet coefficients) in order to determine the best model structure for fishery prediction. These results could also be optimized by implementing bootstrapping techniques for the fitted parameters in order to quantify their uncertainty.