Next Article in Journal
Embedding Learning with Triple Trustiness on Noisy Knowledge Graph
Previous Article in Journal
The Connection between Bayesian Inference and Information Theory for Model Selection, Information Gain and Experimental Design

Entropy 2019, 21(11), 1082; https://doi.org/10.3390/e21111082

Article
Polynomial and Wavelet-Type Transfer Function Models to Improve Fisheries’ Landing Forecasting with Exogenous Variables
1
Escuela de Ingeniería Informática, Pontificia Universidad Católica de Valparaíso, Valparaíso 2340025, Chile
2
Escuela de Ingeniería C. Biomédica, Universidad de Valparaíso, Valparaíso 2391415, Chile
3
Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, IL 61820, USA
*
Author to whom correspondence should be addressed.
Received: 26 September 2019 / Accepted: 1 November 2019 / Published: 5 November 2019

Abstract

:
It is well known that environmental fluctuations and fishing efforts modify fishing patterns in various parts of the world. One of the most affected areas is northern Chile. The reduction of the gaps in the implementation of national fisheries’ management policies and the basic knowledge that supports the making of such decisions are crucial. That is why in this research, a transfer function method with variable coefficients is proposed to forecast monthly disembarkation of anchovies and sardines in northern Chile, taking into account the incidence of large-scale climatic variables on landings. The method uses a least squares procedure and wavelets to expand the coefficients of the transfer function. Linear estimators of the time varying coefficients are proposed, followed by a truncation of the wavelet expansion up to an appropriate scale. Finally, the estimators for the transfer function coefficients are obtained by using the inverse wavelet transformation. Research results suggest that the transfer function models with variable coefficients fit the behavior of the anchovies’ landing with great accuracy, while the use of transfer function models with constant coefficients fits sardines’ landings better. Both fisheries’ landings could be explained to a large extent from the large scale climatic variables.
Keywords:
fisheries’ landings; time series forecasting; wavelets

1. Introduction

Fish are organisms that cannot regulate the temperature of the environment independently, and the environment temperature changes influence their geographical distribution, migratory routes, and occupation of habitat [1]. On the other hand, although the species present variability associated with environmental changes, the composition and abundance are also affected by predators, competitors, and prey [2]. The link between the variation of anchovy abundance and environmental changes in different time-space scales opens the possibility of predicting fluctuations in landings in the short, medium, and long term [3]; which is one of the main objectives of fisheries’ management [4].
As indicated by [5], it is fundamental to consider the impact of the environment and the interactions between fisheries for their management. In effect, the fisheries show different trends in response to environmental changes, since these changes affect various stages of larvae, reproduction, grazing habitat, and migration of different populations. In addition, an inevitable increase in fishing effort must be added. Potential climate change and climate variability at different time scales have immediate or phase effects, both locally and regionally. Possible changes in environmental variables such as sea surface temperature (SST), depth of the mixing layer, depth of the thermocline, intensities of up-welling currents, the mechanism of nutrient concentration, and changes in the ice marine layers [2], although mild, may affect the food chain, thus drastically altering the abundance, distribution, and availability of fish populations. In addition, climatic change could have consequences on the composition of the community and the performance of ecosystems [6].
Regarding the environment and resource analysis in the anchovy and sardine fisheries of northern Chile, the work in [7] developed an artificial neural network (ANN) model for the anchovies’ fishery. In [8], the authors developed a multivariate ANN model considering monthly environmental variables such as the sea surface temperature, up-welling index, and sea level; while [9] developed ANN models for anchovy and sardine, respectively, taking into account, in addition to the environmental variables, the interaction between species. These studies made a brief analysis of the correlation between variables, self-correlation, and cross-correlation using non-linear functions to find functional relationships to introduce different models [2]. On the other hand, the wok in [10] predicted the environmental variability in the anchovy fishery in the northern zone of Chile, through the development of spatio-temporal indicators of the ecosystem, statistical relationships between indicators, GIS functions (Geographical Information Systems), and ANN models, offering an integration in the prediction of anchovy abundance.
With respect to other statistical techniques implemented to forecast fishing landings, there was the application of a hybrid model studied by [11], in which the potentialities of autoregressive models integrated moving averages (ARIMA) were combined with wavelet theory to enhance the precision of fishing landings’ forecasts in Malaysia. Their study found that the combined model provided more accurate forecasts of fishing landing series than the individual ARIMA model. Other studies have presented a forecast strategy based on the decomposition of stationary wavelets combined with linear regression to improve the accuracy of pelagic one month ahead fish catches predictions of the fishing industry in the southern zone of Chile [12]. The authors demonstrated the usefulness of the strategy in the anchovy catch dataset for monthly periods, explaining 98 % of the variance with a parsimonious reduction.
Considering the above and in virtue of the fact that in Chile, the average annual landings in the last 30 years was 4.8 million tons and the agricultural resources in the northern zone represent 40 % [13], as well as given that in this area, the fishery is based successively on anchoveta (Engraulis ringens) and sardine (Sardinops sagax), with notable changes associated with fishing effort and environmental fluctuations (see [14,15]), it is considered pertinent to implement scientific techniques aimed at studying functional relationships that can be analyzed in depth, in order to reduce gaps in the implementation of national fisheries’ management policies and provide the basic knowledge that supports the making of such decisions [2].
Currently, the correct prediction of fishing landings in particular is a point of special interest for fisheries’ management, and researchers who focus on modeling time series of landings are looking for prediction models that take into account various patterns. In the literature, most researchers implement potential methods such as ANN and hybrid models such as autoregressive integrated mobile average with ANN, among others, effectively synced to model time series and predict fishing landings; however, there is still a wide range of hybrid techniques that can be implemented to achieve improvements in predictions.
In this sense, this research proposes the implementation of highly predictive techniques to model and study climatic phenomena, specifically the quantitative characterization of the elements that determine the monthly disembarkation of anchovies and sardines in northern Chile. The work revolves around the following question: Which time series model would allow forecasting more accurately the monthly disembarkation of anchovies and sardines registered in northern Chile, under the influence of macro-climatic variables such as the sea surface temperature and the associated ENSO phenomenon?
The benefits of our research reside in the improvements obtained in the adjustment and forecasting of anchovy landings when the series are broken down into their high and low frequency components, by expanding the transfer function coefficients to a time varying approach by using a least squares procedure. It also highlights the improved performance of using the combination of traditional statistical techniques with the aforementioned extension when implemented to forecast sardine landings. Likewise, seeking to optimize the goodness of fit and quality of the forecast, it was also observed that after the application of various transformations to stabilize the variability of the observed series, significant improvements in the results could be achieved.
The paper is divided as follows: In Section 2, we briefly describe the time series modeling strategy and the required steps to fit these models. In Section 3, we explain the datasets used in the analysis, the methodology to process them, and all the results at each step, when fitting the transfer function models. We finally provide in Section 4 some conclusions and potential extensions of this work.

2. Materials and Methods

2.1. Environmental Setting and Data

Industrial fishing in the northern part of the country began in the 1950s with landings of Peruvian anchovy (Engraulis ringens), which increased, fluctuated, and then fell strongly in 1972–1973, remaining low until 1985, when they again began to fluctuate and increase, reaching new historic levels [13]. After the collapse of anchovy in 1972–1973, the sardine became a targeted species (Sardinops sagax), with catches increasing until 1985, before falling notably and remaining low until the present. The study zone comprised the area covered by the industrial seine fishing fleet that operates in northern Chile ( 18 21 24 00 S) from the coast to 73 W. The analyzed data included environmental and fishing registers for the 1963–2011 period, Table 1 shows a description of each variable considered for analysis.

2.2. Wavelet Transfer Function Model

There are many situations requiring the modeling of the impact of a regressor variable on a response variable through time, when the regressors and the response variables are both assumed stochastic processes. Herein, we will use the term predictors for the regressor variables and predictants for the response variable. One or more predictors can be considered as input variables to the model. On the other hand, predictors can have a lagged effect on the predictant variables, and one must decide how many past values of the predictor variable would make an impact on the predictant variable [16]. Following the transfer function models with the time varying approach used by [17], one might consider the following model:
Y t , T = i = 1 m δ i t T Y t i , T + j = 0 n ω j t T X t j , T + ε t ,
where the time series X t , T and Y t , T correspond to the explanatory and response variable, respectively, T is the number of observations, and ε t is considered independent and identically distributed ( 0 , σ 2 ) random error. It is assumed that the error and the entries in the series are independent. The functions δ i ( u ) , i = 1, …, m and ω j ( u ) , j = 0, 1, …, n, have compact support in the interval [0,1] and are connected to the underlying series by an appropriate adjustment on the time scale, u = t / T . For the estimation of δ i ( u ) and ω j ( u ) , i = 1, …, m, j = 0, 1, …, n, wavelet expansions are used in the time domain. The estimators of the wavelet coefficients are obtained through the least squares method [17].
From two basic functions, the scaling function ϕ ( x ) and the wavelet function ψ ( x ) , infinite collections of scale and translated versions are defined, ϕ j , k ( x ) = 2 j / 2 ϕ ( 2 j x k ) , ψ j , k ( x ) = 2 j / 2 ψ ( 2 j x k ) , j , k Z = 0 , ± 1 , . It is assumed that ϕ ( l , k ) ( . ) ( k z ) ψ j , k . j l ; k z form an orthonormal basis of L 2 R , for some coarse scale ( l ) . To achieve a parsimonious representation of the amplitude of wavelet function classes in the series, it is necessary to construct ϕ and ψ functions with compact support, which generate an orthonormal system, with frequency and spatial localization [17]. The functions ω j ( u ) and δ i ( u ) are defined in a compact range [0,1]. Therefore, an orthonormal system that spans L 2 ( [ 0 , 1 ] ) must be taken into account. Some authors use an adaptation step [18] with the periodized wavelet defined by:
ϕ ˜ j , k x = n Z ϕ j , k x n ψ ˜ j , k x = n Z ψ j , k x n
and these generate a ladder at the multiple resolution level V ˜ 0 V ˜ 1 , in which the spaces V ˜ j are generated by ψ ˜ j , k . For those that are not necessary negative values of j, ϕ ˜ = ϕ ˜ 0 , 0 = 1 . If j 0 , ψ ˜ j , k x = 2 j 2 (see [19]). In the work, periodized wavelets are denoted simply by ψ j , k . Consequently, for any function f L 2 0 , 1 , an orthogonal series expansion can be considered of the form:
f x = α 0 , 0 ϕ x + j 0 k I j β j , k ψ j , k x ,
where we take l = 0 and I j = k : k = 0 , , 2 j 1 . For each j, the set I j brings the values of k, so that β j , k belongs to the scale 2 j . For example, for j = 3, there are eight wavelet coefficients on a scale of 2 3 . The wavelet coefficients are given by:
α 0 , 0 = f x ϕ x d x , β j , k = f x ψ j , k x d x
Often, the sum of Equation (3) is considered for a maximum level J, such that we approximate f in the space V ˜ j (for more details, see [17]).
f x α 0 , 0 ϕ x + j = 0 J 1 k I j β j , k ψ j , k x ,

Estimators of Time Varying Coefficients

The objective is to estimate the functions δ i u , i = 1 , 2 , m and ω j u , j = 1 , 2 , n ( δ i u and ω j u 0 , 1 ) that appear in model Equation (1), given the T observations of the series. We assume that the orders of m and n are fixed and known. The idea is to expand these functions in wavelet series of the form [17]:
δ i u = a 0 , 0 δ i ϕ u + j = 0 J 1 k I j β j k δ i ψ j k u ω i u = a 0 , 0 ω i ϕ u + j = 0 J 1 k I j β j k ω i ψ j k u
The empirical wavelet coefficients are obtained by minimizing the expression:
t = v + 1 T Y t , T i = 1 m δ i u Y t i , T j = 0 n ω j u X t j , T 2
δ i u and ω j u are replaced by Equation (6) for v = m a x m , n . In matrix notation, the solution of the least squares problem given by Equation (7), for 0 m J 1 , is obtained from the equations:
β ^ δ 1 β ^ ω 0 = Ψ Y Ψ Y Ψ Y Ψ Y Ψ X Ψ Y Ψ Y Ψ X 1 Ψ Y Y Ψ X X
where you have to do:
Ψ Y = Φ Y Ψ Y 0 Ψ Y 1 Ψ Y J 1 ; Ψ X = Φ X Ψ X 0 Ψ X 1 Ψ X J 1
Φ Y = ϕ 0 , 0 2 T Y 1 , T ϕ 0 , 0 3 T Y 2 , T ϕ 0 , 0 T T Y T 1 , T ; Φ X = ϕ 0 , 0 2 T X 1 , T ϕ 0 , 0 3 T X 2 , T ϕ 0 , 0 T T X T 1 , T
Ψ Y m = ψ m 0 2 T Y 1 , T ψ m 1 2 T Y 1 , T ψ m , 2 m 1 2 T Y 1 , T ψ m 0 3 T Y 2 , T ψ m 1 3 T Y 2 , T ψ m , 2 m 1 3 T Y 2 , T ψ m 0 T T Y T 1 , T ψ m 1 T T Y T 1 , T ψ m , 2 m 1 T T Y T 1 , T ;
Ψ X m = ψ m 0 2 T X 2 , T ψ m 1 2 T X 2 , T ψ m , 2 m 1 2 T X 2 , T ψ m 0 3 T X 3 , T ψ m 1 3 T X 3 , T ψ m , 2 m 1 3 T X 3 , T ψ m 0 T T X T , T ψ m 1 T T X T , T ψ m , 2 m 1 T T X T , T
After solving for β ^ δ 1 and β ^ ω 0 , they can be inserted in Equation (6).
In this paper, the models were estimated according to the methodology proposed by [17], simplifying the steps to be followed as shown in Figure 1. To obtain the periodized wavelet ψ j , k u and ϕ j , k u Equation (2), the methodology implemented by [20] is used. We estimate the empirical wavelet coefficients by least squares in two stages, as described in Figure 1. In the first stage of the process, we return Y t , T (the response variable) from X t j , T and Y t i , T (explanatory variables) Equation (1). Expanding δ i t T and ω j t T in wavelet series, we obtain Equation (13). These empirical wavelet coefficients can be estimated using a Daubechies filter as in [17] and identifying the best resolution level using skill comparison metrics as: root mean squared error (RMSE) and mean absolute error (MAE), as applied in this document, but other metrics can be implemented as in [21], where a new wavelet entropy based approach was proposed to identify the optimal model specification and construct the effective wavelet entropy based forecasting models.
Y t , T = i = 1 m α 0 , 0 δ 1 ϕ u + j = 0 J 1 β j k δ 1 ψ j k u Y t i , T + j = 0 n α 0 , 0 ω 1 ϕ u + j = 0 J 1 k I j β j k ω 1 ψ j k u X t j , T + e t , T
where we have restricted the values of j to a maximum scale. After obtaining estimates of the wavelet coefficients δ t B and ω t B , we use the inverse wavelet transformation to obtain the estimates of δ ^ t T and ω ^ t T , respectively, and with e t , T as the error of the regression model of the first stage Equation (14). For the coefficient ψ j k u and ϕ j k u , we calculated j matrices (nine matrices) of dimension N · 2 J , for the maximum value of J, (512 × 512). That is to say, every periodized moment of the signal (t/T) is an element to be sampled and convolved with the wavelet for all the possible dyadic translations and resolutions; in our case, several Daubechies filters were implemented.
e t , T = Y t , T Y ^ t , T
In the second stage of the process, we fit the model:
Y = Φ Y ^ 1 Ψ Y ^ 1 0 Ψ Y ^ 1 j * 1 Φ X Ψ X Ψ X j * 1 β δ 1 β ω 0 + e 2
where e 2 is the random error vector, with e 2 ; t , T , t = 2 , , T . In [17], it was show that each component of e 2 follows a locally stationary moving average process of order two:
e 2 ; t , T = Y t , T δ 1 t Y ^ t 1 , T ω 0 t X t , T = Y t , T δ 1 t Y t 1 , T e t 1 ω 0 t X t , T = Y t , T δ 1 t Y t 1 , T ω 0 t X t , T + δ 1 t ϵ t 1 , T δ 1 2 t ϵ t 2 , T
and it is obtained that e 2 ; t , T = ϵ t , T δ 1 2 t ϵ t 2 , T , which is a locally stationary MA(2). In this sense, in the second stage of the process, ϵ t , T of Equation (1) is replaced with MA(2) from e t , T , and Y t , T is replaced with Y ^ t i , T to obtain the final estimates of δ ^ i t T and ω ^ j t T .
According to the methodology of [17], the mother and father wavelets periodized with the original signal are convolved, and after several algebraic operations resulting from the least squares process, we obtain the wavelet coefficients, α and β Equation (4), which are introduced in the equation of wavelet expanded series to obtain finally the δ i u and ω j u coefficients Equation (6). The coefficients δ i u and ω j u are obtained in the wavelet domain. To interpret these coefficients in the time domain and substitute in Equation (1) as the weight each explanatory variable of the model, their inverse function must be calculated, which is resolved very similarly to how the inverse of a Fourier transform is calculated, that is to say:
δ ^ t T = Ψ Y m δ i u
ω ^ t T = Ψ X n ω j u
Synthesizing, we estimate the empirical wavelet coefficients by least squares in two stages. In the first stage of the process, we estimate the initial residuals e t , T and the adjusted values of Y ^ t , T to obtain the final estimates of δ ^ ( t , T ) and ω ^ ( t , T ) in the second phase of the process. At each stage of the process, it must be identified under what level of resolution the error is minimized, and the model is better adjusted to the data. Given that in practice, an appropriate number of levels based on the nature of the signal is usually selected [22], we perform calculations of various goodness of fit indicators to identify which one or more resolution levels we could reconstruct of δ ^ ( t , T ) and ω ^ ( t , T ) , and the calculation was applied in both phases of the process Figure 1.

2.3. Polynomial Transfer Function Model

When requiring the modeling of the impact of a regressive variable on a response variable over time and the regressors and the response variables are assumed as stochastic processes, the approach of the transfer function models proposed by [23] can be followed, expressed as the following lagged regression model:
Y t = j = 0 α j X t j + η t = α B X t + η t
where X t and η t are independent stationary processes and the weights α j measure the impact of the past values of the input variable X t in Y t . The polynomial α B = j = 0 α i B i is called the transfer function, and it is a polynomial in the delay operator B such that B X t = X t 1 . Its coefficients must satisfy j = 0 α j < to ensure stability. The random noise η t is assumed to be stationary and can be written in the form η t = θ η B ϕ η B Z t , where Z t is a white noise process with variance σ Z 2 .
Box et al. [23] proposed a more parsimonious representation of the transfer function as a polynomial relation:
α B = δ B B d ω B Z t
where δ B = δ 0 + δ 1 B + + δ s B s and ω B = 1 ω 0 + ω 1 B ω r B r and d is a delay coefficient. The transfer function ( s , d , r ) will be determined completely by estimating the coefficients of the polynomials δ B and ω B the delay coefficient d. This involves estimating the vector of parameters δ 0 , δ 1 , , δ s , ω 1 , , ω r . It is possible to consider a transfer function model with two or more stochastic input variables. For two input variables x 1 t and x 2 t , the model has the form:
y t = δ 1 B B d 1 ω 1 B x 1 t + δ 2 B B d 2 ω 2 B x 2 t + η t
This model has a much larger number of parameters than the model Equation (19), but its adjustment procedure is similar. A sequential methodology is applied to estimate the parameters of the transfer function presented in Equation (19). The methodology begins by adjusting an autoregressive moving average (ARMA) models of order (p, q) to the input time series x t of the form ϕ B x t = Θ B W t , where W t is a white noise process with variance σ W 2 ; ϕ B = 1 ϕ 1 B ϕ 2 B 2 ϕ p B p is a polynomial of order p that acts on operator B and defines the autoregressive component of the model, and Θ B = 1 + θ 1 B + + θ q B q is a polynomial of order q that defines the moving average component. Applying the operator of the ARMA model ϕ B Θ B on both sides of Equation (20), we obtain:
y ˜ t = α B W t + ϕ B Θ B η t = α B W t + η ˜ t
where y ˜ t = ϕ B Θ B y t and ϕ B Θ B η t = η ˜ t . In this equation, we assume that W t and η ˜ t are independent, where W t is the pre-whitened input series x t and y ˜ t and η ˜ t are the filtered output series of y t and the random noise η t , respectively, using the operator of the ARMA (p, q) model as a filter. It can be shown that the cross-correlation between the filtered series and the pre-whitened series W t is γ y ˜ t W t ( h ) = σ W 2 α h ; therefore, their sample values allow obtaining an approximate estimate of the coefficients of the transfer function α 0 , α 1 , [16].
Shumway and Stoffer [24] presented a sequential process to fit the transfer function model, and this procedure is applied to the data as follow:
(i) Fit an ARMA model to the input series to estimate the parameters ϕ , Θ and σ w 2 in the specification ϕ B x t = Θ B w t . Retain ARMA coefficients for use in the next Step (ii) and the fitted residuals w ^ t for use in Step (iii).
(ii) Apply the operator determined in Step (i), ϕ ^ B y t = Θ ^ B y ˜ t to determine the transformed output series ϕ ^ η B and Θ ^ η B .
(iii) Use the cross-correlation function between y ˜ t and w ^ t in Steps (i) and (ii) to suggest a form for the components of the polynomial α B = δ B B d ω B and the estimated time delay d.
(iv) Obtain β ^ = ω ^ 1 , , ω ^ r , δ ^ 0 , , δ ^ s by fitting a linear regression. Retain the residuals u ^ t for use in Step (v).
(v) Apply the moving average transformation to the residuals u ^ t to find the noise series η ˜ t and fit an ARMA model to the noise, obtaining the estimated coefficients in ϕ ^ η B and Θ ^ η B .

Model Validation Methods

The model was validated using 76 records not considered in the fitting procedure. The validation data corresponded to the period between September 2005 and December 2011. As suggested by [25], model parameterization was achieved by minimizing together the root mean squared error (RMSE) and the mean absolute error (MAE) to ensure optimal results over the prediction and maximizing the correlation coefficient (R). The commonly used RMSE quantifies the differences between predicted and observed values and thus indicates how far the forecasts are from actual data. A few major outliers in the series can skew the RMSE statistic substantially because the effect of each deviation on the RMSE is proportional to the size of the squared error.

3. Results and Discussion

3.1. Data Analysis

As detailed in the previous section, the disembarkation of two species of fish (sardines and anchovies) was selected as dependent variables, whose variability could be potentially explained from the nine climatic variables presented in Table 1. In order to work the variables at the same scale, they were anomalized and standardized; this in turn allowed us to explore model fitting results under diverse temporal patterns (seasonal and slightly stationary).

3.1.1. Variable Anomaly

The first step before estimating the models was to anomalize each of the indices, as well as the disembarkation of anchovy. This implies subtracting from each of the data the average monthly value calculated during a reference period and then dividing it by the monthly standard deviation (both calculated over a base period). To do so and because the World Meteorological Organization (WMO) states that a period of at least 30 years should be used for the anomaly calculation, the reference period from 1963 to 2011 was considered adequate. In other words:
X i j = X i j X i ¯ σ i , i = m o n t h , j = y e a r ,
where the mean is given by X i ¯ = j = 1963 2011 X i j 49 and the variance σ i 2 = j = 1963 2011 X i j X i ¯ 2 48 .
The series of sardine landings was not anomalized because their cyclic variation is not determined by a monthly base period. To stabilize the series, a logarithmic transformation was applied.

3.1.2. Variable Standardization

The standardization consisted of subtracting from each of the data the average value calculated in a reference period and then dividing it by the standard deviation (both calculated over a base period). In other words:
X i = X i X ¯ σ , i = m o n t h ,
where the mean is given by X i ¯ = j = 1963 2011 X i j 588 and the variance σ 2 = j = 1963 2011 X i X ¯ 2 587 .
The first step in any analysis and forecast of time series is to plot the observations against time, to get an idea of the possible trends and/or cycles associated with the temporal evolution of the datasets [25]. In Figure 2, it can be seen that the standardized series had periodic variation, while the exogenous variables did not seem to show periodic variation when they were anomalized; also see that the fish landings stabilized slightly when their logarithmic transformations was standardized.
Figure 3 describes the process to build the transfer functions. The raw data were divided into training and test data. The training data started from January 1963 to August 2005 (totaling 512 records), while the test data started from September 2005 to December 2011 (76 records). The different transformations (logarithmic, anomaly, standardize) were applied to the data partitions, and the transfer function models were fitted (training data). Fish landing forecasts (test data) and goodness of fit metrics were calculated for both the fitted and the forecast values. Finally, the metrics were compared, and the best model was identified.

3.2. Validation Results and Time Series Predictability

Table 2 shows the variables that had a significant cross-correlation with fish landings. From this crossing, the transfer function models began to be built. Based on the cross-correlation function between the fisheries’ landings and macro-climatic phenomena (Step iii, Section 2.3), correlation patterns among the series were detected; also see the variables that had the greatest linear association with the monthly anchovy and sardine disembarkation in northern Chile. These were SST,turbulence index (TI), and Niño Zone 1 + 2 (N12). Likewise, the variables with the lowest linear association were the Pacific Decadal Oscillation index (PDO), El Niño multivariate Southern Oscillation Index (MEI), Southern Oscillation index (SOI), and N34 (highly associated with N12).
It should be noted that there is a wide range of methods to identify significant variables and associated lags, such as those shown in [26]; however in the document, the methodology suggested by [24] and specified in the previous section was used (Step iii, Section 2.3).

Wavelet Transfer Function Models

Table 3 presents a summary of the main models obtained, with their indicators of the goodness of fit and forecast. Synthesizing, a total of 31 combinations of resolution levels was considered for ten Daubechies filters in the first stage. In the second stage, we worked with the residuals, fitted values, and the filter selected from the first stage and 31 resolution combinations. A total 310 models was fitted for each desired transfer function, in order to select the best model under diverse goodness of fit criteria.
The results shown in Table 3 allowed us to identify that through the transfer functions of variable coefficients, we could fit both anomalized and standardized data with a good level of accuracy; however, its performance was much better when the signals were standardized. Likewise, it can be observed that the models continued showing a good behavior in their residuals for the forecast phase, preserving a moderate percentage of strength in their coefficient of determination (calculated exclusively for the data based on the forecast).
We worked with each level j independently, as well as together until reaching an optimum level in the fitting process. In principle, it was observed for each Daubechies filter and under each combination of j that the mean absolute error (MAE) was minimized or the coefficient of determination ( R 2 ) was optimized. One might think that if the model fit were good enough, the behavior should reflect simultaneously, an elevated ( R 2 ) for the minimum MAE. However, this is not always the case, and this is because the wave at a specific j level can have the pattern of the behavior of the original signal, but not be at the scale of the same; therefore, the error could be large like ( R 2 ) . Therefore, the best possible model for each Daubechies filter was identified, and the second stage of the process was executed, by tracking the metric goodness of fit again in the second stage, in order to obtain the best fit.
Similarly, the goodness of fit statistics were calculated, but more indicators were added to make sure that the most appropriate decision was made. In this last adjustment, the square root of the average value of the squared residuals (RMSE), MAE, the coefficient of determination ( R 2 ) , the Pearson correlation coefficient, the Spearman correlation coefficient were calculated, as well as the Kendall correlation coefficient, all for the Daubechies filter selected in Phase 1, and in the same way for various resolution levels (j).
For example, in Figure 4, we show the goodness of fit metrics for wavelet transfer function models estimated to explain anchovy landings (standardized data). Traditional metrics were used to determine the optimal families of wavelets and the decomposition scale, which would produce an improved forecasting performance, so the filter and resolution were selected to achieve the best metrics (high R 2 and minimum residuals). The same procedure was performed for all fitted models. The wavelet entropy algorithm such as the one presented in [21] could be used in the future, to determine the optimal wavelet families and the decomposition scale that would produce an improved forecasting performance.

3.3. Constant Coefficient Transfer Function Models

Table 3 presents a summary of the models obtained, with their indicators of the goodness of fit and forecast. The results shown allowed us to identify that the transfer functions of constant coefficients can model both anomalized and standardized data with a good level of fit; however, its performance was much better when the signals were standardized. Likewise, it can be observed that the models continued showing a good behavior in their residuals for the forecast phase, conserving a moderate percentage of strength in their coefficient of determination (calculated exclusively for the data based on the forecast).

3.4. Comparison between Transfer Function Modeling Approaches

When compared to the methodologies implemented, wavelet and polynomial coefficients with the different transformations applied to the data, in order to select the most accurate model in terms of residuals, it was observed that when the series were standardized to explain the anchovies’ landings, the transfer function wavelet model showed a better fit, while when the exogenous series were anomalized to explain the standardized log(sardine landing), the transfer function polynomial model showed a better fit. In this sense, we must keep in mind that it is convenient for the data to have certain properties so that each approach is optimal. When using constant coefficient transfer functions, it is recommended that the series have a more stable variance, which is achieved with the anomalization; while when using the variable coefficient transfer functions, the high and low frequency components of the seasonal series are captured in such a way that the models perform better.
Figure 5 shows the residuals of the fitted models, for different treatments of the variables and transfer function models. It was verified that the best models (with smaller residuals) were obtained when using a transfer function with variable coefficients to explain the anchovies’ landings (data standardized); and when using a transfer function with polynomial coefficients to explain the sardines landings (explanatory variables anomalized and sardine series in standardized logarithmic scale). Figure 6 shows the fit behavior of both models, as well as the forecasts obtained from the test data.
Likewise, we can see that the variance explained during the validation phase for anchovy landings ( 96 , 9 % ) showed important improvements in relation to previous works, such as that presented by [7], where the variance explained in external validation fluctuated between 84 % and 87 % , or in sardine and anchovy landing forecasts presented by [9], in which the variance explained by both models was slightly higher than 82 % .
We can observe that the sardine landings did not reach as good a fit as those of anchovies. This could be optimized in a future work by considering, for example, a truncated model as suggested by [27], where the data series were modeled based on the assumption that the data followed a truncated and transformed multivariate normal distribution. In their work, the data predictive inferences showed very realistic results, capturing the typical variability of the series in time and space.

4. Conclusions

Based on the analysis of models built to forecast the monthly disembarkation of anchovy (Engraulis ringens) and sardine (Sardinops sagax) in northern Chile, the following conclusions emerged from the analysis:
When various transformations were applied to the data to achieve better model precision, large differences in the benefits of the selected fitted models could be identified. Records of anchovy landings were better fitted and forecast with standardized data under a transfer model with wavelet coefficients, using Daubechies 10 filters with low resolution levels (associated with the slightly compressed wavelets j = 1 : 2 ). Sardine landings were better fitted when the variance of the landings was stabilized using the logarithmic function; and the variance of the explanatory variables was stabilized by anomalizing the variables; finally, modeling these sardine landings in a logarithmic scale with a traditional transfer function.
The variables that allowed explaining in a more robust way the disembarkation of anchovy was the turbulence index from Antofagasta Coastal Oceanographic Station (TI) and the Pacific sea surface temperature index (Niño Zone 1 + 2: N12); while the disembarkation of sardines was explained by local climatic variables: TI, sea surface temperature from Antofagasta Coastal Oceanographic Station (SST), and the log disembarkation of anchovy.
Given that the process of selecting the appropriate number of scales to optimize the model fit was made according to the researcher’s choice, it is advisable to implement in the future some entropy based techniques that allow for the best possible scale selection. It is also recommended to evaluate if the results can be optimized considering other wavelet filters in addition to the Daubechies filters. Likewise, a non-linear structure model could be considered (for example, thresholding wavelet coefficients) in order to determine the best model structure for fishery prediction. These results could also be optimized by implementing bootstrapping techniques for the fitted parameters in order to quantify their uncertainty.

Author Contributions

Conceptualization, E.V., H.A.-C., and R.S.; methodology, E.V. and L.B.; software, E.V. and H.A.-C.; validation, all authors; writing, original draft preparation, E.V.; writing, review and editing, all authors; supervision, H.A.-C.

Funding

This research received internal funding from the Pontificia Universidad Católica de Valparaíso (PUCV).

Acknowledgments

We are grateful to the School of Informatic Engineer of Pontificia Universidad Católica de Valparaíso (PUCV) of Chile for the Scholarship Programme under which the first author was funded for her doctoral study at the School.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; nor in the decision to publish the results.

References

  1. Lum Kong, A. Impacts of Global Climate Changes on Caribbean Fisheries Resources: Research needs. In Caribbean Food Systems: Developing a Research Agenda; Global Environmental Changeand Food Systems (GECAFS): StAugustine, Trinidad, Spain, 2002. [Google Scholar]
  2. Plaza, F.; Salas, R.; Yáñez, E. Identifying ecosystem patterns from time series of anchovy (Engraulis ringens) and sardine (Sardinops sagax) landings in northern Chile. J. Stat. Comput. Simul. 2018, 88, 1863–1881. [Google Scholar] [CrossRef]
  3. Zhou, S.; Smith, A.D.; Punt, A.E.; Richardson, A.J.; Gibbs, M.; Fulton, E.A.; Pascoe, S.; Bulman, C.; Bayliss, P.; Sainsbury, K. Ecosystem-based fisheries management requires a change to the selective fishing philosophy. Proc. Natl. Acad. Sci. USA 2010, 107, 9485–9489. [Google Scholar] [CrossRef] [PubMed]
  4. Silva, C.; Yáñez, E.; Barbieri, M.A.; Bernal, C.; Aranis, A. Forecasts of swordfish (Xiphias gladius) and common sardine (Strangomera bentincki) off Chile under the A2 IPCC climate change scenario. Prog. Oceanogr. 2015, 134, 343–355. [Google Scholar] [CrossRef]
  5. Garcia, S.M. The Ecosystem Approach to Fisheries: Issues, Terminology, Principles, Institutional Foundations, Implementation and Outlook (No. 443); Food & Agriculture Org.: Rome, Italy, 2003. [Google Scholar]
  6. Hiddink, J.; Ter Hofstede, R. Climate induced increases in species richness of marine fishes. Glob. Chang. Biol. 2008, 14, 453–460. [Google Scholar] [CrossRef]
  7. Gutiérrez-Estrada, J.C.; Silva, C.; Yáñez, E.; Rodríguez, N.; Pulido-Calvo, I. Monthly catch forecasting of anchovy Engraulis ringens in the north area of Chile: Non-linear univariate approach. Fish. Res. 2007, 86, 188–200. [Google Scholar] [CrossRef]
  8. Gutiérrez-Estrada, J.C.; Yáñez, E.; Pulido-Calvo, I.; Silva, C.; Plaza, F.; Bórquez, C. Pacific sardine (Sardinops sagax, Jenyns 1842) landings prediction. A neural network ecosystemic approach. Fish. Res. 2009, 100, 116–125. [Google Scholar] [CrossRef]
  9. Yáñez, E.; Plaza, F.; Gutiérrez-Estrada, J.C.; Rodríguez, N.; Barbieri, M.; Pulido-Calvo, I.; Bórquez, C. Anchovy (Engraulis ringens) and sardine (Sardinops sagax) abundance forecast off northern Chile: A multivariate ecosystemic neural network approach. Prog. Oceanogr. 2010, 87, 242–250. [Google Scholar] [CrossRef]
  10. Silva, C.; Barbieri, M.A.; Yáñez, E.; Gutiérrez-Estrada, J.C.; DelValls, T.Á. Using indicators and models for an ecosystem approach to fisheries and aquaculture management: The anchovy fishery and Pacific oyster culture in Chile: Case studies. Lat. Am. J. Aquat. Res. 2012, 40, 955–969. [Google Scholar] [CrossRef]
  11. Shabri, A.; Samsudin, R. Fishery landing forecasting using wavelet-based autoregressive integrated moving average models. Math. Prob. Eng. 2015, 2015. [Google Scholar] [CrossRef]
  12. Rodriguez, N.; Palma, W.; Yañez, E.; Rubio, J.M. Wavelet additive forecasting model to support the fisheries industry. Adv. Sci. Lett. 2013, 19, 3679–3682. [Google Scholar] [CrossRef]
  13. SERNAPESCA. Anuarios Estadísticos de Pesca. Servicio Nacional de Pesca, Ministerio de Economía, Fomento y Recons-trucción; Chile 1978–2012. Available online: http://ww2.sernapesca.cl/index.php?option=com_remository&Itemid=54&func=select&id=2 (accessed on 5 November 2019).
  14. Yáñez, E.; Barbieri, M.; Silva, C.; Nieto, K.; Espındola, F. Climate variability and pelagic fisheries in northern Chile. Prog. Oceanogr. 2001, 49, 581–596. [Google Scholar] [CrossRef]
  15. Yáñez, E.; Hormazábal, S.; Silva, C.; Montecinos, A.; Barbieri, M.A.; Valdenegro, A.; Órdenes, A.; Gómez, F. Coupling between the environment and the pelagic resources exploited off northern Chile: Ecosystem indicators and a conceptual model. Lat. Am. J. Aquat. Res. 2008, 36. [Google Scholar] [CrossRef]
  16. de Guenni, L.B.; García, M.; Muñoz, Á.G.; Santos, J.L.; Cedeño, A.; Perugachi, C.; Castillo, J. Predicting monthly precipitation along coastal Ecuador: ENSO and transfer function models. Theor. Appl. Climatol. 2017, 129, 1059–1073. [Google Scholar] [CrossRef]
  17. Moura, M.S.d.A.; Morettin, P.A.; Toloi, C.; Chiann, C. Transfer function models with time-varying coefficients. J. Probab. Stat. 2012. [Google Scholar] [CrossRef]
  18. Cohen, A.; Daubechies, I.; Vial, P. Wavelets on the interval and fast wavelet transforms. Appl. Comput. Harmon. Anal. 1993, 1, 54–81. [Google Scholar] [CrossRef]
  19. Vidakovic, B. Statistical Modeling by Wavelets; John Wiley & Sons: New York, NY, USA, 2009; Volume 503. [Google Scholar]
  20. Webster, R.; Lark, R. GP Nason: Wavelet Methods in Statistics with r. Math. Geosci. 2011, 43, 261–263. [Google Scholar] [CrossRef]
  21. Zou, Y.; Yu, L.; He, K. Wavelet entropy based analysis and forecasting of crude oil price dynamics. Entropy 2015, 17, 7167–7184. [Google Scholar] [CrossRef]
  22. Misiti, M.; Misiti, Y.; Oppenheim, G.; Poggi, J.M. Wavelet Toolbox; The MathWorks Inc.: Natick, MA, USA, 1996; Volume 15, p. 21. [Google Scholar]
  23. Box, G.E.; Jenkins, G.M.; Reinsel, G.C.; Ljung, G.M. Time Series Analysis: Forecasting and Control; John Wiley & Sons: New York, NY, USA, 2015. [Google Scholar]
  24. Shumway, R.H.; Stoffer, D.S. Time Series Analysis and Its Applications: With R Examples; Springer: New York, NY, USA, 2017. [Google Scholar]
  25. Diodato, N.; De Guenni, L.; Garcia, M.; Bellocchi, G. Decadal Oscillation in the Predictability of Palmer Drought Severity Index in California. Climate 2019, 7, 6. [Google Scholar] [CrossRef]
  26. Veloz, A.; Salas, R.; Allende-Cid, H.; Allende, H.; Moraga, C. Identification of lags in nonlinear autoregressive time series using a flexible fuzzy model. Neural Process. Lett. 2016, 43, 641–666. [Google Scholar] [CrossRef]
  27. Sansó, B.; Guenni, L. Venezuelan rainfall data analysed by using a Bayesian space–time model. J. R. Stat. Soc. Ser. C (Appl. Stat.) 1999, 48, 345–362. [Google Scholar] [CrossRef]
Figure 1. Process diagram for transfer function model with time varying coefficients.
Figure 1. Process diagram for transfer function model with time varying coefficients.
Entropy 21 01082 g001
Figure 2. Standardized and standardized anomalies monthly series: (a) exogenous and response series standardized; (b) standardized anomalies of exogenous series and response series standardized.
Figure 2. Standardized and standardized anomalies monthly series: (a) exogenous and response series standardized; (b) standardized anomalies of exogenous series and response series standardized.
Entropy 21 01082 g002
Figure 3. Process diagram for transfer function modeling.
Figure 3. Process diagram for transfer function modeling.
Entropy 21 01082 g003
Figure 4. Goodness of fit metrics for the best wavelet transfer function models fitted to DANC: (a) Coefficient of determination ( R 2 ) . Maximum values are identified (Max in D10, resolution 1:2). (b) Root mean squared error (RMSE). Minimum values are identified (Min in D10, resolution 1:2). (c) Mean absolute error (MAE). Minimum values are identified (Min in D10, Resolution 1:2).
Figure 4. Goodness of fit metrics for the best wavelet transfer function models fitted to DANC: (a) Coefficient of determination ( R 2 ) . Maximum values are identified (Max in D10, resolution 1:2). (b) Root mean squared error (RMSE). Minimum values are identified (Min in D10, resolution 1:2). (c) Mean absolute error (MAE). Minimum values are identified (Min in D10, Resolution 1:2).
Entropy 21 01082 g004
Figure 5. Estimated residuals between observed values and fitted values for transfer function model methods.
Figure 5. Estimated residuals between observed values and fitted values for transfer function model methods.
Entropy 21 01082 g005
Figure 6. Standardized and scaled anomalized monthly series for anchovies landings (top) and sardine landings in logarithmic scale (bottom).
Figure 6. Standardized and scaled anomalized monthly series for anchovies landings (top) and sardine landings in logarithmic scale (bottom).
Entropy 21 01082 g006
Table 1. Variable that are the object of study. The climatic variables are explanatory, while local fisheries are the response variables. All records are monthly from 1963 to 2011.
Table 1. Variable that are the object of study. The climatic variables are explanatory, while local fisheries are the response variables. All records are monthly from 1963 to 2011.
TypeVariable
SSTSea surface temperature from Antofagasta Coastal Oceanographic Station
Local ClimaticTITurbulence index from Antofagasta Coastal Oceanographic Station
MSLMean sea level from Antofagasta Coastal Oceanographic Station
Global ClimaticMEIEl Niño multivariate Southern Oscillation index
PDOPacific Decadal Oscillation index
N12Pacific sea surface temperature index (Niño Zone 1 + 2)
N34Pacific sea surface temperature index (Niño Zone 3 + 4)
SOISouthern Oscillation index
CTICold tongue index
Local FisheriesDANCDisembarkation anchovy (Engraulis ringens) in northern Chile
DSARDisembarkation sardine (Sardinops sagax) in northern Chile
Table 2. Cross correlation coefficients (CCF) for different lags between disembarkation and significant explanatory variables.
Table 2. Cross correlation coefficients (CCF) for different lags between disembarkation and significant explanatory variables.
XStandardizedAnomalizedStandardizedAnomalized
AnchovyLog AnchovyAnchovyLog AnchovySardineLog SardineSardineLog Sardine
LagCCFLagCCFLagCCFLagCCFLagCCFLagCCFLagCCFLagCCF
SST----15 0.10 7 0.13 --200.22150.13150.12
TI2 0.12 --2 0.14 10 0.20 18 0.15 180.232 0.14 18 0.22
MEI------5 0.11 10 0.15 ------
MSL--7 0.15 3 0.11 7 0.15 ----3 0.08 --
PDO----------2 0.10 ----
N1222 0.11 5 0.12 --5 0.11 --24 0.14 ----
N34----------------
SOI--------23 0.11 ------
CTI--------12 0.11 220.11----
DSAR--5 0.19 --5 0.19 1 0.80 1 0.95 1 0.80 1 0.95
DANC1 0.54 1 0.76 1 0.57 1 0.76 10 0.15 7 0.14 --7 0.14
Table 3. Transfer function models. Goodness of fit indicators for the resulting models. In bold are the results of the best models for each time series.
Table 3. Transfer function models. Goodness of fit indicators for the resulting models. In bold are the results of the best models for each time series.
T 1 / Y 2 / X 3 / L 4 / Type CoefficientFittedForecast
RMSEMAER 2 PearsonSpearmanKendallRMSEMAER 2
Anchovy
NDANCDANC1Constant0.8510.5590.8210.9640.8820.7270.6030.4640.796
TI2
N1222Wavelet0.1650.1280.9780.9900.9710.8650.1380.1060.969
ADANCDANC1Constant0.8310.5680.8260.9660.9190.7710.6600.5710.750
SST15
TI2Wavelet0.4160.3090.9040.9560.9430.7960.2650.2070.824
MSL3
Log NDANCDANC1Constant0.6030.4510.8620.9530.9500.8130.7700.5750.748
MSL7
N125Wavelet0.3910.2860.8830.9320.9290.7750.7170.8060.564
LDSAR5
Log ADANCDANC1Constant0.6040.4470.8600.9500.9480.8080.7920.5480.751
SST7
TI10
MEI5Wavelet0.8180.6200.6520.7140.6370.4580.5990.4930.681
MSL7
N125
LDSAR5
Sardine
NDSARDSAR1Constant0.6100.3800.8580.9430.6400.4720.1670.1350.500
TI18
MEI10
SOI23Wavelet1.0330.7340.6720.7150.5380.3741.3081.0720.500
CTI12
DANC10
ADSARDSAR1Constant0.6140.3810.8590.9470.7440.5840.1390.1090.619
SST13
TI12
SOI23Wavelet1.3770.9980.5900.5570.2900.1920.8870.7490.499
CTI18
Log NDSARDSAR1Constant0.2740.1990.9180.9620.9660.8400.3030.2580.688
SST20
TI18
PDO2Wavelet1.0131.2610.6130.6140.6070.4320.3570.2850.547
N1224
CTI22
LDANC7
Log ADSARDSAR1Constant0.2760.2000.9160.9610.9660.8400.2770.2390.706
SST15
TI18Wavelet0.4170.5420.5620.7550.7620.5651.2911.2490.502
LDANC7
1 / Type of transformation made to the variables: standardized (N), anomalized (A). 2 / Response variable (Y). 3 / Explanatory variable (X). 4 / Lag of the explanatory variable in the transfer model ( L = L a g ) .

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Back to TopTop