Power-type Functions of Prediction Error of Sea Level Time Series

This paper gives the quantitative relationship between prediction error and given past sample size in our research of sea level time series. The present result exhibits that the prediction error of sea level time series in terms of given past sample size follows decayed power functions, providing a quantitative guideline for the quality control of sea level prediction.

This paper presents our study in the aspect of sea level time series prediction using the real sea-level data recorded by the National Data Buoy Center (NDBC) at the time scale of hour.The data are publicly accessible via http://www.ndbc.noaa.gov/historical_data.shtml.The predictor we utilized in this OPEN ACCESS research is an ANN predictor of the back propagation (BP) type, which is available in the toolbox of the software Matlab.Two highlights of this paper are as follows.


We established the quantitative relationship between the past sample size, used for the predicting 40 points ahead of sea level time series as a case, and the prediction error characterized in the form of mean square at measurement stations of sea level time series we investigated.


We obtained the analytic expression of that relationship in the form of power function, providing a guideline for the quality control in sea level prediction.
The rest of the paper is organized as follows.The research background, including the problem statement, is explained in Section 2. The research method is discussed in Section 3. Results are given in Section 4, which is followed by conclusions.
Six real traces of sea level time series described in Table 1 are selected in our research, where x_lkwf1_1999(t) denoted the sea level time series in 1999 at the station LKWF1, x_ven1_2003(t) the sea level time series in 2003 at the station VENF1, and so forth.Our selection of the above data is quite arbitrary in this research, simply for the purpose of study of prediction error of sea level without any conflict of interests regarding other manuscripts.

Problem Statement
Let n > 0 be the past sample size.Denote by m ≥ 0 the step number of prediction.Then, random variables x(i)s for i < n are the past records.Rewrite the n past records by x(i − 1), x(i − 2), …, x(i − n).Following Kolmogorov [72], with the selection of proper real coefficient as, one may obtain the linear combination of the n past records expressed by 1 ( ).
The above L may be used to predict a future value expressed by x(i + m).Denote by e 2 the mean square error (MSE) given by where E is the mean operator.Kolmogorov stated that the e 2 (n, m) does not increase as n increase (Kolmogorov [72]), which yields a consequence we express as Corollary 1.
Corollary 1.For a given m, the following holds Note that the above is a general expression in quality with respect to e 2 (n, m).For a given time series x(i), however, it does not tell us, quantitatively, how large e 2 (n, m) is with the past sample size n and the x(m) to be predicted.For that reason, we consider the solution to the problem described below, aiming at establishing a quantitative relationship between the past sample size n and the given error provided that the step number of prediction m is predetermined.

Problem 1. Let x(i) be a sea level time series. For a fixed m and given error e1
, what is the past sample size, denoted by N1, required such that Note 1.The solution to the above problem is to find the function denoted by e1(N1, m).An alternative solution is N1(e1, m), which is in fact the inverse of e1(N1, m) with respect to N1. □

Research Thoughts
We divide the task mentioned in Note 1 into two subtasks as follows.
 First, we find the e1(n, m) expressed only by an empirical curve based on processing real data of sea level.


Then, by fitting the data, we may find the analytic expression of e1(n, m) as well as e1(N1, m).

BP-ANN Predictor
Artificial neural networks (ANNs) are computational non-linear models that mimic the structures and functions of animals' nervous systems.ANNs, composed of interconnected neurons, are used to "learn" the complex relationship between input data and output data.Back propagation (BP) neural networks, one of the most important models in ANNs [73], are widely used in the fields of economy, ecology, meteorology, hydrology, medicine, etc., see, for example, [74][75][76][77].
A BP neural network (BPNN), composed of one input layer, one or more middle layers called hidden layers and one output layer, can be seen as a non-linear input-output map, in which any two of these adjacent layers are interconnected directly while neutrons in each layer are independent.
As previously mentioned, the current value of a time series is based on the previous observed values.In this paper, every 21st sample is obtained by its previously successive 20 samples.Thus, the threelayer BP neural network is comprised of 20 inputs and 1 output (see Section 4.5).The n samples are divided into two sets to train and test the neural network predictor.Specifically, the first (n = 20) groups of samples are training data and the last 20 data are used to test the predictor.Every time the sample size changes, the number of samples for training is changed, while the numbers of neurons in input and output layers are constant.For example, if the sample size is 200, the inputs of the BPNN predictor are 20-dimensional vectors, containing the samples from the 1st to the 20th, from the 2nd to the 21st, and so forth, up to the last vector which contains the testing samples from the 180th to the 199th to train the network.Output data are the 21st to the 200th of the one-dimensional original data, corresponding to the 180 vectors in the input layer.For the testing sample, the 201st value can be obtained from its former data, referring to the samples from the 181st to the 200th.The 202nd value is the output of the predictor, whose inputs are the data from the 182nd to the 200th and the 201st value, which is a predicted value.The rest predicted values can be generated by this means.In this paper, annual data from the 5001st to the 5040th recorded by each station are predicted.The predicting process is shown in Figure 1.Once the sizes of input and output layer are fixed, the optimal number of hidden layer and the number of nodes in hidden layer becomes a major challenge, see, for example, [78,79].Any continuous and bounded real-valued function, based on Kolmogorov theorem [72,80], can be expressed as a three-layer back propagation neural network, in which there are (2k + 1) nodes for the middle layer, where k is the number of the input nodes.Thus, in this paper, one single hidden layer with 41 nodes is used for the BPNN model.
The transfer function, called activation function, represents the relationship between input and output of each node.Activation functions are required to be monotonic increasing and differentiable [81].Sigmoid function, as a continuous and bounded function, is commonly used in input and hidden layers [82,83], which is expressed by where u stands for the input of a single neuron.For the input layer, u is the input vector, which is the normalized sea level time series x(t), while for the hidden layer, u is the weighted sum of outputs of the previous layer subtract the threshold value of the layer.Linear transfer function is used for the neurons in output layer, so that the outputs of the entire network can take any values.BP neural networks are trained by adjusting the weights between two layers and threshold values based on the gradient steepest descent method, that is, if the errors between actual outputs and expected ones are not satisfied, the errors will propagated backward to adjust weights and threshold values repeatedly until the performance of the system such as mean square error (MSE) is minimized.
The available data are always divided into two sets, namely training and testing datasets.In order to obtain precise results and to reduce running time, it is necessary to normalize the available datasets to a range between zero and one [84], before training process with the formula below: where xmax and xmin are, respectively, the maximum and minimum values of the sample data x(t).The final forecasting results are obtained by the following inverse normalized formula:

Experimental Methods
In this research, Neural Network (NN) Toolbox of the software MATLAB R2012b is used to realize the neural network predicting model.MATLAB is a multi-paradigm environment for numerical computing developed by MathWorks, Natick, USA.NN Toolbox, based on neural network theory, provides functions to design, train, visualize and simulate all kinds of networks such as linear network, BPNN, radial basis function NN, regression NN and so on [85,86].Three functions: newff, train and sim are used in the learning procedure in the MATLAB commands [87].
As previously mentioned, the original dataset is divided into training and testing sets.In order to reduce prediction error, the data values are normalized between 0 and 1 via the function mapminmax.Function newff, which is used to generate and initialize a BP neural network, defines sizes and activation function of every layer and type of training algorithm.In this research, we use the function logsig between the input and hidden layers.The outputs of logsig are limited over the interval from 0 to 1 while its inputs can be taken with any value from negative infinity to positive infinity [88,89].The transfer function purelin is used between the hidden and output layers so that the outputs of the network could be arbitrary values.After creating a BP neural network, functions train and sim are used to train and simulate the net.
Without loss of generality, data used as samples are recorded at the six aforementioned stations, a total of approximately 8700 values annually in every experiment.For each station, the first set of samples is sea level from the 4901st to the 5000th of the original data, 100 values in total.In this paper, the step number of prediction is 40, that is, values from the 5001st to 5040th are predicted.Compared with the actual data, we can get the mean squared prediction error from the Equation (2).The second set of sample data are from the 4801st to the 5000th, 200 data totally.Then original data from the 4701st to the 5000th are chosen as the third set and so on, until the 15th set of sea level data from the 3501th to the 5000th.Fiften mean square errors, which are 15 points generated by 15 sets of samples, constitute a curve envelope, which can be fitted by specific functions.In this way, we can get the quantitative relationship between sample size and prediction error.

Prediction Results
We selected the data recorded at the station LKWF1 in 1999 for demonstration.The original sea level series contains 8641 points of data, some of these are shown in Figure 2. When we used the past sample records of 100 data points from the 4901st point to the 5000th to predict the next 40 data points, the prediction error, MSE, is 0.38664.The original data of 40 points and their predictions are indicated in Figure 3. Figure 4 shows the prediction results when the past sample size is increased to be 200, that is, the past sample records contain the data from the 4801st point to the 5000th, to predict the next 40 data points.By eye, one may see that the prediction error in Figure 4 is smaller than that in Figure 3.When increasing the past sample size to 300, which includes the data from the 4701st point to the 5000th, to predict the next 40 data points, we obtained the results shown in Figure 5, which clearly exhibits that the prediction error in this case is smaller than that described in Figure 4.In Figures 6-17, the demonstrated prediction results based on the past sample records are from 400 to 1500; that is, using the past sample records from the 4601st-3401st point up to the 5000th, to predict the next 40 data points.Those figures again exhibit that the prediction error is decreased as the past sample size increases, which is consistent with the description of Kolmogorov in [72].The values of prediction errors are summarized in Table 2.The relationship between the past sample size and prediction error is demonstrated by Figure 18, of which the generation process is shown in Figure 19.Note 2. From Figure 18, as well as Figure 20, we see that MSE decreases rapidly when the past sample size increases from 100 to 300.Then, prediction errors appear to decay slowly when the past sample size increases.Table 3 summarizes the results.Prediction results at the stations LONF1, SAUF1, SPGF1 and VENF1 are summarized (see Tables 4-7).The corresponding figures are also as follows (see Figures 21-24).Ignoring decay rates of the six stations, all the figures indicate that the prediction errors taper off on the whole as sample sizes increase, and curves decay fast at first but slowly in the end.When the sample size rises to a certain extent, the prediction accuracy cannot be obviously improved.That is to say, if the range of prediction errors has been determined, there is no need to take very large sample size.In the next section, we will fit the curve to explore the precise relationship between prediction errors and sample sizes.

Prediction Error Curve Fitting
Data fitting is used to select the proper type of curve to fit the observed data, and analyze the relationship between these two variables based on the curve fitting equation.In this paper, power function satisfactorily fits the prediction error curve according to the shape characteristic of the curve.show the prediction errors' fitting results of the six aforementioned stations.The corresponding power functions are also listed in Table 8.
According to the physical meaning of past sample size, we have n > 0. Thus, by denoting H(n) the Heaviside unit step function, we rewrite Equation ( 8) as The above expression may serve as a solution to the problem we raised in Section 2.

Note 2. The prediction error f(n) provides us with a guideline to control the prediction error in terms of a given value of past sample size.
Note 3. Our present result about f(n) may serve as a specific case for the quantitative description of prediction error for that f(n) → 0 for n → ∞, which was qualitatively described by Kolmogorov, see Corollary 1.

Discussions
Sea level time series has the property of long-range dependence (LRD), as can be seen from Ercan et al.
where b = 2 − 2H, H ∈ (0.5, 1) is the Hurst parameter.Time series with LRD, in addition to sea level time series, have wide applications in various fields, ranging from physics to computer science, see e.g., , just mentioning a few with the preference by Mandelbrot [125] and references therein, where H plays a key role.At the moment, however, we have not got enough knowledge to establish the relationship between H and the past sample size with respect to prediction errors of sea level time series.
Our future work will work at it so as to understand deeply the issue of prediction errors of sea level time series.Another future work of ours is to investigate whether power-type functions of prediction error might appear as a law in the aspect of prediction of time series with LRD in general besides sea level time series.As previously mentioned in the Introduction, there are two categories of predictors, namely, linear predictors and nonlinear ones [73][74][75][76][77][78][79][80][81][82][83][84][85][86][87][88][89], which may be considered for a specific type of time series, such as sea level.Since sea level is of LRD [90][91][92][93], which is nonlinear, and since, ANN is a nonlinear predictor, it was consequently used in this research.
In the future, we shall use other types of predictors, for instance, autoregressive (AR) type, to study the relationship between prediction error and past sample size of sea level time series, and compare it with the one presented in this research.
Note that the present relationship between prediction error and past sample size may be a law specifically for sea level time series.In general, we never imply that it may fit with other types of time series, such as ocean surface waves or network traffic.Nevertheless, it might yet be a reference for them in this regard.
Finally, we note that this research is in the domain of statistics.Therefore, though the data used in this research (see Table 1) are from 1999 to 2003, we statistically infer that the present law is available for hourly recorded sea level in the past and future.

Conclusions
We have established the relationship between prediction error and past sample size for the prediction of sea level time series based on real data measured at six stations on the east coast of the Gulf of Mexico in Florida.The closed form of that relationship, in decayed power functions, has been obtained.It may yet be useful, on the one hand, for one to control prediction error according to a given past sample size or, on the other hand, for us to require the size of past sample according to a predetermined prediction error.
Sea water levels at six stations on the east coast of the Gulf of Mexico, Florida are predicted.A type of artificial neural network, back propagation (BP), is used with the Neural Network Toolbox in MATLAB.Prediction errors with different values are generated according to changes in sample sizes.

Figure 1 .
Figure 1.Process of predicting sea level time series from the 5001st value to the 5040th value.

Figure 4 .
Figure 4. Prediction results with the sample size 200 at the Station LKWF1in 1999.Solid line: predicted values, dashed line: original values.

Figure 5 .Figure 6 .
Figure 5. Prediction results with the sample size 300 at the Station LKWF1 in 1999.Solid line: predicted values, dashed line: original values.

Figure 7 .
Figure 7. Prediction results with the sample size 500 at the Station LKWF1 in 1999.Solid line: predicted values, dashed line: original values.

Figure 8 .Figure 9 .
Figure 8. Prediction results with the sample size 600 at the Station LKWF1 in 1999.Solid line: predicted values, dashed line: original values.

Figure 10 .
Figure 10.Prediction results with the sample size 800 at the Station LKWF1 in 1999.Solid line: predicted values, dashed line: original values.

Figure 11 .
Figure 11.Prediction results with the sample size 900 at the Station LKWF1 in 1999.Solid line: predicted values, dashed line: original values.

Figure 12 .
Figure 12.Prediction results with the sample size 1000 at the Station LKWF1 in 1999.Solid line: predicted values, dashed line: original values.

Figure 13 .
Figure 13.Prediction results with the sample size 1100 at the Station LKWF1 in 1999.Solid line: predicted values, dashed line: original values.

Figure 14 .Figure 15 .
Figure 14.Prediction results with the sample size 1200 at the Station LKWF1 in 1999.Solid line: predicted values, dashed line: original values.

Figure 16 .
Figure 16.Prediction results with the sample size 1400 at the Station LKWF1 in 1999.Solid line: predicted values, dashed line: original values.

Figure 17 .
Figure 17.Prediction results with the sample size 1500 at the Station LKWF1 in 1999.Solid line: predicted values, dashed line: original values.

Figure 19 .
Figure 19.Generation process of relationship between prediction errors and past sample sizes.

Figure 20 .
Figure 20.Relationship between the prediction error and the past sample size at the Station SMKF1 in 2003.

Figure 21 .n
Figure 21.Relationship between the prediction error and the past sample size at the Station LONF1 in 2003.

Figure 22 .n
Figure 22.Relationship between the prediction error and the past sample size at the Station SAUF1 in 2001.

Figure 24 .n
Figure 24.Relationship between the prediction error and the past sample size at the Station VENF1 in 2003.

Figure 25 .
Figure 25.Curve fitting of the Station LKWF1 in 1999.Solid line: original curve, dashed line: fitting curve.

Figure 26 .
Figure 26.Curve fitting of the Station SMKF1 in 2003.Solid line: original curve, dashed line: fitting curve.

Figure 27 .
Figure 27.Curve fitting of the Station LONF1 in 2003.Solid line: original curve, dashed line: fitting curve.

Figure 29 .
Figure 29.Curve fitting of the Station SPGF1 in 1996.Solid line: original curve, dashed line: fitting curve.

Figure 30 .
Figure 30.Curve fitting of the Station VENF1 in 2003.Solid line: original curve, dashed line: fitting curve.

Table 1 .
Measured data of six traces of sea level time series.

Table 2 .
Prediction errors investigated using x_lkwf1_1999(t) in the Station LKWF1.
Figure 18.Relationship between the prediction error and the past sample size at the Station LKWF1 in 1999.n (past sample size)

Table 3 .
Prediction errors investigated using x_smkf1_2003(t) in the Station SMKF1.

Table 4 .
Prediction errors investigated using x_lonf1_2003(t) in the Station LONF1.

Table 5 .
Prediction errors investigated using x_sauf1_2001(t) in the Station SAUF1.

Table 6 .
Prediction errors investigated using x_spgf1_1996(t) in the Station SPGF1.
n (past sample size)

Table 7 .
Prediction errors investigated using x_venf1_2003(t) in the Station VENF1.

Table 8 .
Curve fitting results of the six stations.f(n)=e 2 (n, m)|m = 40, n is the past sample size.Both figures and table indicate that prediction errors at different stations have the same function form in power law.Power functions fit prediction error curves satisfactorily.The quantitative functions may help us determine the range of sample size in the case that prediction accuracy is ensured, which improves utilization of data.The above experimental results with respect to prediction errors are expressed as