The Optimal Conﬁdence Intervals for Agricultural Products’ Price Forecasts Based on Hierarchical Historical Errors

: With the levels of conﬁdence and system complexity, interval forecasts and entropy analysis can deliver more information than point forecasts. In this paper, we take receivers’ demands as our starting point, use the trade-off model between accuracy and informativeness as the criterion to construct the optimal conﬁdence interval, derive the theoretical formula of the optimal conﬁdence interval and propose a practical and efﬁcient algorithm based on entropy theory and complexity theory. In order to improve the estimation precision of the error distribution, the point prediction errors are STRATIFIED according to prices and the complexity of the system; the corresponding prediction error samples are obtained by the prices stratiﬁcation; and the error distributions are estimated by the kernel function method and the stability of the system. In a stable and orderly environment for price forecasting, we obtain point prediction error samples by the weighted local region and RBF (Radial basis function) neural network methods, forecast the intervals of the soybean meal and non-GMO (Genetically Modiﬁed Organism) soybean continuous futures closing prices and implement unconditional coverage, independence and conditional coverage tests for the simulation results. The empirical results are compared from various interval evaluation indicators, different levels of noise, several target conﬁdence levels and different point prediction methods. The analysis shows that the optimal interval construction method is better than the equal probability method and the shortest interval method and has good anti-noise ability with the reduction of system entropy; the hierarchical estimation error method can obtain higher accuracy and better interval estimation than the non-hierarchical method in a stable system.


Introduction
According to price forecasts, producers and managers adjust current productions and operations, and governments make proper macro-economic policy to stabilize prices.A great number of studies shows that price forecasts of agricultural products are meaningful [1,2].Nowadays, the price forecasts of agricultural products are mainly based on the point forecasts.However, interval forecasts can deliver more information than point forecasts.Point forecasting is widely used, can provide a single value of the variable in the future and cannot provide any information about the value of uncertainty.Uncertainty information is particularly important for decision makers with different risk preferences.The result of interval forecasts is an interval with a confidence level, which is more convenient for Entropy 2016, 18, 439 2 of 17 decision makers to formulate risk management strategies.Due to the importance of interval forecasts, in the monthly World Agricultural Supply and Demand Estimates (WASDE) of the United States Department of Agriculture (USDA), price forecasts are published in the form of intervals.
The more stable and simple the price system of agricultural products, the more favorable the price forecast.Entropy is a measure of the complexity of a system.Greater entropy means that the system is more complex, and then, the price forecast is prone to distortion.Su et al. [3] verified that there is chaos in the price system by calculating the Kolmogorov entropy.Their price system has positive entropy; but the entropy is not big, and the system has weak chaos.How do we predict prices in such a system?When entropy is not big, can we obtain better forecasting results?In this paper, we illustrate that this price system can be forecasted, and we verify that the price interval forecast is feasible in the system with positive Kolmogorov entropy.Our paper illustrates that we can obtain the interval forecasts.In recent years, many scholars have carried out research work on entropy in the economic field.Bekiros et al. [4] studied the dynamic causality between stock and commodity futures markets in the United States by using complex network theory.We utilize the extended matrix and the time varying network topology to reveal the correlation and the temporal dimension of the entropy relationship.Selvakumar [5] proposed an enhanced cross-entropy (ECE) method to solve the dynamic economic dispatch (DED) problem with valve-point effects.Fan et al. [6] used multi-scale entropy analysis; we investigate the complexity of the carbon market and the average return trend of daily price returns.Billio et al. [7] analyzed the temporal evolution of systemic risk in Europe by using different entropy measures and constructed a new banking crisis early warning indicator.Ma and Si [8] studied a continuous duopoly game model with a two-stage delay.They investigated the influence of delay parameters on the stability of the system.
In agricultural economics, Teigen and Bell [9] established the confidence interval of the corn price by the approximate variance of the forecast.Prescott and Stengos [10] applied the bootstrap method to construct the confidence interval of the dynamic metering model and forecasted the pork supply.Bessler and Kling [11] affirmed the role of probability prediction and defined what is a "good" prediction.Sanders and Manfredo [12], Isengildina-Massa et al. [13] compared four methods, including the histogram method, the kernel density method, the parameter distribution estimation method and the quantile regression method.They evaluated the confidence intervals generated by these methods.The results showed that the kernel function method and the quantile regression method can get the best interval forecasts.
There are two main methods for interval forecasting.One is the prediction of the interval type data [14].The interval type data are composed of the minimum and maximum sequences.This method can be used in a case with comprehensive information.The disadvantage is that it cannot provide the confidence level of the interval.The other is constructing the confidence interval by the estimation of the errors of point forecasts.The advantage is that one can obtain confidence levels.In this paper, we will construct a forecast interval with some target confidence level based on the entropy theory and system complexity theory.
In practice, the prediction interval of the same target confidence is not unique, so which is the best interval?Decision makers often choose those results that meet their own needs, so we can directly build the "optimal" forecast intervals under their standard.In this paper, we will construct the model of the optimal forecast interval and transform this problem to an optimization problem.Since it is difficult to solve the analytic solution for a nonlinear optimization problem, we establish an algorithm to solve the numerical solution.
Nowadays, the optimal criterion of the interval mainly lies in the accumulation of the accuracies of point forecasts.The M index defined by Batu [15] is an average of point forecast errors in the prediction interval.Demetresc [16] used the cumulative accuracy of the point forecasts, which can obtain longer intervals and high reliability.However, for the economic data, this kind of forecast loses significance.The forecast interval not only delivers accuracy, but also delivers information.How does one evaluate the interval from both the accuracy and the informativeness, which seem to be contradictory aspects?Yaniv and Foster [17] provided a formal model of the binary loss function, i.e., the trade-off model between accuracy and informativeness.They compared their model with many common models.The results showed that their model is more suitable to reflect individual preferences.In this paper, the optimal forecast interval model is established by using the trade-off model.
To obtain the confidence level, it is critical to correctly estimate the error distribution of the point forecasts.In general, the error distribution is supposed to be a normal distribution or a χ 2 distribution, etc.However, this method is subjective, and it is possible that the error distribution does not obey the assumed distribution.Gardner [18] found that prediction intervals generated by the Chebyshev inequality are more accurate than those generated by the hypothesis of the normal distribution, which was opposed by Bowerman and Koehler [19], Makridakis and Winkler [20] and Allen and Morzuch [21].They thought the intervals generated by the Chebyshev inequality are too wide.Stoto [22] and Cohen [23] found that the forecast errors of population growth asymptotically obey the normal distribution.Shlyakhter et al. [24], recommended the exponential distribution in the data of population and energy.Willia and Goodman [25] first used the empirical method to estimate the distribution of historical errors of point forecasts without restrictions on the method of the point forecast.Chatfield [26] pointed out that the empirical method is a good choice when the error distribution is uncertain.Taylor and Bunn [27] first applied the quantile regression for the interval estimation.Hanse [28] used semi-parametric estimation and the quantile method to construct the asymptotic forecast intervals.This method has strict requirements on time series.Demetrescu [16] pointed out that quantile regressions are not so useful, since one does not know in advance which quantile is needed, and an iterative procedure would have obvious complexity.Jorgensen and Sjoberg [29] used the nonparametric histogram method to find the points of the software development workload distribution.Yan et al. [30], thought that the errors of point forecasts have great influence on the accuracy of uncertainty analysis.Ma et al. [31] investigated existence and the local stable region of the Nash equilibrium point.Ma and Xie [32] studied financial and economic system under the condition of three parameters' change circumstances, Zhang and Ma [33] and ou and Ma [34] investigated a class of the nonlinear system modeling problem, with good research results.Martínez-Ballesteros et al. [35] forecasted by means of association rules.Ren and Ma deepen and complete a kind of macroeconomics IS-LM model with fractional-order calculus theory, which is a good reflection on the memory characteristics of economic variables.
The novelty of this paper is in providing two methods.One is the stratified historical errors' estimation, and the other is the optimal confidence interval model.
To improve the estimation accuracy, we try to stratify the historical error data according to the price and estimate the error distribution of each layer.In the estimation of the historical error distribution, all errors are often treated as obeying the same distribution.Considering the heteroscedasticity of prediction errors of different prices, it is too rough.The frequencies of different prices in history are different.Some extreme prices in history only appeared several times with the emergence of sharp fluctuations.The forecast errors of these prices are generally large, and the sample capacity of such errors is small.On the contrary, some prices appear very frequently with small fluctuations.The forecast errors of these prices are generally small, and the sample capacity of such errors is big.Therefore, we stratify the historical error data according to different prices and estimate the error distribution of each layer.
In this paper, we induce the model of the optimal confidence interval according to the accuracy and the informativeness trade-off model, provide a practical and efficient algorithm for the optimal confidence interval model based on the complexity of the forecasting system and estimate the error distributions according to the stratified prices.The kernel function method is used to estimate the error distribution.For different target confidence levels, simulation prediction is achieved for the continuous futures daily closing prices of soybean meal and non-GMO soybean.Unconditional coverage, independence and conditional coverage tests are used to evaluate the interval forecasts.Empirical analysis is divided into two subsections.In Section 5.1, we apply the equal probability method, the shortest interval method and the optimal interval method to construct the prediction intervals, compare their loss functions and test whether the intervals generated by the optimal interval method are optimal.We add the various SNR (Signal-Noise Ratio) noises to the historical error data and the prediction prices and test the robustness of the algorithm.In Section 5.2, the prediction errors are divided into one to 20 layers according to the prices.The error distributions are estimated in different layers.The confidence intervals are constructed and evaluated, finding whether the error stratification method can improve the prediction accuracy.The evaluation indices are concluding from the loss function, interval endpoint, interval midpoint, interval length, coverage, unconditional coverage test statistic, independence test statistic and conditional coverage test statistic.The error data, including point forecast errors generated by the weighted local method and the RBF neural network method, are used to investigate whether the hierarchical estimation error method can improve the prediction accuracy for different point forecasts.

The Model and Algorithm of the Optimal Confidence Intervals
Denote by Y t the process to be forecast, and assume it has a continuous and strictly increasing cumulative distribution function.Suppose Clearly, the confidence interval [L t , U t ](L t < U t ) of Y t with confidence level α 0 (0 < α 0 < 1) satisfies: Yaniv and Foster [17] established the accuracy-informativeness trade-off model: where the first variable evaluates accuracy, the second variable evaluates informativeness, y is a truth value, m is the midpoint of prediction interval and g denotes the width of interval.Actually, the accuracy-informativeness trade-off model is a kind of loss function.Yaniv and Foster [17] thought that, for a good interval, the lower the L score, the better.They gave a concrete expression of L: where the coefficient γ ≥ 0 is a trade-off parameter that reflects the weights placed on the accuracy and informativeness of the estimates.Yaniv and Foster [17] supposed that the value of γ is taken from 0.6 to 1.2, close to one.
For a given confidence level α 0 , we take the minimum L as the objective to solve the optimal confidence interval, which can be transformed to find the solution L t * U t * of the nonlinear optimization problem under the condition Ψ t−1 , where: Denote by D t the set of all possible values of Y t ; the constraint conditions are: Then, we can obtain the following simplified objective function.
Proof.See Appendix A.
Thus, we only need to find the solution and that satisfies It is difficult to solve analytic solutions; however, for a strictly increasing and numerical F t , we can establish an algorithm to obtain numerical solutions.
The steps are as follows.
Step 1.Take all L t , U t ∈ D t and L t < U t , and find all increases with the increase of U t for a fixed L t , the value of U t can be solved uniquely.Therefore, we point out that it is not necessary to take all of the values of D t .
Step 2. For each point obtained from the first step , . .., calculate the midpoint Step 3.For every Step 4. Sort all of the L t 1 , L t 2 , . ..; find the smallest L t * ; and record the corresponding

Estimate the Conditional Probability Distribution of Error
Denote by Ŷt the forecast of Y t and by e t = Y t − Ŷt the error, i.e., If we take Ŷt as the optimal point forecast [27], we can estimate e t and obtain the distribution of Y t .In this paper, we apply the empirical method, which means that we can take all obtained point forecast errors as samples of the same probability distribution.We estimate the probability distribution by the kernel function method, for which we give the details below.However, it is rough to take all obtained errors as obeying one distribution.For one forecasting value, if we can collect all corresponding errors, the errors can be considered to obey one distribution.However, in general, the error sample size of one forecasting value is very small.In order to collect as many samples as possible, we can take the errors of one forecasting interval.Therefore, how to choose reasonable forecasting value intervals is very important.
We stratify the prediction error samples evenly according to the forecasting values.First, we divide the N historical forecasting values Ŷk , k = t − 1, t − 2, . . ., t − N into M layers, i.e., M intervals, and record the upper limit and lower limit of every layer.The size of every layer is about N/M.The size of every layer may not be the same, and a 10% difference is admissible.Second, put the errors of every layer forecasts into the error sample set of the layer.For example, when N = 1000, M = 8, the division of the forecasts is shown in Figure 1.
rough to take all obtained errors as obeying one distribution.For one forecasting value, if we can collect all corresponding errors, the errors can be considered to obey one distribution.However, in general, the error sample size of one forecasting value is very small.In order to collect as many samples as possible, we can take the errors of one forecasting interval.Therefore, how to choose reasonable forecasting value intervals is very important.
We stratify the prediction error samples evenly according to the forecasting values.First, we divide the N historical forecasting values { } intervals, and record the upper limit and lower limit of every layer.The size of every layer is about / .N M The size of every layer may not be the same, and a 10% difference is admissible.Second, put the errors of every layer forecasts into the error sample set of the layer.For example, when 1000 N = , 8 M = , the division of the forecasts is shown in Figure 1.When N is fixed, the bigger M is, the smaller / N M is; the smaller M is, the greater / N M is.When / N M is small, the size of the sample is small, and the estimating accuracy declines.When / N M is big, the size of the sample is big, which means the width between two adjacent red lines in Figure 1 will be bigger.At this time, the forecasting values within the same layer have a big difference; taking the errors in this layer as obeying the same probability distribution is not reasonable.In short, M cannot be taken too big or too small.When N is fixed, there is an optimal M to obtain the optimal estimated error distribution, which will be verified in Section 5.

>
), we apply the kernel function method to estimate the error distribution.Assume that the size of each layer is / N M ; the error sequence of the Then, the density estimation of the sequence at point x is ( )


, where φ is the normal kernel function: ( ) When N is fixed, the bigger M is, the smaller N/M is; the smaller M is, the greater N/M is.When N/M is small, the size of the sample is small, and the estimating accuracy declines.When N/M is big, the size of the sample is big, which means the width between two adjacent red lines in Figure 1 will be bigger.At this time, the forecasting values within the same layer have a big difference; taking the errors in this layer as obeying the same probability distribution is not reasonable.In short, M cannot be taken too big or too small.When N is fixed, there is an optimal M to obtain the optimal estimated error distribution, which will be verified in Section 5.
For fixed N and M (N > M), we apply the kernel function method to estimate the error distribution.Assume that the size of each layer is N/M; the error sequence of the i − th Then, the density estimation of the sequence at where φ is the normal kernel function: h i is the bandwidth or smoothing parameter.In this paper, we apply the optimal bandwidth [30] σ i , where σ i = median {|e ik − µ i |} /0.6745, and µ i represents the sample median.

Evaluation of the Prediction Interval
The accuracy of forecast intervals is traditionally examined in terms of coverage.However, only if test values are enough, the coverage can reflect the true confidence level.Bowman [36] describes the use of smoothing techniques in statistics, including both density estimation and nonparametric regression.Christoffersen [37] developed approaches to test the coverage and independence in terms of hypothesis tests.Since his methods do not make any assumption about the true distribution, they can be applied to all empirical confidence intervals.His methods include unconditional coverage, independence and conditional coverage tests.
Suppose that α 0 is the confidence level, and test sample sequence is {e t , t = 1, • • • , N 2 }.First, denote by I t the indicator: where [L t|t−1 (α 0 ), U t|t−1 (α 0 )] is an out-of-sample prediction interval, which denotes the prediction interval of e t constructed by the (t − 1) − th error; L t|t−1 (α 0 ) and U t|t−1 (α 0 ) are the lower limit and upper limit, respectively.Christoffersen [37] proved that N 2 ∑ t=1 I t obeys the binomial distribution B(N 2 , α 0 ).When the capacity of test sample is finite, Christoffersen [37] constructed a standard likelihood ratio test with the null hypothesis H 0 : E(I t |Ω t−1 ) = α 0 and the alternative hypothesis The purpose is to examine, with condition Ω t−1 , whether significantly.If H 0 is accepted, then the coverage of the test sample equals the target confidence level.
the maximum likelihood estimation of α 0 , and n 0 and n 1 denote the number that {I t } "hit" zero and one, respectively.Christoffersen [37] thought that the unconditional test is insufficient when the dynamics are present in the higher order moments.In order to test the independence, he introduced a binary first-order Markov chain with transition probability matrix: where π ij = P(I t = j I t−1 = i) .If independence holds true, then π ij = π j , i, j = 0, 1, where π j = P(I t = j).Therefore, under the null hypothesis of independence, (4) turns to: We .The test statistic under the null hypothesis is: where L( Π1 ) = (1 − π01 ) n 00 πn 01 01 (1 − π11 ) n 10 πn 11 11 , L( Π2 ) = (1 − π1 ) n 00 +n 10 πn 01 +n 11 1 .The above tests for unconditional coverage and independence are now combined to form a complete test of conditional coverage: .

Empirical Analysis
In this paper, the continuous futures daily closing prices of the soybean meal and non-GMO soybean from 4 January 2005 to 25 September 2015 are applied to the interval forecast.All of the data are from the Dalian Commodities Exchange in China.The data capacity is 2612.We use a rolling window approach to point forecast with a fixed bandwidth of 1558.Thus, a total of 1053 forecast values and a total of 1053 errors are obtained.The first 1000 of 1053 are used as the training set to construct the prediction interval, and the last 53 are used as the test set.Our purpose is as follows: (1) if the training set is too small, the accuracy of the error distribution estimation will be reduced; (2) if the training set is too big, the amount of data used in one-step prediction will be reduced, which also reduces the forecast accuracy; (3) in general, different amounts of data used to predict the price will induce different forecasts, and the prediction error is very likely to be negatively related to the amount of data, so we apply the fixed bandwidth method in order to avoid such systematic deviations.
We detected chaos by the method of [32].Results show that the daily closing price data are a time series of chaos.Although the Kolmogorov entropy is positive, it is not big, which means the price system can be described and that price forecasts are feasible.Therefore, we use the classical weighted local region and RBF neural network methods to do the one-step point forecast.The classical weighted local region method is looking for some trajectory points closest to the central point as the correlation point and fitting the reconstructed function.The RBF neural network method uses the radial basis function to forecast.The prediction mechanisms of these two methods are different.The former is representative of the local region method, and the latter is a typical three-layer feedforward neural network.Therefore, the two point methods used in this paper are representative.

Comparison of Different Methods Constructing the Confidence Interval
Tables 1 and 2 show the mean values of the closing price interval forecast in the last 53 days.In the following tables, "lower limit", "upper limit", "interval midpoint", L and "interval width" denote the mean values of the 53 lower limits, 53 upper limits, 53 interval midpoints, 53 loss function values and 53 interval widths, respectively.Table 1 shows the result of soybean meal, and Table 2 shows that of non-GMO soybean.OI (optimal interval) presents the result with intervals constructed by our method; EI (equal probability interval) presents the result with intervals generated by the equal probability method, i.e., the bilateral tail probability equals half of the target confidence; and SI (shortest interval) presents the result with intervals constructed by the shortest interval method, i.e., the shortest interval is chosen, the one among all of the intervals with the target confidence.From Tables 1 and 2, no matter if the confidence level is 80%, 90% or 95%, no matter if the point forecast method is the weighted local region method or the RBF neural network method, the forecast intervals constructed by our method have the smallest loss function value.The loss function of OI is 20% lower than that of EI and is 19% lower than that of SI.In Equation ( 3), sequences Ŷt and e t may contain noise.In this section, different SNR Gaussian white noises are added into the historical error data and the forecast price; the prediction interval and loss function are re-calculated; the absolute relative error percent is obtained with the no-noise results as benchmarks; and the robustness of the algorithm is analyzed.
Tables 3 and 4 list the noise test results of the soybean meal price and non-GMO soybean price.Their point forecast methods are the RBF neural network method and the weighted local region method.The symbol H means that the historical error data are added into noise; P means that the forecast Ŷt is added into the noise; and H and P mean that both historical error data and the forecast are added into the noise.Theoretically, SNR < 10 is strong noise, and SNR > 1000 is weak noise, where SNR means the signal to noise ratio.From the table below, we can know that, when SNR = 1, the forecast results produce a rather big deviation, while for SNR = 100 and 1000, the deviations are not over 0.06%, which can be ignored.Taken together, with the noise intensity increasing, the effect of the result is also increased; the noise added into the historical error data has relatively little effect on the results, and especially when SNR = 10, 100 and 1000, the effects are below 3%.Instead, the noise added into Ŷt has a relatively big effect, especially when SNR = 1 and 10.Therefore, the algorithm to calculate the optimal prediction interval in this paper is robust for noise with SNR ≥ 100.

Optimal Hierarchical Analysis
In this section, we verify that, by empirical analysis: (1) the forecasting values and their corresponding errors are correlated, so estimating the error distribution should be according to different forecast values; (2) stratified error estimations are much better than those without stratified error estimations, since the former can obtain better interval forecasts; (3) for stratified error estimation, there is an optimal number of layers to attain the best interval forecasts.

Comparison of the Error Distribution under Different Hierarchies
First, correlation analysis was performed.We take the first 1000 historical errors obtained by the weighted local region point forecasts and show the scatter plot of the relative error percent in Figure 2. In order to find whether the prediction prices and errors are correlated, the Pearson correlation index 0.1813 is calculated, which indicates that the prices and the errors are significantly correlated at the 0.01 significance level (bilateral).

Optimal Hierarchical Analysis
In this section, we verify that, by empirical analysis: (1) the forecasting values and their corresponding errors are correlated, so estimating the error distribution should be according to different forecast values; (2) stratified error estimations are much better than those without stratified error estimations, since the former can obtain better interval forecasts; (3) for stratified error estimation, there is an optimal number of layers to attain the best interval forecasts.

Comparison of the Error Distribution under Different Hierarchies
First, correlation analysis was performed.We take the first 1000 historical errors obtained by the weighted local region point forecasts and show the scatter plot of the relative error percent in Figure 2. In order to find whether the prediction prices and errors are correlated, the Pearson correlation index 0.1813 is calculated, which indicates that the prices and the errors are significantly correlated at the 0.01 significance level (bilateral).Second, using the method in Section 3, we divide 1000 historical price forecasts into M (M = 1, 2, • • • , 20) layers, where M = 1 means no hierarchy.Denote by A i (i = 1, 2, • • • , M) the A i − th layer.Put the errors corresponding to the prices into each layer, and obtain the error sample in each layer.Third, with the error sample of the A i − th layer, we estimate the probability density in the A i − th layer by the kernel function method.Then, we can get M probability density functions.Fourth, we take the last 53 price data as the test set.Assume the number of layers is M.For each price in the test set, we choose the layer to which it belongs, namely the A * − th layer.The A * − th layer contains at least 50 errors.We pick out the error probability density function of the A * − th layer and construct the optimal interval by the method of Section 2. For convenience, we take γ = 1.Thus, we can collect 53 optimal intervals for a number of layers M. For every M, we can also collect 53 intervals.Finally, we evaluate all of the intervals according to Section 4.
Table 5 and Tables B1 and B2 (see Appendix B) list the results and the evaluation of optimal prediction intervals with target confidence levels of 90%, 95% and 80%, respectively.In the following tables, n 1 presents the number with I t hitting one; the coverage = n 1 n ; according to all of the values of LR uc , LR ind , LR cc (see Section 4), the null hypothesis is accepted at the 0.05 significance (bilateral) level that the coverage of the test samples is equal to the target confidence, and the confidence intervals satisfy the independence; for every element in the test set, we compute the value of the loss function, where L denotes the mean value of the 53 loss function values.From the above table, the intervals constructed by our method satisfy the target confidence level and independence.Since the less the loss function value, the better the interval, the results in these tables mean that the intervals with stratified error estimations are much better than those without stratified error estimations.Therefore, it is efficient to construct prediction intervals with stratifying errors.From the tables, with the increasing of the number of layers, the value of L first decreases and then increases.The reason is that the greater the number of layers, the less the number of errors in each layer and the lower the accuracy of the estimated error density function.Table 5 and Tables B1 and B2 present that the optimal number of layers with 90% and 95% intervals are 13, the optimal number of layers with 80% interval is 14 and the loss function values with stratified error distribution estimations are 18%, 17% and 52% lower than those without stratified error distribution estimations, respectively.

The Effect of Point Forecast Methods on the Error Hierarchy
We use the error sample of the RBF neural network point forecasts to repeat the numerical experiment of Section 5.2.1.The results are shown in Tables A3-A5 (see Appendix B).From  , the intervals with stratified error estimations are better than those without stratified error estimations.For point forecasts obtained by different methods, the method of error hierarchy is helpful to obtain better interval forecasting.However, although all of the intervals pass the unconditional coverage test, intervals with the 90% confidence level and some intervals with the confidence levels of 80% and 95% fail to pass the independence and conditional coverage tests, which shows that the intervals obtained by the RBF neural network errors have poor independence.

Conclusions
In this paper, we deduce the theoretical model of the optimal confidence interval, establish the algorithm to solve the optimal interval, stratify the historical error data according to the prediction   For point forecasts obtained by different methods, the method of error hierarchy is helpful to obtain better interval forecasting.However, although all of the intervals pass the unconditional coverage test, intervals with the 90% confidence level and some intervals with the confidence levels of 80% and 95% fail to pass the independence and conditional coverage tests, which shows that the intervals obtained by the RBF neural network errors have poor independence.

Conclusions
In this paper, we deduce the theoretical model of the optimal confidence interval, establish the algorithm to solve the optimal interval, stratify the historical error data according to the prediction   For point forecasts obtained by different methods, the method of error hierarchy is helpful to obtain better interval forecasting.However, although all of the intervals pass the unconditional coverage test, intervals with the 90% confidence level and some intervals with the confidence levels of 80% and 95% fail to pass the independence and conditional coverage tests, which shows that the intervals obtained by the RBF neural network errors have poor independence.

Conclusions
In this paper, we deduce the theoretical model of the optimal confidence interval, establish the algorithm to solve the optimal interval, stratify the historical error data according to the prediction

The Effect of Point Forecast Methods on the Error Hierarchy
We use the error sample of the RBF neural network point forecasts to repeat the numerical experiment of Section 5.2.1.The results are shown in Tables B3-B5 (see Appendix B).From L, the intervals with stratified error estimations are better than those without stratified error estimations.For point forecasts obtained by different methods, the method of error hierarchy is helpful to obtain better interval forecasting.However, although all of the intervals pass the unconditional coverage test, intervals with the 90% confidence level and some intervals with the confidence levels of 80% and 95% fail to pass the independence and conditional coverage tests, which shows that the intervals obtained by the RBF neural network errors have poor independence.

Conclusions
In this paper, we deduce the theoretical model of the optimal confidence interval, establish the algorithm to solve the optimal interval, stratify the historical error data according to the prediction prices of data, estimate the error distribution by using the nonparametric method, construct the optimal confidence interval of the future price, use the point forecast errors obtained by the weighted region method and the RBF neural network methods as samples and simulate the optimal interval forecast of prices of soybean meal and non-GMO soybean futures.Numerical experiments show that: (1) the forecast intervals constructed by our method have the smallest loss function value; the loss function is 20% lower than that of intervals constructed by the equal probability method and is 19% lower than that of intervals constructed by the shortest interval method; (2) the algorithm to calculate the optimal prediction interval in this paper is robust for noise with SNR ≥ 100; (3) for error data obtained by the different point forecast method and for different target confidence levels, the intervals with stratified error estimations are much better than those without stratified error estimations, and the loss function value can be reduced by up to 52%.The interval forecast method provided in this paper can obtain a conclusion that is more in line with individual requirements, improve the accuracy of the forecasting of agricultural production prices and provide a reference for other economic data forecasts.

Figure 2 .
Figure 2. The historical errors of the weighted local region method.

Figure 2 .
Figure 2. The historical errors of the weighted local region method.

Figures 3 -
5 plot the prediction intervals with M = 1, 2, • • • , 20.The red lines indicate the upper and lower limits of intervals with the optimal number of layers, i.e., M = 13 or 14, the black lines indicate intervals with M = 1 and the green lines indicate intervals with M taking the remaining values.

Figure 3 .
Figure 3.The prediction interval of the 95% confidence level: the weighted local region.

Figure 4 .
Figure 4.The prediction interval of the 90% confidence level: the weighted local region.

Figure 5 .
Figure 5.The prediction interval of 80% confidence level: the weighted local region.

Figure 3 . 19 Figure 3 .
Figure 3.The prediction interval of the 95% confidence level: the weighted local region.

Figure 4 .
Figure 4.The prediction interval of the 90% confidence level: the weighted local region.

Figure 5 .
Figure 5.The prediction interval of 80% confidence level: the weighted local region.

Figure 4 . 19 Figure 3 .
Figure 4.The prediction interval of the 90% confidence level: the weighted local region.

Figure 4 .
Figure 4.The prediction interval of the 90% confidence level: the weighted local region.

Figure 5 .
Figure 5.The prediction interval of 80% confidence level: the weighted local region.

Figure 5 .
Figure 5.The prediction interval of 80% confidence level: the weighted local region.

Table 1 .
The interval forecasts of soybean meal.OI, optimal interval; EI, equal probability interval; SI: shortest interval.

Table 2 .
The interval forecasts of non-GMO soybean.

Table 3 .
The weighted local region: soybean meal.H, historical error data.

Table 5 .
The weighted local region: the optimal interval-confidence level of 90%.

Table B3 .
The RBF network: the optimal interval-confidence level of 90%.Denotes that the null hypothesis is rejected at the 0.05 significance (bilateral) level; ** denotes that the null hypothesis is rejected at the 0.01 significance (bilateral) level. *

Table B4 .
The RBF network: the optimal interval-confidence level of 95%.Denotes that the null hypothesis is rejected at the 0.05 significance (bilateral) level; ** denotes that the null hypothesis is rejected at the 0.01 significance (bilateral) level. *