A Fuzzy-Statistical Tolerance Interval from Residuals of Crisp Linear Regression Models

: Linear regression is a simple but powerful tool for prediction. However, it still su ﬀ ers from some deﬁciencies, which are related to the assumptions made when using a model like normality of residuals, uncorrelated errors, where the mean of residuals should be zero. Sometimes these assumptions are violated or partially violated, thereby leading to uncertainties or unreliability in the predictions. This paper introduces a new method to account for uncertainty in the residuals of a linear regression model. First, the error in the estimation of the dependent variable is calculated and transformed to a fuzzy number, and this fuzzy error is then added to the original crisp prediction, thereby resulting in a fuzzy prediction. The results are compared to a fuzzy linear regression with crisp input and fuzzy output, in terms of their ability to represent uncertainty in prediction.


Introduction
In classical linear regression models, assumptions like linearity, fixed independent variables, normality of residuals, uncorrelated errors are made to simplify model estimation procedures. Despite these assumptions, the results are often taken at face value with very little effort, to adequately represent the uncertainty in predictions made by the model. Uncertainties in linear regression models are often represented via confidence and prediction intervals, which may not be adequate, since only one interval is calculated; e.g., 95% confidence or the prediction interval. Fuzzy linear regression with crisp input can be used to better represent uncertainty in prediction. This fuzzy model, first proposed in [1], has been widely used as alternative to classical crisp linear regression models. Since then, there have been various modifications to the model to overcome certain limitations of the original model [2][3][4][5]. Although such fuzzy linear regression models can represent uncertainty, there has always been doubt in terms of their suitability for prediction of future values (see a discussion on this in [6]). The problem is that minimization of fuzziness of the model is done, such that model fits the available sample with a certain h-value. There is no connection with prediction of future values like those of classic regression. Recently, there has been an attempt by [7] to create fuzzy numbers from predictions made by classic linear regression models, without the need of optimization or assuming any fuzzy coefficient, or fuzzy input or output. In their work, confidence and prediction intervals from crisp linear regression are converted to fuzzy numbers, by superimposing intervals and deriving the equivalent membership functions using fuzzy estimators. In this paper, we use the technique that was introduced in [7], and we propose a new approach to fuzzify the outputs of crisp linear regression models, for use in the case of tolerance intervals that are required, instead of prediction intervals. In our proposed approach, we assume that errors are normally distributed and, as such, we can construct a tolerance interval Mathematics 2020, 8, 1422 2 of 10 of normal distribution for the errors. This tolerance interval will contain at least a proportion of the errors, both those in the sample and those outside of the sample (future predictions). Using the method proposed in [8], we construct a fuzzy number by superimposing the tolerance intervals up to the mean error. We then use this fuzzy tolerance interval as a fuzzy estimate of the error in our model. This is similar to the error proposed in [9], which uses crisp coefficients, and estimates a fuzzy error using optimization. This is to avoid the issue of the increasing magnitude of spread with an increasing independent variable. To complete the process, we add the fuzzy tolerance error to our crisp estimate, thereby resulting in a fuzzy estimate. The advantage of the proposed approach is that all possible statistical errors in the model are represented in one interval. Statistical tolerance intervals are, by definition, different from prediction intervals, and they serve a different purpose. Tolerance intervals give the percentage of population coverage interval with some confidence level, while a prediction interval will give the coverage interval for a single prediction. The interpretation and calculation of both intervals are also different. This paper thus extends the work in [7,8], to produce fuzzy statistical tolerance intervals. The method proposed here is important for applications where a tolerance interval is needed, instead of a prediction interval. The tolerance interval covers both the confidence interval and the prediction interval. Thus, both in-sample, out-of-sample, and future errors are covered. It also uses the information that model errors are normally distributed, so the better the original crisp model, the better the fuzzy model. Finally, the proposed method produces a fuzzy output with only crisp input, and crisp parameters without the need for optimization.

Crisp Linear Regression
A linear regression model with n independent variables and one dependent variable can be written as: where Y k is the dependent variable, X ki , i = 1, 2, . . . , n are the independent variables, α 0 , α 1 , . . . ,α n are the coefficients which need to be estimated, and ε k is the random error of the model. The linear regression model assumes that errors are normally distributed with zero mean and constant variance, i.e., ε k ∼ N(0, σ). The method of least squares is the most common way of estimating the model parameters. The parameters are calculated as: where A is a vector of the parameters, X is a matrix of explanatory variables, and Y is a vector of the response variable. A thorough examination of the distribution of errors is usually done after model estimation, to check the validity of the model. To account for uncertainty in prediction due to random errors, prediction and confidence intervals are easily constructed both for the estimated parameters and for the predicted response. The (1 − α)% prediction intervals are given as follows: where X is a matrix of explanatory variables, Y is the estimated response variable, K is the sample size, MSE is the estimate of the mean-squared error of the model, t 1− α 2 ,K−p is a t-distribution with K−p degrees of freedom, and h f = x f X T X −1 x f , where x f is the row-vector of that observation.

Fuzzy Linear Regression
The classical fuzzy linear regression model proposed by [1] is similar to the crisp linear regression. The main difference is that the parameters are fuzzy numbers. This results in a fuzzy output for the response variable. The model is given below: where A i (i = 1, 2, . . . , n) are symmetric triangular fuzzy numbers of the form (c i , r i ), c i is the center of the triangular fuzzy number and r i is the spread. The objective of the fuzzy linear regression model is to minimize the uncertainty by minimizing the spreads of the fuzzy numbers. This results in the linear optimization problem below [1]: c i x ij with the following constraints: The value 0 ≤ h ≤ 1 represents the confidence level of the model and the membership value of all all responses in the sample should be at least h i.e., µ y j ≥ h for j = 1, 2, . . . , K.

Proposed Method
Suppose that we have estimated a linear regression model and checked its validity with the necessary residual plots. That is, we assume that the model is well calibrated, and the distribution of errors are approximately normal. From every observation, our model produces an error: where y k is the real value of the response variable andŷ k is the estimate from the model. We assume that all errors in the model come from a normal distribution, with an unknown mean and unknown standard deviation. To accommodate the uncertainty in our errors, we can construct confidence intervals for the mean, or standard deviation of errors. However, this does not give us a bound on all errors that the model can produce, but only a bound for the mean. Additionally, a prediction interval can only hold for a particular prediction and, thus, it is not valid for other predictions. To accommodate both sample errors and future prediction errors, we opt to use a tolerance interval to bound the errors; an interval which contains p% of all errors with confidence γ%. To simplify calculations, we do not focus on tolerance intervals for Y given a particular X (see for example [10]). Rather, we focus on calculating a general tolerance interval of a random sample originating from a normal distribution. We treat all errors in our model estimation as a random sample and try to find a tolerance interval for such errors. There are various ways to calculate the tolerance interval of a sample from a normal distribution [11][12][13]. For simplicity we choose, the approximation offered by [13]. The tolerance interval is defined below: x ± K 2 s where x = ε is the sample mean of the errors, s the sample standard deviation of the errors, K 2 is the tolerance factor, χ 2 1−γ,ν is the critical value of the chi-square distribution with degrees of freedom ν that is exceeded with probability γ and z (1+p) To represent more proportions, we convert this interval to a fuzzy interval by superimposing all proportions up to p = 0%, while the confidence level γ% is kept constant. Using the method proposed in [14,15], which was generalized in [8], we convert this interval to a fuzzy number with explicit membership function. In [8], it was shown that any interval of the form [x − m f (α) x + m f (α)] can be converted to a fuzzy number with explicit membership function if an appropriate function f (α) is chosen, where m is a constant and f (α) a function of α-cut of the corresponding fuzzy number. It has been shown that the inverse cumulative distribution functions are proper candidates for (α). Since z 1+p 2 is an inverse cumulative distribution function, the superimposed intervals can be converted to a fuzzy number. Now, the tolerance interval [x − K 2 s x + K 2 s] can be written in the form: The tolerance interval is then written as: The interval above is exactly the (1 − β)% tolerance interval with confidence γ%. With the above substitutions, the following membership function can be derived for the interval: The a-cut of the fuzzy number above is [8]: is the inverse cumulative distribution function (cdf) of a normal distribution and h(α) is any monotonic non-decreasing function h(α) Constructing a fuzzy output from the crisp linear regression, using the fuzzy number constructed from the tolerance interval of the errors, a crisp prediction from a linear regression model can be converted to a fuzzy output by adding the fuzzy error. The fuzzy output can be then written as: where ε is the fuzzy error constructed from the tolerance interval using the procedure described above. The reason for having only one error and not a prediction specific error is that it considers all errors in the model and, thus, it gives more conservative estimates compared to a prediction interval. Note that this approach is similar to the fuzzy linear regression model proposed in [9] but the error that they propose there is a fuzzy error estimated from an optimization process, and is not based on any statistical interval like the current one. Additionally, our method is similar to the one proposed in [7], but there the authors do not use the tolerance interval for errors like we do here. In their approach they convert well-known confidence and prediction intervals from linear regression to fuzzy numbers, using the same procedure used here. Our model can be viewed as a fuzzy linear regression model with a fuzzy error constructed from a statistical tolerance interval. This can be used in the case where a tolerance interval is of interest, rather than a prediction or confidence interval.

Case Study
In order to test the applicability of our fuzzy linear regression model, two datasets from the fuzzy literature are used [16] and two real datasets from the classical linear regression literature (car data, UCI machine learning repository [17], Hald Cement data [18,19]. We compare the results to those of classical linear regression, to fuzzy linear regression of [1] and the fuzzy prediction intervals proposed in [7]. The datasets from the fuzzy literature are shown in the Tables 1-3 below. Note that the comparison with a classical fuzzy model, and fuzzy prediction interval proposed in [7] is done for clarity, rather than to compare predictive performance. As it is well-known, tolerance interval, prediction interval, and the spreads from a fuzzy linear regression have different interpretations and are used for different purposes. Therefore, the results are not directly comparable. However, all three models share some similarities. For example, they can measure how close the true value is to our predicted value using the membership function of the predicted value. In addition, they can measure how much uncertainty is in the model, by using the spreads of the predicted values. In [16], the credibility of a predicted value is measured by how close it is to the original value and also how precise it is (i.e., how small the spreads of the value). The credibility of a fuzzy predicted value is defined as [16]: where µ y i (y i ) is the membership value of the true value, y i in the fuzzy predicted value y i and ∆ y i is the fuzziness of the prediction which is equivalent to the area of the fuzzy number, which is a symmetric triangular fuzzy number with height of 1. The area is just the spread of fuzzy number, i.e., the difference between the central value and the right or left value. For a model with sample size K, the total credibility of the model is the sum of the credibility of the individual sample predictions. The total credibility is given by: Tables 4-15 below show the total credibility of all three models on all two datasets. The comparison with fuzzy linear regression is made for the h-value with the maximum credibility (indicated with * in the tables). However, since the h-value is not comparable to a statistical confidence level, we also show the lowest membership value of the true value in the fuzzy number, that is produced from both a prediction and tolerance interval. A visual comparison is also shown in Figures 1 and 2. Since every data now belongs to the interval with some membership value, it is possible to calculate the minimum membership value in the dataset; this is what is defined as the lowest membership value, and it gives an indication of how good the coverage interval is. A value of zero indicates that there is at least one data that does not belong to the interval, while a value greater than zero implies that all data belong to the interval.  Table 6. Linear model based on the fuzzy tolerance intervals (data from Liu and Chen, 2013 [16]).  Table 8. Linear Model based on the fuzzy prediction intervals (Data from Liu and Chen, 2013 [16]).    Table 11. Linear model based on the fuzzy prediction intervals (Car data, UCI Repository).

Conclusions
Errors from a linear regression model are assumed to be normally distributed, with a mean of zero and constant standard deviation. In this study, we have shown how to construct a fuzzy error

Conclusions
Errors from a linear regression model are assumed to be normally distributed, with a mean of zero and constant standard deviation. In this study, we have shown how to construct a fuzzy error The credibility of the proposed model for this dataset is slightly higher than that of the classical fuzzy model and lower than that of the fuzzy prediction intervals. As it can be seen, the credibility of all three models is within the same range of values, and this confirms the applicability of the proposed model. As expected, the credibility falls as the coverage probability increases. The 95% coverage seems to be the best coverage, since it contains all true values and a high credibility.
As with the previous dataset, the credibility of the proposed model is within the same range as the classical fuzzy model, and the fuzzy prediction interval further confirms its applicability. However, the values for both 90%, 95% and 99% coverage are slightly lower than the other two models. Again, the credibility falls as the coverage increases. All the three intervals contain the true values and with high membership values. The 90% coverage seems to give the best trade-off between total credibility and lowest membership value of true value.

Conclusions
Errors from a linear regression model are assumed to be normally distributed, with a mean of zero and constant standard deviation. In this study, we have shown how to construct a fuzzy error from the tolerance interval of the errors in a linear model. This approach leads to a fuzzy linear regression model with a constant fuzzy error. The fuzzy error is constructed entirely from the observed errors and it does not need to be optimized like the usual fuzzy regression models. The current model is useful for applications where a tolerance interval is needed to bound acceptable errors, in both sample and future predictions of a linear regression model. For clarification, our method does not remove uncertainties in regression models, however, it gives a better method to use a fuzzy number to represent these uncertainties, instead of a single number or a single interval like it, as is normally done. In addition, choosing only one coverage interval, say 99%, it does not fully represent all the uncertainty in the model. In contrast, the fuzzy number produced contains all the uncertainty from, for example, 99% up to 0%.
In addition, by using a fuzzy number, another level of uncertainty is captured. Whether a number belongs to an interval or not is no longer crisp (0 or 1), but now one can say that a number belongs to the interval with some possibility [20], say 0.7. This captures the uncertainty in the assumptions that the interval will contain, say, 99% of the data. The fuzzy number shows that some data are more likely to be captured in the interval, compared to others.
Unlike crisp predictions, the true values can belong to the fuzzy output with a membership degree. As an added advantage, the total credibility of the model can be used to select among two linear regression models, and to choose the most credible model in terms of tolerance intervals for predictions. Additionally, similar to how h is used in a classical fuzzy model, a decision-maker can choose a linear model whose true value belongs to the fuzzy predicted values, with at least a specified membership degree.
The limitation of the model is that the fuzzy error is constant and is added to all crisp output. So, all outputs are assumed to be affected by the same error. In future research, non-constant tolerance interval could be used, which would depend on the input. The same procedure, for constructing a fuzzy number from a statistical interval, can be used to convert the tolerance intervals to fuzzy numbers. Last, but not least, different approximations exist for tolerance intervals, and they can be used apart from the one used in this paper. It would be worthwhile to extend this method to non-parametric tolerance intervals for solving the problem of assuming a known distribution.