Ridge Fuzzy Regression Modelling for Solving Multicollinearity

: This paper proposes an α -level estimation algorithm for ridge fuzzy regression modeling, addressing the multicollinearity phenomenon in the fuzzy linear regression setting. By incorporating α -levels in the estimation procedure, we are able to construct a fuzzy ridge estimator which does not depend on the distance between fuzzy numbers. An optimized α -level estimation algorithm is selected which minimizes the root mean squares for fuzzy data. Simulation experiments and an empirical study comparing the proposed ridge fuzzy regression with fuzzy linear regression is presented. Results show that the proposed model can control the effect of multicollinearity from moderate to extreme levels of correlation between covariates, across a wide spectrum of spreads for the fuzzy response.


Introduction
Often times in practical applications, the available data may not always be precise. The researcher may be only accessible to minimum and maximum values of data. Sometimes the data may not even be given in numbers. For instance, consider linguistics data such as "young", "tall", or "high", and medicine data such as "healthy" and "not healthy". In such cases where the given data are imprecise and vague, classical representation of numbers may be insufficient. The fuzzy set theory introduced by Zadeh [1,2] can handle such uncertainty in data. In the view of fuzzy set theory, uncertain data are what is called fuzzy. Fuzzy data are prevalent in various fields such as linguistics, survey, medicine and so forth [3][4][5][6][7]. The development of fuzzy set theory has led to statistical methods for analyzing fuzzy data. When the measure of indeterminacy is needed, the neutrosophic set introduced by Smarandache [8] considered the measure of indeterminacy in addition to the fuzzy set. The neutrosophic statistics based on the the neutrosophic set can be applied for the analysis of the data when data are selected from the population with uncertain, fuzzy, and imprecise observations [9].
In classical linear regression models, the multicollinearity phenomenon is frequently observed in which two or more explanatory variables are highly linearly related. Common examples of collinear covariates are: a person's height and weight, a person's level of education, gender, race, and starting salary. When multicollinearity occurs, the least squares estimator may not be obtainable or be subject to very high variance. Once the researcher identifies the collinear variables, there are several techniques the researcher can use to handle multicollinearity. Among these techniques, the two most widely used approaches are lasso regression and ridge regression. Lasso regression developed by Tibshirani [17] and ridge regression developed by Hoerl and Kennard [18] improve model performance by adding a penalty term to the classical linear regression model. Both methods aim to shrink the model parameters towards zero. This induces a sparse model which increases the model bias, but decreases the model variance even more, thus improving overall performance. Ridge regression decreases the parameters of low contributing variables towards zero, but not exactly to zero, and stabilizing the parameter variance of the least squares estimator in the presence of multicollinearity. Lasso regression sets the model parameters exactly to zero, removing low contributing variables as well as improving model fitting. However, sometimes the researcher may want to include all the available covariates in the model without having to reduce the dimension of the data. In such cases, ridge regression is preferred to lasso regression.
Similar to classical linear regression models, multicollinearity occurs frequently in fuzzy linear regression models as well, causing problems in the estimation procedure. Often times the number of covariates is not particularly large for fuzzy data. Consequently, dropping any explanatory variables may not be an option. As in the classical statistical setting, we prefer to use ridge regression to lasso regression to handle multicollinearity in such datasets. In this paper, we incorporate fuzzy set theory with ridge regression developed by Hoerl and Kennard [18] to handle multicollinearity observed in fuzzy data. Only some works have suggested ridge estimation methods for fuzzy linear regression, and are limited to obtaining fuzzy ridge estimators which are dependent on the distance between fuzzy numbers [19][20][21]. We instead propose an α-level estimation algorithm for ridge fuzzy regression modelling. The proposed algorithm is an extension of the ridge regression model introduced in Choi et al. [22]. By applying α-levels to the estimation procedure, we are able to construct a fuzzy ridge estimator which does not depend on the distance between fuzzy numbers. Simulation experiments show the proposed ridge fuzzy regression model can solve moderate to severe degrees of multicollinearity across a wide range of spreads for the fuzzy response. An empirical study using Tanaka's house prices data [10] with multicollinearity, the most widely applied data set in the fuzzy linear regression literature, is conducted to demonstrate the practical implementations.
The rest of this paper is organized as follows. Section 2 introduces key definitions and results from fuzzy set theory. Section 3 describes the classical ridge regression, followed by a step-by-step procedure for the proposed α-level estimation algorithm of ridge fuzzy regression modeling. Sections 4 and 5 illustrates the performance of the model with simulation studies and a numerical example, respectively. Section 6 concludes the paper.

A fuzzy set is a set of ordered pairs
is a membership function which represents the degree of membership of x in a set A. Please note that when A is a crisp (classical) set, its membership function can take only the values one or zero depending on whether or not x does or does not belong to A. In this case, µ A (x) reduces to the indicator function I A (x) of a set A. For any α in [0, 1], the α-level set of a fuzzy set A is a crisp set A(α) = {x ∈ X : µ A (x) ≥ α} which contains all the elements in X with membership value in A greater than or equal to α. The α-level set of a fuzzy set A can also be represented by Here l A (α) and r A (α) are the left and right end-points of the α-level set, respectively. Zadeh's [23] resolution identity theorem states that a fuzzy set can represented by its membership function or by its α-level set. Let A be a fuzzy number with membership function µ A (x) and α-cut A(α). Then we have A fuzzy number is a normal and convex subset of the real line R with bounded support. The support of a fuzzy set A is defined by supp(A) = {x ∈ R|µ A (x) > 0}. The following parametric class of fuzzy numbers, the so-called LR-fuzzy numbers denoted by A = (a m , s l , s r ) LR , is often used as a special case: where L, R : R → [0, 1] are fixed, left-continuous, and non-increasing functions with R(0) = L(0) = 1, and R(1) = L(1) = 0. L and R are called left and right shape functions of A, respectively. a m is the mean value of A, and s l , s r > 0 are each the left and right spreads of A. The spreads s l and s r represent the fuzziness of the fuzzy number and can be symmetric or asymmetric. If s l = s r = 0, the LR-fuzzy number becomes a precise real number with no fuzziness. Thus, a precise real number can be considered to be a special case of a fuzzy number. For a precise observation a ∈ R, its corresponding membership function is µ a (x) = 1.
In the fuzzy set theory, triangular and trapezoidal fuzzy numbers are special cases of LR-fuzzy numbers and are used extensively [24]. The membership function of a triangular fuzzy number A = (a l , a m , a r ) T is given by where a l , a m , and a r are the left end-point, mid-point, and right end-point, respectively.

Ridge Fuzzy Regression
In this section, we propose the α-level estimation algorithm for the proposed ridge fuzzy regression model. This algorithm modifies the method based on Choi et al. [22] to estimate the fuzzy parameters. The term α-level estimation indicates that our algorithm uses α-levels to describe fuzzy data. By using α-level, we are able to develop a ridge fuzzy estimator which is not restricted to the distance between fuzzy numbers. We first briefly examine the original formulation of ridge regression model for crisp data.

Ridge Regression
Given a data set {y i , x i1 , x i2 , · · · , x ip } N i=1 , a multiple linear regression model assumes that the relationship between a dependent variable y i , i = 1, · · · , N and a set of explanatory variables x i1 , x i2 , · · · , x ip , i = 1, · · · , N is linear. The model takes the form or written alternatively in matrix notation as Y = X β + . A vector Y = (y 1 , · · · , y N ) t is a vector of observations on the dependent variable, X = (X t 1 , · · · , X t N ) t is a matrix of explanatory variables, β = (β 0 , β 1 , · · · , β p ) t is a vector of regression coefficients to be estimated, and = ( 1 , · · · , N ) t is a vector of error terms. The standard estimator for β is the least squares estimator defined bŷ In the presence of multicollinearity, i.e., in state of extreme correlations among the explanatory variablesβ is poorly determined and susceptible to high variance. Thus, we may deliberately bias the regression coefficient estimates so as to control their variance. In this manner, the ridge regression estimator was introduced by Hoerl and Kennard [18] as a penalized least squares estimator. It is achieved by minimizing the residual sum of squares (RSS) subject to a constraint on the size of the estimated coefficient vector [25]: Here λ ≥ 0 is a shrinkage parameter which controls the size of the coefficients. The larger the value of λ, the greater the amount of shrinkage, and we have coefficients close to zero. The smaller the value of λ is close to 0, we obtain the least squares solutions. Please note that by convention the input matrix X is assumed to be standardized and Y is assumed to be centered before solving RSS(λ). The ridge regression solution isβ where I is the p × p identity matrix. The shrinkage parameter λ is usually selected via K-fold cross validation. Cross validation is a simple and powerful tool often used to calculate the shrinkage parameter and the prediction error in ridge regression. The entire dataset is divided into K parts, and trains the model on all but the kth part. The model is validated on the k th part, iterating for all k = 1, · · · , K. The choice of K is K = 5 or K = 10 in general.

Ridge Fuzzy Regression Algorithm
Let us consider a set of observations where the dependent variable y i , i = 1, · · · , N and the explanatory variables x i1 , x i2 , · · · , x ip , i = 1, · · · , N are triangular fuzzy numbers. We assume a linear relationship between the dependent and explanatory variables: where are the fuzzy error terms. ⊕ and represent addition and multiplication between two fuzzy numbers, respectively. Often the N equations are stacked together and written in matrix notation as For more details on arithmetic operations between fuzzy numbers, see [10,26]. Please note that the above fuzzy variables can be symmetric or asymmetric, and be extended to various forms such as normal, parabolic, or square root fuzzy data. Since crisp sets are a special case of fuzzy sets, fuzzy inputs and fuzzy outputs, or fuzzy inputs and crisp outputs combinations are also possible. For illustration purposes, in this section, we present our ridge fuzzy regression model using triangular membership functions.
We divide the given data into training and test sets. The model is computed from the training set , and later its performance is evaluated on the test set Note again that N is the total number of observations, n is the number of observations for the training set, and m is the number of observations for the test set, such that n + m = N. We fit our ridge fuzzy regression model on the training set by the following estimation algorithm: Step 1: Create α-level sets of the triangular fuzzy input and output as illustrated in Figure 1.
where s y il , s y ir , s x ijl , s x ijr ≥ 0 are the left and right spreads of the dependent and explanatory variables, respectively. The α-levels are denoted by the sequence (α Step 2: Perform ridge regression of Y(α k ) on X(α k ) for each k = 0, · · · , K. Find the intermediate estimators l A (α k ) and r A (α k ) of l A (α k ) and r A (α k ) by minimizing the following respective ridge loss functions (see Figure 2).
We assume the endpoints of the α-level set of Y has been centered and the endpoints of α-level set of X has been standardized as is by convention in classical ridge regression [25]. Step 3: Obtain the estimators l A (α k ) and r A (α k ) of l A (α k ) and r A (α k ) by modifying the intermediate estimators l A (α k ) and r A (α k ) so that the estimated coefficients form the membership function of a triangular fuzzy number. For this the following operations are performed (see Figure 3). Step 4: Estimate the triangular fuzzy coefficient A = ( A l , A m , A r ) T and its membership function µ A by fitting a linear regression line through l A (α k ) and r A (α k ) for k = 0, · · · , K, respectively. A constraint is given so that µ A satisfy the condition of µ A ( l A (1)) = µ A ( r A (1)) = 1 (see Figure 4). Step 5: Symmetric fuzzy inputs or outputs do not always guarantee that the estimated membership function µ A will also be symmetric. To reduce the difference between the true values with the fitted values we consider the following candidates: or where We present two performance criteria based on Diamond's fuzzy distance measure [27] to evaluate the proposed fuzzy estimators. Denote the dependent variable as y i = (y il , y im , y ir ) T , i = 1, · · · , n, and its predicted value as y i = ( y il , y im , y ir ) T = (X t il A l , X t im A m , X t ir A r ) T , i = 1, · · · , n. Here n is the number of observations for the training set. We defined RMSE F (root mean square error for fuzzy numbers) and MAPE F (mean absolute percentage error for fuzzy numbers) as below.
Compute the RMSE F for each of the membership functions, then select the one which minimizes the criterion.
Step 6: Repeat Steps 1-5 for selected α-level sequences (α k ) K k=0 with α k equally spaced between 0 and 1. Choose the optimal set of α-levels which minimizes RMSE F . Finally, compute the fuzzy ridge coefficient estimate A based on that selected sequence.

Simulation Study
A simulation study was conducted to illustrate the performance of the proposed ridge fuzzy regression model in the presence of multicollinearity. Simulation results are compared with the fuzzy linear regression model with varying degrees of correlation. The fuzzy least squares estimator is obtained by setting the tuning parameter λ as zero in Step 2 of Section 3.2.
We generated N = 100 observations for each of the p = 4 crisp explanatory variables. The number of data dimensions is in line with commonly found fuzzy data. Following Gibbons [28], the explanatory variables x ij are generated by where ρ is a given constant and z ij are generated from independent normal distributions with mean 50 and variance 1. Here x ij are assumed to be non-negative so as to reflect the non-negative characteristics of real world fuzzy data. The degree of linear association between explanatory variables is controlled via ρ, where in this case is the correlation between any two explanatory variables is ρ 2 . Three different sets of correlation are considered corresponding to ρ = 0.8, 0.9, and 0.99. Each value of ρ stands for moderate, high, and very high correlation between the variables. Observations on the fuzzy dependent variable are generated by 200 replicates for each scenario are generated. The explanatory variables and the fuzzy coefficients remain fixed, while the error terms and hence the fuzzy dependent variable changes. We separated the simulated data into training and test sets. Once the ridge fuzzy regression model and the fuzzy linear regression model are fit to the training data, RMSE F and MAPE F are computed from the test set for t = 1, · · · , 200 replicates. Let RMSE t F and MAPE t F be the performance measures when the fuzzy model is applied to the replicate t. The following quantities are then computed for each fuzzy estimator: In addition, we fit the ridge regression model and the linear regression model on the mid-point of our training data {y im , x i1m , x i2m , · · · , x ipm } n i=1 for comparison with fuzzy methods.

Empirical Study
In this section, we demonstrate the performance of the proposed ridge fuzzy regression model on an illustrative example taken from Tanaka [10]. The performance of the ridge fuzzy regression estimator is compared with the fuzzy least squares estimator for crisp explanatory variables and a fuzzy dependent variable. The linear regression fuzzy model from Tanaka [10] is further compared to illustrate the performance of the ridge fuzzy regression model. For both the ridge fuzzy regression and the linear fuzzy model, the α-level sequences α k = r × k, k = 0, · · · , K for some r and K are chosen as candidates for Step 6 of the estimation algorithm in Section 3.2. The list of α-level sequences is presented in Table 7.

Example: House Prices Data
Tanaka et al. [10] presents a data set concerning the price mechanism of prefabricated houses. The relationship between five crisp inputs (rank of material, first floor space (m 2 ), second first floor space (m 2 ), number of rooms and number of Japanese-style rooms) and a fuzzy output (house price) is investigated. The complete data is shown in Table 8. The fitted values for the ridge fuzzy model and the linear fuzzy model is shown in Table 9. Results show the predicted values from the ridge fuzzy regression more accurately describes the original data than fuzzy linear regression. This is again clarified in Figure 5. In the triangular fuzzy plot of the observed and fitted values, a comparison of the two models is shown. The black triangles correspond to the observed values, the red triangles in Figure 5a to the ridge fuzzy fitted values, and the blue triangles in Figure 5b to the fuzzy linear fitted values. Both methods estimated the mid-points of the fuzzy dependent variable well. The spreads however, are shorter for the proposed ridge fuzzy regression than the other. The fitted equation for the ridge fuzzy regression is given by  An analysis of the α-level sequences used in Step 6 of the estimation algorithm is presented in Figure 6. The α-level sequence which minimizes RMSE F was chosen as the optimal α-level sequence for each of the models. The red dots in Figure 6a,b each indicate the chosen α-level sequence based on RMSE F . For the ridge fuzzy regression, α k = r × k, k = 0, · · · , K with r = 0.01, K = 100 was chosen. In the case of fuzzy linear regression, r = 0.5, K = 2 was selected.  Table 9. Fitted values of house prices data.    In Table 10 the performance measures RMSE F and MAPE F ridge fuzzy regression are compared with the fuzzy linear regression model and the linear regression fuzzy model from Tanaka et al. [10].
Clearly both measures are greatly reduced for the ridge fuzzy regression compared to the other models, suggesting that the proposed ridge fuzzy regression model provides a better fit of the data in comparison to the two methods.

Conclusions
This paper proposes an α-level estimation algorithm for ridge fuzzy regression modeling, extending the ridge regression model introduced in Choi et al. [22]. As shown in simulation studies and an empirical study, the proposed ridge fuzzy regression model can handle fuzzy data sets with crisp inputs and triangular fuzzy outputs. The same procedure is available with fuzzy inputs and fuzzy outputs, or fuzzy inputs and crisp outputs. In previous works, estimation methods for ridge fuzzy regression depend on the distance between fuzzy numbers. By incorporating α-levels to ridge fuzzy regression, we are able to construct the ridge fuzzy estimator without having to define the distance between fuzzy numbers. Simulation results show the ridge fuzzy regression model reduces the effect of multicollinearity over a wide range of spreads for the fuzzy response, for various levels of correlation between inputs. In the illustrative example taken from Tanaka et al. [10], we have shown the practical implementations of our method. Comparison is made with fuzzy linear regression with respect to RMSE and MAPE for fuzzy numbers. Overall these results demonstrate the effectiveness of ridge regression in fuzzy data.
An importance point to note is that typically ridge regression is preferred over lasso regression when the objective of research is to handle multicollinearity while not wanting to remove low contributing variables. However, when the dimension of the data is large and dropping collinear variables is necessary, one may use lasso regression rather than ridge regression. To manage such cases, in future studies we plan to extend the proposed α-level estimation algorithm for ridge fuzzy models to lasso fuzzy regression models. Lasso fuzzy regression will be especially useful for modeling correlated genetic data sets.