Hesitant Fuzzy Linear Regression Model for Decision Making

: An expert may experience difﬁculties in decision making when evaluating alternatives through a single assessment value in a hesitant environment. A fuzzy linear regression model (FLRM) is used for decision-making purposes, but this model is entirely unreasonable in the presence of hesitant fuzzy information. In order to overcome this issue, in this paper, we deﬁne a hesitant fuzzy linear regression model (HFLRM) to account for multicriteria decision-making (MCDM) problems in a hesitant environment. The HFLRM provides an alternative approach to statistical regression for modelling situations where input–output variables are observed as hesitant fuzzy elements (HFEs). The parameters of HFLRM are symmetric triangular fuzzy numbers (STFNs) estimated through solving the linear programming (LP) model. An application example is presented to measure the effectiveness and signiﬁcance of our proposed methodology by solving a MCDM problem. Moreover, the results obtained employing HFLRM are compared with the MCDM tool called technique for order preference by similarity to ideal solution (TOPSIS). Finally, Spearman’s rank correlation test is used to measure the signiﬁcance for two sets of ranking.


Introduction
The fuzzy set theory introduced by Zadeh [1] provides an excellent base to work in uncertain and ambiguous situations with incomplete information. The fuzzy set theory has been applied in different research areas to handle uncertainty, such as medical and life sciences [2,3], management sciences [4,5], social sciences [6], engineering [7], statistics, artificial intelligence [8], robotics, computer networks, and decision making [8][9][10][11][12]. As an important extension of fuzzy set theory, Torra [13] introduced a hesitant fuzzy set (HFS) which allows a set of possible values and takes the degrees of membership it can express the hesitant information more comprehensively. HFS attracted many researchers in a short period because of its frequent usage in hesitant situations in real-world problems. Recently, many researchers have shown great attention to work with the decision-making problems in the framework of hesitant fuzzy information [14]. For example, Mardani et al. [15] proposed an extended approach under HFSs for assessing the key challenges of digital health intervention adoption during the COVID-19 pandemic, Narayanamoorthy et al. [16] suggested an approach for the site selection of underground hydrogen storage based on normal wiggly dual HFSs, Dong and Ma [17] developed an enhanced fuzzy time series model based on hesitant differential fuzzy sets and error learning, and so on.
It is a tedious process to seek out the most straightforward alternative among the available choices. An outsized number of techniques are being used to facilitate the decision makers (DMs) for ranking alternatives in decision-making problems. Since 1960, MCDM has been an active research area, and there are several methods that DMs frequently use for decision making such as TOPSIS [18], Best Worth Method [19] (and its extension [20]), Evaluation based on Distance from the Average Solution [21], and so on. In the current era, several professionals have employed MCDM methods and strategies to deal with the problems of the modern age. For example, Wang et al. [22] proposed a three-way decision method for MCDM problems based on hesitant fuzzy information, and Farhadinia and Herrera-Viedma [23] proposed several new forms of distance and similarity measures and an extended TOPSIS method for dealing with MCDM problems in the context of dual HFS.
For several years, regression analysis has been used to determine the relationship between the output variable (dependent variable) and one or more than one input variable (independent variables). Traditionally, regression modelling examined crisp data and relationships; however, we can assume a fuzzy relationship between an output variable and input variables is more practical if the phenomenon under study is imprecise. Initially, a possibilistic approach for fuzzy regression analysis was proposed by Tanaka et al. [24]. They introduced a linear system for solving the FLRM. Further, Tanaka [25] improved the possibilistic approach and introduced fuzzy interval analysis in which possibilistic linear models were proposed with non-fuzzy inputs and fuzzy outputs. These approaches were criticized due to having non-interactive possibilistic parameters by Celmins [26] who described the fuzzy least square method as a fuzzy vector and also illustrated the least square fitting for fuzzy models when data are available in n-component vectors. Diamond [27] introduced fuzzy least square models as well as normal equations that are similar to classical least squares. Tanaka and Watada [28] developed linear programming using possibilistic measures and estimated the parameters of the FLRM.
To answer the criticism about non-interactive possibilistic parameters, Tanaka and Ishibuchi [29] presented an identification method for the interactive fuzzy parameters in which they used quadratic membership functions in the possibilistic linear systems. Sakawa and Yano [30] developed a group of FLRMs that use indices about the equality of two fuzzy numbers. Peters [31] elaborated a general form of Tanaka's approach [24] into fuzzy intervals with fuzzy linear programming. Kim and Chen [32] presented comprehensive research between non-parametric linear regression and FLRMs. Yen et al. [33] improved the fuzzy regression model with symmetric triangular parameters. This approach helped in reducing inflexibility, which was present in the earlier developed models. Chen [34] presented a study to handle the outliers in a case when data are available in the form of non-fuzzy input and fuzzy output with some more constraints so that the effect of outliers can be reduced. Kocadagli [35] addressed the problem of the h-cut level with a constrained non-linear programming method and developed effective solutions for fuzzy regression. Choi and Buckley [36] proposed a fuzzy least absolute approach to estimate the fuzzy parameters and to examine the performance of fuzzy regression models with the help of specific error measures. Recently, Cerny and Rada [37] derived a possibilistic generalization technique for the linear regression model using censored/rounded data. In decision making for robot selection, Karsak et al. [38] used a FLRM for alternatives ranking in the robot selection. Several issues concerning fuzzy regression analysis have been discussed in recent years. For example, Icen and Dermirhan [39] used a Monte Carlo simulation to the error measurements in the FLRM, Choi et al. [40] determined an algorithm to address the problem of multicollinearity by combining the approaches of ridge regression and fuzzy regression model, Chakravarty et al. [41] proposed robust fuzzy regression functions based on fuzzy k-means clusters against the outliers, Wang et al. [42] helped in the approximation Bayesian computation of FLRM and used a likelihood-free function for generating samples from the posterior distribution, Hesamian and Akbari [43] proposed a fuzzy additive regression model using kernel smoothing for estimating a fuzzy smooth function, and Boukezzoula and Coquin [44] redefined interval-valued type-1 and type-2 fuzzy regression models in terms of philosophy and methodology.
From the above literature review, we can see the work performed in the field of fuzzy regression analysis over the last few decades and how gradually this field has evolved and is still evolving. We found from the research examined that most of the work has been performed using FLRM. However, the FLRM does not address the situations where input variables and output variables are observed in a hesitant environment. This paper thus extends the work [31] in a hesitant environment and observes input-output variables as HFEs. We introduce the concept of HFLRM such that the coefficients of the model are STFNs. Practically, we have used this model to facilitate an organization in generating revenue, and several variables (goods and services) are taken into account when calculating revenue generation. A large organization contains many experts and wishes to utilize their expertise in their respective fields to come to a more plausible decision. Therefore, experts may advise different values (between 0 and 1) when analyzing certain variables. For example, one expert may suggest 0.2, the second may suggest 0.4, and the third 0.5, which can be represented by an HFE such as 0.2, 0.4, 0.5, which is the basic form of the HFS. Thus, motivated by HFS, the HFLRM incorporates these HFEs into the regression analysis and estimates the HFLRM parameters using the LP model. Furthermore, the alternatives are ranked using the residual values of the proposed HFLRM. To validate the proposed method, the HFLRM findings are compared to those of the most widely used MCDM technique, TOPSIS.
This paper is organized as follows: Section 2 defines some preliminary concepts related to the research work. Section 3 presents the idea of HFLRM and then proposes a decisionmaking algorithm with HFLRM in the framework of a hesitant environment. In Section 4, we present the TOPSIS method and Spearman rank correlation test and some popular similarity coefficients. An application example using the proposed HFLRM is presented in Section 5. Results and discussions are provided in Section 6. Finally, concluding remarks are given in Section 7.

Preliminaries
This section introduces the basic knowledge that is necessary to understand the proposed study.
Torra [13] defined HFS in terms of a function that returns a set of membership values for each element in the domain as follows: Definition 1 ([45]). Let Z be a reference set, a HFS A on Z in terms of a function h(z) that when applied to Z returns a finite subset of [0, 1], which can be represented as the following mathematical symbol:
Definition 4 ([48]). The membership function for a fuzzy number, F n is as follows: where z ∈ R, Lt and Rt are left and right reference functions of the membership function, respectively. The c Lt and c Rt are the left and right spreads, respectively, and m is the mode of the fuzzy number. The distance from the left end-point to the mode and the distance from mode to the right end-point is represented by c Lt and c Rt , respectively. A special form of Lt and Rt type fuzzy number is known as the triangular fuzzy number (TFN) when its membership function has the following form.

Definition 5 ([48]
). The TFN denoted by (a, m, b) can be defined as: Moreover, if c Lt = c Rt then the TFN is known as a symmetrical triangular fuzzy number (STFN). Some particular fuzzy arithmetic that is used in this study is described as follows: Definition 6 ( [48]). Suppose, F n 1 = (a 1 , m 1 , b 1 ) and F n 2 = (a 2 , m 2 , b 2 ) are two TFNs; addition, subtraction, and scalar multiplication of TFNs are defined as:

Decision Making Based on Hesitant Fuzzy Linear Regression Model
In this section, we define the concept of HFLRM from the statistical point of view on the basis of hesitant fuzzy information. However, before this, it is necessary to provide a brief review of existing regression models which are available in the literature.

Linear Regression Model
A multiple linear regression model with output variable Y i (i = 1, 2, 3, . . . , M) and the input variables X 1 , . . . , X N is defined as where parameters A 0 , A 1 , . . . , A N are crisp numbers, and ε i is the random error of the model. In the linear regression model, errors are assumed normally distributed with zero mean and constant variance.

Fuzzy Linear Regression Model
Tanaka et al. [24] proposed FLRM, which measures the relationship between the variables where the relationship among the variables is vague and regression residuals (the difference between observed and predicted values) are assumed to be due to the imprecise nature of the system. The FLRM model is defined as: where fuzzy parameters A j = (α j , c j ) are STFNs, and α j and c j represent center and spread of STFNs, respectively. The FLRM addresses the problem of the determination of fuzzy parameter estimates A j such that the membership value of Y i to its fuzzy estimateŶ i is at least H, where H ∈ [0, 1), also known as a measure of the goodness-of-fit, is provided by the decision maker [49]. The objective of the FLRM is to minimize the uncertainty by minimizing the spreads of the fuzzy numbers. This problem leads to following LP model [28]: where L is the membership function of a standardized fuzzy parameter [38].
Peters [31] modified Tanak's model by compensating good and bad data (outliers) within estimated intervals as it was not able to handle bad data. Peters [31] introduced a new variable λ (a membership degree which conforms to a set of good solutions) and used arithmetic mean [50] as the aggregation operator. It is defined as: The width of the estimated interval depends on the selection of the parameters d 0 , p 0 , p i which are chosen according to the nature of problem. The parameter p i represents the width of the tolerance interval of the output variable. A permissive condition for spread minimization leads in a wide interval, i.e., a large value of p 0 and a small value of p i . On the other hand, strict conditions for minimizing the spread results in a small interval, i.e., a small p 0 and a large p i . The parameter d 0 represents the desired value of an objective function. Since the purpose of FLRM is to minimize the total spread, it is suggested that parameter d 0 is selected as 0 (Peters [31]).

Hesitant Fuzzy Linear Regression Model
Motivated by Peters' model [31], we propose the concept of HFLRM which can be used further in solving the decision-making problems. We will take the output variable Y i (i = 1, 2, . . . , M) and the input variables X j (j = 0, 1, 2, . . . , N) as HFEs. The HFLRM is defined as: STFNs which are estimated with the help of the following LP model: where k determines several values assigned by P DMs for the output variable Y i and input variables X j .

Decision-Making Algorithm Based on HFLRM
Assume that A = {A 1 , A 2 , . . . , A M } is a set of alternatives and D = {d l , 1 < l < P} is a set of DMs who provide their evaluations in the form of HFEs about alternatives A i under some input variables X j (j = 0, 1, 2, . . . , N) and output variable Y i (i = 1, 2, . . . , M).
Step 2. For two finite HFEs, h 1 and h 2 , there are two opposite principles for normalization.
The first one is α-normalization in which we remove some elements of h 1 and h 2 which have more elements than the others. The second one is β-normalization in which we add some elements to h 1 and h 2 which have fewer elements than the other. In this paper, we use the principle of β-normalization [51] to make all HFEs equal in the matrix H. LetH = [Z ij ] M×(N+1) be the normalized matrix wherē Z ij = {z k ij , k = 1, 2, . . . , P} are HFEs.
Step 3. Again normalize the matrixH by using the following equation: . . , P} are HFEs.
Step 4. By estimating the parameters with the help of a linear programming model, the HFLRM is obtained using the normalized decision matrixĤ.
Step 5. Rank the alternatives using residual values obtained from the score values of are predicted values which are calculated by using Definitions 2, 3 and 6.
Step 6. Finally, the alternatives are ranked according to the values of e i (i = 1, 2, . . . , M).
The alternative with the least residual is identified as the best choice.
A multiple linear regression model (Section 3.1) is a very effective and reliable technique for determining the effect of one or more input variables on an output variable. It is the most extensively used statistical technique and has a wide variety of practical applications. It is based on precise data and a precise relationship between the output variables and input variables. However, despite the widespread use of the model (Section 3.1) in everyday activities, there exists uncertainty in variables. In real life, there are several situations in which data are not provided as a precise quantity but rather as incomplete, ambiguous, linguistically imperfect, and imprecise. The FLRM (Section 3.2) was introduced to deal with such uncertainty and ambiguity. Recently, many researchers have presented statistical regression analysis in the framework of fuzzy set theory. The HFS is an extension of fuzzy set theory that has drawn the attention of many researchers in a short period because we can observe hesitation in a variety of real-world scenarios, and this novel technique helps us deal with the ambiguity caused by hesitation. This is why we have extended the idea of FLRM (Section 3.2) to HFLRM (Section 3.3) where input-output variables are observed as HFEs, which is a basic form of HFS.

The TOPSIS Method under Hesitant Environment
Hwang and Yoon [18] developed an MCDM technique, TOPSIS, which is based on the belief that the alternative solution that is selected (solution) should have the shortest distance to the ideal solution (alternative) and the farthest distance from the negative ideal solution for all the available alternatives [52]. When criteria values are HFEs then the mathematical formulation of the TOPSIS method will be as follows: Step 1. Take the decision matrices H andH, the same as mentioned in Steps 1 and 2 of Section 3.4. Step 2. Normalize the decision matrixH with the help of the following formula: Step 3. Weighted normalized decision matrix is calculated by multiplying the normalized decision matrix with its associated weights, i.e., V ij =Ẑ ij × W j . Step 4. Determine the positive ideal solution A + and negative ideal solution A − where J b and J c represent the set of benefit and cost criteria, respectively.
Step 5. Calculate the Euclidean distance of each alternative A i from the positive ideal solution A + and negative ideal solution A − , respectively.
Step 6. Calculate the relative closeness P i of each alternative to the ideal solution where Step 7. Rank the alternatives A i (i = 1, 2, . . . , M) according to relative closeness values P i in the descending order.

Spearman's Rank Correlation Coefficient
Spearman's rank correlation is a method for analyzing the relationship between ordinal measurement level variables. It is high when observations have a similar rank and low when observations have a different rank between the two sets of values. Spearman's rank correlation coefficient, r s , is defined as follows: where d i = R i1 − R i2 is the ranking difference while R i1 and R i2 indicate the two sets of ranking. The rank correlation coefficient ranges from +1 to −1.The r s = ±1 indicates a perfect positive (r s = +1) and perfect negative (r s = −1) relationship between the two sets of ranking.
We often want to know whether or not a significant relationship exists between two sets of ranking. Therefore, we state the null hypothesis (H 0 ) and alternative hypothesis (H 1 ) as: H 0 : There is no significant relationship between the two sets of ranking. H 1 : There is a significant relationship between the two sets of ranking. The null hypothesis is evaluated using the following test statistic provided that the sample size is not too small, i.e., M > 10.
If the Z c statistic value exceeds the critical value Z α (usually, α = 0.05), then the null hypothesis is rejected, and we conclude that there is a significant relationship between the two sets of ranking.

An Application Example
Revenue is essential for nearly every structure of organization. Any organization must generate revenue in order to cover the gross and net operating costs. The owner of a well-known business chain wants to determine which outlet made the most revenue throughout the month of holy Ramadan. The revenue generated by a store is determined by the sale of goods (X 1 ), production expenditures (X 2 ), operational costs (X 3 ), and the profit margin (Y). In this study, 20 store outlets A i (i = 1, 2, . . . , 20) are given in the form of alternatives. These alternatives are evaluated by the output variable Y i (i = 1, 2, . . . , 20) and input variables X j+1 (j = 0, 1, 2). Three experts/DMs from senior management have made their judgements on the input and output variables. The solution to the given problem comprises the following steps: Step 1. The connected input-output variable decision matrix provided by the DMs by using HFEs is shown in Table 1.
Step 2. To make all HFEs equal in the decision matrix H, we use the principle of βnormalization and obtained matrixH which can be seen in Table 2. Step 3. We further normalize the data of matrixH to make all of its elements lie between 0 and 1 for a common scale. The normalized decision matrixĤ is shown in Table 3.
Step 4. Now, we estimate the parameters using the LP model by taking d 0 = 0, p 0 = 1000 and p i = 1, which is formulated as follows: For k = 1 and  After solving the LP model as mentioned above, we get the values of λ 1 i (i = 1, 2, . . . , 20), α 1 j (j = 1, 2, 3, 4), and c 1 j (j = 1, 2, 3, 4) for k = 1, which are shown in Table 4. In the same way, we can also obtain the results for k = 2 and k = 3 which are given in the same table.

Results and Discussion
To check the validity and feasibility of our proposed approach, a MCDM tool called the TOPSIS method is applied to solve the same problem and we compare the results of the proposed approach with the results obtained in the TOPSIS method. Among the four criteria, we take the sale of goods (X 1 ) and the profit margin (Y) as benefit criteria, while production expenditures (X 2 ) and operational costs (X 3 ) are considered cost criteria. After normalizing the matrixH according to step 2 of the TOPSIS algorithm, the PIS (A + ) and NIS (A − ) are as follows: Now we calculate the Euclidean distances D + i and D − i of each alternative A i from A + and A − along with its relative closeness P i to the ideal solution by using Step 5 and Step 6 of Section 4. The values of D + i , D − i , P i and the ranking of alternatives (R Topsis ) can be seen in Table 6. In Table 6, we can see alternative 15, with the largest value of P i , is the best alternative, while alternative 12, with the smallest value of P i , is generating the lowest revenue among the stores. Additionally, we have compared the two sets of ranking R HFLR and R Topsis through a bar chart given in Figure 1.  Figure 1 illustrates a visual representation of the alternative ranking approach using the HFLRM and TOPSIS methods. We can see outlet number 9 is at the top of the list for generating the most revenue employing HFLRM during the holy month of Ramadan, and it is also the third best earning outlet according to the TOPSIS technique. Similarly, store 15 generates the second highest revenue when HFLRM is used and the highest revenue when TOPSIS is used. Likewise, all other outlets have the same ranking or very similar ranking for both HFLRM and TOPSIS.
Whereas the graphical representation provides a quick summary of the performance of two ranking sets R HFLR and R Topsis , it is not conclusive. As a result, the Spearman rank correlation coefficient is calculated to determine the statistical significance of the two sets of ranking, as shown in Table 7. The Spearman rank correlation coefficient is calculated as r s = 1 − 6(30) 7600 = 0.98. The coefficient r s = 0.98 is close to +1 in Table 8, indicating that there is a very strong positive correlation between two sets of ranking, R HFLR and R Topsis . In order to evaluate whether the result is meaningful or merely down to chance, we performed a test of the null hypothesis that there is no very strong positive relationship between two sets of ranking, versus the alternative that there is a very strong positive relationship between two sets of ranking at a 5% level of significance. The value of test statistics, Z c = r s √ M − 1 = 0.97 √ 20 − 1 = 4.22 falls within the critical region, Z 0.05 = 1.645 (derived from the statistical table of cumulative normal distribution); therefore, our null hypothesis would be rejected. We conclude that there is a very strong positive correlation between the two sets of ranking. In addition, we determined the values of the similarity coefficients of the two final rankings using r w and WS, which are described more extensively in [53,54]. The value of the weighted Spearman coefficient was 0.9781, and for the weighted similarity, the value was 0.9258. Thus, both coefficients determine a very strong relationship between the two final rankings. In addition, the proposed approach has the following advantages over the TOPSIS method: 1.
The HFLRM can identify outliers (i.e., λ i ) that may be included in the data set; if these are not identified, it may result in an inaccurate solution. However, the data presented in the application example of this paper have no outlier.

2.
The HFLRM provides results by solving a simple LP model to obtain the ranking for the decision-making problem which provides results quickly with less computational time as compared to TOPSIS. 3.
In comparison with TOPSIS, the complexity of the proposed methodology does not increase by inserting more criteria and alternatives to the given MCDM problem.

Conclusions
This paper provides a multi-criteria decision-making approach for fuzzy linear regression models that incorporates hesitant information. This concept has not been explored previously and is a novel alternative to statistical regression in resolving MCDM challenges. We have implemented our proposed methodology to choose the best store outlet for the most revenue in a certain month. We have evaluated 20 alternative store outlets nationwide in the context of four criteria that have a major impact on the revenue generation for a chain of stores. Similarly, we may include more criteria and alternatives, but computing becomes more complicated as the number of alternatives or criteria examined increases. Finally, the suggested methodology's outcomes are compared to those of a widely used decisionmaking technique, TOPSIS. In the future, we will further investigate the applications of HFLRM in decision making with hesitant fuzzy linguistic term sets and the probabilistic hesitant fuzzy linguistic sets.

Acknowledgments:
The authors would like to thank the editor and the anonymous reviewers, whose insightful comments and constructive suggestions helped us to significantly improve the quality of this paper.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: