Making Group Decisions within the Framework of a Probabilistic Hesitant Fuzzy Linear Regression Model

A fuzzy set extension known as the hesitant fuzzy set (HFS) has increased in popularity for decision making in recent years, especially when experts have had trouble evaluating several alternatives by employing a single value for assessment when working in a fuzzy environment. However, it has a significant problem in its uses, i.e., considerable data loss. The probabilistic hesitant fuzzy set (PHFS) has been proposed to improve the HFS. It provides probability values to the HFS and has the ability to retain more information than the HFS. Previously, fuzzy regression models such as the fuzzy linear regression model (FLRM) and hesitant fuzzy linear regression model were used for decision making; however, these models do not provide information about the distribution. To address this issue, we proposed a probabilistic hesitant fuzzy linear regression model (PHFLRM) that incorporates distribution information to account for multi-criteria decision-making (MCDM) problems. The PHFLRM observes the input–output (IPOP) variables as probabilistic hesitant fuzzy elements (PHFEs) and uses a linear programming model (LPM) to estimate the parameters. A case study is used to illustrate the proposed methodology. Additionally, an MCDM technique called the technique for order preference by similarity to ideal solution (TOPSIS) is employed to compare the PHFLRM findings with those obtained using TOPSIS. Lastly, Spearman’s rank correlation test assesses the statistical significance of two rankings sets.


Introduction
Statistical regression analysis is a valuable tool for determining the functional relationship between an output variable (the dependent variable) and the input variables (the independent variables). In statistical regression analysis, the relationship between IPOP variables is determined using precise data and precise relationships. However, when a phenomenon is imprecise, when there is vague variability rather than stochastic variability, and when the underlying regression model distributional assumptions are violated or cannot be tested (e.g., due to small sample size), it is more reasonable to assume a fuzzy relationship rather than a crisp relationship. Several researchers have modified and extended notions of statistical regression analysis to overcome these limitations using the fuzzy set theory (FST). Firstly, Tanaka et al. [1] introduced fuzzy regression analysis employing LPM. Further, Tanaka [2] introduced fuzzy intervals, Celmin [3], and Diamond [4] introduced fuzzy least-square models. Tanaka's model was very sensitive to outliers, and then Peters [5] generalized Tanaka's approach [1] where output values no longer fall within or outside the interval but rather belong to a certain degree of membership. Wang and Tsaur [6] presented a variable selection approach for a FLRM with crisp input and fuzzy output based on and they implemented the suggested method to evaluate the strategic positions of energy transmission and distribution networks, and so on.
A single criterion is not enough in real-world decision-making problems, as they are often poorly structured and highly complex. Multi-Criteria Decision-Making (MCDM) methods solve complex problems and help to make the right decision. Finding the best alternative among the multiple alternatives is a challenging task. In decision-making problems, several techniques are used to assist DMs in ranking the alternatives, such as the Analytic Hierarchy Process [27], the Best Worth Method [28], EDAS (Evaluation on Distance from Average Solution) [29], and TOPSIS (a Technique for Order Performance by Similarity to Ideal Solution) [30]. The TOPSIS method is a well-known technique and considers the distances to both Positive Ideal Solution (PIS) and Negative Ideal Solution (NIS) simultaneously, and assigns a preference order based on their relative closeness and a combination of these distance measures. Recently, many papers have been devoted to developing new approaches, i.e., a new logarithm methodology of additive weights [31], FUCOM [32], COMET extensions [33,34], WASPAS method [35], SPOTIS [36,37], RAFSI [38], and an integrated SWOT-fuzzy PIPRECIA [39]. These methods are valuable and address the main challenges of MCDA techniques such as rank reversal paradox resistance or handling uncertainty. Sometimes, authors propose a new operators to support decision making [40][41][42].
The literature review shows how gradually the area of regression analysis has developed and how researchers continue to show increasing interest over time. We can see that most of the researchers' attention has focused on the FLRM, a simple linear regression model developed using FST. Still, several extensions of the FST can be employed in the FLRM for complex problems. The PHFS works in a hesitant environment so that a researcher not only collects information in a HFS, but also finds its probability values for each HFE, which are referred to as PHFEs. Motivated by PHFS, a fuzzy regression model developed by Peters [5] has been extended using probabilistic information in a hesitant environment called PHFLRM, where IPOP variables are observed as PHFEs. We introduce the concept of PHFLRM such that the model's coefficients are STFNs. Consequently, the PHFLRM incorporates these PHFEs into the fuzzy regression analysis and uses the LPM to estimate the PHFLRM parameters. Furthermore, alternatives are ranked according to the residual values of the proposed PHFLRM. The proposed approach is evaluated by comparing the results of PHFLRM to those of TOPSIS, which is the most popular MCDM technique. Previously, fuzzy regression models such as the FLRM [43] and HFLRM were used for decision-making; however, these models do not give distribution information. The novelty of our proposed model PHFLRM [11] is that it incorporates distribution information to account for multi-criteria decision-making (MCDM) problems.
This study is organized as follows: In Section 2, some basic definitions and terminologies are discussed. In Section 3, we establish the idea of PHFLRM. Section 4 includes an algorithm of the proposed approach PHFLRM. Section 5 presents an application example of the purposed approach, and a comparative study of the PHFLRM with TOSPSIS methods is discussed. This study concludes in Section 6.

Preliminaries
This section discusses basic definitions and terminologies to help readers understand the proposed approach. It is generally tough to reach a final conclusion, because people are usually hesitant when making decisions. Torra [17] developed the following definition of HFS in consideration of this problem: Definition 1 ( [17]). For a fixed set Z, a HFS on Z is a function that, when applied to Z, returns a subset of values that fall within the interval [0, 1]. Mathematically, it is defined as: where h E (z) denotes the possible hesitant membership degrees of z ∈ Z to set E, and it is called the hesitant fuzzy element.
The PHFS proposed by Zhu and Xu [18] is an enhanced form of HFS that not only addresses the situation in which decision makers are uncertain as to which of several assessment values best represents their perspective, but also assigns varying probabilities to the assessed values. Mathematically, it is defined as: 18]). Let Z be a reference set, then a PHFS on Z is defined as: where h z (γ l |p l ) denotes the probabilistic degrees of memberships of the element z ∈ Z to set E p . This is referred to as PHFEs, which can take several membership degrees γ l = (l = 1, 2, . . . , #h z (p)) with the probabilities p l = l = 1, 2, . . . , #h z (p)) such that, . For sake of convenience, we have assumed h z (γ l |p l ) as h z (p) i.e., h z (p) = h z (γ l |p l ).
Sometimes, the probabilistic information for a PHFE is incomplete; in this situation, an estimate for the incomplete probabilistic information is used by averaging the available data.

Probabilistic Hesitant Fuzzy Linear Regression Model
In this section, we discuss our purposed methodology about PHFLRM from a statistical perspective using hesitant fuzzy information.
Initially, the FLRM was introduced by Tanaka et al. [1]. It is defined as: where the parameters A j = (α j , c j ) are symmetrical TFNs, α j is the centre, and c j is the spread of the symmetrical TFNs. The FLRM minimizes the spread of the symmetrical TFNs in the following way [44]: x ij with following constraints x i0 = 1, c j ≥ 0 where F is the membership function of a standardized fuzzy parameter [43].
Peters [5] modified Tanak's model [1], introducing a new variable λ in the following way: where λ(0 ≤ λ i ≤ 1) represents the degree of membership that belongs to a set of good solutions. The parameters d 0 , p 0 , and p i are selected to determine the width of the estimated interval. If a wide interval (a high p 0 and a small p i ) is deemed to minimize the spread, the requirement is regarded as lenient, while a narrow interval (a small p 0 and a high p i ) is taken as a strict condition. The value of d 0 , a desired value of the objective function, is taken as 0 [5].
Motivated by Peters [5], we introduced the PHFLRM for solving decision-making problems. The output variable It is defined as: where the parameters γ j = α k j , c k j , 0 < j < N are symmetrical TFNs and k denotes the number of values assigned by the P DMs to the IPOP variables. The PHFLRM parameters are estimated using the following LPM.

Decision-Making Algorithms
In this section, we will describe the algorithms that are used to solve the PHFLRM and the TOPSIS method, respectively, in detail.

Algorithm for PHFLRM
Assume A = {A 1 , A 2 , . . . , A M } is a set of alternatives and D = {d l , 1 < l < P} is a set of DMs that provide their evaluations in the form of PHFEs about alternatives A i under some input variables X j (j = 0, 1, 2, . . . , N) and output variable Y i (i = 1, 2, . . . , M). Let Figure 1 shows the flowchart of the proposed algorithm, and below are the detailed steps of this algorithm. Step Step 2. For two finite PHFEs h 1 and h 2 , there are two opposite principles for normalization. The first one is α− normalization, in which we remove some elements of h 1 and h 2 which have more elements than the other ones. The second one is β− normalization, in which we add some elements to h 1 and h 2 , which have fewer elements than the other one. In this study, we use the principle of β−normalization to make all PHFEs equal in the matrix H . LetH = [Z ij ] M×(N+1) be the normalized matrix, whereZ ij = {z k ij (pz k ), k = 1, 2, . . . , S} are PHFEs.
Step 4. Again, normalize the matrixH by using the following equation . . , P} are PHFEs.
Step 5. By using the normalized decision matrixĤ, the PHFLRM is obtained. We further estimate the parameters of PHFLRM employing LPM. Step 6. Rank the alternatives using residual values obtained from the score values of where Y * i are predicted values which are calculated by using Definitions 3 and 4.
Step 7. Finally, the alternatives are ranked according to the values of e i (i = 1, 2, . . . , M).
The alternative with the least residual is identified as the best choice.  The HFS, an extension of FST, has attracted the attention of many researchers in a short period, as hesitant situations are very common in real-world problems. Numerous extensions are introduced to address the uncertainty caused by hesitation; PHFS is one of them. The PHFS illustrates not only decision-makers' hesitancy when they are undecided about something, but also the hesitant distribution of information. In the PHFLRM (3) IPOP variables are observed as PHFEs instead HFEs, which is a basic form of PHFS.

The TOPSIS Algorithm
A MCDM methodology, TOPSIS, was developed by Hwang and Yoon [30], which provides the shortest distance from the positive ideal solution (PIS) and the longest from the negative ideal solution (NIS) for all possible alternatives. The mathematical formulation of the TOPSIS method when the criteria values are PHFEs is as follows: Step 1. Take the decision matrices H,H andH same as mentioned in Step 1, 2, and 3 of Section 4.1.
Step 2. Normalize the decision matrixH with the help of the following formula.
Step 3. The weighted normalized decision matrix is calculated by multiplaying the normalized decision matrix with its associated weights, i.e., Step 4. Determine the positive ideal solution A + and negative ideal solution A − as where J b and J c represent the set of benefit and cost criteria, respectively. Step 5. Calculate the Euclidean distances of D + i and D − i of each alternative A i from the positive ideal solution A + and negative ideal solution A − , respectively, by using Definition 6.
Step 6. Calculate the relative closeness P i of each alternative to the ideal solution as Step 7. The alternatives A i (i = 1, 2, . . . , M) are ranked according to relative closeness values P i in the descending order.

Application Example
Wheat is the most important rabi crop in Pakistan, and it is also the country's staple diet. Wheat production is one of the most pressing concerns confronting the agricultural industry today, and it is expected to continue to grow. Various factors such as farm size, seed quality, fertilizer price, irrigation area, and rain amount contribute to the yield of wheat. In this example, a simultaneous analysis including multiple variables is performed for efficient decision making. We consider rain amount (X 1 ), farm size (X 2 ), and irrigation area (X 3 ) in order to determine their effect on wheat yield (Y). Twelve districts A i (i = 1, 2, . . . , 12) of Punjab (Pakistan) are selected in the form of alternatives. These alternatives are evaluated using Yi (i = 1, 2, . . . , 12) and Xj + 1(j = 0, 1, 2) as input and output variables, respectively. The IPOP variables have been evaluated by three agriculture department experts. The steps necessary to resolve this problem are listed below.
Step 1. Table 1 shows the connected IPOP variable decision matrix provided by the DMs employing PHFEs.To make all PHFES equal using beta−normalization and to make the sum of all probabilities equal to one for all PHFES in the decision matrix HandH, respectively, and to obtain matrixH, as shown in Table 2.
Step 2 & 3. We obtain the matrixH, which can be shown in Table 2, by making all PHFES equal using beta−normalization and making the sum of all probabilities equal to one for all PHFES in the decision matrix H andH, respectively.
Step 6 & 7. By using PHFLRM, we will find the estimated PHFEs (Y * ) of all possible alternatives. To save time, we will just compute the estimated PHFE Y * 1 against the alternative A 1 using the Definition 3 and 4, as follows:  Table 4. Further, residual values e i against each alternative Y i are calculated as e i = Sc(Y i ) − Sc(Y * i ), i = 1, 2, . . . , 12, and finally, all alternatives are ranked using these residual values e i , in Table 4. We have the smallest residual e 3 = 30.04827 against the alternative A 3 , so it is considered as the best choice. Additionally, the alternative A 11 has the largest residual e 11 = 47.9838, and is considered the worst alternative.

A Comparative Study of the PHFLRM and the TOPSIS
The TOPSIS method, which is a MCDM tool, has been used to verify the results and efficiency of our proposed approach. For the same problem, the results of the proposed method are compared with the results of the TOPSIS method. We have taken rain amount (X 1 ), farm size (X 2 ), irrigated area (X 3 ), and wheat yield (Y) as the benefit criteria.  Table 5 by using step 5, 6, and 7 of the algorithm (Section 4.1), as follows:  Table 5 shows that the best choice among the alternatives is A 3 as it has the largest value of P i , whereas the alternative A 1 1 is considered the worst choice of alternative, as it has the largest value of P i . Further, two sets of ranking R HFLLR and R TOPSIS are compared using the bar chart in Figure 2, as follows:  Figure 2 illustrates that the ranking order between two sets of rankings, R PHFLLR and R TOPSIS , is nearly similar, and that there is no significant difference between them. Although the graphical presentation provides a quick assessment of the performance of two ranking sets, R PHFLLR and R TOPSIS , it is not conclusive. In order to determine the statistical significance of the two sets of rankings, the Spearman's rank correlation coefficient is calculated, as shown in Table 6. From Table 6, Spearman's correlation coefficient is calculated as r s = 1 − 6(38) 1584 = 0.87, which shows that two sets of rankings, R PHFLLR and R TOPSIS , are strongly related to each other [45]. To evaluate whether the correlation coefficient r s = 0.87 is meaningful or not, a statistical test is performed, taking the null hypothesis (H 0 : there is no relationship between the two sets of rankings) against the alternative hypothesis (H 1 : there is a relationship between two sets of rankings) at the 5% level of significance. As the calculated value, Z c r s √ M − 1 = 0.87 √ 12 − 1 = 2.88, exceeds the table value Z 0.05 = 1.645, we reject H 0 and conclude that there is a very strong relationship between the two sets of rankings. Additionally, the values of correlation r w and similarity coefficient WS [46] were examined for the considered example. These values are 0.8607 and 0.9289, respectively, confirming the close correlation between the obtained results.

Conclusions
This paper provides a MCDM approach to FLRMs by incorporating probabilistic hesitant information. This concept has not been explored previously, and is a novel alternative to statistical regression in resolving MCDM challenges. The proposed methodology PHFLRM is applied in agriculture to evaluate wheat production in different Pakistan districts by considering significant factors such as rainfall, farm size, and irrigated area. We examined twelve districts' yields across the country in the context of four factors that significantly affect wheat yield production. Similarly, we may include more criteria and alternatives, but computing becomes more complicated as the number of alternatives or criteria examined increases. Finally, the suggested methodology's (PHFRM) outcomes are compared to the widely used decision-making technique called TOPSIS.
Compared with TOPSIS, the complexity of the proposed methodologies does not increase by inserting more criteria and alternatives into the given MCDM problems. The proposed methodology provides results by solving a simple LP model to obtain the ranking for decision-making problems, which provides results quickly, with less computational time than TOPSIS. The proposed methodologies may be a feasible alternative decision-making approach that accommodates a high-level system fuzziness. In the future, we will further investigate the applications of FLRM in decision-making using different FS extensions, and we should also investigate the accuracy of the obtained results.

Acknowledgments:
The authors would like to thank the editor and the anonymous reviewers, whose insightful comments and constructive suggestions helped us to significantly improve the quality of this paper.