A Clustering Multi-Criteria Decision-Making Method for Large-Scale Discrete and Continuous Uncertain Evaluation

In recent years, Dempster–Shafer (D–S) theory has been widely used in multi-criteria decision-making (MCDM) problems due to its excellent performance in dealing with discrete ambiguous decision alternative (DA) evaluations. In the general framework of D–S-theory-based MCDM problems, the preference of the DAs for each criterion is regarded as a mass function over the set of DAs based on subjective evaluations. Moreover, the multi-criteria preference aggregation is based on Dempster’s combination rule. Unfortunately, this an idea faces two difficulties in real-world applications: (i) D–S theory can only deal with discrete uncertain evaluations, but is powerless in the face of continuous uncertain evaluations. (ii) The generation of the mass function for each criterion relies on the empirical judgments of experts, making it time-consuming and laborious in terms of the MCDM problem for large-scale DAs. To the best of our knowledge, these two difficulties cannot be addressed with existing D–S-theory-based MCDM methods. To this end, this paper proposes a clustering MCDM method combining D–S theory with the analytic hierarchy process (AHP) and the Silhouette coefficient. By employing the probability distribution and the D–S theory to represent discrete and continuous ambiguous evaluations, respectively, determining the focal element set for the mass function of each criterion through the clustering method, assigning the mass values of each criterion through the AHP method, and aggregating preferences according to Dempster’s combination rule, we show that our method can indeed address these two difficulties in MCDM problems. Finally, an example is given and comparative analyses with related methods are conducted to illustrate our method’s rationality, effectiveness, and efficiency.


Introduction
MCDM offers a systematic methodology for assisting decision makers in weighing various DAs and determining the optimal DA while considering multiple criteria, which has led to MCDM attracting widespread attention in both theory and practice [1,2]. However, in the practical application of MCDM, uncertainties, such as missing, ambiguous, or inaccurate information, are inevitable due to the subjective evaluations of experts, missing data, the randomness of data, and so on [3][4][5][6]. In order to determine the optimal DA, the MCDM problem in such an uncertain situation usually consists of two major tasks: (1) modeling and processing uncertain information for all evaluations of each decision alternative and (2) synthetically aggregating the performance of each alternative with respect to each criterion [7].
To model and deal with uncertain information, many mathematical tools and theories, such as fuzzy set theory [8], intuitionistic fuzzy set theory [9], rough set theory [10], D-S theory [11][12][13][14][15], evidential reasoning [16], D-number [7], and Z-number [6], have been extended into the uncertain MCDM problem. Among these approaches, as a generalization of probability theory, D-S theory has been found to excel in dealing with uncertain information, such as ambiguous, imprecise, and missing information, due to its main advantages of of our method through an example illustration and a comparative analysis with the existing D-S-theory-based MCDM methods.
The contributions of this paper are summarized as follows: • We identify various continuous uncertain evaluations of DAs in MCDM and represent them by the continuous probability distributions. • A clustering method based on the Silhouette coefficient is proposed to handle largescale discrete and continuous uncertain evaluations on DAs in MCDM. Furthermore, when the total number of DAs is relatively large, clustering significantly improves the efficiency of MCDM. • A new ratio formula for uncertain evaluation values that can handle discrete and continuous uncertain evaluations is proposed. Moreover, we demonstrate that the formula can effectively avoid the large number of pairwise comparisons in the AHP method. • A clustering MCDM method based on D-S theory and the AHP method is proposed to evaluate all DAs with respect to all related criteria in MCDM.
The rest of this paper is structured as follows: The studied literature is outlined in Section 2; Section 3 reviews the basic concepts of D-S theory and other preliminaries used in this paper; a novel MCDM method is proposed in Section 4; Section 5 details the experimental evaluations; Section 6 concludes this paper.

Literature Review
Solving the uncertain MCDM problem consists of two crucial tasks: the first is modeling and processing all of the uncertain information, and the second is integrating the performance of DAs under each criterion to determine the optimal DA [7].
In order to achieve the first task, some theories and methods for dealing with uncertain information have been incorporated into MCDM problems. For example, the fuzzy set theory provides a systematic framework and solves uncertainty and fuzzy problems with the aid of the membership function [8]. Then, the intuitionistic fuzzy set was proposed, which extended fuzzy sets to represent the rejection degree by the non-membership degree [9]. Then, an interval-valued intuitionistic fuzzy set was proposed, where both the membership and non-membership degree are represented by intervals [32]. To represent the reliability of the information, the Z-number was then proposed [6]. Similarly, the existing extended fuzzy set theories, such as spherical fuzzy sets and neutrosophic sets, additionally considered the mental state of the decision maker, especially the hesitation state. However, it is too complicated for the decision maker to define their satisfaction, dissatisfaction, and hesitation level when expressing thoughts [33,34]. Furthermore, rough set theory was proposed to solve the inconsistency problem of granular information [10], and soft set theory provided a general mechanism for modeling uncertainty from a parametric point of view [35]. The D-number was proposed to express imprecision and uncertainty [7]. The evidential reasoning rule presented a general approach to solving uncertain information, including ignorance and randomness [16]. As an extension of probability theory, the D-S theory was proposed by Dempster and developed by Shafer. The mass function in the D-S theory can express imprecision, ambiguity, and ignorance by assigning support to subsets [11,12]. The D-S theory performs very well in modeling and handling uncertain information without additional auxiliary information, such as the membership function in fuzzy set theory, the non-membership function in intuitionistic fuzzy set theory, and the reliability in the Z-number [13][14][15]18,28].
The second crucial task of the uncertain MCDM problem is to combine the performance of all DAs for each criterion and obtain the optimal DA. A considerable number of traditional MCDM technologies can do this, such as AHP [19,20], TOPSIS [21], VIKOR [18], BWM [23], ELECTRE [22], MARCOS, and ARAS [7,13]. In addition to these traditional MCDM techniques, the D-S theory can also be used to select the optimal DA by treating each criterion as a mass function whose frame of discernment is the set of all DAs and then combining the mass functions of all the criteria by Dempster's combination rule [25,26]. Furthermore, Dempster's combination rule satisfies the commutative and associative laws, making it very efficient in fusing information from multiple sources [11,12]. D-S theory performs well in modeling and handling multiple types of uncertain information, and its core, Dempster's combination rule, can efficiently fuse information from multiple criteria [13][14][15]18,28]. Therefore, incorporating D-S theory into the uncertain MCDM problem is desirable and has attracted much attention from researchers in recent years. For example, the EFMCDM model proposed by Xiao [13]; Wang et al. used the D-S theory to represent and handle uncertain information and used TOPSIS to select the optimal offshore wind turbine [14].
However, these existing methods based on the D-S theory rely on the empirical knowledge of experts when generating the mass function of each criterion, which is timeconsuming, labor-intensive, and subjective. Moreover, these methods can only deal with discrete evaluations and not continuous evaluation values, which are very common in the practical applications of MCDM. Therefore, to tackle the above problems, a novel MCDM method is proposed in this paper.

Preliminaries
First, we review the basic definitions in D-S theory [11]. Definition 1. Let Θ = {θ 1 , . . . , θ n } be a set of exhaustive and mutually exclusive elements called a frame of discernment (or, simply, a frame). A function m : Here, the mass value of m(A) represents the degree to which the corresponding evidence supports A. Moreover, any subset A ⊆ Θ satisfying m(A) > 0 is called a focal element of m. In particular, the mass function of m(Θ) = 1 represents the decision maker's total ignorance over the frame Θ.
One advantage of D-S theory is that it provides a method for accumulating and combining evidence from multiple distinct sources by using Dempster's combination rule [12]. Definition 2. (Dempster's combination rule) Let m 1 and m 2 be two mass functions over a frame of discernment Θ. Then, the combined mass function from m 1 and m 2 according to Dempster's combination rule, denoted as m 1,2 , is defined as: Since the probability of identifying each element in the frame is uncertain for the mass function, some methods are employed to transform uncertain probability into certain probability, which is called probability transformation [38].
where |A| is the cardinality of A.
Due to the existence of the criterion weight, it is necessary to discount the criterion's mass function according to the weight before combining the mass functions of all criteria. Shafer's discounting method is defined as follows [12]. Definition 4. (Shafer's discounting) Let m be a mass function over a frame Θ, and let ω be the evidence weight of mass function m; then, the mass functionm discounted by the evidence weight is given by:m Based on the concept of D-S theory, Ma et al. [4] proposed four types of discrete ambiguous evaluation for DAs with respect to a given criterion and converted them into the form of mass function as follows.
Definition 5. (Discrete ambiguous evaluation) Let m A,c be the mass function representing the decision maker's judgment about the DA, A, regarding criterion c, and let H = {x i | x i ∈ R, i = 1, . . . , n, x 1 < . . . < x n } be a mutually exclusive and collectively exhaustive set of numeral assessments that present a scale set of n-scale unit preference on DAs; then, the four forms of evaluation and their corresponding mass functions are defined as follows: • Determined evaluation. This can be expressed as a special mass function defined on the frame H: m A,c ({x}) = 1, where x(x ∈ H) is the numeral assessment grade evaluated by an expert. • Unknown evaluation.
Since the decision maker is utterly ignorant in this situation, it can be expressed as a special mass function defined on the frame Θ: Interval-valued evaluation. This means that the decision maker only knows that the numeral assessment grade could be anywhere between x i and x j , where x i and x j are the numeral assessment grades evaluated by the expert, but it is not sure which one. It can be expressed as a special mass function defined on the frame Θ: m A,c ({x i , · · · , x j }) = 1, Θ = H. The Silhouette coefficient is an index for judging the clustering effect [39] among a set of elements. The closer the distance between samples of the same class and the farther the distance between samples of different classes, the better the clustering effect and the larger the Silhouette coefficient. Definition 6. (Silhouette coefficient) Let X = {x 1 , . . . , x n } be a set of elements, let {Cl 1 , . . . , Cl k }, and let (k < n) be a set of clusters, where Cl i is the cluster consisting of x i ; then, the Silhouette coefficient of the k clusters is given by: where a(x i ) is the average distance between x i and other data in the cluster Cl i , and b(x i ) is the minimum value of the average distance between x i and another cluster Cl t (1 ≤ t ≤ k, t = i), given by: in which η(x i , x j ) is the distance between x i and x j .

A Novel MCDM Method
Based on the D-S theory, clustering method, and eigenvector method in the AHP, this section proposes a novel MCDM method for dealing with uncertain information.
Problem definition: Let Λ = {A 1 , A 2 , . . . , A n } be the nonempty finite DA set consisting of n DAs, let C = {c 1 , c 2 , . . . , c m } be the nonempty finite decision criteria set, let x l,i (1 ≤ l ≤ n, 1 ≤ i ≤ m) be the value (certain or uncertain) of the DA A l (A l ∈ Λ) regarding the criterion c i (c i ∈ C), and let ω i be the importance weights of the decision criterion c i (i = 1, 2, . . . , m). For this uncertain MCDM problem, the optimal DA should be selected from Λ.
For example, an automotive enterprise needs to select an optimal DA from ten suppliers of automotive parts, and these ten suppliers are denoted as a, b, c, d, e, f , g, h, i, and j, respectively. In this MCDM problem, automotive enterprises evaluate suppliers using four decision criteria: price, delivery time, quality, and service level. The evaluations for DAs over all criteria are shown in Table 1, where some values are unknown (denoted as "null") due to the absence of data values, and P(·) refers to the probability. For the criteria of service level and quality, the grades of DAs could be evaluated by the following linguistic terms: Very Poor (VP), Poor (P), Average (A), Good (G), and Very Good (VG). We can find that some evaluations of DAs are discrete uncertain values and some evaluations of DAs are continuous uncertain values. Moreover, such MCDM problems are common in real-world applications, and existing methods cannot solve them.
In order to solve the MCDM problem shown in Table 1 and select the optimal DA, the proposed method is shown in Figure 1. As shown in the figure, the proposed method has three main steps, including identifying the uncertain information in the MCDM, generating the mass function of each criterion, and combining the mass functions of all criteria according to their weights. The specific steps are shown below.

Step 1: Identifying Uncertain Information in MCDM
In Section 2, we recall several forms of discrete evaluation values for DAs identified by Ma et al. [4]. Then, inspired by this idea, we identify the various forms of continuous evaluation values of DAs as follows.
For a continuous evaluation value x l,i of a DA A l (A l ∈ Λ) regarding a criterion c i (c i ∈ C), if Z(Z ⊆ R) is the region in which the evaluation value is defined, x l,i could take one of the following three forms:

•
Missing evaluation value, such as the delivery time of supplier i in Table 1.
, where a, b ∈ Z and a < b, such as the price of supplier a in Table 1.
is the probability density function, such as the delivery time of supplier a in Table 1.  In order to facilitate the processing of evaluation values, we convert all evaluation values into continuous probability distributions, as shown below.

•
Since we are entirely ignorant of missing evaluations, we assume that it takes a random value in the interval and obeys a uniform distribution. Furthermore, the lower bound of the interval is assumed to be the minimum possible value (the worst case) of all other DAs, while the upper bound of the interval is assumed to be the maximum possible value (the best case) of all other DAs. Moreover, such assumptions satisfy our intuition that in many real-world applications, the worst case of missing values is no worse than the known worst case, and the best case is no better than the known best case. • For the case of interval evaluation, since its probability distribution is unknown, generally, interval-valued evaluation is uniformly treated along its range [40]. Then, we can express it as a uniform probability distribution, b a f A,c (x)dx = 1, where a, b ∈ Z, a < b, which is a continuous probability distribution with a constant probability density function 1 b−a . Therefore, all forms of discrete evaluation values of DAs can be expressed in the form of mass functions, while all forms of continuous evaluation values of DAs can be expressed in the form of probability distributions.

Identify uncertain information in MCDM
Step 1

Generate the mass function of each criterion
Determine the set of focal elements based on clustering Step 2.1 Assign mass values to each focal element Step 2.2 Obtain the converted decision matrix Step 2 Obtain the mass function

Step 2: Generating the Mass Function of Each Criterion
In this paper, the generation of the mass function of each criterion consists of two steps: Firstly, DAs are divided into different clusters according to their values using the clustering method, where the set of DAs divided into the same cluster constitutes a focal element of the mass function; secondly, the preference of each cluster, which is the mass value of each focal element, is calculated to generate the mass function of the criterion.

Step 2-1: Determining the Set of Focal Elements Based on Clustering
In this paper, we update the UK-medoids, one of the classic methods for clustering uncertain data [41], in order to cluster the n DAs, and we use the Silhouette coefficient in Definition 6 to determine the optimal number of clusters and the set of clusters for the DAs.
Firstly, the distance between values in the clustering algorithm needs to be calculated. We define the distance between uncertainty values in the form of a probability distribution and mass function as follows.

Definition 7.
(Distance of uncertain values) Let x i be the value, let Z i be the region in which value x i is defined, let f i (·) be the probability of x i if x i is in the form of a continuous probability distribution, let BetP m i (·) be the pignistic probability of m i , and let Θ be the frame of m i if x i is in the form of a mass function m i ; then, the distance between two values x i and x j , denoted as η x i , x j , is defined by the following: • If x i and x j are both continuous probability distributions: • If x i and x j are both mass functions: • If x i is a mass function and x j is a continuous probability distribution: Then, our proposed clustering algorithm is shown in Algorithm 1.

Algorithm 1:
Clustering of values of DAs.

Input:
The set of values of n DAs: X = {x 1 , x 2 , . . . , x n }, the minimum number of clusters: k min , the maximum number of clusters: k max . Output: The optimal number of clustersk, the set of clustersĈl. Find the cluster Cl λ i into which Recompute the medoid of each cluster Cl t (t = 1, . . . , k), update until The set of medoids S does not change;

12
Calculate the Sihouette coefficient Overview. Given the set of values of each DA and the predetermined minimum and maximum numbers of clusters, k min and k max (2 ≤ k min ≤ k max ≤ n), Algorithm 1 can determine the optimal number of clusters and obtain the set of clusters. The parameters k min and k max are two predetermined parameters that indicate the lower and upper limits of the number of clusters taken, respectively, so that the optimal number of clusters is between the values of these two parameters. Moreover, k min and k max can be the same, and when they are the same, it is equivalent to the decision maker fixing the number of clusters to a single value. The main procedures are as follows: (1) First, calculate the distance between all values of the DAs, including the distance between the value and itself. Note that if the value of the DA is uncertain, the distance between the value of DA and itself is greater than 0. (2) If all of the distances calculated above are the same, then all values of DAs regarding this criterion are considered the same and should be divided into the same cluster (shown in lines 2-3 of Algorithm 1), in which case the mass function generated by this criterion represents ignorance in D-S theory. Otherwise, all values of DAs will be clustered based on the distance between values. The number of clusters is in the range of [k min , k max ], so the number of clusters whose Silhouette coefficient reaches the maximum is the optimal number of clusters, and the set of clusters obtained at this time is the optimal set of clusters (shown in lines 5-15 of Algorithm 1). In addition, if the set of clusters derived from Algorithm 1 contains empty clusters, then the empty clusters are not added to the set of optimal clusters, so the set of optimal clusters contains only clusters with elements.
Compared with the UK-medoids algorithm, Algorithm 1 has the following advantages: (1) The formula for calculating the distance of uncertain values is extended to apply to the uncertain values in the form of the mass function, as shown in Definition 7. (2) The number of clusters can be automatically determined according to the Silhouette coefficient without determining the number of clusters in advance. Note that if all distances are the same, the number of clusters is 1. That is, all values are divided into the same cluster. In this case, all DAs with respect to this criterion behave the same, so it is only through this criterion that DAs cannot be distinguished. Therefore, all DAs should be grouped into the same cluster, in which case the mass function generated by this criterion represents ignorance in D-S theory. Therefore, when all values are the same, it is unnecessary to divide them into two or more clusters. (3) The selection method of the initial clustering medoid is improved. Unlike in the random selection of cluster centers, the improved selection method can maximize the distance between the initial medoids so that the different final clusters are as far from each other as possible. The specific steps are as follows: First, randomly select a value as the first initial medoid, select the value with the largest distance from the first initial medoid as the second initial medoid, and then select the value with the largest minimum distance from the first two values as the third initial medoid, and so on, until k initial medoids are selected.
Then, by applying Algorithm 1, we will obtain the optimal number of clusters, denoted ask, and the set of clusters Cl 1,i , Cl 2,i , . . . , Clˆk ,i regarding the criterion c i . The set of optimal clusters determines the set of focal elements of the mass function of the criterion, where all DAs contained in each cluster constitute a focal element.

Step 2-2: Assigning Mass Values to Each Focal Element
After obtaining the set of clusters corresponding to the criterion, we need to determine each cluster's preference degree to generate the criterion's mass function. It should be noted that if the number of clusters regarding the criterion c i (c i ∈ C) obtained by Algorithm 1 is 1, that is, all DAs regarding criterion c i have the same performance, this means that the pros and cons of each DA cannot be distinguished regarding criterion c i . The information provided by this criterion is completely ignorant, so the mass function of criterion c i is m c i (Θ) = 1. The following content of this subsection presents the case where the number of clusters is greater than or equal to 2.
In this paper, we determine the preference degree of each cluster by calculating the ratio relationship of the values among different clusters. Therefore, before calculating the preference degree of each cluster, the ratio between all the values contained in the different clusters needs to be calculated. Similarly to the distance of uncertain values, the formula for calculating the ratio between uncertain values is as follows.
Definition 8. (Ratio of uncertain values) Let x i be the value, let Z i be the region in which value x i is defined, let f i (·) be the probability of x i if x i is in the form of probability distribution, let BetP m i (·) be the pignistic probability of m i , and let Θ be the frame of m i if x i is in the form of a mass function. Then, the ratio of value x i to x j , denoted as r x i , x j , is defined by the following: • If x i and x j are both continuous probability distributions: • If x i and x j are both mass functions: • If x i is a mass function and x j is a continuous probability distribution: The proposed formula for the ratio of uncertain values has the following excellent property.

Theorem 1.
Let r x i , x j be the ratio of x i to x j , let r(x i , x k ) be the ratio of x i to x k , let r x j , x j be the ratio of x j to itself, and let Z i be the region in which value x i is defined; then, the ratio of x j to x k , denoted as r x j , x k , can be expressed in terms of r x i , x j , r(x i , x k ), and r x j , x j as follows.
Proof. Considering the case of the continuous probability distribution, according to Fubini's theorem [42], when the integral area of a double integral is a rectangular area and the binary function can be separated into the product of two univariate functions, the double integral can be converted to a repeated integral. Thus, we have Then, we have In the case of discrete forms, the integral needs to be transformed into a cumulative form, and the procedure of the proof is similar. Thus, Theorem 1 holds. Therefore, by Theorem 1, we only need to calculate the ratio of each uncertain value to itself and the ratio of one uncertain value to all other uncertain values; then, the ratio between all uncertain values can be calculated, which will significantly reduce the number of comparisons and calculations and improve the efficiency.
Next, we can obtain the ratio relationship between different clusters, as shown below.

Definition 9.
(Ratio of clusters) Let r(x i , x j ) be the ratio of uncertain value x i to x j ; then, the ratio of cluster Cl h,i to Cl t,i regarding the criterion c i (c i ∈ C), denoted as R i (h, t), is defined by: where |Cl h,i |, |Cl t,i | are the numbers of evaluations contained in clusters Cl h,i and Cl t,i .
It is important to note that the ratio between uncertain values of the same cluster does not need to be calculated. Moreover, due to Theorem 1, only the ratio of one uncertain value to other values needs to be calculated; then, the ratios of all uncertain values can be obtained. Therefore, we need to pick the cluster that contains the largest number of uncertain values, randomly pick one of the uncertain values, and then calculate the ratio between it and all the values of other clusters. The ratio of all uncertain values can be obtained; thus, the ratio between the clusters can also be obtained. Therefore, the efficiency of calculating the ratio between all clusters is significantly improved by Theorem 1.
After the ratio relationships of all clusters are obtained, we use the AHP method to determine the preference degree of each cluster regarding the criterion. First, we construct the judgment matrix for pairwise comparisons among clusters regarding the criterion c i . Let R i (h, t) be the ratio of cluster Cl h,i to Cl t,i and let k be the number of clusters; then, the judgment matrix is constructed as follows.

•
If c i is a benefit criterion: When the criterion is a benefit criterion, the elements of the judgment matrix are the ratio of each cluster. When the criterion is a cost criterion, the elements are the reciprocal of the ratio of each cluster. For the positive reciprocal matrix, the maximum eigenvalue must be a positive eigenvalue according to the Perron-Frobenius theorem [43], and its corresponding eigenvector is a positive vector. The eigenvector is then normalized, and the normalized eigenvector of the maximum eigenvalue can be used as the weight vector. Therefore, the normalized eigenvector corresponding to the largest eigenvalue of the judgment matrix is selected as the weight vector, in which each element value is the preference degree of each cluster Cl, denoted as µ(Cl). Then, we can obtain the mass function of criterion c i as follows.

Step 3: Combining the Mass Functions of All Criteria According to Their Weights
After obtaining the mass functions of all criteria in Step 2, by Definition 4, we can obtain the mass function of each criterion c i (c i ∈ C) discounted by the importance weight ω i , which is denoted asm i . Then, through Dempster's combination rule in Definition 2, we combine all of the discounted mass functions of each criterion to obtain the final mass function defined on the frame Λ, denoted as m f inal . After obtaining the final mass function, according to Definition 1, we calculate the plausibility function of each DA within the frame of the final mass function. Then, the optimal DAs are the DAs with the maximum plausibility function. In this step, if two or more DAs have the same maximum plausibility function (that is, they are divided into the same cluster with regard to each criterion), we can redefine the frame of the mass function of each criterion as the set of DAs to be further compared and repeat Steps 2-3 until we obtain the unique optimal DA.

Experiments
In this section, we will first illustrate the novel MCDM method proposed in this paper using the decision-making problem of selecting the optimal supplier of car parts described in Section 3, as shown in Table 1. Moreover, the decision tree following the three-level AHP framework is shown in Figure 2. Then, a comparative analysis of our method and the existing related MCDM methods is performed.

Step 1: Identifying Uncertain Information in MCDM
For qualitative criteria, in this paper, the set H of numeral assessment grades we use is {1, 2, 3, . . . , 9}, which represents a nine-scale unit preference set for DAs, and the corresponding numeral assessment grades of the linguistic terms Very Poor (VP), Poor (P), Average (A), Good (G), and Very Good (VG) are 1, 3, 5, 7, and 9, respectively, as shown in Table 2. We convert the continuous uncertain evaluations and discrete uncertain evaluations in Table 1 into the form of probability distributions or mass functions, as shown in Table 3.

Step 2: Generating the Mass Function of Each Criterion
In Step 2, we first generate the set of focal elements of the mass function for each criterion by clustering; then, we assign mass values to each focal element.
Step 2-1: Determining the Set of Focal Elements Based on Clustering By setting the minimum and the maximum numbers of clusters as 2 and 10, respectively, and applying Algorithm 1, we can obtain the optimal set of clusters of DAs for each criterion, which determines the set of focal elements of the mass function of the criterion, where all DAs contained in each cluster form a focal element. The optimal set of clusters of DAs is shown in Table 4.   After determining the set of focal elements based on clustering, a new decision tree for the selection of a car part supplier is shown in Figure 3. Step 2-2: Assigning Mass Values to Each Focal Element After obtaining the optimal set of clusters of the criterion, we choose the largest cluster among the set of clusters of the criterion; then, we choose a random value in that cluster and calculate the ratio between it and all values of other clusters according to Definition 8. The ratio between all values can be obtained according to Theorem 1. Then, according to Definition 9, the ratio between all clusters can be obtained. We construct the judgment matrix and calculate the preference degree of each cluster with the eigenvector method. Then, according to Definition 10, the mass function of this criterion is obtained, as shown in Table 5.

Step 3: Combining the Mass Functions of All Criteria According to Their Weights
In this article, we weight the criteria of price, delivery time, service level, and quality as ω(m p ) = 0.4, ω(m t ) = 0.1, ω(m sl ) = 0.2, and ω(m q ) = 0.3, respectively. Then, we obtain the discounted mass function of each criterion with Definition 4, as shown in Table 6. Table 6. Mass function of each criterion discounted by the importance weight.

Decision Criteria Discounted Mass Function
By using Dempster's combination rule, we can obtain the final mass function, denoted as m f inal , as shown in Table 7.  It can be seen that the DAs c and d are the two optimal alternatives among them. Furthermore, if we want to determine the unique optimal DA, we need to further compare DAs c and d. We redefine the frame of the mass function of each criterion as Θ = {c, d} and repeat Steps 2-3; then, we can obtain the final combined mass function as follows: Then, we can obtain the plausibility functions of DAs c and d as: Pl(c) = 0.79, Pl(d) = 0.76. Therefore, c d, and the optimal DA is c.

Comparison with Related Work and Discussion
We first compare our proposed MCDM method with existing MCDM methods based on D-S theory, including DS/AHP [37], DS-AHP [ [15], and DS-VIKOR [18]; the experimental results are shown in Table 8 [4] to generate the corresponding evaluation in the form of the mass function. In addition, since none of these existing methods can handle continuous uncertain evaluations, we replace them with deterministic mathematical expectations. As can be seen from Table 8, the proposed method and other related methods consistently select c as the optimal DA. Moreover, except for the DS-AHP and DS-VIKOR, all other related methods consider the six DAs a, b, c, d, e, h to be relatively better than other DAs, which is consistent with our method.
To further analyze the performance of the priority order generated by our method, we calculated the ranking similarity between the priority order generated by our method and the priority orders generated by other methods. Since the priority order is used to determine the optimal DA, the WS coefficient [44] is used as a measure of ranking similarity in this paper due to its main advantage that differences at the top of the ranking have a more significant impact than differences at the bottom. In addition, to generate our method's specific priority order, we further compare the two DAs a and b. We redefine the frame of the mass function for each criterion as Θ = {a, b} and then repeat Steps 2-3. Then, we obtain the combined final mass function and calculate the plausibility functions of a and b, yielding Pl(a) = 0.701, Pl(b) = 0.712, so that b a. Therefore, the specific priority order generated by our method is c d b a e h f g i j. Using the priority order generated by our method as the reference ranking, we calculated the WS coefficients of the priority order generated by the other methods with the priority order of our method, as shown in Table 9. As can be seen from Table 9, the WS coefficients of the priority order generated by our method and the priority order of the other methods, except DS-AHP, are all greater than 0.808, above which the similarity between the orderings of ten elements is defined as high [44]. Then, we further check the WS coefficients of the priority order generated by the DS-AHP and other methods, and the results are all less than 0.793, as shown in Table 10. Thus, the WS coefficient of the priority order generated by our method and the priority order generated by the DS-AHP is lower than 0.808 due to the difference between the DS-AHP methods and other methods, while the result generated by our method is consistent with those generated by the majority of the methods. As a result, the rationality of our method can be confirmed. Moreover, since the mass function of D-S theory has the ability to express ignorance directly, all methods based on D-S theory can easily handle missing discrete evaluations. Furthermore, we compared and analyzed our method with other existing methods in three aspects: whether they can handle continuous evaluation, whether they can handle the discrete evaluation of interval values, and whether they can handle ambiguous discrete evaluation, as shown in Table 11. We can see that only our method performs well in all three aspects, while the other existing MCDM methods based on D-S theory fail to do so. More specifically, among them, Ma et al.'s method is able to deal well with various types of discrete uncertain evaluations, while none of the other methods consider uncertain evaluations in the form of interval values. However, other methods, except ours, cannot handle continuous uncertainty evaluations. As a result, the effectiveness and practicality of our approach are demonstrated. Table 11. Analysis of the comparison of existing MCDM methods based on D-S theory.

Ability to Handle Continuous Evaluation
Ability to Handle Interval-Valued Discrete Evaluation edge judgment of the expert. Furthermore, when using the DS/AHP, the expert's knowledge judgment is required to determine the set of focal elements of the mass function for each criterion, where the focal elements containing more than one DA represent the expert's ambiguous judgment of the DAs contained within the focal elements. Assuming that the expert determines from their own knowledge judgment that the set of focal elements contains n 1 (1 ≤ n 1 ≤ m) focal elements, it is required that all n 1 focal elements are compared with the set of DAs, i.e., the frame of discernment. Thus, the total number of knowledge judgments required by the method of DS/AHP consists of two parts, and the first part is the number of knowledge judgments to determine n 1 focal elements among m DAs, which relies on the expert having reasonable knowledge judgments for all m DAs. The second part is n 1 comparative judgments for comparing the focal elements with the frame of discernment. In order to reduce the number of knowledge judgments, the n 1 obtained by experts based on empirical judgments is much smaller than m. Therefore, it can be seen that DS/AHP has a significant reduction in the number of knowledge judgments compared with the classical AHP method. However, it is demanding and laborious for an expert to determine the appropriate set of n 1 focal elements among m DAs with knowledge judgments when the size of m is very large, and this method can be very subjective and arbitrary.
When using the DS-AHP method, experts are first required to make knowledge judgments on the performance of m DAs. The set of focal elements of the criterion's mass function is determined by combining all DAs with the same evaluation value into a focal element. Finally, experts are required to make evaluation judgments on each focal element to generate the criterion's mass function. Assuming that the number of focal elements generated by the DS-AHP method is n 2 (1 ≤ n 2 ≤ m), the number of expert knowledge judgments required by the method consists of two parts: The first part is m knowledge judgments for each DA, and the second part is n 2 knowledge judgments for each focal element. However, identical evaluation values are rare in the practical application of uncertain MCDM, especially for continuous uncertain evaluations. Therefore, in large-scale uncertain MCDM problems, the values of m and n 2 are relatively close. The method of Ma et al. is an extension of DS/AHP. It differs from DS/AHP in generating the mass function for each criterion by requiring an additional knowledge judgment of whether or not the evaluations of the DA groups over the given criterion are complete before assigning a mass value to each focal element.
Our method automatically determines the set of focal elements for each criterion by employing a clustering approach based on uncertain evaluations and then assigning mass values to each focal element by computing the ratios between clusters. Therefore, the number of expert knowledge judgments required by our method is the number of DAs, i.e., m, which is significantly less than the number required for the classical AHP, DS/AHP, DS-AHP, and Ma et al.'s method. In addition, our method is efficient in assigning mass values to each focal element, since the ratios between DAs in the same cluster do not need to be calculated, and according to Theorem 1, the ratios between all DAs to be compared can be determined by calculating the ratio of a certain DA to all other DAs. Thus, the efficiency of our method is demonstrated.

Conclusions
Handling large-scale discrete and continuous uncertainty evaluations of DAs in realworld MCDM applications is an essential and open issue in the research field of MCDM. In order to address this issue, this paper considers the hierarchical structure of the AHP method as the backbone of our MCDM method. Then, the concepts of D-S theory and probability distribution are incorporated to identify various discrete and continuous uncertain evaluations. Afterwards, a clustering technique based on the Silhouette coefficient is applied to translate large-scale uncertain evaluation into the mass value of each criterion. Finally, Dempster's combination rule is adopted for preference aggregation. Moreover, comparative experiments and analyses with other related methods demonstrate that the proposed method is feasible, effective, and efficient. Our method is considered to be the state of art for the following reasons: First, to the best our knowledge, our method is the first for MCDM problems with large-scale continuous and discrete uncertain evaluations. Hence, the idea of applying the clustering method to the MCDM method explores a new path, and it is able to balance precision and efficiency in handling large-scale decision alternatives. Finally, by choosing the AHP method, one of the most representative and widely used MCDM methods, as the backbone of our method, we indeed extend the application area of the AHP to the handling of real-world MCDM problems with large-scale uncertain evaluations. Moreover, the comparative experiments and analyses with other related methods demonstrate that the proposed method is feasible, effective, and efficient.
In future work, we will apply our method to practical applications to further validate its feasibility. Furthermore, we will integrate methods for determining criteria weights, such as weighting methods that combine the subjective and objective, with our method. Additionally, we will further explore how to avoid the rank reversal paradox. In addition, in this paper, the eigenvector method of the AHP was used to determine the rankings of DAs with respect to each criterion, and since there are many MCDM techniques that can do this, other MCDM techniques can be integrated into our approach to replace the AHP, such as TOPSIS, BWM, etc. Moreover, our method requires pre-determination of the value interval of the optimal number of clusters before clustering, which requires the decision maker to make a basic judgment of the overall DAs to improve the efficiency. Therefore, we can incorporate other clustering methods for uncertain data into our method in the future. Furthermore, extending the D-S theory to handle continuous values would be an exciting topic for future research.