Numerical Analysis of Consensus Measures within Groups

Measuring the consensus for a group of ordinal-type responses is of practical importance in decision making. Many consensus measures appear in the literature, but they sometimes provide inconsistent results. Therefore, it is crucial to compare these consensus measures, and analyze their relationships. In this study, we targeted five consensus measures: Φe (from entropy), Φ1 (from absolute deviation), Φ2 (from variance), Φ3 (from skewness), and Φmv (from conditional probability). We generated 316,251 probability distributions, and analyzed the relationships among their consensus values. Our results showed that Φ1, Φe, Φ2, and Φ3 tended to provide consistent results, and the ordering Φ1≤Φe≤Φ2≤Φ3 held at a high probability. Although Φmv had a positive correlation with Φ1, Φe, Φ2, and Φ3, it had a much lower tolerance for even a small proportion of extreme opposite opinions than Φ1, Φe, Φ2, and Φ3 did.


Introduction
A consensus measure quantifies the consensus in ratings of a target. It provides fundamental implications of the group's decision. For example, it can reveal whether the opinions of the group's members are converging during a successive voting process [1], or whether averaging the members' ratings to the group level is appropriate [2]. Because of its practicality, the problem of measuring consensus has received much attention, both in academic and applied research [3].
Many consensus measures appear in the literature. Most of them are derived from the deviation of individual ratings from the mean [3,4], while some are based on the extension of entropy [1], or the application of conditional probability [5]. Because consensus measures intend to quantify consensus, one tends to assume that similar conclusions can be drawn using different consensus measures. Although this assumption usually holds, it is still possible that a set of ratings which receives the lowest consensus score using one consensus measure may get a very high consensus score using another consensus measure (see Table 9). It is reasonable that using different consensus measures might lead to different conclusions because they are built on different theoretical concepts. For example, let A 1 and A 2 denote two sets of ratings collected at time t 1 and t 2 , t 1 < t 2 . Using one consensus measure might conclude that the consensus of A 1 is smaller than that of A 2 (i.e., the group members' opinions are converging), but using another consensus measure might yield the opposite conclusion. Therefore, it is crucial to compare these consensus measures in more detail so that one can adequately interpret the meanings of the consensus values.
The objective of this study was to analyze the relationships among different consensus measures so that one can adequately utilize these consensus measures going forward. We first reviewed five consensus measures, and their properties. Then, we took a numerical analysis approach to comparing these consensus measures. This approach proceeded by generating a large number of possible rating distributions, and calculating their consensus scores using each consensus measure. Then, these consensus scores were analyzed to reveal the relationships among these consensus measures. Finally, we discussed how to interpret these consensus scores, and how to select a suitable consensus measure.

Basic Properties of a Consensus Measure
In this paper, we assumed that a rating was an integer in X = {1, 2, . . . , n}. For Likert-type scale responses, n = 5 or 7 is often used. Then, the ratings of all group members can be described as a probability distribution p(x) over X. Let p i denote the probability p(x = i) of getting a rating i. Then, variance Notably, the rating data are ordinal, and thus, calculating the mean or variance of p(x) is inappropriate. However, mean, variance, or a combination of both was used intensively in the literature to design consensus measures for ordinal attributes.
Let Φ denote a consensus measure, and Φ(p). denote the consensus score of p(x), based on Φ. It is common to restrict the range of Φ(p) between zero and one. This restriction also facilitates comparing different consensus measures. Thus, 0 ≤ Φ(p) ≤ 1, and Φ(p) = 1 and Φ(p) = 0 indicate the maximum and minimum consensus scores, respectively [5]. In this paper, we divided the consensus measures into three categories, as described in the three subsections below.

Deviation-Based Consensus Measures
Deviation-based consensus measures use the absolute deviation of individual ratings from their mean to measure the consensus. They mainly differ in the power of the absolute deviation. In the literature, power = 1 or 2 was used to measure consensus. In this study, we extended the power to 3.
The average deviation (AD) [6] is the average difference between each rating and the mean, as shown in Equation (5). It is a measure of variability, and its range is between 0 and n−1 2 , as proven in Corollary 1. Based on AD, we can design a consensus measure Φ 1 (p) such that 0 ≤ Φ 1 (p) ≤ 1 (see Definition 1).

Proof. See Appendix A.
Definition 1. Consensus measure Φ 1 (p) = 1 − AD(p) (n−1)/2 . Similar to AD, variance (V) is also a measure of variability, and is defined as the average of the squared difference between each rating and the mean, as shown in Equation (4). Its range is between 0 and n−1 2 2 , as proven in Corollary 2. Elzinga et al. [4] designed a consensus measure Φ 2 (p) based on V (see Definition 2). , as proven in Corollary 3. A consensus measure Φ 3 (p) based on S is shown in Definition 3.
Proof. See Appendix C.
Essentially, in Φ 1 (p), Φ 2 (p), and Φ 3 (p), raising the power of the absolute deviation increases the impact of those ratings further from the mean. An example is given below.
Example 1. Given a probability distribution p(x) over X = {1, 2, 3, 4, 5} where p i∈X = 0.2, a (less consensus) probability distribution q(x) with more probabilities further from the mean is generated from p(x) by shifting 0.05 probability at x = 4 to x = 5, i.e., q 1 = q 2 = q 3 = 0.2, q 4 = 0.15, and q 5 = 0.25. Table 1 shows AD, V, S, Φ 1 , Φ 2 , and Φ 3 of p(x) and q(x). The last row of Table 1 indicates that from p to q, the consensus is reduced by 0.03 with Φ 1 , 0.03688 with Φ 2 , and 0.04211 with Φ 3 . That is, the impact of increasing the probability further from the mean is greatest in Φ 3 , less in Φ 2 , and least in Φ 1 . Table 1. From p(x) to q(x), consensus score reduces the most in Φ 3 , less in Φ 2 , and least in Φ 1 .

Conditional-Probability-Based Consensus Measure
Corollary 2 shows that the range of variance V is between 0 and n−1 2 2 , and the consensus measure Φ 2 is constructed based on this range. However, the range of V is a function of the mean m. Specifically, for a given value of m, the range of V is between (m − m)(m + 1 − m) and (m − 1)(n − m), where m is the greatest integer ≤ n. The size of this range is small as the value of m approaches either end of the interval [1, n], and is large as the value of m approaches the center of the interval [1, n]. Thus, Akiyama et al. [5] proposed a new consensus measure via the conditional probability p(V|m) . Because this consensus measure is calculated using both m and V, we denoted it as Φ mv (p) in this paper. Figure 1 shows the steps to calculate Φ mv (p) for a probability distribution p(x) over X = {1, 2, 3, 4, 5}.

Input:
and of a probability distribution ( ) over = {1,2,3,4,5} Steps to calculate Φ ( ) for a probability distribution ( ) (revised from Reference [5]). Table 2 shows some examples of the probability distribution ( ) with Φ ( ) = 1 or 0. Unlike Φ , Φ , and Φ , Φ ( ) = 1 not only occurs when = 1 for some ∈ , and ∈ \{ } = 0, but also occurs in many other cases. The first four examples in Table 2 show that the maximum value of Φ ( ) occurs when all probabilities are distributed on one side, and none on the other side of . Similarly, Φ ( ) = 0 not only happens when = = 0.5, and ∈ \{ , } = 0, but also occurs in many other cases. The last three examples in Table 2 show that a small proportion of extreme opposite opinions can drag Φ ( ) to zero. Table 2. Some examples of the probability distribution ( ) satisfying Φ ( ) = 1 or 0.

Entropy-Based Consensus Measure
In the literature, the Shannon entropy equation and its extensions were used to quantify the diversity of a probability distribution [7]. Given a probability distribution ( ), the Shannon entropy of ( ) is − ∑ ln( ) where is the number of possible values of , and denotes the probability of = . Because diversity appears to be the opposite concept of consensus, and the range of the Shannon entropy is between 0 and ln( ), a consensus measure between 0 and 1 based on the Shannon entropy equation can be defined as follows [1,8]: Steps to calculate Φ mv (p) for a probability distribution p(x) (revised from Reference [5]). Table 2 shows some examples of the probability distribution p(x) with Φ mv (p) = 1 or 0. Unlike Φ 1 , Φ 2 , and Φ 3 , Φ mv (p) = 1 not only occurs when p k = 1 for some k ∈ X, and p i∈X\{k} = 0, but also occurs in many other cases. The first four examples in Table 2 show that the maximum value of Φ mv (p) occurs when all probabilities are distributed on one side, and none on the other side of x. Similarly, Φ mv (p) = 0 not only happens when p 1 = p n = 0.5, and p i∈X\{1,n} = 0, but also occurs in many other cases. The last three examples in Table 2 show that a small proportion of extreme opposite opinions can drag Φ mv (p) to zero. Table 2. Some examples of the probability distribution p(x) satisfying Φ mv (p) = 1 or 0.

Entropy-Based Consensus Measure
In the literature, the Shannon entropy equation and its extensions were used to quantify the diversity of a probability distribution [7]. Given a probability distribution p(x), the Shannon entropy of p(x) is − ∑ n i=1 p i ln(p i ) where n is the number of possible values of x, and p i denotes the probability of x = i. Because diversity appears to be the opposite concept of consensus, and the range of the Shannon entropy is between 0 and ln(n), a consensus measure between 0 and 1 based on the Shannon entropy equation can be defined as follows [1,8]: Notably, the Shannon entropy equation treats the variable x as a nominal variable, and not as an ordinal variable; thus, the Shannon entropy equation and Equation (7) are inappropriate for quantifying the consensus of ordinal data, such as Likert-type scale responses. To resolve this problem, Tastle and Wierman [1,8] extended the Shannon entropy equation to define a new consensus measure, denoted as Φ e in this paper, as follows: where m is the mean of p(x), as defined in Equation (3). Similar to Φ 1 (p), Φ 2 (p), and Φ 3 (p), the maximum value of Φ e (p) only occurs when p k = 1 for some k ∈ X, and p i∈X\{k} = 0; the minimum value of Φ e (p) only occurs when p 1 = p n = 0.5, and p i∈X\{1,n} = 0.

Experiment Setup
Given a probability distribution, the five consensus measures reviewed in Section 2 often yielded different consensus scores, and sometimes the differences among these scores were substantial, and led to opposite conclusions. This phenomenon makes it difficult to interpret the meaning of these scores. In this study, we performed a numerical experiment to analyze the relationships among these five consensus measures.
This experiment used the probability distribution p(x) over X = {1, 2, 3, 4, 5}, which is common for Likert-type scale data. Specifically, we wrote a small computer program containing a five-level for loop to generate 316,251 probability distributions, where the i-th level of the for loop changed the value of p i from 0 to 1 with a step size of 0.2, and cases not satisfying ∑ 5 i=1 p i = 1 were skipped. Thus, these 316,251 probability distributions covered all of the possible probability distributions of p(x) satisfying p i ∈ {0, 0.2, 0.4, . . . , 0.98, 1} for i = 1 to 5, and ∑ 5 i=1 p i = 1. Then, the consensus scores of each generated probability distribution were calculated and compared to study the relationships among the five consensus measures. Table 3 shows the distribution of the mean values of the 316,251 probability distributions. Most of the generated probability distributions had mean values between 2 and 4. Table 3. The distribution of the mean values of the 316,251 generated probability distributions.  Table 4 shows the Kendall rank correlation coefficients between any two consensus measures. As expected, the results reflected higher than 0.887 correlation between any two consensus measures. That is, if a probability distribution A is ranked higher than another probability distribution B based on one consensus measure, it is very likely that A is also ranked higher than B based on another consensus measure. Let τ Φ i , Φ j denote the Kendall rank correlation coefficient between Φ i and Φ j . According to Table 4, the lowest correlation occurred at τ(Φ 1 , Φ 3 ), and the highest occurs at τ(Φ 1 , Φ e ).

Correlation
According to Table 3, only 5.18% and 4.82% of the 316,251 generated probability distributions had their mean values in the intervals [1, 2] and (4,5], respectively. To check whether high correlation still existed for probability distributions with small or large mean values, we calculated the Kendall rank correlation coefficients using both subsets of probability distributions, and the results are shown in Tables 5 and 6. Every value in Tables 5 and 6 was smaller than its corresponding value in Table 4. Table 4 to 0.774093 in Table 5, and 0.772132 in Table 6; τ(Φ 1 , Φ mv ) dropped from 0.925708 in Table 4 to 0.785614 in Table 5, and 0.776873 in Table 6. Table 4. Kendall rank correlation coefficients between consensus measures using all 316,251 probability distributions.

Range of Difference
Although Table 4 shows that a positive correlation existed between any two consensus measures of the 316,251 generated probability distributions, some of the generated probability distributions did not follow this general trend. In this section, we calculated the range of differences between two consensus measures to show that this difference was usually small, but was sometimes very big. Table 7 shows the mean differences between any two consensus measures of the 316,251 generated probability distributions. All of the mean differences were small (<0.167), where the largest mean difference occurred between Φ 1 and Φ 3 , and the smallest mean difference occurred between Φ 1 and Φ e . The results were consistent with Table 4, where the smallest and the largest correlation coefficients were R(Φ 1 , Φ 3 ) and R(Φ 1 , Φ e ), respectively. Table 8 shows the maximum difference between any two consensus measures of the 316,251 generated probability distributions. Some of the maximum differences were very large.
For example, the maximum difference between Φ mv and other consensus measures was larger than 0.84. Notably, all of the correlation coefficients between Φ mv and the other consensus measures were greater than 0.92 (see Table 4), and the mean difference between Φ mv and the other consensus measures was less than 0.16 (see Table 7). Thus, it is reasonable to infer that, although for most probability distributions, the difference between Φ mv and the other consensus measures was not large, but for some probability distributions, this difference could be huge. Therefore, it is important to understand for which kinds of probability distributions does such a big difference between various consensus measures occur. Table 7. Mean differences between any two consensus measures.
The first four examples in Table 9 show some of the generated probability distributions where the maximum differences between two consensus measures occurred. Example 1 had a large proportion (98%) of probability at x = 1, thus rendering high consensus scores using Φ 1 , Φ e , Φ 2 , and Φ 3 . However, this large proportion of probability at x = 1 also made values of m close to 1, where m was the mean of the probability distribution. As discussed in Section 2.3, the range of variance is small when m approaches either end of the interval [0, 1]. Thus, for values of m close to 1, the range of variance was small, making Φ mv very sensitive to even a small proportion of probability at the opposite end of x (2% at x = 5 in this example). As a result, Example 1 yielded Φ mv = 0. This example was also one of the probability distributions among the 316,251 generated probability distributions that had the maximum difference (in Table 8) between Φ mv and other consensus measures. Table 9. Some examples of the probability distribution p(x), and their consensus scores. Examples 2 and 3 in Table 9 were similar to Example 1, where a large proportion of probability occurred at x = 1, and a small proportion of probability occurred at x = 5. The values of Φ mv remained 0 for Examples 2 and 3. However, the difference between p 1 and p 5 decreased from Example 1 through to Example 3, making Φ 1 , Φ e , Φ 2 , and Φ 3 smaller for Examples 2 and 3 than for Example 1. Notably, Example 2 was one of the probability distributions that had the maximum difference (in Table 8) between Φ 1 and Φ e ; Example 3 was one of the probability distributions that had the maximum difference between Φ 2 and Φ 3 .
Example 4 had p 3 = p 5 = 0.5, and yielded the maximum difference (in Table 8) between Φ 1 and Φ 2 , between Φ 1 and Φ 3 , between Φ e and Φ 2 , and between Φ e and Φ 3 . Suppose that the first four examples in Table 9 describe the voting results at four different stages during a successive voting process. From Example 1 through to Example 4, the value of Φ 1 decreased, indicating the group's consensus was diverging. However, using Φ mv concluded the opposite. For Φ e , Φ 2 , and Φ 3 , the consensus first decreased (from Example 1 through to Example 3), and then increased (from Example 4 onward). However, the differences between the consensus values in Examples 1 and 4 were 0.273596 with Φ e , 0.1716 with Φ 2 , and −0.02565 with Φ 3 . Thus, using different consensus measures could lead to different conclusions.
A small change in the probability distribution could result in a different impact on different consensus measures. Consider Examples 1, 7, and 6. They differed by moving a small proportion (2%) of probability from x = 5, to x = 4, and to x = 3, respectively. Although they were similar probability distributions, the value of Φ mv was 0 in Example 1, and gradually increased to 0.166667 in Example 7, but quickly increased to 0.833333 in Example 6. However, the values of Φ 1 , Φ e , Φ 2 , and Φ 3 did not change much among these three examples. Notably, the proportion of probabilities further from the mean had a greater negative impact on Φ 3 , than on Φ 2 and Φ 1 . Thus, by moving 2% of probability from x = 5 to x = 4 (i.e., moving closer to the mean), the ordering of Φ 1 , Φ 2 , and Φ 3 changed from Then, by moving 2% of probability from x = 4 to x = 5, the ordering of Φ 1 , Φ 2 , and Φ 3 changed to Φ 1 < Φ 2 < Φ 3 in Example 6.
The ordering of the values of these consensus measures depended on the probability distribution. For Examples 4, 5, and 6, the value of Φ mv was the same, but . In Example 7, Φ mv was the smallest among all consensus measures; however, in Example 8, Φ mv was the greatest.

Ordering
From the examples in Table 9, it appeared that no fixed ordering existed among the consensus scores calculated using different consensus measures. Figure 2 shows the distributions of consensus scores of the 316,251 probability distributions generated in this experiment. The distributions of consensus scores based on Φ 1 , Φ e , Φ 2 , and Φ 3 were similar, but were very different from the distribution of consensus scores based on Φ mv . For the consensus values close to 1, the ordering of the probabilities among Φ 1 , Φ e , Φ 2 , and Φ 3 was Φ 1 < Φ e < Φ 2 < Φ 3 , but for the consensus values close to 0, the ordering of the probabilities became In Table 10, we compared the consensus scores of the 316,251 generated probability distributions, and calculated the probabilities of scores based on one consensus measure being less than or equal to scores based on another consensus measure. According to Table 10, and Φ 1 ≤ Φ e also held at very high probabilities. Thus, Φ 1 ≤ Φ e ≤ Φ 2 ≤ Φ 3 was the most probable ordering among the scores based on these four consensus measures. The orderings between Φ mv , and Φ 1 or Φ e were not apparent, where Φ 1 ≤ Φ mv and Φ e ≤ Φ mv only held at 58.12% and 52.04% probabilities, respectively. Finally, Φ 2 > Φ mv and Φ 3 > Φ mv were likely to occur because Φ 2 ≤ Φ mv and Φ 3 ≤ Φ mv held at 36.84% and 28.01% probabilities, respectively.

Relationships
To visually inspect the relationships among different consensus measures, we plotted the consensus values of the 316,251 generated probability distributions in two-dimensional (2D) scatter charts. Figure 3 shows the scatter charts of Φ scores versus scores based on the other consensus measures, where the red dashed lines represent equality between two consensus scores. As expected, a positively correlated trend existed. No fixed ordering existed between Φ and the other consensus measures except that Φ ≤ Φ always held, as shown in Figure 3b. According to Figure 3a-c, as the value of Φ approached 0 or 1, the ranges of Φ , Φ and Φ narrowed, indicating that the maximum differences between Φ and Φ , Φ , and Φ decreased. However, when the value of Φ approached 0.5, the ranges of Φ , Φ , and Φ increased, indicating that the maximum differences between Φ and Φ , Φ , and Φ also increased. Furthermore, the maximum difference between Φ and Φ was smaller than both the maximum differences between Φ and Φ , and between Φ and Φ . Figure 3d shows that, for Φ < 1, as the value of Φ increased, the range of Φ increased, and the maximum difference between Φ and Φ became huge. For any probability distribution satisfying Φ = 1, its Φ was also 1. However, for any probability distribution satisfying Φ = 1, its value of Φ was not necessarily 1. In fact, there were only probability distributions satisfying Φ = 1, that is, when = 1 for some ∈ , and ∈ \{ } = 0 (this statement also applies to Φ , Φ , and Φ ). However, there were many probability distributions satisfying Φ = 1 (see Table 2 for examples).

Relationships
To visually inspect the relationships among different consensus measures, we plotted the consensus values of the 316,251 generated probability distributions in two-dimensional (2D) scatter charts. Figure 3 shows the scatter charts of Φ 1 scores versus scores based on the other consensus measures, where the red dashed lines represent equality between two consensus scores. As expected, a positively correlated trend existed. No fixed ordering existed between Φ 1 and the other consensus measures except that Φ 1 ≤ Φ 2 always held, as shown in Figure 3b. According to Figure 3a-c, as the value of Φ 1 approached 0 or 1, the ranges of Φ e , Φ 2 and Φ 3 narrowed, indicating that the maximum differences between Φ 1 and Φ e , Φ 2 , and Φ 3 decreased. However, when the value of Φ 1 approached 0.5, the ranges of Φ e , Φ 2 , and Φ 3 increased, indicating that the maximum differences between Φ 1 and Φ e , Φ 2 , and Φ 3 also increased. Furthermore, the maximum difference between Φ 1 and Φ e was smaller than both the maximum differences between Φ 1 and Φ 2 , and between Φ 1 and Φ 3 . Figure 3d shows that, for Φ 1 < 1, as the value of Φ 1 increased, the range of Φ mv increased, and the maximum difference between Φ 1 and Φ mv became huge. For any probability distribution satisfying Φ 1 = 1, its Φ mv was also 1. However, for any probability distribution satisfying Φ mv = 1, its value of Φ 1 was not necessarily 1. In fact, there were only n probability distributions satisfying Φ 1 = 1, that is, when p k = 1 for some k ∈ X, and p i∈X\{k} = 0 (this statement also applies to Φ e , Φ 2 , and Φ 3 ). However, there were many probability distributions satisfying Φ mv = 1 (see Table 2 for examples).  Figure 4 shows the scatter charts of the consensus scores based on Φ , Φ , Φ , and Φ . No fixed ordering existed among these consensus measures except that Φ ≤ Φ always held, as shown in Figure 4a. According to Figure 4a,b,d, for Φ , Φ , and Φ , as the value of one consensus measure approached either end of the interval [0, 1], the range of another consensus measure decreased. According to Figure 4a,b, the maximum difference between Φ and Φ was smaller than that between Φ and Φ . According to Figures 3b and 4a,d, the maximum difference between Φ and Φ was smaller than those between Φ and Φ , and between Φ and Φ . Figure 4c,e,f show a similar pattern to Figure 3d. As the value of Φ (or Φ , Φ ) increased (before reaching 1), the range of Φ increased, and the maximum difference between Φ (or Φ and Φ ) and Φ became huge.  Figure 4 shows the scatter charts of the consensus scores based on Φ e , Φ 2 , Φ 3 , and Φ mv . No fixed ordering existed among these consensus measures except that Φ e ≤ Φ 2 always held, as shown in Figure 4a. According to Figure 4a,b,d, for Φ e , Φ 2 , and Φ 3 , as the value of one consensus measure approached either end of the interval [0, 1], the range of another consensus measure decreased. According to Figure 4a,b, the maximum difference between Φ e and Φ 2 was smaller than that between Φ e and Φ 3 . According to Figures 3b and 4a,d, the maximum difference between Φ 2 and Φ e was smaller than those between Φ 2 and Φ 1 , and between Φ 2 and Φ 3 . Figure 4c,e,f show a similar pattern to Figure 3d. As the value of Φ e (or Φ 2 , Φ 3 ) increased (before reaching 1), the range of Φ mv increased, and the maximum difference between Φ e (or Φ 2 and Φ 3 ) and Φ mv became huge.

Discussions
Given a probability distribution, using different consensus measures often yields different consensus scores. If there exists a fixed ordering among these scores, then consistent results can be drawn using different consensus measures. Unfortunately, such an ordering depends on the given

Discussions
Given a probability distribution, using different consensus measures often yields different consensus scores. If there exists a fixed ordering among these scores, then consistent results can be drawn using different consensus measures. Unfortunately, such an ordering depends on the given probability distribution. However, according to Table 10, the following orderings among the consensus scores held at high probabilities: Because there exists no fixed ordering among consensus scores based on different consensus measures, it is crucial to know the relationships among the consensus measures. Figures 3 and 4 revealed that, for Φ 1 , Φ e , Φ 2 , and Φ 3 , as the value of one consensus measure approached either end of the interval [0, 1], the ranges of the other consensus measures decreased. Thus, one can expect smaller differences among Φ e , Φ 1 , Φ 2 , and Φ 3 for consensus scores close to 0 or 1, than for consensus scores close to 0.5.
According to Figures 3d and 4c,e,f, the range of Φ mv increased rapidly as the value of Φ e , Φ 1 , Φ 2 , or Φ 3 increased. Thus, Φ mv often gave results inconsistent with those from Φ e , Φ 1 , Φ 2 , and Φ 3 , especially when the value of Φ e , Φ 1 , Φ 2 , or Φ 3 was large. Looking at these figures from another perspective, the ranges of Φ 1 , Φ e , Φ 2 , and Φ 3 decreased as the value of Φ mv increased. Notably, Φ mv tended to give low scores to probability distributions where some probability was located at the opposite end of the mean. Thus, for values of Φ mv close to zero, one should also check the values of Φ 1 , Φ e , Φ 2 , and Φ 3 for possibly inconsistent results.
Choosing a consensus measure remains a task for the users. If one has a low tolerance for even a small proportion of extreme opposite opinions, then Φ mv is a good choice. Otherwise, the other consensus measures tend to provide consistent results. If one prefers to emphasize the opinions further from the mean, then Φ 3 is a good choice. Otherwise, either Φ 1 or Φ e can be used, both yielding similar results. Finally, Φ 2 provides a middle ground between Φ 3 and Φ 1 .

Conclusions
An understanding of the characteristics of consensus measures helps users interpret results. For example, according to Figure 3b, Φ 1 tended to yield a smaller consensus score than Φ 2 for the same probability distribution; thus, a probability distribution A with Φ 1 (A) = 0.6 might have more consensus than another probability distribution B with Φ 2 (B) = 0.7, even though Φ 1 (A) < Φ 2 (B).
In essence, two opposite forces shape the design of a consensus measure: the force of obeying the majority, and the force of respecting the minority. Consensus measure Φ e stressed on the former, and the opinion of the minority has a weak impact on the consensus scores. In contrast, Φ mv emphasizes the latter, and the opinion of the minority substantially influences the consensus scores, as shown in the first four examples in Table 9.
Deviation-based consensus measures (i.e., Φ 1 , Φ 2 , and Φ 3 ) allow users to adjust the strengths of these two forces. As described in Section 2.2, raising the power of the absolute deviation in the deviation-based consensus measures increases the impact of ratings further from the mean. Intuitively, unless the probabilities of all opinions are distributed evenly on opposite sides of the mean (e.g., p 1 = p n = 0.5), ratings further from the mean represent the opinions of the minority. Thus, going from Φ 1 through to Φ 3 , the impact of the minority increases. Overall, fine-tuning the balance between the force of obeying the majority, and the force of respecting the minority in a consensus measure provides the consensus measure with more flexibility for various situations, and is a direction of research worth exploring.

Conflicts of Interest:
The author declares no conflict of interest.

Appendix A
In this section, we derived the range of AD(p), where p is a probability distribution over X = {1, 2, ..., n} with mean m. Lemma 1 shows that, by moving each p i≤m gradually toward p 1 , the AD of the resulting distribution keeps increasing. Similarly, Lemma 2 shows that by moving each p i>m gradually toward p n , the AD of the resulting distribution also keeps increasing. Lemma 1. Let p(x) and q(x) be two probability distributions over X = {1, 2, ..., n}, p i and q i denote p(x = i) and q(x = i), respectively, and p i < 1 and q i < 1 for each i ∈ X. Let m = ∑ n i=1 ip i , and k denote the greatest integer satisfying 1 < k ≤ m and p k > 0. If q k−1 = p k + p k−1 , q k = 0, and q i = p i for each i ∈ X\{k − 1, k}, then AD(q) > AD(p). (3), the mean of q(x) is

Proof. By Equation
Let j denote the smallest integer such that m < j and p j > 0. Then, p i = 0 for k + 1 ≤ i ≤ j − 1, and q i = 0 for k ≤ i ≤ j − 1. Thus,

Lemma 2.
Let p(x) and q(x) be two probability distributions over X = {1, 2, ..., n}, p i and q i denote p(x = i) and q(x = i), respectively, and p i < 1 and q i < 1 for each i ∈ X. Let m = ∑ n i=1 ip i , and j denote the smallest integer satisfying m < j < n and p j > 0. If q j = 0, q j+1 = p j + p j+1 , and q i = p i for each i ∈ X\{j, j + 1}, then AD(q) > AD(p).

Proof. By Equation (3), the mean of q(x) is
Let k denote the greatest integer such that 1 < k ≤ m and p k > 0. Then, p i = 0 for k + 1 ≤ i ≤ j − 1, and q i = 0 for k + 1 ≤ i ≤ j. Thus, Also, 0 < p j < 1, k ≤ m and m < j yield k ≤ m < m < m + 1 < j + 1.
Lemmas 3 and 4 were used to derive the upper bound of AD in Corollary 1.
Proof. First, consider the trivial case of p i = 1 for some i ∈ X. Let q 1 = 1, then AD(q) = AD(p) holds, obviously. Next, consider the case of p i < 1 for each i ∈ X. Let m = ∑ n i=1 ip i denote the mean of p(x), k denote the greatest integer satisfying 1 < k ≤ m and p k > 0, and j denote the smallest integer satisfying m < j < n and p j > 0. We can generate a new distribution q(x) by repeatedly applying Lemma 1 to move each p i≤k gradually toward p 1 , and by repeatedly applying Lemma 2 to move each p i≥j gradually toward p n . As a result, q 1 = ∑ k i=1 p i , q n = ∑ n i=j p i , and q i = 0 for each i ∈ X\{1, n}, and AD(q) > AD(p).
Proof. The upper bound n−1 2 is the direct result from Lemmas 3 and 4, and occurs when p 1 = p n = 0.5. The lower bound 0 is by the definition of AD(p) in Equation (5), and occurs when p i = 1 for some i ∈ X.

Appendix B
In this section, we derived the range of V(p), where p is a probability distribution over X = {1, 2, ..., n} with mean m. The proof follows similar steps to those in Appendix A.

Lemma 5.
Let p(x) and q(x) be two probability distributions over X = {1, 2, ..., n}, p i and q i denote p(x = i) and q(x = i), respectively, and p i < 1 and q i < 1 for each i ∈ X. Let m = ∑ n i=1 ip i , and k denote the greatest integer satisfying 1 < k ≤ m and p k > 0. If q k−1 = p k + p k−1 , q k = 0, and q i = p i for each i ∈ X\{k − 1, k}, then V(q) > V(p).

Proof. The mean of
Let j denote the smallest integer such that m < j and p j > 0. Then, p i = 0 for k + 1 ≤ i ≤ j − 1, and q i = 0 for k ≤ i ≤ j − 1. Thus, Lemma 6. Let p(x) and q(x) be two probability distributions over X = {1, 2, ..., n}, p i and q i denote p(x = i) and q(x = i), respectively, and p i < 1 and q i < 1 for each i ∈ X. Let m = ∑ n i=1 ip i , and j denote the smallest integer satisfying m < j < n and p j > 0. If q j = 0, q j+1 = p j + p j+1 , and q i = p i for each i ∈ X\{j, j + 1}, then V(q) > V(p).
Proof. The mean of q(x) is m = m + p j .
Proof. First, consider the trivial case of p i = 1 for some i ∈ X. Let q 1 = 1, then V(q) = V(p) holds, obviously. Next, consider the case of p i < 1 for each i ∈ X. Let m = ∑ n i=1 ip i denote the mean of p(x), k denote the greatest integer satisfying 1 < k ≤ m and p k > 0, and j denote the smallest integer satisfying m < j < n and p j > 0. We can generate a new distribution q(x) by repeatedly applying Lemma 5 to move each p i≤k gradually toward p 1 , and by repeatedly applying Lemma 6 to move each p i≥j gradually toward p n . As a result, q 1 = ∑ k i=1 p i , q n = ∑ n i=j p i , and q i = 0 for each i ∈ X\{1, n}, and V(q) > V(p).
If δ = 0, then p 1 = p n = 0.5. Use V 0 to denote the value of V(p) at δ = 0. Then, is the direct result from Lemmas 7 and 8, and occurs when p 1 = p n = 0.5. The lower bound 0 is by the definition of V(p) in Equation (6), and occurs when p i = 1 for some i ∈ X.

Appendix C
In this section, we derived the range of S(p), where p is a probability distribution over X = {1, 2, ..., n} with mean m. First, Lemma 9 is used to split the probability at x = j into the probabilities at x = 1 and at x = m for 1 < j < m. We can repeatedly apply Lemma 9 until p j = 0 for 1 < j < m, and yield a new probability distribution q such that S(q) > S(p).
Lemma 9. Let p(x) be a probability distribution over X = {1, 2, ..., n}. Let m = ∑ n i=1 ip i and k = m. If there exists p j > 0 where 1 < j < k, then S(q) > S(p) where q(x) is a probability distribution over X with q 1 = p 1 + k−j k−1 p j , q j = 0, q k = p k + j−1 k−1 p j , and q i = p i for i ∈ X\{1, j, k}. (3), the mean of q(x) is also m.

Proof. By Equation
Similar to Lemma 9, Lemma 10 is used to split the probability at x = j into the probabilities at x = m and at x = n for m < j < n. We can repeatedly apply Lemma 10 until p j = 0 for m < j < n, and yield a new probability distribution q such that S(q) > S(p).
Lemma 10. Let p(x) be a probability distribution over X = {1, 2, ..., n}. Let m = ∑ n i=1 ip i and k = m. If there exists p j > 0 where k < j < n, then S(q) > S(p) where q(x) is a probability distribution over X with q k = p k + n−j n−k p j , q j = 0, q n = p n + j−k n−k p j , and q i = p i for i ∈ X\{k, j, n}. (3), the mean of q(x) is also m.
Lemmas 11, 12, and 13 are used to split the probabilities at x = m, x = m, and x = m, respectively, into x = 1 and x = n.
Lemma 11. Let p(x) be a probability distribution over X = {1, 2, ..., n}. Let m = ∑ n i=1 ip i . If m ∈ X and p m > 0, then S(q) > S(p) where q(x) is a probability distribution over X with q 1 = p 1 + n−m n−1 p m , q m = 0, q n = p n + m−1 n−1 p m , and q i = p i for i ∈ X\{1, m, n}. (3), the mean of q(x) is also m.
Proof. By Equation (3), the mean of q(x) is also m.
Lemma 13. Let p(x) be a probability distribution over X = {1, 2, ..., n}, m = ∑ n i=1 ip i , and k = m. If m < k < n and p k > 0, then S(q) > S(p) where q(x) is a probability distribution over X with q 1 = p 1 + n−k n−1 p k , q k = 0, q n = p n + k−1 n−1 p k , and q i = p i for i ∈ X\{1, k, n}.
Proof. By Equation (3), the mean of q(x) is also m.