Evaluating the Promethee ii Ranking Quality

Coquelet, Boris; Dejaegere, Gilles; De Smet, Yves

doi:10.3390/a18100597

Open AccessArticle

Evaluating the Promethee ii Ranking Quality

by

Boris Coquelet

^1,*

,

Gilles Dejaegere

^1,2

and

Yves De Smet

^1,*

¹

Unité CoDE-SMG, Université libre de Bruxelles, 1050 Brussels, Belgium

²

Unité CoDE-DES Lab, Université libre de Bruxelles, 1050 Brussels, Belgium

^*

Authors to whom correspondence should be addressed.

Algorithms 2025, 18(10), 597; https://doi.org/10.3390/a18100597

Submission received: 6 August 2025 / Revised: 12 September 2025 / Accepted: 18 September 2025 / Published: 24 September 2025

(This article belongs to the Section Analysis of Algorithms and Complexity Theory)

Download

Browse Figures

Versions Notes

Abstract

Multicriteria decision aid consists in helping decision makers to compare (rank, choose, sort, etc.) different alternatives which are evaluated on conflicting criteria. One well-known family of multicriteria decision aid methods is Promethee. It is based on pairwise comparisons of the alternatives to produce a complete or a partial ranking. Promethee is widely used in practical applications, and different aspects of its theoretical properties have already been studied in the literature. However, no work has been proposed to assess the quality of the rankings it produces with regard to the pairwise preference matrix it exploits. This work addresses this problem by proposing a new indicator of the quality of the rankings produced by Promethee ii. This indicator is derived from a new property of the net flow score procedure. The indicator is illustrated on both artificial and real datasets.

Keywords:

Promethee ii; quality index; ranking consistency

1. Introduction

Multicriteria decision aid (MCDA) aims to help decision makers in solving problems where alternatives are simultaneously evaluated on multiple conflicting criteria. Usually, three prominent families of methods can be identified: aggregating, outranking, and interactive procedures. The focus of this contribution will be on Promethee (more precisely, Promethee ii), which belongs to the outranking methods (with one of the first mentions of the technique in [1]).

The Promethee procedure is based on two main steps:

The computation of a valued pairwise preference matrix between all pairs of alternatives;
The exploitation of this preference matrix to compute a partial or a complete ranking (through the computation of positive, negative, and net flow scores).

Even though this method has a wide range of applications and continues to be used in various fields (as shown in Section 2), there is very little work in the way of evaluating the results of the method. This led to the following research question:

Is there a way to assess the quality of the results provided by the application of the Promethee ii method to a given dataset?

This question is central in the sense that the computation of flow scores is always possible once the preference matrix has been computed. As a consequence, complete or partial rankings can always be built based on these scores, as they result as the output of a constructive procedure. Like in statistics, assessing the quality of a given output is essential.

Let us note that the idea of evaluating the coherence/consistency of an MCDA method is not novel. For instance, in the Analytic Hierarchy Process (AHP), this aspect is evaluated through what is called a consistency index [2].

Thus, in this contribution, we propose a complementary view with a focus on the adequacy between the preference matrix and the ranking obtained by the net flow score procedure. In the same spirit as [2], we provide an indicator to quantify the ranking quality in the context of the Promethee ii method (which will be briefly recalled in Section 3). This will be derived from a new property of the net flow score procedure that will be explained in Section 4. A new quality index will then be developed in Section 5. The proposed approach will then be illustrated on artificial and real datasets in Section 6. Finally, Section 7 concludes this work and explores avenues of further research.

2. Literature Review

Promethee has known extensive interest since its inception, with many research and application papers being published throughout the years. This is attested by the online Promethee bibliographical database, which contained nearly 2400 references as of September 2020 [3] as well as by a literature review published in 2010 [4] which focuses on the application of the method. This trend remains prevalent today, with research on extensions and applications to various fields or problems. Hereunder, we provide some recent examples of applications of the method:

Food sector: with a recent application where Promethee was used in conjunction with other MCDA methods to prioritize obstacles and opportunities in the implementation of short food supply chains [5];
Energy sector: where suitable locations for on-shore wind farms are identified with the help of Promethee ii [6];
Computer science: with a novel method for rules extraction from artificial neural networks by integrating Promethee into a multi-objective genetic algorithm [7].

In addition to these practical applications, multiple software for the use of Promethee were developed. One of the latest is Promethee-Cloud [8]. These software aims to provide a tool to support the decision-making activity in an enterprise.

While popular in practice, few studies were conducted on the inner workings of Promethee ii. One particularly studied aspect of the method is its susceptibility to rank reversal (RR) occurrences. Rank reversals are not proper to Promethee methods but affect most outranking methods. This phenomenon was initially discussed in the context of the AHP method with the very first mention of rank reversal in [9]. Later, this was pointed out for other decision aid methods such as Promethee [10] or ELECTRE [11]. Since then, several studies have investigated under which conditions rank reversal could occur in Promethee i and Promethee ii with the latest threshold on RR occurrence in [12]. Another avenue of research consists of the axiomatic characterization of the net flow procedure. As far as we are aware, the first characterization was made in 1992 in [13]. Then, in 2022, Ref. [14] provided an axiomatic interpretation of the Promethee ii method. These two works are focused on the understanding of the net flow score procedure (once the preference matrix has been computed). Recently, the compensatory aspects of the methods were also studied in [15].

In addition, several works also focus on the stability of Promethee methods and the influence of their internal parameters. The first was in 1995, with [16]. Then, in 2018, Ref. [17] proposed an alternative approach using inverse optimization to define sensitivity intervals for the weights of the criteria. Ref. [18] later proposed another sensitivity analysis of the intra-criterion parameters, and finally, Ref. [19] studied the stability of the ranking according to the evaluations of the alternatives.

Promethee has also had several extensions. For instance, in 2014, Ref. [20] proposed a method combining Stochastic Multicriteria Acceptability Analysis (SMAA) with the Promethee methods. Or more recently, Ref. [21] introduced the best–worst Promethee method, which avoids the rank reversal problem.

3. Promethee

In this section, a short introduction about Promethee methods, and more particularly Promethee ii, is presented. For more details, the interested reader can refer to the description provided in [22].

Let us consider a decision problem composed of a set

A = {a_{1}, \dots, a_{n}}

defining the n alternatives of the decision problem which have to be evaluated according to a set

F = {f_{1}, \dots, f_{k}}

of k criteria. Without loss of generality, these criteria are assumed to be maximized.

As already said, Promethee ii works by computing pairwise comparisons and flow scores. The pairwise comparisons are performed as follows. First, the differences between the evaluations of each pair of alternatives on each criterion are computed:

d_{c} (a_{i}, a_{j}) = f_{c} (a_{i}) - f_{c} (a_{j})

(1)

In a second step, these differences are transformed into mono-criterion preference indices

π_{i j}^{c} = F_{c} (d_{c} (a_{i}, a_{j}))

.

F_{c}

is a monotonically increasing preference function in

[0, 1]

. The exact forms of

F

are left to the decision maker to decide. In [23], however, the authors proposed 6 different types of preference functions that should reasonably satisfy most decision contexts. The so-called linear preference with an indifference threshold function, shown in Figure 1, is an example of a widely used function. In this case, the decision maker has to instantiate the values of two parameters, denoted

q_{c}

and

p_{c}

, which respectively represent an indifference and a strict preference threshold for criterion

f_{c}

.

The method supposes that the decision maker is able to provide positive and normalized weights denoted

ω_{c}

for the different criteria. These weights are then used to compute the pairwise preference index between a pair of alternatives

a_{i}

and

a_{j}

, denoted

π_{i j}

, as follows:

π_{i j} = \sum_{c = 1}^{k} ω_{c} \cdot π_{i j}^{c}

(2)

These pairwise preferences (also referred to as pairwise comparisons) can be organized in what is called a preference matrix. For instance, with three alternatives, the obtained preference matrix will be

(\begin{matrix} 0 & π_{12} & π_{13} \\ π_{21} & 0 & π_{23} \\ π_{31} & π_{32} & 0 \end{matrix})

(3)

However, this representation is only an intermediary step in the computation of the Promethee ii ranking. The next step is to compute the negative and positive outranking flows as follows:

\begin{matrix} ϕ_{i}^{+} = \frac{1}{n - 1} \sum_{a_{j} \in A, j \neq i} π_{i j} \\ ϕ_{i}^{-} = \frac{1}{n - 1} \sum_{a_{j} \in A, j \neq i} π_{j i} \end{matrix}

(4)

On the one hand, the positive flow of a given alternative represents its mean advantage over all the other alternatives. This has to be maximised. On the other hand, the negative flow of a given alternative represents its mean weakness to all the other alternatives. This has to be minimised.

The net flow score of

a_{i}

is then the positive outranking flow minus the negative outranking flow:

ϕ_{i} = ϕ_{i}^{+} - ϕ_{i}^{-}

(5)

The net flow scores are lying in the interval

[- 1, 1]

and are interpreted as follows: the higher the better. The complete ranking of the alternatives is deduced from the values of the net flow scores:

a_{i} ⪰ a_{j} ⟺ ϕ (a_{i}) \geq ϕ (a_{j})

with

a_{i} ⪰ a_{j}

meaning that

a_{i}

has an equal or better ranking than

a_{j}

.

While the computation of net flow scores to rank the alternatives seems to be reasonable (a balance between the “advantages” of a given alternatives minus its “weaknesses”), additional theoretical justifications are needed to convince practitioners that the method is grounded. As already pointed out, the works [13,24] also follow this direction. In the next section, we provide a complementary argument.

4. A New Interpretation of Promethee ii’s Net Flow Procedure

In this work, a new possible interpretation is presented for the Promethee ii’s outranking procedure. This interpretation is stated as follows:

“Promethee ii ranks the alternatives to maximise the sum of all paths of at most length 2 starting from any alternative and ending on any alternative, worst in the ranking.”

Without loss of generality, we will consider the following ranking of the alternatives:

a_{1} ⪰ a_{2} ⪰ a_{3} ⪰ a_{4} ⪰ \dots ⪰ a_{n}

. In this case, the sum of all paths of length two starting from any alternative and ending on any alternative worst in the ranking (

P L 2

) can be defined as

P L 2 = \sum_{i = 1}^{n - 1} \sum_{j = i + 1}^{n} \sum_{k = 1}^{n} (π_{i k} + π_{k j})

(6)

Hereunder, we show that the value of this index is maximal only if the order of the alternatives also aligns with the order provided by the values of the net flows. This can be proven by the following decomposition:

\begin{matrix} \sum_{i = 1}^{n - 1} \sum_{j = i + 1}^{n} \sum_{k = 1}^{n} (π_{i k} + π_{k j}) & = (n - 1) [\sum_{i = 1}^{n - 1} \sum_{j = i + 1}^{n} (ϕ_{i}^{+} + ϕ_{j}^{-})] \end{matrix}

(7)

\begin{matrix} = (n - 1) [\sum_{i = 1}^{n - 1} \sum_{j = i + 1}^{n} (ϕ_{i} + ϕ_{i}^{-} + ϕ_{j}^{-})] \end{matrix}

(8)

\begin{matrix} = (n - 1) [\sum_{i = 1}^{n - 1} ϕ_{i} \sum_{j = i + 1}^{n} 1 + \sum_{i = 1}^{n - 1} ϕ_{i}^{-} \sum_{j = i + 1}^{n} 1 + \sum_{j = 1}^{n} ϕ_{j}^{-} \sum_{i = 1}^{j - 1} 1] \end{matrix}

(9)

\begin{matrix} = (n - 1) [\sum_{i = 1}^{n - 1} ϕ_{i} \sum_{j = i + 1}^{n} 1 + \sum_{i = 1}^{n - 1} ϕ_{i}^{-} \sum_{j = i + 1}^{n} 1 + \sum_{i = 1}^{n} ϕ_{i}^{-} \sum_{j = 1}^{i - 1} 1] \end{matrix}

(10)

\begin{matrix} = (n - 1) [\sum_{i = 1}^{n - 1} ϕ_{i} \sum_{j = i + 1}^{n} 1 + (n - 1) \sum_{i = 1}^{n} ϕ_{i}^{-}] \end{matrix}

(11)

\begin{matrix} = (n - 1) [\underset{d e p . s e q .}{\underset{︸}{\sum_{i = 1}^{n} (n - i) ϕ_{i}}} + \underset{i n d e p . s e q .}{\underset{︸}{\sum_{i = 1}^{n} \sum_{j = 1}^{n} π_{i j}}}] \end{matrix}

(12)

\begin{matrix} = (n - 1) [\underset{d e p . s e q .}{\underset{︸}{\sum_{i = 1}^{n} (n - i) ϕ_{i}}} + C] \end{matrix}

(13)

with C being the sum of all pairwise preferences, independent of the order of the alternatives.

P L 2

can be therefore be reformulated as

P L 2 = (n - 1) * [C + (n - 1) ϕ_{1} + (n - 2) ϕ_{2} + (n - 3) ϕ_{3} + (n - 4) ϕ_{4} + \dots + 0]

(14)

From Equation (14), it is straightforward to notice that

P L 2

will be maximal if and only if

ϕ_{1} \geq ϕ_{2} \geq ϕ_{3} \geq ϕ_{4} \geq \dots \geq ϕ_{n}

.

In the article by [13], it was shown that there is a set of admissible transformations that can be performed on the pairwise preferences without changing the net flow scores. This means that there are the following different preference matrices:

Those that lead to the same net flow scores;
Those that, as a consequence, lead to the same solution of the maximisation problem (same order of the alternatives maximising $P L 2$ );
Those that differ regarding the sum of preferences (and therefore also have different optimal values of $P L 2$ ).

A natural question can now be raised: Among all the preference matrices leading to the same net flow scores, which one is the most appropriate? In particular, are there elements of the preference matrix that are not used to compute net flow scores?

5. Defining the Quality of Net Flow Scores

In Section 3, it has been shown that the ranking produced by the net flow scores is, for a fixed set of pairwise preferences, the ranking maximising the

P L 2

indicator. While this constitutes another argument in the consistency of the Promethee ii method, this result cannot be used to characterise whether the net flow scores obtained with Promethee ii effectively represent the preference relation obtained from a specific decision problem. To tackle this problem, a new indicator

I

will be defined, which assesses how well a preference matrix is represented by the net flow scores of Promethee ii.

5.1. The Class of Equivalent Problems

Let us start by showing an extreme case (see Figure 2). Let us consider two different preference matrices obtained by the application of Promethee. Since a pairwise preference of 0 can be seen as the absence of preference, they can both be presented in graph form, where only the pairwise preferences that are higher than 0 result in directed edges:

In both cases,

ϕ (a_{1}) = 0.25, ϕ (a_{2}) = 0, ϕ (a_{3}) = - 0.25

, leading to an identical ranking of the alternatives, which is illustrated in Figure 3.

Since all alternatives have the same net flow scores,

\sum_{i = 1}^{n} (n - i) ϕ_{i}

will be equal in both problems. However, since they differ in their sum of pairwise preferences (

\sum_{i = 1}^{n} \sum_{j = 1}^{n} π_{i j}

), their respective values of

P L 2

will be different. This difference is related to the presence of a cycle in the second preference matrix. This was already observed before in [13]. The author demonstrated that the addition or removal of cycles of a constant value in the preference matrix did not alter the net flow scores. Thus, the addition of cycles can artificially increase the sum of preferences and, by extension, the value of

P L 2

.

It is well known that cycles in the preference relation are in contradiction with the transitive property required for a consistent ranking. Furthermore, since cycles of a constant value can always be removed without influencing the net flow scores, the preference values associated with these cycles are not exploited by the Promethee ii procedure. As a consequence, it follows that for a given set of net flow scores, the preference matrix with no cycles can be seen as better represented by the net flow scores than the one with cycles. Thus, the aim of our indicator is to compare the initial preferences matrix to an ideal one that lacks cycles but leads to identical net flow scores.

5.2. Finding a Preference Matrix Without Cycles

Finding all cycles in a graph or matrix is typically solved by a Depth First Search (DFS) algorithm. This problem is an NP-hard problem, and solving it can be exponentially time-consuming with the number of alternatives. This was observed for randomly generated datasets. However, a basic transformation of the initial preference matrix could result in the removal of many cycles without having to look for all of them. This is what is proposed in this section. And, to validate such claims, empirical evidence that the matrix resulting from this transformation yields

I

values comparable to those obtained from cycle-free matrices is provided.

We denote by

\overset{˘}{π}

the preference relation obtained from

π

by removing all the cycles. The preference relation obtained with the basic transformation is denoted by

\tilde{π}

. It is obtained as follows:

{\tilde{π}}_{i j} = m a x (π_{i j} - π_{j i}, 0) \forall i, j

(15)

This transformation results in a preference matrix where, if

{\tilde{π}}_{i j} > 0

, then

{\tilde{π}}_{i j} = 0

(for all pairwise preferences). This is the form showcased in Figure 2 of the example in the previous section.

It is important to note that this transformation consists of the removal of cycles between two alternatives i and j. Indeed, if both

π_{i j} > 0

and

π_{j i} > 0

, it means that there is a preference of i over j but also of j over i. By [13], a cycle in pairwise preferences can always be removed by reducing all pairwise preferences of this cycle by the minimum pairwise preference. In our case, it means

{\tilde{π}}_{i j} = π_{i j} - m i n (π_{i j}, π_{j i}) a n d {\tilde{π}}_{j i} = π_{j i} - m i n (π_{i j}, π_{j i})

(16)

This can be reformulated and generalized to Equation (15). However, this does not necessarily remove all the cycles from the initial preference matrix. As shown in the example (Figure 2), even after applying this transformation, a cycle still exists.

To better understand the number of cycles left after this basic transformation and the impact on the sum of pairwise preferences (

\sum_{i = 1}^{n} \sum_{j = 1}^{n} π_{i j}

), an empirical comparison between the sum after the first step and the sum after the second step was conducted. However, since finding cycles using DFS on artificial can be exponentially time-consuming, the tests were limited to at most 20 alternatives (as shown in the Figure 4). The data was randomly generated. First, a table of evaluation for n alternatives with k criteria each was generated with an uniform distribution. Then PROMETHEE II was applied with randomly generated weights (also uniformly distributed) and using the linear preference function (see Figure 1). For each criterion, the indifference threshold was fixed as the first quartile of all possible differences between alternatives on the given criterion. Similarly, the preference threshold was the third quartile of all differences. After conducting tests with varying numbers of criteria, the changes in the distribution were minor, so only one test was retained for this article and is displayed in Figure 4.

The improvement axis in the Figure 4 is obtained by the following equation:

\frac{\sum_{i = 1}^{n} \sum_{j = 1}^{n} {\tilde{π}}_{i j} - \sum_{i = 1}^{n} \sum_{j = 1}^{n} {\overset{˘}{π}}_{i j}}{\sum_{i = 1}^{n} \sum_{j = 1}^{n} {\tilde{π}}_{i j}}

(17)

This can be interpreted as the ratio of the sum of pairwise preferences that still need to be removed from

π

to remove all cycles after the basic transformation presented in Equation (15).

Looking at Figure 4, it is quite clear that for a small number of preferences, there is no significant improvement. When adding alternatives, the improvements get a little better, with most being around 1% of the previous step. Considering the small amount of gains when moving from 5 alternatives to 20, and with the fact that finding cycles in a sizable preference matrix becomes highly computationally demanding, the second step was left as a future improvement of the quality index (this improvement could be achieved by taking advantages of Promethee ii properties to avoid doing a DFS for finding cycles).

5.3. Defining the Quality Index

Since only the basic transformation presented in Equation (15) has to be applied, the quality index can be defined in simple terms. Let

\tilde{Π} = \sum_{i = 1}^{n} \sum_{j = i}^{n} | π_{i j} - π_{j i} |

be the ideal value. When considering a given problem and its associated preference matrix. It is possible to compare

Π = \sum_{i = 1}^{n} \sum_{j = 1}^{n} π_{i j}

(where the

π_{i j}

are the pairwise preferences of the problem) to

\tilde{Π}

that is associated to the same net flow scores. Then an indicator

I

can be defined as follows:

I = \frac{Π - \tilde{Π}}{Π}

(18)

The indicator is such that

I \in [0, 1]

. It can be interpreted as the percentage of the initial preference matrix that can be removed without modifying the net flow score. If

I

is high, a lot of cycles are present in the initial preference matrix. On the contrary, when it is close to 0, all the elements of the matrix are in agreement with the net flow score (i.e., very few cycles are present in the initial preference matrix). As a consequence, it has to be minimized.

6. Results

Using the previously defined index, some tests can be conducted both on simulated data and on five real datasets: Human Development Report [25], Times Higher Education World University Ranking [26], World Happiness Report [27], ASEM Sustainable Connectivity Index [28], and Environmental Performance Index [29].

6.1. Results on Simulated Data

This section studies the evolution of the index with two types of simulated data.

Random: This data is generated using a uniform distribution, similarly to that described in Section 5.2;
Artificial: This data is artificially generated such as to maximize the number of cycles in the dataset. It is worth noting that the generation of artificial datasets is deterministic (thus, for a given number of alternatives and criteria, only one artificial dataset exists). The complete mathematical formulation used to generate the artificial data can be found in Appendix A.

Tests on different numbers of criteria were conducted. The results shown in Figure 5 are fairly similar for a fixed number of criteria, regardless of the number of alternatives. The mode of each distribution is more or less the same value. From 5 alternatives to 25 alternatives, there is a large change in the width of each distribution. As the number of alternatives grows, the probability of having a well-ordered dataset (respectively, datasets with lots of cycles) decreases. This is due to the way the datasets are generated (each data point is uniformly distributed and its generation is independent from others). This is unrealistic when considering the field of MCDA, where there are, most of the time, dependence and conflict between criteria, and alternatives will never be uniformly distributed. For the artificial data, a random sampling on an initial 1000 alternatives was used to reach the 5, 25, and 50 alternatives of each distribution. As the sample is getting closer to the original number of alternatives, the distribution becomes narrower. The main takeaway from the artificial data is the high score it achieves (as it is an extreme case). However, this score is not 1 since a score of 1 would mean the net flow scores are all null (since it is the only way the sum of the pairwise preference can be null), and by the generation conducted, this is not achievable. Moreover, having all net flows score equal to zero is highly unrealistic as well.

In this randomly generated data, as the number of criteria grows, the likelihood of performing well diminishes (in Figure 5, with 3 criteria, some of the random samples can score quite well, while with 10 criteria, the best scores do not go below 0.5). This can be explained by the fact that cycles result from the multicriteria nature of the problem and how the preferences are constructed. Indeed, in the degenerated case consisting of only 1 criterion, no cycle is possible. Therefore, it is not surprising that problems with fewer criteria are less likely to have cycles in their preference relations.

6.2. Results on Real Datasets

As stated previously, the real datasets considered are Human Development Report [25], Times Higher Education World University Ranking [26], World Happiness Report [27], ASEM Sustainable Connectivity Index [28], and Environmental Performance Index [29]. For more information on the specifics of each dataset, Appendix B provides the criteria and number of alternatives (the rest can be found using the references). In addition to the real datasets, an artificial dataset of 10 criteria and 1000 alternatives was added to provide a point of comparison. As stated previously, the artificial dataset is an extreme case that contains one of the greatest amounts of cycles for its size.

For each dataset, Promethee ii was applied on a randomly selected set of alternatives using the same preference function and thresholds as the random data. As shown in Figure 6, the number of samples and number of alternatives are the same as in the test on random data.

There is still a slight trend in the number of criteria. The lower the number of criteria, the better the ranking tends to perform. This supports the idea that as the number of criteria grows, it becomes harder to avoid conflicts among the alternatives and, thus, harder to reach a good score. In such a case, applying Promethee ii might result in many concessions and can lead to untrustworthy results. However, for certain datasets (e.g., Environmental Performance Index), if most criteria are concordant, the dataset can outperform datasets with a lower number of criteria (Environmental Performance Index with 11 criteria scores better than World Happiness Index with 6 criteria).

Another observation is that even though the World University Rank has the same number of criteria as the randomly generated data with 4 criteria (Figure 5 and Figure 6 both in blue), no matter the sample, they all perform better than the random data. Furthermore, both the World University Ranking and the Human Development Ranking perform very well, with, depending on the sample, a near-perfect score. This shows that both rankings provided by Promethee ii can be trusted. In contrast, the World Happiness Index is much worse. This can be caused by several factors: highly conflicting criteria, contradictions in the pairwise preferences, the presence of a cycle in the pairwise preferences, etc. Nevertheless, this results in less robust rankings. They are prone to changes and contradictions. Thus, in comparison to the other two rankings, applying Promethee ii with the considered weights and preference functions on the World Happiness Index might not be recommended.

The ASEM dataset is the smallest, with only 51 alternatives. Both the Environmental Performance Index dataset and the World Happiness Report dataset are also fairly small with 105 alternatives and 137 alternatives, respectively. The Human Development Report has 190 alternatives. And, for the World University Rank, the number of alternatives is 1394, much larger than the rest. Due to the difference in size, the sampling is limited to at most 50 alternatives. To complete this distribution analysis, for all of those datasets, the final score can be computed and can be found in Table 1.

As shown in Table 1, the ranking of the net flow scores produced on the World Happiness Index dataset is less representative of its respective preference relation. Even though the ASEM index and the Environmental Performance index have more criteria, they do score better. These preliminary results already showcase how much more informed one can be when applying the Promethee ii method. In this case, both the World University Rank and the Human Development Index score very well. It follows that both these datasets can lead to well-defined rankings obtained from preference matrices with very few cycles. In contrast, the last three real datasets are less precise in their resulting rankings. The net flow procedure of Promethee ii will simplify multiple cycles to produce these final ranking, which, therefore, particularly for the World Happiness Index, represent the decision problem less accurately.

6.3. Link with Rank Reversal

As highlighted in [14], rank reversal can be seen as a consequence of a third alternative in opposition to the direct preference between two alternatives. In other words, possible rank reversals result from the non-transitive nature of the pairwise preference matrix.

In this section, we confirm this intuition by comparing the value of the index

I

with the latest (and most accurate) threshold on the presence of rank reversal [12]. This is performed by reporting under % RR (in Table 2) the proportion of alternatives that could potentially have their rank reversed (according to the mentioned threshold).

As predicted, there seems to be a trend linking the rankings’ scores and the proportion of potential rank reversals. However, both values are not entirely proportional. Even though the ASEM Index scores lower than the Environmental Performance Index, its percentage of rank reversal is higher. This is not surprising either since not all cycles are necessarily considered when computing the score and since the threshold is not an exact metric (there are still false positives; see [12]). Both of these aspects will result in imprecision and explain why both scores are not fully correlated.

In addition, Table 2 also shows that the artificial data does indeed correspond to an extreme (most likely unreachable) case, with all its 1000 alternatives subject to rank reversal.

Finally, in this article, the focus is on evaluating the presence of cycles in the preference matrix. The goal is to provide a metric on the whole ranking so that the decision maker can decide whether or not they can trust the results. In contrast, studies on the RR focus on evaluating the respective positions of pairs of alternatives in the ranking. This provides local insights, which are aggregated in Table 2. Indeed, the % RR column hides the relationships between pairs of alternatives by simply counting the number of alternatives that can have their rank reversed but does not provide any indication about the alternatives with which their rank could be reversed.

7. Conclusions

While net flow scores can always be computed given any preference matrix, one expects that some preference matrices better fit the net flow score procedure. In this contribution, one proposes a novel indicator

I

(varying between 0 and 1) to evaluate this aspect. It is derived from a new property of the net flow score procedure and is illustrated on both artificial and real datasets.

Several remaining questions deserve future attention. First, the removal of cycles was limited to cycles of length 2. This approximation was justified empirically on datasets limited in size and should be expanded to all datasets. Second, there is a dependence between the index values, the number of criteria and the parameters of Promethee ii. There was a preliminary investigation that could be taken further. Finally, even though the index is a percentage and can be interpreted by a decision maker, no clear threshold, below which one should not trust the resulting ranking, was provided.

In conclusion, this contribution aims to provide an additional way for the decision maker to assess the results of Promethee ii. Given a good value for the indicator, it can provide confidence in the ranking obtained and justify the application of Promethee ii to the dataset.

Author Contributions

B.C., G.D. and Y.D.S. collaborated on the conceptualisation/design of the quality index and the writing of the article; The supervision was done by Y.D.S.; The testing of the quality index was done by B.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

All real datasets used in the result section are publicly available (see source [25,26,27,28,29]). A description on how the artificial data was generated can be found in Appendix A.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

MCDA	Multicriteria Decision Aid
PROMETHEE	Preference Ranking Organization Method for Enrichment of Evaluations
ELECTRE	Élimination et Choix Traduisant la Réalité
AHP	Analytical Hierarchical Process
ASEM	Asia–Europe Meeting
RR	Rank Reversal

Appendix A. Artificial Data Generation

Let us consider the artificial dataset composed of a set

A = {a_{1}, \dots, a_{n}}

defining the n alternatives which have to be evaluated according to a set

F = {f_{1}, \dots, f_{k}}

of k criteria.

For the first criterion,

f_{1} (a_{i}) = n + 1 - i \forall i = 1, \dots, n

(A1)

For all other criteria,

f_{c} (a_{i}) = (f_{c - 1} (a_{i}) + \frac{n}{k}) m o d n \forall i = 1, \dots, n \forall c = 2, \dots, k

(A2)

This results in a dataset with a lot of conflicting criteria. For instance, for 5 criteria and 100 alternatives,

a_{1}

evaluation on each criterion will be as follows:

f_{1} (a_{1}) = 100 f_{2} (a_{1}) = 20 f_{3} (a_{1}) = 40 f_{4} (a_{1}) = 60 f_{5} (a_{1}) = 80

(A3)

This means that, for the first criterion,

a_{1}

is better than all other alternatives, for the second criterion,

a_{1}

is better than only 19 alternatives, etc.

This is also the case for all the other alternatives. And, if all criteria have the same weights, the resulting pairwise preferences will also be highly conflicting as shown in Table 2, where for the artificial dataset generated, all alternatives can have their rank reversed.

Appendix B. Datasets

Human Development Report [25]:
–
4 criteria: life expectancy at birth, expected years of schooling, mean years of schooling, and gross national income per capita;
–
190 alternatives.
Times Higher Education World University Ranking [26]:
–
5 criteria: teaching, research, citations, industry income, and international outlook;
–
1394 alternatives.
World Happiness Report [27]:
–
6 criteria: logged GDP per capita, social support, healthy life expectancy, freedom to make life choices, generosity, and perceptions of corruption;
–
137 alternatives.
ASEM Sustainable Connectivity Index [28]:
–
There are 8 criteria: physical, economic and financial connectivity, political, institutional, people-to-people, environmental, social, economic and financial sustainability;
–
51 alternatives.
Environmental Performance Index [29]:
–
11 criteria: biodiversity and habitat, forests, fisheries, air pollution, agriculture, water resources, air quality, sanitation and drinking water, heavy metal, solid waste, and climate change mitigation;
–
105 alternatives.

References

Brans, J.P. L’Ingénierie de la Décision: L’élaboration D’Instruments d’aide à la Décision; Université Laval, Faculté des Sciences de L’Administration: Québec, QC, Canada, 1982. [Google Scholar]
Saaty, T.L. The Analytic Hierarchy and Analytic Network Processes for the Measurement of Intangible Criteria and for Decision-Making. In Multiple Criteria Decision Analysis: State of the Art Surveys; Greco, S., Ehrgott, M., Figueira, J.R., Eds.; Springer: New York, NY, USA, 2016; pp. 363–419. [Google Scholar]
Mareschal, B. The Promethee Bibliographical Database. 2020. Available online: https://bertrand.mareschal.web.ulb.be/PG2024/bibliographical-database.html (accessed on 10 September 2025).
Behzadian, M.; Kazemzadeh, R.; Albadvi, A.; Aghdasi, M. PROMETHEE: A comprehensive literature review on methodologies and applications. Eur. J. Oper. Res. 2010, 200, 198–215. [Google Scholar] [CrossRef]
Pinto, G.D.; Cassânego, V.M.; Zanon, L.G.; Ferraz, D.; do Nascimento Rebelatto, D.A. Strengthening short food supply chains: Prioritizing obstacles and opportunities with PROMETHEE, Fuzzy DEMATEL, and Fuzzy TOPSIS class. Int. J. Prod. Econ. 2025, 289, 109732. [Google Scholar] [CrossRef]
Sotiropoulou, K.F.; Vavatsikos, A.P.; Botsaris, P.N. A hybrid AHP-PROMETHEE II onshore wind farms multicriteria suitability analysis using kNN and SVM regression models in northeastern Greece. Renew. Energy 2024, 221, 119795. [Google Scholar] [CrossRef]
Yedjour, D.; Yedjour, H.; Amri, M.B.; Senouci, A. Rule extraction based on PROMETHEE-assisted multi-objective genetic algorithm for generating interpretable neural networks. Appl. Soft Comput. 2024, 151, 111160. [Google Scholar] [CrossRef]
Pohl, E.; Geldermann, J. PROMETHEE-Cloud: A web app to support multi-criteria decisions. EURO J. Decis. Process. 2024, 12, 100053. [Google Scholar] [CrossRef]
Belton, V.; Gear, T. On a short-coming of Saaty’s method of analytic hierarchies. Omega 1983, 11, 228–230. [Google Scholar] [CrossRef]
De Keyser, W.; Peeters, P. A note on the use of PROMETHEE multicriteria methods. Eur. J. Oper. Res. 1996, 89, 457–461. [Google Scholar] [CrossRef]
Wang, X.; Triantaphyllou, E. Ranking irregularities when evaluating alternatives by using some ELECTRE methods. Omega 2008, 36, 45–63. [Google Scholar] [CrossRef]
Coquelet, B.; Dejaegere, G.; De Smet, Y. Analysis of third alternatives’ impact on PROMETHEE II ranking. J. Multi-Criteria Decis. Anal. 2024, 31, e1823. [Google Scholar] [CrossRef]
Bouyssou, D. Ranking methods based on valued preference relations: A characterization of the net flow method. Eur. J. Oper. Res. 1992, 60, 61–67. [Google Scholar] [CrossRef]
Dejaegere, G.; Boujelben, M.; De Smet, Y. An axiomatic characterization of Promethee II’s net flow scores based on a combination of direct comparisons and comparisons with third alternatives. J. Multi-Criteria Decis. Anal. 2022, 29, 364–380. [Google Scholar] [CrossRef]
Schär, S.; Pohl, E.; Geldermann, J. Analysing the Compensatory Properties of the Outranking Approach PROMETHEE. J. Multi-Criteria Decis. Anal. 2025, 32, e70013. [Google Scholar] [CrossRef]
Wolters, W.; Mareschal, B. Novel types of sensitivity analysis for additive MCDM methods. Eur. J. Oper. Res. 1995, 81, 281–290. [Google Scholar] [CrossRef]
Doan, N.A.V.; De Smet, Y. An alternative weight sensitivity analysis for PROMETHEE II rankings. Omega 2018, 80, 166–174. [Google Scholar] [CrossRef]
Liu, X.; Liu, Y. Sensitivity analysis of the parameters for preference functions and rank reversal analysis in the PROMETHEE II method. Omega 2024, 128, 103116. [Google Scholar] [CrossRef]
Flachs, A.; De Smet, Y. Inverse optimization on the evaluations of alternatives in the Promethee II ranking method. Omega 2025, 136, 103325. [Google Scholar] [CrossRef]
Corrente, S.; Figueira, J.R.; Greco, S. The smaa-promethee method. Eur. J. Oper. Res. 2014, 239, 514–522. [Google Scholar] [CrossRef]
Ishizaka, A.; Resce, G. Best-Worst PROMETHEE method for evaluating school performance in the OECD’s PISA project. Socio-Econ. Plan. Sci. 2021, 73, 100799. [Google Scholar] [CrossRef]
Brans, J.P.; De Smet, Y. Promethee methods. In Multiple Criteria Decision Analysis: State of the Art Surveys; Greco, S., Ehrgott, M., Figueira, J.R., Eds.; Springer: New York, NY, USA, 2016; pp. 187–219. [Google Scholar] [CrossRef]
Brans, J.P.; Vincke, P.; Mareschal, B. How to select and how to rank projects: The PROMETHEE method. Eur. J. Oper. Res. 1986, 24, 228–238. [Google Scholar] [CrossRef]
Dejaegere, G.; De Smet, Y. Promethee γ: A new Promethee based method for partial ranking based on valued coalitions of monocriterion net flow scores. J. Multi-Criteria Decis. Anal. 2023, 30, 147–160. [Google Scholar] [CrossRef]
United Nations. Human Development Report 2021–2022. 2022. Available online: https://hdr.undp.org/content/human-development-report-2021-22 (accessed on 23 May 2023).
Times Higher Education. World University Rankings. 2021. Available online: https://www.timeshighereducation.com/world-university-rankings/2021/world-ranking (accessed on 10 December 2022).
Helliwell, J.F.; Layard, R.; Sachs, J.D.; De Neve, J.E.; Aknin, L.B.; Wang, S. World Happiness Report. 2019. Available online: https://worldhappiness.report/ed/2019/ (accessed on 10 February 2022).
Becker, W.; Dominguez-Torreiro, M.; Neves, A.R.; Tacão Moura, C.J.; Saisana, M. Exploring ASEM Sustainable Connectivity—What Brings Asia and Europe Together? 2018. Available online: https://composite-indicators.jrc.ec.europa.eu/asem-sustainable-connectivity/ (accessed on 10 December 2022).
Block, S.; Emerson, J.W.; Esty, D.C.; de Sherbinin, A.; Wendling, Z.A. 2024 Environmental Performance Index. 2024. Available online: https://epi.yale.edu/ (accessed on 1 March 2025).

Figure 1. Representation of

F_{c} (d_{c} (a_{i}, a_{j})

as a linear preference function according to

q_{c}

and

p_{c}

.

Figure 1. Representation of

F_{c} (d_{c} (a_{i}, a_{j})

as a linear preference function according to

q_{c}

and

p_{c}

.

Figure 2. An extreme example of two preference matrices with the same net flow score.

Figure 3. Ranking representation of the two examples.

Figure 4. Improvement yield when applying the second step. Expressed as a percentage of the sum of the previous step.

Figure 5. Score distribution of randomly generated data and extreme data.

Figure 6. Score distribution of real data.

Table 1. Value of

I

on the complete datasets.

Table 1. Value of

I

on the complete datasets.

Datasets	$I$
Human Development Index	0.03
World University Rank	0.18
World Happiness Index	0.46
ASEM Sustainable Connectivity Index	0.39
Environmental Performance Index	0.40
Artificial Data	0.75

Table 2. Comparison of the quality of the rankings produced with the netflow procedure and the proportion of pairs of alternatives subject to rank reversals in all datasets.

Datasets	$I$	% RR
Human Development Index	0.03	0.23
World University Rank	0.18	0.30
World Happiness Index	0.46	0.65
ASEM Sustainable Connectivity Index	0.39	0.53
Environmental Performance Index	0.40	0.49
Artificial Data	0.86	1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Coquelet, B.; Dejaegere, G.; De Smet, Y. Evaluating the Promethee ii Ranking Quality. Algorithms 2025, 18, 597. https://doi.org/10.3390/a18100597

AMA Style

Coquelet B, Dejaegere G, De Smet Y. Evaluating the Promethee ii Ranking Quality. Algorithms. 2025; 18(10):597. https://doi.org/10.3390/a18100597

Chicago/Turabian Style

Coquelet, Boris, Gilles Dejaegere, and Yves De Smet. 2025. "Evaluating the Promethee ii Ranking Quality" Algorithms 18, no. 10: 597. https://doi.org/10.3390/a18100597

APA Style

Coquelet, B., Dejaegere, G., & De Smet, Y. (2025). Evaluating the Promethee ii Ranking Quality. Algorithms, 18(10), 597. https://doi.org/10.3390/a18100597

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Evaluating the Promethee ii Ranking Quality

Abstract

1. Introduction

2. Literature Review

3. Promethee

4. A New Interpretation of Promethee ii’s Net Flow Procedure

5. Defining the Quality of Net Flow Scores

5.1. The Class of Equivalent Problems

5.2. Finding a Preference Matrix Without Cycles

5.3. Defining the Quality Index

6. Results

6.1. Results on Simulated Data

6.2. Results on Real Datasets

6.3. Link with Rank Reversal

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Artificial Data Generation

Appendix B. Datasets

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI