3.4. Evaluation of the Consistency of the Opinions of the Expert Team
Initially, it was checked whether the assessments of the significance of 10 defects (criteria) for road quality by the team of 71 experts were non-contradictory, i.e., they did not differ significantly. Only with sufficient consistency of the expert team’s opinions can the arithmetic means of the numerical significance of the evaluated criteria be taken as a reliable result for solving the problem. The consistency in the opinions of the expert team is determined by calculating the Kendall coefficient of concordance W according to the following formula [
54]:
where
n is the number of experts (
j = 1, 2, …,
n),
m is the number of defects (criteria) being evaluated (
i = 1, 2, …,
m), and
S is the sum of squared deviations of the sum of criteria ranks
from the overall mean rank
, calculated using Formula (2):
Only criteria ranks are suitable for calculating the Kendall coefficient of concordance.
Whether the expert team’s opinions are consistent was determined in the classical way by comparing the empirical value of the Chi-square Pearson’s statistic χ
2 with its critical value,
, which depends on the degrees of freedom,
v =
m − 1, and the chosen significance level, α (taken as α = 0.05 or α = 0.01). The empirical χ
2 value is calculated from Formula (3):
When χ2 is greater than , it is reasonably considered that the expert team’s opinions are non-contradictory, even though the criteria significance assessments in ranks by all experts are not identical, and there may be outliers in the assessment.
It was proposed [
55] to compare the concordance coefficient
W with its minimum (threshold) value
Wmin, which is calculated from Formula (4):
It was proposed [
56] to calculate the ratio of the concordance coefficient,
W, to its minimum value,
Wmin, from Formula (5) and to call it the compatibility coefficient:
When kc is greater than 1, the opinions of the expert team in assessing the significance of the criteria not only by rank but also by percentage weights or relative weights of the AHP method are sufficiently consistent. This allows the arithmetic mean of the significance estimates (ranks or relative weights) of each criterion to be taken as the solution to the problem.
3.5. Evaluation of Rank Variation, Normality, and Outliers
The ranks assigned to the criteria by the experts usually differ and have a certain variation, the magnitude of which is indicated by the standard deviation
:
where
is the arithmetic mean of the ranks
Rij for the
i-th criterion.
The average variation (variance,
) of the significance estimates for the entire object under study (ten quality indicators of the asphalt layer of the road pavement), evaluated by m criteria of the same size
n, can be calculated from Formula (8):
where
is the standard deviation of the ranks
Rij for the
i-th criterion; m is the number of criteria.
This Formula (8) can be used to calculate the average standard deviation of ranks
only if the rank variances
of the
i-th criteria are statistically equal [
57]. The rank variances calculated from samples of the same size n can be compared using Cochran’s test. The statistic
is calculated, which is the ratio of the maximum variance to the sum of all variances:
where
is the maximum variance of the ranks for the
i-th criterion.
It can be assumed that the rank variances of the criteria are equal if the empirical value of the Cochran’s test statistic
is less than its critical value
GC(α,
m,
v), which depends on the significance level alpha, the number of variances compared (number of criteria) m, and the degrees of freedom
v =
n − 1 (where n is the number of experts). The value of
GC(α,
m,
v) is found in Table 152 [
57].
When using Cochran’s test to determine the homogeneity of rank variances, it is necessary to ensure that the rank distribution follows a normal law. The normality of the criteria ranks can be assessed by the values of the skewness and kurtosis coefficients. The moduli of empirical skewness
and kurtosis
must be less than 3 standard deviations of skewness
SSk (Formula (10)) and less than 5 standard deviations of kurtosis
SKu (Formula (11)), respectively. The standard deviations of skewness and kurtosis, which depend only on the sample size (number of experts),
n, are calculated from the following formulas:
and
If the empirical concordance coefficient,
W, is greater than its critical value,
Wmin (when
kc > 1), then the opinions of the expert team in assessing the significance of the object’s criteria are consistent (non-contradictory). The reason for the non-conformity of the ranks of the object’s criterion to the normal distribution, determined by the skewness and kurtosis values, is usually the presence of outliers in the criterion’s ranks. Significantly different rank values (outliers) from expert studies can be justifiably removed from the research data. Usually, values of the variation series that do not fall within the
interval are considered outliers [
58]. The upper threshold rank value,
RijU, is calculated from Formula (12):
and
where
is the mean of the ranks of the
i-th criterion (with outliers);
is the standard deviation of the ranks of the
i-th criterion.
It is likely that, after the rank outliers are removed, the standard deviation of the ranks will significantly decrease, and the empirical values of and will also decrease. The rank means will change slightly: they will increase when the outliers are small rank values and decrease when the outliers are large rank values. If the significance assessment of all criteria by the j-th expert contains at least one rank that is an outlier, then the assessments of all criteria by this expert are removed from the study.
3.6. Calculation of Relative Weights of the Using MCDM Methods
In multiple-criteria evaluation, criteria weights are of great importance. In practice, subjective criteria weights determined by specialists/experts are commonly used. The types of elements in a decision matrix also play an important role in the evaluation of alternatives [
59]. It is customary to evaluate the significance of the criteria by comparing their normalised relative weights, which range from 0 to 1. The more significant the indicator (criterion) describing the object under study, the greater its relative weight. Therefore, the significance of the criteria, determined by ranks, percentage weights, scores on a five-point scale, or scores on a ten-point scale, must be calculated as normalised relative weights, the sum of which is equal to 1.0000. This allows for a comparison of the weights of each criterion determined by different MCDM methods.
It is convenient to calculate the relative weights of these criteria
, from the mean ranks of the criteria
, using the Average Rank Transformation into Weight-Linear (ARTIW-L) method, presented in 2011 [
55]:
From the mean ranks of the criteria
, it is also possible to calculate the relative weights of these criteria
, using the Average Rank Transformation into Weight-Non-Linear (ARTIW-N) method, which shows a non-linear relationship between these variables [
60]:
When evaluating the significance of the criteria of the object under study not by ranks but by percentage weights
Pij, the Direct Percentage Weight (DPW) method can be applied, which allows the calculation of the relative weights of these criteria [
56]:
To determine the significance of the criteria, the popular but rather complex method of filling their pairwise comparison matrix, the Analytic Hierarchy Process (AHP), presented by T.L. Saaty [
61], is suitable (
Appendix A,
Table A3). The AHP is a reliable, rigorous, and robust method for eliciting and quantifying subjective judgements in multicriteria decision-making (MCDM). Despite the many benefits, the complications of the pairwise comparison process and the limitations of consistency in AHP are challenges that have been the subject of extensive research [
62].
The AHP approach has been widely used in MCDM. It is very difficult to meet the consistency requirement of a comparison matrix (CM) in AHP. The authors [
63] analyse the reasons for inconsistent CM in AHP and propose an improved AHP (IAHP) to improve the consistency of CM using a classification and ranking methodology.
From the matrix filled by the
j-th expert, the eigenvector
is calculated for each
i-th criterion, which is taken as the normalised relative weight of the criterion:
where
is the intensity of pairwise comparison assigned by the
j-th expert to the
i-th criterion (
i,
j = 1, 2, …,
m) on a nine-point scale.
When filling the matrix of the AHP method,
, the criteria on the left side of the matrix are compared with the criteria at the top. By convention, the strength of comparison is always that an activity appearing in the column on the left against an activity appearing in the row on top [
61].
The relative weight,
, assigned to the
i-th criterion by the expert team is calculated from Formula (18):
where
is the weight assigned to the
i-th criterion by the
j-th expert using the AHP method.
The consistency of each pairwise comparison matrix filled by an expert is checked by calculating the consistency ratio,
C.
R. It is calculated by dividing the consistency index,
C.
I., by the random index,
R.
I. The consistency index,
C.
I., is calculated from Formula (19):
where m is the number of criteria that are evaluated (
i = 1, 2, …,
m); λ
max is the largest eigenvalue of the pairwise comparison matrix, which is calculated from Formula (20) [
64,
65,
66].
The random index,
R.
I., is taken from a table [
61], or when the number of criteria,
m > 10, it is calculated from Formula (21) [
67]:
A consistency ratio of 0.10 or less is considered acceptable [
61]. The resulting vector is accepted if
C.R. is about 0.10 or less (0.20 may be tolerated, but not more) [
68].
3.7. Results and Their Analysis
The significance of the parameters of the asphalt layers of road pavement structures, for which monetary deductions (MDs) are calculated for exceeding the permissible deviations (PDs) or not complying with the limit values (LVs) according to the requirements of ĮT ASFALTAS 08, determined by a team of 71 experts of rank, is presented in
Table 2.
With the number of criteria, m = 10, and the number of experts, n = 71, the overall mean rank is = 390.5. In this table, the calculated S = 318,338, according to Formula (2); therefore, from Formula (1), the Kendall coefficient of concordance is W = 0.765. The empirical Pearson’s chi-square statistic, χ2 = 489.12, is 28.9 times greater than the minimum critical value when the significance level is α = 0.05, = 16.92. The minimum value of the concordance coefficient, calculated according to Formula (4), is Wmin = 0.0265. The compatibility coefficient of the opinions of the expert team, calculated from Formula (5), is kc = 28.9. Therefore, it can be reasonably stated that the opinions of the 71-expert team regarding the significance of the criteria are consistent.
Skewness and kurtosis, which indicate the nature of the rank variation and its conformity to the normal distribution, are compared with their standard deviations calculated from Formulas (10) and (11). The rank distribution of the criteria is normal when the modulus of the empirical skewness
is less than 3
SSk = 3 × 0.285 = 0.855 and when the modulus of kurtosis
is less than 5
SKu = 5 × 0.562 = 2.813. According to the empirical skewness, the ranks of criteria E, F, and G follow a normal distribution, while, according to kurtosis, the ranks of criteria D, E, G, H, and J follow a normal distribution (
Table 2).
Assuming that the ranks of all criteria follow a normal distribution and are calculated from independent populations, the Cochran test can be used to compare the rank variances,
, of all criteria. The standard deviations of the criteria ranks are presented in
Table 2 and the variances in
Table 3.
Cochran’s test statistic, calculated from Formula (9):
The critical tabular value of Cochran’s test statistic,
GC (0.05; 10; 70) =
[
57], is less than the empirical statistic value of 0.1636; therefore, it cannot be stated that the criteria rank variances
are statistically equal with a probability of
p = 0.05, i.e., with a 5% significance level.
From different (unequal) criteria rank variances,
, the overall average variance
for the entire object under study (asphalt layer) cannot be calculated using Formula (8). The degree of homogeneity of variances can likely be increased by removing outliers from the study. The threshold values for outliers of each criterion’s ranks, calculated from Formulas (12) and (13), and the serial numbers of the experts and their outlier (rejected) ranks assigned to the criteria are presented in
Table 4. The rankings submitted by experts E10, E17, E19, E21, E22, E39, E54, and E55 were classified as outliers, resulting in a total count of
n0 = 8.
The rank means
calculated with outliers were used to determine the significance of the criteria using the ARTIW-L and ARTIW-N methods (
Table 2). The relative weights of the criteria,
and
, were calculated from Formulas (14) and (15), and priorities were assigned to the criteria. The values of the relative weights of the criteria calculated from Formula (16) using the DPW method are presented in
Table 5. The results indicate that the ranking of the two primary criteria, E and A, as well as G and H, was reversed when calculated using the ARTIW and DPW methods. Some experts did not maintain consistency in their evaluation of the criteria, assigning a higher rank to one criterion while attributing a higher percentage weight to another. In these cases, the expert’s assessment remains unchanged unless the evaluation is determined to be an outlier.
The relative weights,
, of all criteria for each of the 71 experts were calculated using the AHP method from Formula (17). The average relative weights of the criteria
, calculated from Formula (18), are presented in
Table 6. This table also presents the standard deviations,
, skewness (Sk), and kurtosis (Ku) empirical values, which show that only the weight distributions of the most important criteria E, A, and B conform to the normal law, and the variances,
, of all criteria differ significantly when their homogeneity is checked through Cochran’s test.
The averages of the relative weights of the criteria, evaluated by a team of 71 experts using four MCDM methods, show that the significance of the 10 indicators of the asphalt layer for the quality of the pavement on the road differs. Without rejecting outliers, the following sequence of significance criteria was obtained (
Figure 3):
.
After eight outliers of the study criteria ranks were rejected (
Table 4), the adjusted relative weights of the criteria were calculated from the remaining 63 expert assessments, which allowed for an evaluation of the influence of outliers on the relative significance of the criteria and the consistency of opinions. The overall mean rank,
, the sum of squared deviations of the sum of criteria ranks
from
, is S = 283,884 (
Table 7). The Kendall coefficient of concordance, W = 0.867, is approximately 10% higher than the
W calculated for the ranks of the 71-expert team, which shows that after rejecting the outliers, the consistency of the experts’ opinions increased.
The statistic χ2 = 491.57 is 29.1 times greater than its threshold value, , and similarly, W = 0.867 is greater than Wmin = 0.0298 by the same factor.
Using the ARTIW-L and ARTIW-N methods, the calculated relative weights and priorities of the criteria allowed for the creation of a sequence of criteria significance for road pavement quality:
(
Table 7). The sequence of criteria is the same as for the criteria with outliers (
Figure 3).
Using the DPW and AHP methods without outliers, the calculated relative weights and priorities of the criteria are presented in
Table 8 and
Table 9. The sequence of criteria differs slightly:
.
The arithmetic means of the significances of the road pavement layer quality-indicator defects, expressed as relative weights and calculated using four MCDM methods, are presented in the bar chart (
Figure 4). The sixty-three-expert team provided the following sequence of criteria significance:
.
After the number of experts was reduced from 71 to 63, the average weights of the criteria (defects) changed slightly (
Table 10). For the most important criteria, A, E, B, C, and D, they increased by up to 5%, while for the least important criteria, F, H, G, J, and I, they decreased by up to 16%.
The assumption that the criteria rank means are calculated from independent populations of the same size and that the ranks of all 10 criteria follow a normal distribution allows for the comparison of rank variances
using Cochran’s test (
Table 11) by calculating the statistic according to Formula (9):
The critical value of Cochran’s test
GC (α,
m,
v) found from the table (Sachs, 1972) [
57], when α = 0.05,
m = 10, and
v = 62, is equal to 0.1571. The calculated value, 0.1433, is less than the critical value, 0.1571, which indicates that the criteria rank variances,
, presented in
Table 11 are equal; i.e., there is no basis to reject the null hypothesis.
From the equal (homogeneous) variances,
, of the asphalt-layer quality-indicator ranks, their average,
, is calculated, which shows the average variation of all quality indicators of the road pavement:
The overall standard deviation of the criteria ranks is .
The precision,
, which indicates how much the criteria rank mean determined by the 63-expert team can differ from the population mean with 95% probability (
t = 1.96) when the average standard deviation of ranks is
, is calculated from the sample size formula [
66,
69,
70]:
The precision, , calculated from the expert study reasonably allows for the statement that the average rank of the population for each quality indicator of the asphalt layer is within the interval . Whether the calculated precision is sufficient is determined by the decision-maker. If the required precision is , then it is necessary to survey 69 experts, and if , then the number of experts, n, must be 18.