Comparing Multi-Criteria Decision Making Models for Evaluating Environmental Education Programs

Educators in the field of Environmental Education often have difficulty identifying and selecting programs that have the potential to best maximize needed resources to implement and achieve desired outcomes. This difficulty is, in part, due to their lack of expertise in evaluation knowledge and practice. The use of multi-criteria decision-making models in evaluating environmental education programs is new and, as a result, not many models have been used and tested in the specific domain. Comparisons of multi-criteria decision-making models have been implemented in various domains but not for environmental education programs’ evaluation. Therefore, we investigate the comparative performance of the SAW, WPM, TOPSIS, and PROMETHEE II models in evaluating and selecting the most appropriate environmental education program. The main objective of this paper is on presenting the different steps of the comparative analysis of multi-criteria decision-making models and on making conclusions on the suitability and robustness of the SAW, WPM, TOPSIS, and PROMETHEE II models in evaluating environmental education programs.


Introduction
The evaluation of the EE programs before their implementation can save time and effort [1]. The importance, as well as the difficulty of evaluations of environmental education (EE) programs, have been highlighted by many researchers [2][3][4][5][6][7][8][9][10][11][12][13][14]. Due to this difficulty, educators omit this stage and end up in many cases at implementing EE programs that do not fulfill their goals, objectives, and/or requirements.
An automated evaluation of EE programs would be very useful for many researchers and educators. Prior automated evaluations of EE programs such as the one proposed by Zint [12] mainly assisted the evaluation of one EE program after its implementation and not before. In Kabassi et al. [15] an automated system is designed to evaluate EE programs based on a multi-criteria decision making (MCDM) approach. In that approach, a set of criteria has been formed and the combination of Analytical Hierarchy Process (AHP) [16] with the Technique for the Order of Preference by Similarity to an Ideal Solution (TOPSIS) [17] has been applied to evaluate EE programs prior to their implementation, comparing them and selecting the one that seems more appropriate.
Generally, there are many MCDM methods available. However, finding the best MCDM model to apply is not easy. In MCDM, no single method has been considered as the most suitable for all types of decision-making situations [18][19][20]. Different methods may lead to different rankings of the evaluated objects [18,21]. A solution to the problem may be given by comparing the MCDM models [22]. Different comparative analyses of MCDM methods have been implemented [18,[22][23][24][25][26] but none of these analyses concern evaluation of EE programs.
In light of the importance of usage of MCDM in evaluating EE programs and the need for comparison of MCDM approaches for each different domain, we perform a comparative analysis of four MCDM models for evaluating EE programs. For this purpose, we have used AHP for estimating the weights of the criteria as this theory has a well-defined way of forming the set of criteria and estimating the weights of the criteria based on pair-wise comparisons. Then, we apply four different MCDM models for processing the results of the evaluation. For this purpose, we apply SAW (Simple Additive Weighting) [17,27], WPM (Weighted Product Model) [28], TOPSIS (Technique for Order of Preference by Similarity to Ideal Solution) [17], and PROMETHEE II model (Preference Ranking Organization METHod for Enrichment Evaluations II) [29,30]. The main reason for the selection of those four models is that they are among the most popular and widely used in the literature in general.
There is no formal way of comparing MCDM models as has been proved by the different comparison experiments [25,26,[31][32][33][34][35][36][37][38][39][40]. Indeed, Salabun et al. [41] concluded in their paper that almost every combination of the method and its parameters may bring us a different result. Therefore, the comparison of MCDM models in the context of EE projects may be of particular interest. Furthermore, TOPSIS, SAW, WPM, and PROMETHEE II have not been compared before for the EE programs' evaluation. For the comparison of the MCDM theories, we use the data from the EE programs in Greece. More specifically, we have used the data of the 52 EE programs for paths.
After the comparison of the MCDM models, we performed a sensitivity analysis. Sensitivity analysis examines the degree of change in the overall ranking of the alternatives when input data are slightly modified. More specifically, the scheme of weights of criteria changes to examine the robustness of the models and the alternatives. Although sensitivity analysis of the models has been implemented before in different domains and for different MCDM models [40,[42][43][44][45], it is the first time that it is implemented for estimating the consistency of the MCDM models in the domain of EE programs.
The rest of the paper is organized as follows: Section 2 presents comparisons of the MCDM models described in the particular paper in other domains. Section 3 describes in detail the steps for the application of AHP for the criteria and their weights. Then, in Section 4, the four MCDM models that are compared are described. In Sections 5 and 6, those models are applied for the evaluation of EE programs and the results are analyzed and compared. In Section 7, a sensitivity analysis is performed to evaluate the robustness of the four different MCDM models and the conclusions drawn by this study are discussed in Section 8.

Bibliographic Review
Quality environmental education involves many partners and stakeholders who collaborate in a research-implementation space where science, decision making, and local culture and environment intersect [46]; environmental education evaluation and assessment often struggle in these productive, yet complex, spaces [3]. Regular program evaluation is needed to better understand the linkages between the issues programs are designed to address, the metrics or indicators used to describe effectiveness, and the actual measured outcomes [6].
Many researchers have referred to the need for evaluation of EE projects [2][3][4][5][6][7][8][9][10][11][12][13][14] while most of them also emphasize to the lack of experiments due to the difficulty in implementing an evaluation experiment. For example, in the studies implemented by Norris & Jacobson [9] and O'Neil [46] reviewing 56 and 37 EE projects, correspondingly, it is concluded that less than one-third of the programs report evaluations. Since then more evaluation experiments and reviews have been done on EE activities [47], EE projects [6,48], or EE Centers [49].
Although progress had been made in terms of environmental education program evaluation practices and use, more emphasis is needed on the subject [48]. The first and most commonly addressed concerns the capacity of educators to evaluate [50]. For this purpose, in order to help stakeholders in EE, automated systems for evaluation have been developed [12]. However, this system mainly assisted the evaluation of one EE program after its implementation.
Indeed, most of the experiments of evaluation of EE projects involve surveys after the implementation of the project and/or observation during the implementation [2]. This means that an EE project has to be implemented before one can say if it is worth being selected and that costs time, effort, and, probably, money. Therefore, an evaluation of the EE projects prior to their implementation is essential.
In Kabassi et al. [15] an automated system has also been designed to evaluate EE programs based on a multi-criteria decision making (MCDM) approach. In this approach, the combination of the MCDM models proved rather effective for evaluating the EE projects prior to their implementation. However, a major concern with decision-making is that different MCDM methods can provide different results for the same problem. For this purpose, comparisons of MCDM models have been made in different domains and concern different MCDM theories [32,36,40,43,46,[51][52][53][54][55][56][57]. The MCDM models compared in the particular paper for the domain of the evaluation of EE programs are SAW, WPM, TOPSIS, and PROMETHEE II. These models have been compared in the past in pairs and the comparison experiments are presented in Table 1.  Table 1, one can easily observe that these four models have never been part of the same comparison. Furthermore, there is no previous experiment that tries to compare MCDM models in the domain of EE programs' evaluation. As a result, the particular comparison may provide interesting insights and conclusions regarding the evaluation of EE programs.

AHP for the First Steps
Analytic Hierarchy Process [16] is one of the most popular MCDM techniques. AHP aims to analyze a qualitative problem through a quantitative method and seems to be very appropriate for implementing the first steps of any MCDM problem. As a result, it has been combined with many other MCDM models [58][59][60].
The application of AHP revealed the set of criteria as this was formed by a group of experts and is described in detail in [15] Then AHP was applied to estimate the weights of the criteria. The estimation of the weights is presented in detail in [15] and the values of the weights were estimated as follows: w uc1 = 0.072, w uc2 = 0.071, w uc3 = 0.036, w uc4 = 0.129, w uc5 = 0.133, w uc6 = 0.127, w uc7 = 0.171, w uc8 = 0.099, w uc9 = 0.111, w uc10 = 0.051. The analysis of the weights revealed that the most important criterion while evaluating EE programs is 'Skills' while 'Effectiveness', 'Clarity' and 'Knowledge' were also considered rather important.

MCDM Models
The different MCDM models selected to be combined with AHP are SAW, WPM, TOPSIS, and PROMETHEE II. Using each model, the aim was to determine the value of EE programs by combining the values of the criteria. The different models differ in their basic principles and the way they combine the criteria values. However, the first three steps of the MCDM models are identical for all the four models implemented: Forming the set of alternative EE programs: The set of alternative EE projects was set after running a study on the EE projects in Greece [15]. More specifically, we collected the 553 programs that had been implemented in the past by environmental education centers in Greece. After collecting all this information, we chose only the EE projects that related to the subject of Environmental Paths. These projects had been implemented by environmental education centers in different parts of Greece. The specific set was selected due to its eligible number of programs (52) and its characteristics.
Forming a set of evaluators: The group of evaluators comprised only of expert users. More specifically, three users participated in the experiment, all experts in EE programs.
Calculating the values of the criteria: In this step, the evaluators studied the 52 EE programs for paths in Greece and provided values to the 10 criteria for each program. Those values were on the nine-number scale. As soon as all the values of the three decisionmakers were collected, the geometric mean was calculated for the corresponding values of each criterion for each EE program.
Application of the MCDM model to estimate the final value of each EE program and ranking of the alternatives: To implement this step and calculate a final value U(EEp j ) for each EE program, four different MCDM models have been applied: SAW, WPM, TOPSIS, and PROMETHEE II.

SAW
SAW model consists of translating a decision problem into the optimization of some multi-attribute utility function U defined on A. The decision-maker estimates the value of the function U(EEp j ) for every alternative EEp j and selects the one with the highest value. The multi-attribute utility function U can be calculated in the SAW method as a linear combination of the values of the n attributes: where EEp j is one alternative and uc ij is the value of the c i criterion for the EEp j alternative. The higher is the value, the more desired is the alternative. The values of U(EEp j ) using SAW are presented in Table 2.

WPM
Since the classic model of WPM is considered complicated in the case of pairwise comparison as it compares alternatives in pairs by calculating a ratio U(EEp K /EEp L ). However, in our case, the alternatives are 52 and this pair-wise comparison would be complicated and time-consuming. Therefore, we used an alternative application of WPM, which is proposed by Triantafyllou [21]. In this alternative approach of WPM, the decisionmaker use only products without ratios. Therefore, for each alternative, the following value was calculated: The term U(EEp j ) denoted the total performance value of the alternative EEp j ( Table 2). Similar to SAW, the alternative with the highest U(EEp j ) is ranked first.

TOPSIS
The central principle in TOPSIS model is that the best alternative should have the shortest distance from the ideal solution while the farthest distance from the negative-ideal solution.
Calculating weighted ratings: The weighted value is calculated as: v ij = w uci · uc ij , where w uci is the weight and uc ij is the value of the criterion c i .
Identify positive-ideal and negative-ideal solutions: The positive-ideal solution is the composite of all the best attribute ratings attainable and is denoted: Calculate the separation measure from the positive-ideal and negative-ideal alternative: In this step, the system calculates the n-dimensional Euclidean distance from the posi- Calculate similarity indices and ranking EE projects: The similarity index represents the similarity to the positive-ideal solution for alternative j, which is finally given by The alternative EE projects are then ranked according to U * j , in descending order, and the one with the higher value is selected as the most desired (Table 2).

PROMETHEE II
The PROMETHEE methods belong to the family of the outranking methods. The steps of PROMETHEE II after having defined criteria and their weights of importance as well as the values of the criteria for all EE programs are: Making comparisons and calculate preference degree: This step computes for each pair of possible EE programs and for each criterion, the value of the preference degree. Let g j (EEp i ) be the value of a criterion j for a EE program EEp i . We note d j (EEp i , b), the difference in the value of a criterion j for two EE programs a and b.
P j (EEp i , b) is the value of the preference degree of a criterion j for two EE programs EEp i and b. The preference functions used to compute these preference degrees are defined such as: Aggregating the preference degrees of all criteria for pair-wise EE programs: This step consists in aggregating the preference degrees of all criteria for each pair of possible EE programs. For each pair of possible EE programs, we compute a global preference index. Let C be the set of considered criteria and w j the weight associated with criterion j. The global preference index for a pair of possible EE programs EEp i and b is computed as follows: Calculate positive and negative outranking flow: This step, which is the first that concerns the ranking of the possible EE programs, consists in computing the outranking flows. For each possible EE program EEp i , we compute the positive outranking flow φ + (EEp i ) and the negative outranking flow φ − (EEp i ). Let A be the set of possible EE programs and n the number of possible EE programs. The positive outranking flow of a possible EE program EEp i is computed by the following formulae: The negative outranking flow of a possible EE program EEp i is computed by the following formulae: Calculate the net outranking flow: The last step of the application of PROMETHEE II consists in using the outranking flows to establish a complete ranking between the possible EE programs. The ranking is based on the net outranking flows. These are computed for each possible EE program from the positive and negative outranking flows. The net outranking flow U(EEp i ) of a possible EE program EEp i is computed as follows: Ranking EE programs: The ranking of EE programs is done according to the value U(EEp i ).

Application of MCDM Models
As soon as all the MCDM models have been applied, the final values of all EE programs using each model are calculated (Table 2).
After calculating the values of the different MCDM models, we estimated the ranking of each EE program ( Table 3).

Comparison of the Models
The main steps for comparing the MCDM models are: 1. Implementing pairwise comparisons of the values of the models by calculating the Pearson correlation coefficient.

2.
Implementing pairwise comparison of the rankings by calculating the Spearman's rho correlation 3.
Estimating the Cohen's kappa for testing the inter-rater comparability, using MCDM models as raters.

4.
Performing a sensitivity analysis to evaluate the robustness of those models.
In order to estimate the correlation of the four MCDM models being evaluated, we have calculated the Pearson correlation coefficient, for all pairs of MCDM models using the values of Table 2. The values of the Pearson correlation coefficient are presented in Table 4, which revealed that all four methods perform very similarly (Pearson correlation coefficient of 0.969 to 0.995, which is very high). After confirming the correlation of the MCDM models using the values of the Pearson correlation coefficient, we aim at implementing a pairwise comparison of the rankings produced by the MCDM models. This comparison is performed by estimating Spearman's rho correlation coefficient. More specifically, the rankings of Table 3 are used for calculating Spearman's rho correlation, for all pairs of MCDM models. The Spearman's rho correlation is estimated by: where d i is the rank different at position i and n is the number of ranks. The results are presented in Table 5 and the results are remarkable as all four methods perform very similarly (Spearman's rho correlation of 0.983 to 0.995, which is very high). Generally, the values of correlations are very high for all pairs of MCDM models. According to the values of the Pearson correlation coefficient, the highest correlation was between SAW and WPM, which was quite expected since their reasoning is very similar, and the lowest correlation was between TOPSIS and PROMETHEE II. The correlation of the rankings of the different alternative EE programs confirmed the high correlation of SAW and WPM but slightly higher was the correlation of SAW with PROMETHEE II.
The values of Cohen's kappa for pairwise comparing the agreement of the MCDM models are presented in Table 7. A value of Cohen's kappa above 0.6 is quite good. Thus the values of the Cohen's kappa of the particular experiment (0.806-0.976) confirm the reliability of the four MCDM models being compared for the particular domain. The raters' resemblance according to Cohen's kappa is quite high in general (above 0.806 for all pairs of MCDM models). This fact was also confirmed by the Pearson correlation coefficient and Spearman's rho correlation presented in Tables 4 and 5. Cohen's kappa revealed the highest correlation was between PROMETHEE II and SAW, which is in line with the results of Spearman's rho correlation. Both Spearman's rho correlation and Cohen's kappa revealed that the correlation of PROMETHEE II with WPM is also very high. Even though some theories seem to correlate more than others, the general correlation of the four methods is very high. A reasonable disagreement can be observed among the methods, but this does not affect their reliability.

Sensitivity Analysis
A sensitivity analysis was performed to investigate the robustness of each MCDM method compared in this paper. A way of performing sensitivity analysis is changing one by one the weights of the criteria. Another way of performing this analysis is using a weighting scheme, which assigns equal weight to each one of the criteria [40,61]. The values of the criteria are not modified. Since there are 10 criteria, the weight for each criterion was determined to be 0.1. Table 8 presents the values assigned to each one of the alternative EE programs being evaluated using the four MCDM models. First, the values calculated by MCDM models using the weighting scheme 1 is presented. This scheme uses the weights as these were estimated by AHP, and then for each model, the weighting scheme 2 is used in which equal weights are given to all criteria. Using the two different schemes of weights by the four different MCDM, the rankings have been estimated for each alternative EE program. As a result, Table 9 presents the ranking of the alternative EE programs using the four different MCDM models and the two schemes of weights.

•
Checking how many identical rankings were among the rankings of each model using the different schemes.

•
Estimating the Spearman's rho correlation for each model using the two schemes of weights.
In Table 10 the percentage of identical rankings is presented for the four MCDM models and from these values, it is derived that SAW is less affected by the change in the weights of the criteria. Spearman's rho correlation also confirmed the high correlation of rankings estimated by SAW using the two different schemes of weights (Table 10). The MCDM model that seems to be less affected by the choice of the set of weights is SAW, while the MCDM model that is most affected is TOPSIS as it has the lowest values of Pearson correlation coefficient, identical rankings, and Spearman's rho correlation coefficient.

Conclusions
The front-end evaluation of EE programs prior to the application is very important because the application of an EE program costs time, money, and effort, and, therefore, one would like to implement only a program that is worth being implemented. Furthermore, the evaluation of EE programs can help the reuse of EE programs that require a lot of effort to be designed and have proven to have complete instructions and confirmed success in the application.
In view of the above, the contribution of this paper is on presenting the use of MCDM models for the front-end evaluation of EE programs and the comparison of the different models. Conclusions have been drawn regarding the application of different MCDM models. The implementation of the different MCDM models and their comparison can be found very useful by Environmental Education Centers, educators and/or generally stakeholders in EE that want to evaluate EE projects prior to their implementation and select the one that seems the most appropriate.
In this paper, we show that AHP has been used for defining the set of criteria as well as their weights. Then we combine AHP with four different MCDM models and compare them. In order to implement the MCDM models and run the comparison test between the model for an initial comparison of the EE projects prior to their implementation, we run a simulation using 52 EE projects of the EE Centers in Greece that involve paths. The data of the 52 EE projects that involved paths were incorporated by SAW, WPM TOPSIS, and PROMETHEE II, which are various MCDM models with different computational mechanisms as an additive or multiplicative combination, similarity to an ideal solution, etc.
The application of the different theories revealed the characteristics of each method as well as their advantages and disadvantages. The application of SAW is very easy, but the values of criteria must be positive to provide valuable results. WPM has the same advantages and disadvantages as SAW and provides similar results as it is proved by the values of Spearman's rho and Cohen's kappa. Both SAW and WPM have the ability to compensate among different criteria. Simplicity is also the main advantage of TOPSIS. An additional advantage of TOPSIS is its ability to maintain the same amount of steps regardless of problem size has allowed it to be utilized quickly to review other methods or to stand on its own as a decision-making tool [55]. However, the use of Euclidean Distance does not consider easy for the correlation of attributes but keeps the consistency of judgment. PROMETHEE II is also quite easy and additionally, it does not require the assumption that criteria are proportionate. However, all four models have the disadvantage of not having a clear methodology for estimating weights. For this purpose, in the current paper, AHP is used for estimating weights and combined in turn with SAW, WPM, TOPSIS, and PROMETHEE II.
The application of the MCDM models for the assessment of EE programs has revealed that the MCDM may prove rather effective. However, according to Mulliner et al. [18], different MCDM models can yield different results when applied to the same decision problem. This fact was also confirmed in this particular study. Therefore, we have used the Pearson correlation coefficient, Spearman's rho correlation, and Cohen's kappa for pairwise comparison of the different models and checking the inter-MCDM models' reliability.
The sensitivity analysis that was performed in order to evaluate the robustness of the four different MCDM models in evaluating EE programs was implemented by applying a different scheme of weights and comparing the results of each model using the two different weighting schemes. The comparison involved estimating the Pearson correlation coefficient, identical rankings, and Spearman's rho correlation. The results of the sensitivity test revealed that all models were quite robust. The MCDM model that proved to be more robust and less affected by the choice of the set of weights is SAW, while the MCDM that is most sensitive is TOPSIS.
The high values of correlation between the different MCDM models revealed that the ranking results mainly depend on the nature and the values of the criteria and less on the model selected. The reasonable disagreement that was observed among the methods did not affect their reliability. As a result, MCDM models proved generally very effective for EE programs before their implementation and selecting the best ones.
However, a possible limitation of this work is that this comparison has been made using only a set of alternative EE projects and it would provide safer conclusions if more sets were involved. Furthermore, the comparison could also involve more MCDM models in order to confirm that the results would not change. Therefore, it is among our future plans to extend the experiment with more MCDM models, such as ELECTRE, Delphi etc. Furthermore, we aim at re-implementing the experiment with other sets of EE projects with different characteristics in order to confirm that the set of projects does not affect the conclusions drawn by this study.