Application of Multi-Criteria Decision-Making Models for the Evaluation Cultural Websites: A Framework for Comparative Analysis

Websites in the post COVID-19 era play a very important role as the Internet gains more visitors. A website may significantly contribute to the electronic presence of a cultural organization, such as a museum, but its success should be confirmed by an evaluation experiment. Taking into account the importance of such an experiment, we present in this paper DEWESA, a generalized framework that uses and compares multi-criteria decision-making models for the evaluation of cultural websites. DEWESA presents in detail the steps that have to be followed for applying and comparing multi-criteria decision-making models for cultural websites’ evaluation. The framework is implemented in the current paper for the evaluation of museum websites. In the particular case study, five different models are implemented (SAW, WPM, TOPSIS, VIKOR, and PROMETHEE II) and compared. The comparative analysis is completed by a sensitivity analysis, in which the five multi-criteria decision-making models are compared concerning their robustness.


Introduction
The electronic presence of museums in the post COVID-19 era plays a very important role as the number of visitors to the museums constantly reduces due to health constraints. Therefore, many museums have concentrated on improving their online image and services. However, creating the right electronic presence is not easy, and the success of a website can only be confirmed by its evaluation. The evaluation of a website is a complex procedure, which, despite its importance, is often omitted from the website life-cycle. As a result, many researchers have highlighted the need for evaluating museum websites [1][2][3][4][5][6][7][8].
In an effort to help developers and stakeholders implement evaluation experiments, several researchers have proposed frameworks, criteria, dimensions, and theories that could be used for this purpose. In some evaluation experiments, a first phase is proposed where experts independently extract the criteria that are going to be used in the next phases of the experiment [9]. Other proposed specific frameworks define both the criteria and the steps of the evaluation experiment (e.g., [2,4]).
A way of combining the dimensions and criteria that are taken into account while evaluating a website is using a multi-criteria decision-making (MCDM) model. Different models have been used in the past for this purpose. For example, Analytic Hierarchy Process (AHP) [10] has been used solely [11] or in combination with other methods, such as TOPSIS [2], fuzzy TOPSIS [5], WPM, fuzzy WPM, and fuzzy SAW [12,13].
In light of the above, the main contribution of this paper is on presenting a generalized framework that uses and compares MCDM models for the evaluation of cultural websites. The framework is called DEWESA (dimensions for evaluating websites and sensitivity analysis) and presents in detail the steps that have to be followed for selecting dimensions and criteria, their weights of importance, the MCDM models that seem to be more appropriate for the specific category of websites, and the comparative analysis that has to be performed to select which MCDM model is the most suitable.
DEWESA is implemented in the current paper for the evaluation of museum websites. More specifically, we used the dimensions and the criteria defined during the application of AHP for the evaluation of museums' websites in a previous experiment [25]. New studies confirm these criteria [17,18,30] but many have a different focus, e.g., some implement heuristic evaluation [6]. Then, AHP was used for the estimation of the weights of the criteria, and five different MCDM models were implemented in turn. More specifically, we ran an inspection evaluation, in which expert users were asked to evaluate five websites of worldwide well-known museums. The results of the evaluation were processed by the different MCDM models, and their results were compared. For this purpose, we applied simple additive weighting (SAW) [25,31], weighted product model (WPM) [15], TOPSIS (technique for order of preference by similarity to ideal solution) [32], VIKOR model [33][34][35], and PROMETHEE II (preference ranking organization method for enrichment evaluations II) [36,37]. As a next step of DEWESA, a comparative analysis of the five MCDM models for the particular domain was implemented.
The main aim of this framework is the comparative analysis of the different models with respect to their consistency and robustness and, therefore, a sensitivity analysis was performed. Sensitivity analysis is an important procedure that allows testing the degree of change in the overall ranking of the alternatives when input data are slightly modified. In most approaches, the sensitivity analysis that takes place involves estimating the changes in the scores of alternatives, for a given change in the weight of one criterion or to all criteria. Although sensitivity analysis of the models has been implemented before in different domains [27,[38][39][40], it is the first time that it is implemented for estimating the consistency of the MCDM models in museums' website evaluation.

Framework
In this section, the generalized framework DEWESA for using MCDM models in websites' evaluation is presented. The framework presents the steps that have to be followed for selecting dimensions and their weights for website evaluation as well as the MCDM models that seem to be more appropriate for the specific category of websites.
The main steps of the framework DEWESA are: 1. Dimensions and Criteria. In this step, the dimensions and the criteria are defined. The dimensions are used for the evaluation of the website. The value of each dimension is affected by a subset of criteria. The criteria and the dimensions do not have the same importance; therefore, their weights must be calculated. For this purpose, AHP is used. The application of AHP involves setting a pair-wise comparison matrix for the dimensions and a pair-wise comparison matrix for the sub-criteria of each dimension. Then, an open-source decision-making software that implements AHP, such as 'Priority Estimation Tool' (PriEst) [41], could be used to estimate the weights. For the case of museum websites' evaluation, this step is analyzed in Section 2. The analysis of the steps is presented in the subsequent sections, and an example for museum websites evaluation is presented.

Dimensions and Criteria
In this step of the framework, the dimensions used for the evaluation of the websites should be defined. Using the DEWESA, for the evaluation of museum websites, we had to define the set of dimensions for museum websites evaluation. Therefore, we used as a basis the dimensions proposed by Kabassi [2] based on the analysis of criteria used in the evaluation experiments of museums' websites [14] and have gone through later studies to check whether new dimensions have been proposed. New studies confirm these criteria [3,16,17] but many have a different focus, e.g., some experiments implement heuristic evaluation, in which the criteria are prefixed [6]. The three dimensions proposed by Kabassi [2] are: Each one of these dimensions depends on several criteria. The values of the different criteria and their weights are used for calculating the final values of the dimensions. In this approach, we focus on the values of the dimensions and not the values of the criteria.
The dimensions are not taken equally into consideration while evaluating a museum's website. Furthermore, the criteria are not taken equally important in the estimation of the final value of each dimension. For this purpose, AHP is used for calculating the values of the weights both of dimensions and criteria.
AHP has a formal method of estimating weights and supporting hierarchies of criteria such as the one we have in the current experiment. According to AHP, a set of evaluators consisting of both software engineers and domain experts was formed. Each expert had to complete one matrix for pair-wise comparison of the dimensions and three matrices for pair-wise comparisons of the criteria of each dimension. The values that the experts used for completing the tables varied from 1/9 to 9, as Saaty [10] proposed. The final matrices of pair-wise comparisons of the criteria were formed by calculating the geometric mean of the corresponding values of the experts' matrices. This procedure resulted in Tables 1-4.   0.20 0.5 1.00 As soon as the final matrices have been completed, the principal eigenvalue and the corresponding normalized right eigenvector of each matrix give the relative importance of the various criteria being compared. The elements of the normalized eigenvector are the weights of dimensions or sub-criteria. These estimations are made using the 'Priority Estimation Tool' (PriEst) [41] and are presented in Figure 1.

Alternative Museums' Websites and Criteria's Values
The alternative museums' websites that have been selected to be evaluated and compared in the current experiment are those of the five big museums of European cities. More specifically, the Louvre Museum in Paris, the British Museum in London, the Rijksmuseum in Amsterdam, the Acropolis Museum in Athens, and the Del Prado Museum in Madrid were selected (Figures 2-6). These websites were assigned to:      The experts that evaluate the alternative websites interact with its interface and have to provide values to the criteria. This procedure is presented in the next section.

Estimating Dimensions' Values
Nine expert users (three web designers, one software engineer, three curators, and two archaeologists) were asked to visit the websites of the museums and interact with them. At the end of their interaction, they were asked to provide values to the sub-criteria and not to the main dimensions. The final value of each sub-criterion was calculated as a geometric mean of the nine values assigned by the nine evaluators (Table 5). The values of the main dimensions will be calculated as a weighted sum of the corresponding sub-criteria:  Table 6.

Applying MCDM Models
The main dimensions of each website will be combined using different MCDM models.

SAW
The simple additive weighting (SAW) [31,32] method consists of estimating a function ) ( j A U for every alternative j A and selecting the one with the highest value.
The multi-attribute utility function U is calculated as a linear combination of the values of the n attributes: x is the value of the i dimension for the j A website.

WPM
In this paper, we use the approach of WPM proposed by Triantafyllou [15]. In this alternative approach of WPM, the following value is calculated for each website: The term ) ( j A P denotes the total performance value of the website j A .

TOPSIS
The central principle in TOPSIS model is that the best alternative should have the shortest distance from the ideal solution and the farthest distance from the negative-ideal solution.
Calculate Weighted Ratings. The weighted value is calculated as: where i w is the weight, and ij x is the value of the dimension i . Identify Positive-Ideal and Negative-Ideal Solutions. The positive ideal solution is the composite of all best attribute ratings attainable and is denoted: v is the best value for the dimension i among all alternatives. The negative-ideal solution is the composite of all worst attribute ratings attainable and is denoted:

VIKOR
The basic concept of the VIKOR model lies in defining the positive and negative ideal points, which was first put forth by Opricovic and Tzeng [33,34].
The compromise ranking algorithm [43,44]  The values of Q for each website are presented in Table 5. Taking into account these values, the alternatives are sorted Q in ascending order and compared with the ranking made also using S and R.

PROMETHEE II
PROMETHEE II creates a complete pre-order on the set of possible websites that can be proposed to the decision-maker in order to solve the decision problem. The steps of PROMETHEE II after having defined dimensions, their values, and their weights of importance are: Making comparisons and calculate preference degree. This step computes, for each pair of websites and each dimension, the value of the preference degree. Let ) (a g j be the value of a dimension j for a website a. We note , the difference of the value of a dimension j for two websites a and b.
is the value of the preference degree of a dimension j for two websites a and b. The preference functions used to compute these preference degrees are defined such as: Ranking websites. The ranking of museums' websites is performed according to the value of

Comparison of the MCDM Models
As soon as all the MCDM models have been applied, the final value for each alternative website using each one of the MCDM models is calculated. Those values are further used for ranking the alternative websites. Both the values and the ranking order of the websites using the five MCDM models are presented in Table 7. In order to compare the MCDM models, we calculated the Pearson correlation coefficient for making a pair-wise comparison of the values produced by the models and the Spearman's correlation coefficient for making a pair-wise comparison of the rankings of the alternative websites (Tables 8 and 9). Spearman's rho correlation is estimated by: where is the rank different at position i and n is the number of ranks. Both values of the Pearson and Spearman's rho correlation coefficient revealed a high correlation of SAW and WPM, which was quite expected, as the reasoning of these two models is considered rather similar. A high correlation is also found between TOPSIS and SAW or TOPSIS and WPM. The lower correlation was spotted between TOPSIS and VIKOR or TOPSIS and PROMETHEE II.
In comparative studies, SAW has been compared with WPM [27,45], TOPSIS [27,28,[45][46][47], VIKOR [27,47], and PROMETHEE II [28]. WPM has only been compared with TOPSIS [27,45] and VIKOR [27]. TOPSIS has been compared with SAW and WPM, as mentioned above, as well as with VIKOR [47][48][49] and PROMETHEE II [28,38,49]. Finally, VIKOR has been also compared with PROMETHEE II [49]. However, in most of these studies, general remarks are made and not specific statistical values, except for the study of Valikipour et al. [47] that uses Spearman's rho and concludes that TOPSIS has a high correlation with SAW. This is in line with the results of the current study. The study of Widianta et al. [28] revealed a high correlation of TOPSIS with PROMETHEE. This is not completely in line with the current study, but the difference in the domain of application of the MCDM model justifies the disagreement. Regarding the evaluation of websites of museums, a comparison of MCDM models has been implemented for websites of museums' conservation labs between fuzzy SAW and fuzzy WPM [12] and another one for environmental websites between TOPSIS and VIKOR [50].

Sensitivity Analysis of the MCDM Models
In order to check the consistency of the results produced by each MCDM model and evaluate the robustness of each model, we performed a sensitivity analysis. A way of performing sensitivity analysis in using a different scheme of weights or changing the weights of the dimensions one by one. In this case, we use a different scheme of weights, which uses the same weight for all dimensions. This means that all dimensions are considered equally important in the reasoning process, and the weight of each dimension is set to 0.333. We apply the second scheme of weights to the data of the dimensions as these were given by the human experts and re-calculate the final value for each alternative website using each one of the five MCDM models examined in this paper. After having calculated the new values for each alternative, the new ranking of the alternative websites is estimated. The values as well as the ranking of the alternatives using the five different MCDM models are presented in Table 10. The main aim of the sensitivity analysis is to check how sensitive the MCDM models are in a change of weights of the dimensions. For this purpose, we calculated the Pearson correlation coefficient for each MCDM model. More specifically, the values generated by each MCDM using the two different schemes of weights were compared pair-wise, and the Pearson correlation coefficient was estimated. However, the most important analysis involves checking the rankings generated by the different models. We compared the rankings of websites using the two different schemes by  checking how many identical rankings were among the rankings of each model using the different schemes;  estimating the Spearman's rho correlation for each model using the two schemes of weights. Table 11 presents the Pearson correlation coefficient, the percentage of identical ranking, and Spearman's correlation coefficient for each model when the results of the same models are compared using the two different weighting schemes. One can easily observe that, although VIKOR and PROMETHEE II present a high correlation of the values because they have high values of Pearson correlation coefficient, they have low or null percentages of identical rankings and lower values of Spearman's correlation coefficient, which means that the correlation of their ranking is very low or non-existent. As a result, VIKOR and PROMETHEE II appear to be very sensitive to changes in weights. Both SAW and WPM have mediocre values of Pearson correlation coefficient, a mediocre percentage of identical rankings, and a quite high Spearman's correlation coefficient. This means that SAW and WPM are very robust. Finally, TOPSIS has a mediocre sensitivity as the percentage of identical rankings is medium and a quite high Spearman's correlation coefficient but not that high value of Pearson correlation coefficient. In view of the statistical analysis presented in Table 11, VIKOR is presented to be the most sensitive in the changing of weights for the dimensions, while SAW and WPM are the most robust and less affected by changes in the weights of the criteria.

Conclusions
MCDM models have been used for evaluating and comparing cultural websites in the past [2,12]. However, MCDM models have been criticized for producing different results and no model has proved to be the best in all domains. The aim of this paper was to present a generalized framework for implementing and comparing MCDM models for the evaluation of cultural websites. The generalized framework gives the steps and the details for their implantation in order to apply the MCDM models and compare them. In the light of this information, many researchers can benefit since it would be easier for them to apply and compare MCDM models for the evaluation of cultural websites.
DEWESA was designed for cultural websites and has been applied for the evaluation of museum websites. However, the steps could be used by other researchers in the evaluation of any website. Furthermore, they could make changes by adjusting the dimensions and/or the MCDM models.
In this paper, DEWESA has been used to evaluate museum websites and, for this purpose, we apply and compare five different MCDM models. The evaluation of the museum websites is based on three main dimensions: usability, functionality, and mobile interaction. The dimensions and the criteria defined in this paper based on a study of Kabassi [2] and confirmed by other studies are used for the evaluation of museum websites and could be also used for the evaluation of other websites, as well. For the processing of the data of the evaluation and the aggregation of the values of the dimensions, five different models are used in turn: SAW, WPM, TOPSIS, VIKOR, and PROMETHEE II.
The comparative analysis proposed by DEWESA involves the estimation of statistical terms for comparing the values and the rankings of each model using the different schemes of weights. More specifically, Pearson correlation coefficient is used for comparing the values, and Spearman's rho correlation is used for comparing the rankings of each model using the different schemes of weights. These statistical terms proved very effective for the extraction of conclusions on the similarity of results of MCDM models. In the implementation of DEWESA for museum websites, the statistical analysis of the comparison of the MCDM models revealed a high correlation of SAW and WPM, which was quite expected as the reasoning of these two models is considered rather similar. The lower correlation was spotted between TOPSIS and VIKOR or PROMETHEE II.
In order to check the robustness of the MCDM models, DEWESA implements a sensitivity analysis. For this purpose, the generalized framework proposes using a different scheme of weights, in which equal weights were used for all dimensions, and re-calculated all the values of the alternatives using the different MCDM models. In the implementation of DEWESA for the evaluation of museums' websites, conclusions were drawn for the comparison of the five MCDM models applied: SAW, WPM, TOPSIS, VIKOR, PROMETHEE II. Indeed, the pair-wise comparison of the models using the two different weighting schemes revealed that VIKOR has the highest sensitivity in the change of the weights of criteria while SAW and WPM are considered to be rather robust and maintain partly the ranking of the alternatives despite the change of weights.
A possible limitation of the paper is that DEWESA has not been checked for the evaluation of other cultural websites to test its effectiveness. Furthermore, its effectiveness for the evaluation of other websites in other domains should be confirmed. Therefore, it is among our future plans to implement DEWESA in the other cultural websites and websites of different domains to check its usefulness and efficiency. Finally, it is intended to use DEWESA for the comparison of more than five MCDM models for the evaluation of museum websites.