AHP, Fuzzy SAW, and Fuzzy WPM for the Evaluation of Cultural Websites

: The evaluation of cultural websites is a complicated procedure that depends on several criteria. The main contribution of this paper is on showing how this process may be implemented. For this reason, different multi-criteria decision-making methods are combined and compared for processing the results of the evaluation. More specifically, the paper provides the criteria for evaluating a website of cultural heritage and presents the application of Analytical Hierarchy Process for estimating the weights of the criteria. Then, it compares two different multi-criteria decision-making models, fuzzy simple additive weighting and fuzzy weighted product model, for being combined with Analytical Hierarchy Process and processing the results of the evaluation. The evaluation involves 29 websites of the conservation labs of the international museums and useful conclusions are extracted about the application of all methods and the combination of Analytical Hierarchy Process with fuzzy simple additive weighting and fuzzy weighted product model.


Introduction
Websites play an important role in promoting culture to everyone. For this purpose, cultural websites are targeted to users of different background knowledge and characteristics. The great variety of the characteristics of the users interacting with the websites of cultural content, results in not being able to address the needs of all users and as a result most users encounter problems during their interaction with such websites. Therefore, the need for evaluating such websites is great. Indeed, Jesús et al. [1], after the evaluation of two museum websites, pointed out that the usability, the content, and other characteristics of the website positively and significantly influence users' intentions of revisiting it and visiting the museum physically. The problem is even greater when one refers to more specialized parts of the museum. Such a case is the electronic presence of a conservation lab.
A conservation lab's main functions are to receive art, provide a workspace for the conservation of art, maintain safety standards, and provide for the storage of conservation records. Different specializations will have different requirements for particular equipment in their labs including architecture, book and paper, electronic media, objects, paintings, photographic materials, scientific research, textiles, and wooden artifacts. However, even if the conservation labs in museums play an important role in the preservation of the collections and their artifacts, mainly this role is sometimes overlooked. The museums' conservation labs and the treatments which are carried out on the artifacts, many times are not obvious for the public. Nevertheless, their content may be of interest to students, researchers, archaeologists, tourists, and artists for further education and preservation guidelines purposes.
Many researchers have highlighted the need for evaluating e-museum websites in order to confirm that websites are actually accessible and usable [2,3]. For this purpose, different models for evaluation of cultural websites have been proposed. For example, Di Blas et al. [4] have developed MilanoLugano Evaluation Method (MiLE), a model for weighting attributes in a usability evaluation. However, they did not use a specific decision-making model but only weighting between pairs of attributes. Similarly, models such as MuseumQual [5] and Museum's Sites Evaluation Framework (MUSEF) [6] use multiple criteria but not a specific theory for their model. Indeed, the majority of the models for website evaluation do not use a theory for combining the evidence collected during the experiment. Therefore, the main scope of this paper is to present a model for the evaluation of the websites of cultural content. As a testbed we use the 29 websites of the museums' conservation labs that had electronic presence on the internet. The specific websites have been selected because they contain both specialized and non-specialized content and, therefore, their content aims at targeting professionals, tourists, and all different kinds of users.
Taking into account that the evaluation of a website is a multi-dimensional problem that is based on several criteria and the fact that there is a lack of evaluation models for cultural websites that use theories, we consider multi-criteria decision making (MCDM) methods in a model for evaluating websites of cultural content. The Analytic Hierarchy Process (AHP) [7] is really strong in making decisions by making pairwise comparisons of uncertain, qualitative, and quantitative factors. The method provides a well-designed procedure for calculating the weights of the criteria used in the reasoning process of experts [8]. However, the implementation of AHP is getting complicated as the number of alternatives increases. Therefore, in many cases [9,10], AHP is combined with other methods.
Taking into account the above, we have made two different combinations of methods. The first combination involved AHP and fuzzy simple additive weighting (SAW) [11]. The second combination involved AHP and fuzzy weighted product model (WPM) [12,13]. Both these combinations have been used in the implementation of the evaluation results and the processing of its results. Fuzzy SAW (FSAW) and fuzzy WPM (FWPM) are simple multi-criteria decision-making methods, meaning that their implementation does not get more complicated with the increase of the alternatives. As a result, AHP has been used for the estimation of the weights of the criteria. Then the main evaluation experiment took place with the participation of 81 users who visited and evaluated the 29 websites using linguistic terms. The results collected were in turn processed with fuzzy SAW and fuzzy WPM in order to compare them and extract useful conclusions about their application and their combination with AHP.

Research Aim
Kabassi [14] presented a review of the evaluation experiments of museums' websites, gathered the criteria used in different expert-based or empirical evaluation experiments and created the final set of criteria used for the evaluation of websites. In this review, the fuzzy theory is only used in the Fuzzy Quality Tree for Web Inspection (FQT4Web) methodology [15] in combination with a hierarchical tree. We use fuzzy theory in combination with two multi-criteria decision-making theories in a model for cultural websites' evaluation.
Our research described in the present paper is based on previous research on the evaluation of websites of cultural content presented in [9,14,16]. Kabassi et al. [17] present the evaluation of museums' conservation labs using eight criteria and the WPM model. However, the set of criteria was considered limited and a better set may have given better results. Furthermore, in those evaluation experiments, the fuzzy theory was not employed and giving crisp values to the criteria was considered hard by the evaluators. In Kabassi [9], on the other hand, a framework has been proposed for the evaluation of museums' websites using AHP and a technique for the order of preference by similarity to an ideal solution [18] using the set of criteria proposed in [14].
The application of multi-criteria decision-making theories for the evaluation of museums' websites revealed a greater role that decision making can play in evaluation experiments. At the same time, important questions were raised: 1) multi-criteria decision making can be used in evaluation experiments of websites with cultural content in general? 2) can the set of criteria be improved? 3) can fuzzy multi-criteria decision-making methods be successfully used? 4) which method gives better results?
In an effort to address the above questions, in this paper, we present the evaluation experiment of twenty-nine websites of conservation labs and try to explore the suitability of the set of criteria proposed in [14] for the evaluation of different websites of cultural content. Furthermore, we use fuzzy multi-criteria decision making in the evaluation process and check the effectiveness of such methods by combining them with AHP. Finally, we compare two different fuzzy multi-criteria decision-making methods by applying them in a similar evaluation experiment and compare their application and results. These combinations of methods have never been compared before and have not been used for evaluating websites of cultural content.
The paper is organized as follows: Section 3 presents the material and methods used in this paper. More specifically, it presents the criteria used for the evaluation of the websites, as well as the multi-criteria decision-making methods that are used to combine these criteria and evaluate the websites. In Section 3, we present the experiment design, which includes first the estimation of the weights using AHP and then both fuzzy SAW and fuzzy WPM for the processing of the data collected during the evaluation. The extracted results are presented in Section 5. In Section 6, we present a comparison of the two methods. The results of the evaluation and the conclusions drawn by this work are discussed and analyzed in Section 7 and we give the conclusions drawn in Section 8.

Materials and Methods
In this section, we present the materials and methods used in the implementation of the proposed model for the evaluation of the websites of conservation labs in museums. For the evaluation of websites of museums' conservation labs we have used the set of criteria proposed in [14] for the expert-based evaluation experiments.

Category 1: Content.
In this category, all criteria are related to the content of a website.
i. c11: Currency/clarity/text comprehension. This criterion checks the currency and the clarity of the text. Currency refers to how successful is the system in providing up-to-date information, and how successfully it can reflect the current state of the world that it represents. Clarity refers to how comprehensible the texts provided to the users are. For this purpose, the quality and the style are checked as well as the way the content is organized and designed in order to make the website credible and trustworthy. ii.
c12: Completeness/richness. This criterion checks whether a website has adequate information on the subject. iii.
c13: Quality content. This criterion involves the accuracy and understandability of content. iv.
c14: Support of research. Checks whether the website provides information for the support of research.

Category 2:
Usability. All the criteria that are related to Usability.
i. c21: Consistency. Consistency means that similar pieces of information are dealt with in similar fashions [4]. ii.
c22: Accessibility. Accessibility measures how easily and intuitively accessible is the website's information for any user. iii.
c23: Structure/navigation. The structure of the information provided plays an important role in the success of a website. Therefore, the organization of the content pieces should be in such a way that the navigation of the user to the content of the website is easy. iv.
c24: Easy to use/simplicity. The user interface should be simple and easy to use. v.
c25: User interface/overall presentation/design. This criterion checks whether the overall presentation is attractive and engaging. vi.
c26: Efficiency. This criterion shows whether actions within the website can be performed successfully and quickly [4].

Category 3: Functionality.
Criteria that are related to the functionality of the website. i. c31: Multilingualism. The information should be given in more than one language [4].
ii. c32: Multimedia. Different media should be used to convey the information [4]. iii. c33: Interactivity. This criterion checks whether the content of the website is comprehensive and useful, nicely presented, and easy to explore and use. iv.
c34: Adaptivity. Adaptivity is the ability of the system to adapt to users' characteristics such as needs and interests while adaptability refers to the ability of users to adapt the user interface to their own preferences.

AHP
The steps of the implementation of AHP that were used for the estimation of the weights of the criteria were the following: 1. Form the set of evaluators. For the estimation of the weights of the criteria, it is important to have the view of experts on the field. Therefore, evaluators are only human experts. The selection of expert-based evaluations has many advantages [19] and the correct choice of the expert would give reliable and valid results. 2. Setting up a pairwise comparison matrix of criteria. In this step, a comparison matrix is formed so that the heuristics are pairwise compared. More specifically, a V from the scale that is presented in Table 1 is assigned to the comparison result of two elements P and Q at first, then the value of comparison of Q and P is a reciprocal value of V, i.e., 1/V. The value of the comparison between P and P is one (see Table 1).
Calculating weights of criteria: After making pairwise comparisons, estimations are made that result in the final set of weights of the criteria. As soon as, the weights have been calculated, the websites of conservation labs in museums are evaluated and the analysis of the results of the evaluation is made based on fuzzy SAW or fuzzy WPM.

Fuzzy SAW and Fuzzy WPM
Fuzzy SAW and fuzzy WPM have many steps in common and mainly differ in the pre-last step of their implementation. For this purpose, we give the steps that are common and those that differentiate in the implementation of these two methods. Both methods presuppose that the weights of the criteria have been estimated and use them to evaluate the alternative website of conservation labs in museums. The steps of fuzzy SAW and fuzzy WPM are the following: 1. Forming a new set of evaluators. In this phase of the evaluation experiment, the set of evaluators were formed, following the taxonomy of types of users of cultural websites proposed by Sweetnam et al. [20]. This group may be the same as the one formed in step 1 of the application of AHP or may be different.

Assigning values to the criteria.
In order to make this process easier for the users, especially for those that do not have experience in multi-criteria analysis, the users could use linguistic terms for characterizing the fourteen criteria presented at the beginning of this section. Therefore, evaluators use linguistic terms to give values to the criteria. The linguistic terms used in this method have been proposed by [21]. All criteria are assigned fuzzy values. The linguistic terms and the corresponding fuzzy values, which are used for rating all criteria, are presented in Table  2.
3. Linguistic terms are transformed into fuzzy numbers. Each linguistic term is assigned to a fuzzy number, which is a vector where i shows the alternative and j shows the criterion. Each is a triangular fuzzy number.

Implement Fuzzy SAW or Fuzzy WPM
The normalization of fuzzy numbers. To avoid the complicated normalization formula normally used, Chen [21] proposes a linear scale transformation in order to transform the various criteria scales into a comparable scale. The particular normalization method aims at preserving the property that the ranges of normalized triangular fuzzy numbers belong to [0,1]. The normalization of a fuzzy number where * j c is the maximum value of all ij c (3).

5.1.ii.
Calculating the weighted normalized fuzzy numbers of the MCDM matrix. Derive total fuzzy scores for individual alternatives by multiplying the fuzzy rating matrix by their respective weight vectors. Obtained total fuzzy score vector by multiplying the fuzzy rating matrix D by the corresponding weight vector W, i.e., Calculating the weighted normalized fuzzy numbers of the alternatives.
The classic WPM compares alternatives in pairs by calculating a ratio . If this ratio is greater than or equal to the value one, then it indicates that alternative K A is preferred than L A . However, when the alternatives are too many these pair-wise comparisons are complicated and time-consuming. Therefore, we use an alternative application of WPM, which is proposed by Triantafyllou [12]. In this alternative approach of WPM, the decisionmaker use only products without the previous ratios. Therefore, for each alternative the following value is calculated: The by the signed distance to determine the best location. Four defuzzification methods are most commonly used: The centroid method, mean of maximal (MOM), a-cut method, and signed distance method [11,22−26]. All these methods share advantages and disadvantages [27], but Yao and Chiang [28] propose the signed distance method, which is also used by Chou et al. [11] in fuzzy SAW. The crisp total scores of individual locations are calculated by the following defuzzification equation: gives the defuzzified value (crisp value) of the total fuzzy score of the alternative i A . The ranking of the websites can then be based on the above crisp value of the total scores for individual alternative websites.

Experiment Design
For the implementation of the experiment, we conducted a search on the websites of conservation labs in museums all over the world. This search revealed that only a few museums provide adequate information about their conservation labs and have separate sections on their websites or a completely new website. This process revealed twenty-nine museum conservation labs websites that belong to the following international and national museums:

AHP
As soon as the set of the websites of conservation labs of museums is formed, AHP is applied to calculate the weights of the criteria used in the reasoning process of the decision-makers. For the application of AHP, the following steps are implemented: 1. The set of evaluators was formed. The group of evaluators participated four professional conservators and four software engineers, three of which had experience in a University Department of Conservation of Antiquities & Works of Art. 87.5% of the experts were Greek and the rest of them, English. 75% of the evaluators were 35-45 years old and the others belonged to the age group 45-55. Finally, 62.5% of the experts were male.

2.
A pairwise comparison matrix of criteria was set up. Each one of the eight experts were asked to fill the pairwise comparison matrix so that the criteria would be pairwise compared. As a result, eight matrices were completed. In order to find the final pairwise comparison matrix of criteria, we calculated the geometric mean of the values of all corresponding cells belonging to the eight matrixes completed by the experts.
3. The weights of the criteria were calculated. In this step, the principal eigenvalue and the corresponding normalized right eigenvector of the comparison matrix gave the relative importance of the various criteria being compared. The elements of the normalized eigenvector were the weights of criteria. In terms of simplicity, we had used the Priority Estimation Tool (PriEst) (Sirah et al. 2015), an open-source decision-making software that implements the AHP method, for making the calculations of AHP. The weights of the criteria were estimated as follows:

292
. The next steps of AHP would require the completion of eight 29 by 29 matrixes. This considered very complicated and hard to implement even for the experts on the field. Another difficulty that cultural experts often encounter when evaluating alternative websites, is quantifying criteria. For this purpose, we have used alternatively, fuzzy SAW and fuzzy WPM to evaluate the websites of museums' conservation labs. Then, we compared these two fuzzy MCDM models to see which one seemed more appropriate for completing AHP.

Fuzzy SAW vs. Fuzzy WPM
In this part of the evaluation, experts and non-experts were selected to participate in the experiment. More specifically, the new group of the evaluator consisted of 81 users of different knowledge and levels of expertise in conservation and software engineering. They were asked to visit the 29 websites of conservation labs of museums and answer a questionnaire with multiple-choice questions. Each question corresponded to the evaluation of each criterion. This process resulted in one decision matrix with the linguistic terms for the 81 users. The linguistic terms in the 81 matrices created by the evaluators were then replaced with fuzzy numbers. The fuzzy numbers that correspond to the linguistic terms are presented in Table 2. In order to aggregate all the values of the 81 matrices created by the decision-makers in one single matrix, the geometric mean was used. This process resulted in one single decision matrix: where i shows the alternative museum and j shows one of the 14 criteria. Each element of the matrix is a triangular fuzzy number. In our evaluation = (4.92,6.97,8.64). As far as the application of SAW is concerned, the values of the criteria were transformed into a comparable scale using the formula: As a result, all triangular fuzzy numbers that existed in the decision matrix belonged to [0,1]. In our evaluation, for example, ̃ = (0.55,0.78,0.96). These normalized values were further used for the application of fuzzy SAW.
Taking into account that criteria have different weights in the reasoning process of the decisionmakers, we calculated the values of each criterion with the corresponding weight of the criterion. In these tables, the element i f was estimated either with SAW in the first matrix or with WPM in the second matrix.      Barberini -Corsini Gallery -Roma 84. 10 28 National Museum New Delhi 71. 32 29 Galleria Nazionale d'Arte Moderna 61.76

Comparison of Methods
In order to compare the two methods presented in this paper, we conducted an experiment using AHP. The validation process aimed at comparing the two different combinations AHP-FSAW and AHP-FWPM. The main reason for selecting AHP for comparing these two combinations is the AHP method supports comparisons in pairs. The criteria used for comparing these two proposed methods are acquired by [29,30] and adjusted to our case:  Completeness: This criterion shows whether the framework is complete.  Accuracy: This criterion shows whether the method's processes are accurate.  For the implementation of the validation process, we used a group of evaluators that consisted of nine expert users (two web designers, one software engineer, three curators, and two archaeologists). The values of Table 5 have resulted after finding the geometric mean of all values of the nine tables collected by the nine evaluators. In order to calculate the weights of the criteria, we have used the PriEst and were the following w = 0.08 , w = 0.08 , w = 0.08 , w = 0.08 , w = 0.15. After completing the process of evaluating the criteria, the evaluators participated in the second phase of the validation and comparison of the two methods: AHP-FSAW and AHP-FWPM. First, the implementation of the two methods was presented to the nine experts and then the results of each method. The evaluators were then asked to complete pairwise comparison two by two tables for each criterion. The values of the tables were processed by the PriEst and calculated a value for each method: AHP = 0.631, AHP = 0.369. Taking into account these AHP values, the method AHP-FSAW was considered better than the AHP-FWPM method. The results may not be very different, but the human experts thought that the results produced by the AHP-FSAW were three times better than the results of AHP-FWPM. The two methods did not differentiate in completeness and efficiency but AHP-FSAW was also considered slightly better in accuracy and in the satisfaction of the technique.

Discussion
AHP has the ability to model expert opinion and, therefore, is considered ideal for the calculation of the weights of the criteria. However, AHP is a time-consuming technique because of the mathematical calculations and the number of pairwise comparisons. The complexity increases as the number of alternatives and criteria increases or changes [31]. Since complexity rises with the increase in websites, the number of alternatives that can be compared is limited. This is one of the main reasons for selecting to combine AHP with another method.
For this reason, AHP was combined in turn with fuzzy SAW and fuzzy WPM. The complexity of these methods does not increase with the rate of AHP when the number of alternative websites increases. Both fuzzy SAW and fuzzy WPM use linguistic terms, which are easier for users to comprehend and, therefore, the implementation of the methods is considered easier. Furthermore, in the case of an evaluation, in which several evaluators are involved that do not have experience in implementing, fuzzy SAW or fuzzy WPM seems more appropriate. However, fuzzy SAW and fuzzy WPM does not provide a specified way for calculating the weights of criteria as AHP does. Taking into account all this information, AHP can successfully be combined with fuzzy SAW or fuzzy WPM.
The results of the first part of the evaluation revealed that the most important criterion of the first level is usability, followed by content. Within the sub-criteria of content, the quality of content was considered the most important criterion. Regarding usability, the sub-criteria structure/navigation and easy to use/simplicity are considered almost equally important. As far as functionality is concerned, the existence of multimedia is considered the most important criterion.
Regarding the results of the websites' evaluation, fuzzy SAW and fuzzy WPM had similar results but not identical. Both methods revealed that the best website was considered to be the website of the conservation lab of the National Gallery of Greece. The particular website provided rich content related to the activities of the department, the different departments, the equipment, and the staff. Its content is enriched with multimedia. The user interface is well designed and generally, the website is well structured and usable. The website of the Benaki Museum in Athens was rated as second by both methods. However, one may be concerned with the fact that two Greek websites were rated first. Although the language is a factor that may have influenced the evaluators, one can also observe that other Greek sites have been ranked in the last five.
The methods also gave the same rating in the last nine websites. Two of the last ranked websites of museums' conservation labs are the websites of the National Museum of New Delhi and the Galleria Nazionale d'Arte Moderna. Their content was poor, and it did not provide information about the staff, the facilities, and the equipment of the conservation lab. Furthermore, the websites had only a few photos and no other multimedia that could improve the interaction with the user. Finally, both websites did not appear to be updated until the time of the evaluation.
The other websites received different ratings by fuzzy SAW and fuzzy WPM but their position in the final ranking did not vary considerably. For example, the websites of the conservation labs of the Hermitage Museum and the Byzantine & Christian Museum in Athens were rated fourth and fifth according to fuzzy SAW and fifth and sixth according to fuzzy WPM. Similarly, the Archaeological Museum of Thessaloniki was rated eighth using fuzzy SAW and ninth using fuzzy WPM.
The comparison, however, of the two methods revealed that the combination of AHP with SAW provided the best final ordered list of the websites that were evaluated, and the human experts preferred this combination taking into account its accuracy and the implementation of the technique.

Conclusions
The evaluation of websites of cultural content that are targeted to a variety of users [20,32,33] is a very complicated procedure that focuses on the examination of several different criteria. The main contribution of this paper is on showing how this process may be implemented and comparing different methods for processing the results of the evaluation. More specifically, the paper provides the criteria for evaluating a website of cultural heritage and presents the application of AHP for estimating the weights of the criteria. Then it compares two different multi-criteria decision-making models, fuzzy SAW and fuzzy WPM, for being combined with AHP and processing the results of the evaluation.
AHP proved very effective in estimating the weights of criteria as it provided a well-defined procedure. Then AHP was combined with fuzzy SAW and provided a view for the electronic presence of the websites of the museums' conservation labs. In an effort to compare fuzzy SAW with fuzzy WPM, we also combined AHP with fuzzy WPM. The combination of these methods provided a different view of the electronic presence and this view compared with the one produced with the application of fuzzy SAW. Both of these methods can successfully process the results of the main evaluation experiment and be combined with AHP to provide better results. Additionally, these methods use linguistic terms which makes the evaluation process easier for both expert and nonexpert users.
Researchers of the field of the evaluation of cultural heritage (e.g., [19]) believe that the reliability of the results of an evaluation experiment mainly depends on the expertise of the evaluators and propose the use of a double expert system (with both usability and domain experts) to increase the reliability of the results. This latter suggestion is taken into account when forming the group of human experts presented in this paper in order to ensure the reliability of the evaluation's results.
Taking into account the results of the evaluation, both fuzzy SAW and fuzzy WPM provided a similar view of the electronic image of the conservation labs of museums. However, we also performed a comparison test of the two methods using a different group of the human experts of the initial evaluation experiment. Nevertheless, some of the experts participated in both groups of evaluators. The comparison of the two methods gave precedence in the AHP-SAW combination with regard to the results, the accuracy, and the implementation of the method.
This combination could also be successfully be easily used in a different group of websites of cultural content, but it could be also used in the evaluation of the website of a completely different domain as the set of criteria is general and not really focused on cultural content. The group of evaluators should change to involve experts in a field related to the domain of the evaluated websites, but the rest of the steps would be similarly implemented.
In order to confirm the claim of the appropriateness of these methods in the evaluation experiments of websites of a different domain, it is among our future plans to apply these methods in other domains, such as e-government and e-health. Furthermore, we aim at comparing the results of these two models with other multi-criteria decision-making models in order to find out which model provides the best results.