Evaluating Museum Virtual Tours: The Case Study of Italy

: Virtual tours in museums are an ideal solution for those that are not able to visit a museum or those who want to have a small taste of what is presented in the museum before their visit. However, these tours often encounter severe problems while users interact with them. In order to check the status of virtual tours of museums, we present the implementation of an evaluation experiment that uses a combination of two multi-criteria decision making theories, namely the analytic hierarchy process (AHP) and the fuzzy technique for order of preference by similarity to ideal solution (TOPSIS). AHP has been used for the estimation of the weights of the heuristics and fuzzy TOPSIS has been used for the evaluation of virtual tours of museums. This paper presents the exact steps that have to be followed in order to implement such an experiment and run an example experiment for virtual tours of Italian museums.


Introduction
Virtual reality (VR) technology offers an ideal presentation medium for museums and other cultural heritage institutions [1]. These appealing technological systems have demonstrated their usefulness and value in science centers and traditional museums all over the world, thanks to the fact that visitors can view digitized artworks and explore reconstructed historical places by means of virtual museum-hosted installations [2]. However, the implementation of VR installation is complicated and expensive and, therefore, in most cases, the final environment may experience several usability problems.
Virtual museums' lack of usability is commonly attributed to non-comprehensible or too complicated navigation techniques being used [3]. Regarding navigation in virtual environments, users can face disorientation problems, loss of overview and difficulty relocating previously visited locations [4]. In this context, navigation assistance is considered to be an important factor of usability improvement and numerous approaches have been proposed in order to provide virtual museums in which virtual visitors have realistic navigation abilities and observation experiences of the presented exhibits.
Another factor that acts against the development of a VR environment is the lack of public experience in interacting with such environments. This problem is more obvious in cases where the users are elderly or novices in the usage of computers. In addition, the virtual museum and its functionality should be accessible to a wide range of users including people with diverse skills

Materials and Methods
In this section, we present the materials and methods used for implementing the proposed model for the usability evaluation of virtual tours in museums. For this purpose, the heuristics proposed by Sutcliffe and Gault [11] that are specialized for the usability evaluation of VR environments have been used and are the following: 1.
Natural engagement: how close the interaction is to the real world (Ne).

2.
Compatibility with the user's task and the domain: how close the behavior of objects is to the real world and affordance for task action (C).

3.
The natural expression of action: does the system allow the user to act naturally? (Na).

4.
Close coordination of action and representation: quality of the response between user movement and Virtual Environment (Cr).

5.
Realistic feedback: visibility of the effect of users' actions and conformity to the laws of physics (Rf). 6.
Faithful viewpoints: the naturalness of change between viewpoints (Fv).

7.
Navigation and orientation support: naturalness in orientation and navigation. Is it clear where they are and how they return? (No). 8.
Clear entry and exit points: clearness of entry and exit points (Ce). 9.
Consistent departures: consistency of departure actions (Cd). 10. Support for learning: promotion of learning (L). 11. Clear turn-taking: clearness of who has the initiative (Tt). 12. Sense of presence: the naturalness of the user's perception of engagement in the system and being in a 'real' world (P).
These heuristics are used as the criteria for the evaluation. Multi-criteria decision making theories are ideal for combining these criteria and evaluate different alternatives such as different virtual tours. For this purpose, AHP is used for calculating the weights of the criteria and then fuzzy TOPSIS is used for evaluating the different alternative virtual tours.

AHP
The steps of the implementation of AHP that are used for the estimation of the weights of the criteria, which have already been selected and are the heuristics presented above, are:

1.
Form the set of evaluators: for the estimation of the weights of the heuristics, it is important to have the view of experts on the field. Therefore, the set of evaluators is only composed of human experts. The selection of expert-based evaluations has many advantages [5] and the correct choice of experts gives more reliable and valid results.

2.
Setting up a pairwise comparison matrix of heuristics: in this step, a comparison matrix is formed so that the heuristics are pairwise compared.

3.
Calculating the weights of criteria: after making pairwise comparisons, estimations are made that result in the final set of weights of the criteria.
As soon as, the weights have been calculated, the virtual tours are evaluated and the analysis of the results of the evaluation is made based on fuzzy TOPSIS.

Fuzzy TOPSIS
Fuzzy TOPSIS presupposes that the weights of the criteria have been estimated and it uses them to evaluate alternative virtual tours in museums. The main steps of fuzzy TOPSIS for the evaluation of alternative virtual tours are the following:

1.
Forming a new set of evaluators: in this phase of the evaluation experiment, the set of evaluators was formed, following the taxonomy of types of users of cultural websites proposed by Sweetnam et al. [23]. This group may be the same as the one formed in step 1 of the application of AHP or may be different.

2.
Assigning values to the criteria: in order to make this process easier for the user, especially for those that do not have experience in multi-criteria analysis, the users could use linguistic terms for characterizing the twelve heuristics presented above. The linguistic terms are presented in Table 1.

3.
Linguistic terms are transformed into fuzzy numbers: each linguistic term is assigned to a fuzzy number, which is a vector a = (a 1 , a 2 , a 3 ). The matches are presented in Table 1 [16].

4.
Construction of the Multi-Criteria Decision Making (MCDM) matrix: a fuzzy multi-criteria group decision-making problem can be expressed in matrix format. Each element of the matrix is a fuzzy number. However, in order to aggregate all the values of the decision-makers in one single value, the geometric mean is used. The geometric mean of two fuzzy numbers a = (a 1 , a 2 , a 3 ) and b = (b 1 , b 2 , b 3 ) is calculated as follows: x 1n where i shows the alternative and j shows the criterion. Each x ij = (a ij , b ij , c ij ) is a triangular fuzzy number.

5.
The normalization of fuzzy numbers: to avoid the complicated normalization formula used in classical TOPSIS, the authors in [16] propose a linear scale transformation in order to transform the various criteria scales into a comparable scale. This particular normalization method aims at preserving the property that the ranges of normalized triangular fuzzy numbers belong to [0,1]. The normalization of a fuzzy number x ij = (a ij , b ij , c ij ) is given by the formula:

6.
Calculating the weighted normalized fuzzy numbers of the MCDM matrix: considering the different importance of each criterion, which is imprinted in the weights of the criteria, the weighted normalized fuzzy numbers are calculated: u ij = r ij (·) w j and these values are used to construct the weighted normalized fuzzy MCDM matrix V = [ u ij ] M×N , i = 1, 2, . . . , m; j = 1, 2, . . . , n.

7.
Determination of the fuzzy positive-ideal solution (FPIS) and the fuzzy negative-ideal solution (FNIS): the FPIS and the FNIS are calculated as follows.
FPIS : FNIS : where d u ( a, b) is the distance between two fuzzy numbers a, b. The distance between two fuzzy numbers a = (a 1 , a 2 ,

9.
Calculation of the closeness coefficient of each alternative: the closeness coefficient of each alternative j is given by the formula According to the values of the closeness coefficient, the ranking order of all the alternatives is determined. The alternative that is closer to FPIS and further from FNIS as CC i approaches 1. The values of the closeness coefficient of each alternative and the final ranking of the evaluated websites are presented in Table 2. Table 1. Linguistic variables and fuzzy numbers.

Implementation of the Experiment
For the implementation of the experiment, we conducted a search on the websites of museums in Italy in order to locate the museums that offer a virtual tour. This search revealed that only a few museums in Italy offer virtual tours on their websites (all these websites were retrieved in September 2019): As soon as the set of alternative museum virtual tours is formed, AHP is applied to calculate the weights of the criteria used in the reasoning process of the decision-makers. For the application of AHP, first, we form the set of evaluators. The group of the evaluators involved one professional researcher in VR, one researcher in pattern recognition, one researcher in software engineering and one researcher in cultural heritage conservation.
The next step involved setting up a pairwise comparison matrix of heuristics. Each one of the four experts is asked to fill out the pairwise comparison matrix so that the heuristics are pairwise compared. As a result, four matrices are completed. In order to find the final pairwise comparison matrix of heuristics, we calculated the geometric mean of the values of all corresponding cells belonging to the four matrices completed by the experts.
Finally, the weights of the criteria were calculated. In this step, the principal eigenvalue and the corresponding normalized right eigenvector of the comparison matrix gave the relative importance of the various criteria being compared. The elements of the normalized eigenvector were the weights of the heuristics. There are several methods for calculating the eigenvector. Multiplying together the entries in each row of the matrix and then taking the n-th root of that product approximates the correct answer. The n-th roots are summed and that sum is used to normalize the eigenvector elements to add to 1.00. In terms of simplicity, we had used the 'Priority Estimation Tool' (PriEst) [24], an open-source decision-making software that implements the AHP method, for making the AHP calculations. The weights of the criteria were estimated as follows: w Ne = 0.049, w C = 0.038, w Na = 0.042, w Cr = 0.186, w R f = 0.055, w Fv = 0.084, w No = 0.163, w Ce = 0.047, w Cd = 0.015, w L = 0.139, w Tt = 0.054, w P = 0.129.
Then, fuzzy TOPSIS was used to implement the next phase of the experiment, which involved the evaluation of the museum virtual tours. At the beginning of the application of fuzzy TOPSIS, a set of evaluators had to be formed. We used the same set of four evaluators that were used in the application of AHP. Each evaluator visited each one of the virtual tours presented above. For each one of the museums selected to be evaluated, the users had to assign a linguistic term to each criterion. The linguistic terms are presented in Table 1 [16]. This resulted in four matrices with linguistic terms.

Results
The linguistic terms in the four matrices created by the evaluators were then replaced with fuzzy numbers. The fuzzy numbers that correspond to the linguistic terms are presented in Table 1. In order to aggregate all the values of the four matrices created by the decision-makers in one single matrix, the geometric mean was used. This process resulted in one single decision matrix: where i shows the alternative museum and j shows one of the 12 criteria. Each element of the matrix is a triangular fuzzy number. In our evaluation, x 12 = (0, 1, 3). In order to transform the various criteria scales into a comparable scale, all values of the decision matrix should be normalized. For this reason, we used the linear scale transformation proposed by Chen [16]. This particular normalization method aims at preserving the property that the ranges of normalized values belong to using the formula: all triangular fuzzy numbers that exist in the decision matrix belong to [0,1]. In our evaluation, for example, r 12 = (0.00, 0.12, 0.36). Taking into account that the criteria have different weights in the reasoning process of the decision-makers, we calculated the values of each criterion with the corresponding weight of the criterion. As a result, the weighted normalized fuzzy MCDM matrix V = [ u ij ] M×N , i = 1, 2, . . . , 16; j = 1, 2, . . . , 12 is constructed. In this matrix, the element u 12 was estimated to be (0.0000, 0.0045, 0.0136). TOPSIS is based on the reasoning which estimates the Euclidean distances of each alternative from an ideal alternative and from the negative-ideal alternative. For reasons of simplicity, FPIS is considered as: A * = (1, 1, 1), (1, 1, 1), . . . , (1, 1, 1), . . . , (1, 1, 1) and FNIS as: A * = (0, 0, 0), (0, 0, 0), . . . , (0, 0, 0), . . . , (0, 0, 0) . Considering these values, we have calculated the distances (d * i and d − i ) of each weighted alternative museum i = 1, 2, . . . , 16 from FPIS and FNIS. In Table 2, one can see in the second column the distance of each museum from the fictitious ideal museum and in the third column the distance of each alternative from the negative-ideal alternative. These values were further used to calculate the closeness coefficient of each alternative museum, which is presented in the fourth column of Table 2. The museum with the highest value of the closeness coefficient was considered to be the best. The final ranking of the evaluated websites is presented in the fifth column of Table 2.
The virtual tour of the museum Carezzonico was considered to be the best as it had the highest value of the closeness coefficient in comparison to all the others. The virtual tour of Cavallogiocattolo was ranked as second while the virtual tour of the Ebraico museum was ranked as third. The worst virtual tours were considered to be those of the Maotorino museum and the museum of Zoologia.

Analysis of the Results and Discussion
Searching for virtual tours of museums in Italy revealed that only a few museums that have a website also have a virtual tour. Also, searching on the webpage for the link from which the virtual tour can be accessed is sometimes a problem due to poor design of the webpage.
The reviewed virtual tours are implemented using worldwide web technologies, thus increasing the ability of users to access them using standard computing equipment and software. The virtual tours are based on 360-degree captured views and not on computed generated three-dimensional (3D) environments, thus providing a realistic view of the museum. A major drawback of virtual tours based on 360-degree views is the limited degree of freedom in navigating and the poor navigation experience. Additionally, most of the virtual tours do not provide interaction with the represented space or the hosted exhibits, and the information communication methods, when present, are mostly based on text descriptions of the exhibits.
For the evaluation of the museums' virtual tours and their classification, we used a combination of AHP with fuzzy TOPSIS. AHP has been used for the estimation of the criteria weights and then fuzzy TOPSIS was used to evaluate the different virtual tours of museums in Italy. As fuzzy TOPSIS was used, the evaluators could use linguistic terms for evaluating the different alternatives. This is one of the advantages of fuzzy TOPSIS and a reason that makes it easier for all evaluators to characterize the criteria. The linguistic variable is then transformed into triangular numbers and then all estimations take place.
Generally, the evaluators used the value 'Low (L)' much more than all the other values. Values such as 'Low (L)' were used in 32% of all rates, while values as 'Very High (VH)' were used very rarely (2% of all rates). The values 'Medium (M)' and 'High (H)' were almost equally used (28% and 25%, respectively). Generally, the 'Natural engagement-Ne' and the 'Compatibility with the user's task and domain-C' of the virtual tours were characterized as 'Low' (80% of all rates assigned to all evaluated virtual tours). Similarly, bad rates had the criterion 'Realistic feedback-Rf', which the evaluator characterized in all virtual tours as 'Very Low' or 'Low', and 'Natural expression of action-Na' has been characterized as 'Low' in 89% of cases. The most 'Medium' values of the criteria were assigned to 'Faithful viewpoints-Fv' and 'Clear entry and exit points-Ce'. The only criterion that was revealed to be good in almost all virtual tours as it received many 'High' values (85%) and some 'Very High' values was that of 'Clear turn-taking-XX'. However, the values cannot reveal how good a virtual tour is as all criteria do not have the same weight in the reasoning process of the decision-makers. For this purpose, a multi-criteria decision-making theory like TOPSIS seems very appropriate.
Generally, the interaction with the virtual environments was considered to be rather satisfactory. The main problems identified in general in the virtual tours of the museums in Italy were that very few of them were equipped with learning support, which could be very useful for understanding the context. Also, most of the virtual tours showed a middle-low naturalness of change between viewpoints. Further problems were identified related to: (i) the navigation functionality; (ii) the interactivity of the represented space and exhibits; (iii) the information communication methods; etc.
Due to these issues, the reviewed virtual tours received bad ratings for most of the evaluation criteria. The fact that the virtual tours are based on 360-degree views with limited degrees of freedom in navigating is considered to be the reason why the evaluators state that the reviewed virtual tours' 'Natural engagement-Ne', the 'Natural expression of action-Na' and the 'Realistic feedback-Rf' are 'Low' or 'Very Low'. The fact that the virtual tours do not provide interaction functionality is considered to be the reason that the 'Compatibility with the user's task and domain-C' is evaluated to be 'Low'.

Conclusions
Internet technologies have the tremendous potential of offering virtual visitors ubiquitous access via the worldwide web to a virtual museum environment [25]. However, these technologies often encounter several problems. Taking into account the usability problems that are often encountered in VR environments [9,11], the authors present a solution to the research question of combining heuristics for the evaluation of virtual tours of museums and present an evaluation experiment using a combination of two multi-criteria decision-making theories.
For the evaluation of the museums' virtual tours and their classification, we used multi-criteria decision-making theories. Indeed, the evaluation of a virtual tour is a multi-dimensional problem that requires the evaluation of several criteria. Therefore, we have used AHP in combination with fuzzy TOPSIS for the implementation of the evaluation experiment. These two theories seem rather complementary and have been used in several domains but never before for the evaluation of virtual tours.
AHP has the ability to model expert opinion and, therefore, was considered ideal for this evaluation experiment since the set of the decision-makers consisted only of experts. Therefore, AHP was used for the calculation of the weights of the criteria. The application of the theory assigned the highest weights to the three following criteria: 'Close coordination of action and representation-Cr', 'Navigation and orientation support-No' and 'Support for learning-L', while criteria such as 'Consistent departures-Cd', 'Compatibility with the user's task and domain-C' and 'Natural expression of action-Na' were found by the application of the AHP to be the least important criteria.
AHP was considered ideal for calculating the weights of the criteria but it was not selected to be used for the evaluation of the virtual tours because it is a time-consuming technique due to the mathematical calculations required. However, the main drawback of AHP is that the number of pairwise comparisons increases as the number of alternatives and criteria increases or changes. Since complexity rises with the increase in websites, the number of alternatives that can be compared is limited. This is one of the main reasons for our decision to combine AHP with another theory. The theory that was selected to implement the empirical method in the second phase of the evaluation experiment was fuzzy TOPSIS. This particular theory was selected because the complexity of TOPSIS's application does not increase at the same rate as AHP when the number of alternative websites increases. Therefore, this theory was considered to be more appropriate for the evaluation of virtual tours. A main drawback of TOPSIS is, however, that it does not provide a specified way for calculating the weights of criteria as AHP does.
Taking into account the advantages and disadvantages of AHP and TOPSIS, these two theories have different reasonings but seem rather complementary. The main advantage of fuzzy TOPSIS is that it uses linguistic variables and, therefore, it is easier for all evaluators to characterize the criteria. The linguistic variable is then transformed into triangular numbers and then all estimations take place.
The main problems identified in our evaluation of the virtual tours of the museums in Italy had to do with the implementation issues of the VR interaction, namely: (i) the navigation functionality; (ii) the interactivity of the represented space and exhibits, and (iii) the information communication methods. Most of the reviewed virtual tours do not provide additional information related to the exhibits, e.g., annotations, supplementary content, etc., which could be of use for learning purposes. Despite these minor problems that were identified, the general impression was positive. The interaction with the users was very good and friendly and the experts did not spot serious problems.