MULTIMOORA under Interval-Valued Neutrosophic Sets as the Basis for the Quantitative Heuristic Evaluation Methodology HEBIN

During the last decade, researchers put a lot of effort into the development of the multicriteria decision methods (MCDM) capable of dealing with the uncertainty and vagueness of the initial information. MCDM approaches that work under the environment of the interval-valued neutrosophic sets (IVNS) demonstrate credibility for the analysis of different opinions as well as for the inconsistency of the criteria evaluation data. The novel multicriteria decision-making approach MULTIMOORA-IVNS (multi-objective optimisation by ratio analysis under interval-valued neutrosophic sets) is presented in this paper. A novel heuristic evaluation methodology HEBIN (heuristic evaluation based on interval numbers) that exploits MULTIMOORA-IVNS for the processing of the evaluation results is also presented in this research. HEBIN is able to increase the accuracy of the checklists-based heuristic evaluation and to diminish the impact of the inconsistencies caused by the evaluators. A comparison of six e-commerce websites is introduced to reveal the practicalities of the proposed multicriteria decision-making application.


Introduction
Multi-criteria decision making (MCDM) theory is intensively investigated for both the theoretical and implementation aspects. Since there are many real-life applications where the decision information cannot be rigorously represented due to its incompleteness, indeterminacy, and inconsistency, researchers are constantly looking for the novel mathematical modelling techniques that can be applied to deal with this kind of challenge.
The pioneering ideas to deal with non-rigid boundaries of decision information was proposed by Zadeh [1], who introduced the fuzzy set concept. By this theory, each object of the universe is described by the single relatively graded membership. Atanassov [2] extended traditional fuzzy sets formulation by incorporating the degree of hesitation into the decision-making and named this extension as the intuitionistic fuzzy sets (IFS). Since the IFS theory requires to keep the sum of the membership and non-membership degrees in the closed interval [0, 1], it also raises some limitations for the IFS applications. Therefore, the q-rung orthopair fuzzy sets were proposed as the augmentation of the intuitionistic fuzzy sets and Pythagorean fuzzy sets [3]. The q-rung orthopair fuzzy sets are governed by the condition that the summation of q th power of the membership function and the q th power of the non-membership grade that are limited in the interval [0, 1]. These and other extensions of the fuzzy sets were proposed by researchers for the implementation into the various MCDM problems [4][5][6][7].
Since fuzzy sets could not take into consideration all types of uncertainties that emerge in the construction of the mathematical models developed for the solutions of where T N (x) : X → [0, 1] , I N (x) : X → [0, 1] and F N (x) : X → [0, 1] with 0 ≤ T N (x) + I N (x) + F N (x) ≤ 3 or all x ∈ X. The variables T A (x), N A (x) and F A (x) define truth-membership degree function, the indeterminacy-membership degree function and the falsity-membership degree function of x to N, respectively. For the case of the interval neutrosophic set, these functions must be described as T N (x) = [in f T N (x), supT N (x)] ⊆ [0, 1] , I N (x) = [in f I N (x), supI N (x)] ⊆ [0, 1], 1] and the sum of these functions satisfy the condition 0 ≤ supT N (x) + supI N (x) + supF N (x) ≤ 3.
, [in f F N2 (x), supF N2 ] are two interval-valued neutrosophic numbers (IVNN), then N 1 is contained in the other neutrosophic element N 2 , N 1 ⊆ N 2 if and only if: Definition 3. Two IVNNs N 1 and N 2 are equal, described as N 1 = N 2 , if and only if N 1 ⊆ N 2 , and N 1 ⊇ N 2 .

Definition 4.
Comparison of the interval-valued neutrosophic numbers is completed employing the score, accuracy and certainty functions. For the interval-valued neutrosophic number N1 ] the mentioned functions are of the form where s(N 1 ), a(N 1 ) and c(N 1 ) means the score, accuracy and certainty functions of the IVNN N 1 , respectively. Definition 5. If N 1 and N 2 are two interval-valued neutrosophic numbers, then their determination should be completed in the following way: • If p(s(N 1 ) ≥ s(N 2 )) > 0.5, then N 1 is greater than N 2 or N 1 is superior to N 2 and this fact can be represented as N 1 N 2 . • If p(s(N 1 ) ≥ s(N 2 )) = 0.5 and p(a(N 1 ) ≥ a(N 2 )) = 0.5, then N 1 is greater than N 2 or N 1 is superior to N 2 and this fact can be represented as N 1 N 2 . • If p(s(N 1 ) ≥ s(N 2 )) = 0.5, p(a(N 1 ) ≥ a(N 2 )) = 0.5 and p(c(N 1 ) ≥ c(N 2 )) = 0.5, then N 1 is greater than N 2 or N 1 is superior to N 2 and this fact must be represented as N 1 N 2 . • If p(s(N 1 ) ≥ s(N 2 )) = 0.5, p(a(N 1 ) ≥ a(N 2 )) = 0.5 and p(c(N 1 ) ≥ c(N 2 )) = 0.5, then N 1 is equal to N 2 or N 1 is indifferent to N 2 and this fact can be represented as N 1 ∼ N 2 .
Definition 6. The degree of the possibility of the score function is determined by the following expression: where l N1 = sup(s(N 1 )) − in f (s(N 1 )) and l N2 = sup(s(N 2 )) − in f (s(N 2 )). The degrees of the possibility for the accuracy and certainty functions are calculated in the respective approach.

Definition 7.
If we consider two IVNNs The operations for IVNNs can be expressed as follows: The distance measure between two interval-valued neutrosophic numbers is described by the expression:

MULTIMOORA-IVNS Approach
The essence of the novel approach MULTIMOORA-IVNS consists of the operational functionality of interval-valued neutrosophic sets and crisp MULTIMOORA extensions described by [20].
Step 1. The initial step in the multicriteria decision-making methods is the construction of the initial decision matrix X, where elements x ij are interval numbers corresponding to the i th criteria of j th alternative. The normalisation of the decision matrix elements is done applying the function, that was specifically developed for appropriate estimation of the certain features of the neutrosophic sets and interval-valued numbers.
The proposed normalisation function ensures better stability and resolution range for the proposed MULTIMOORA-IVNS approach.
Step 2: The neutrosophication for the elements of the decision matrix. The members of the interval values are converted into interval-valued neutrosophic numbers applying the standard modification rates as in [14].
Step 3: Assembly of the neutrosophic decision matrix consisting of the elements (x * n ) ij .
Step 4: The first target of interval-valued neutrosophic MULTIMOORA proposal can be described as: where g elements match members of beneficial criteria, n − g match to members of nonbeneficial criteria. The second component in Equation (12) is constructed applying supplementing part of the interval-valued neutrosophic member, which can be described in the expression: Step 5: Calculation of the second objective of interval-valued neutrosophic MULTI-MOORA approach. The second objective is established taking into account deviation from the reference point and the Min-Max metric of Tchebycheff norm The reference point is calculated for the case of the beneficial criteria by the expression: In the case of the non-beneficial criteria r i is defined as: The matching of the interval-valued neutrosophic members is done by applying the degree of the possibility of the score function as followed in Definitions (6) and (7).
Step 6: Calculation of the third objective of interval-valued neutrosophic MULTI-MOORA expression. Full multiplicities should be used for the third objective, which implements the purely multiplicative utility function for the criteria to be maximised as well as for the criteria to be minimised. Consequently, for each analysed alternative must be assembled the common utility, which must be described: Here, A j and B j components are calculated as The product of maximised criteria of j th alternative represented by the first component A j . The product of minimized of criteria of j th alternative described by the second component B j .
Step 7: The final summarization of first, second and third goals of MULTIMOORA-IVNS is completed within the dominance theory framework [20].

Quantitative Heuristic Evaluation Methodology HEBIN
Heuristic evaluation [25] is a widely used website inspection method devoted to examining interfaces via the recommendations grounded on the user-centred design principles identified as heuristics [26]. Depending on the selected procedure, HE technique may provide qualitative or quantitative results. While qualitative heuristic evaluation (QLHE) brings extensive information about the quality of the single interface, quantitative heuristic evaluation (QNHE) provides numerical data mandatory for the comparison of the alternatives. However, quantitative heuristic evaluation is a challenging task since neither unified methodology on how to do it is presented for the current day.
González et al. [27] stated that results of the QNHE depend on the three main components: (I) the characteristics of the evaluators; (II) the set of the domain orientated heuristics and sub-heuristics and (III) the mathematical model that is chosen to process data. Comprehensive checklists (questioners) where heuristics are divided into the sub-heuristics are an important part of the QNHE [28]. The authors of this article compared several studies that employ checklist based QNHE to revealed differences in their applicability (Table 1). It can be seen in Table 1, that different sets of heuristics and sub-heuristics can be used for the QNHE. The amount and the experience of the evaluators participating in the experiments also differs. It is well known that inconsistencies related to the diverse expertise, culture, gender, age, attention and information processing capacities of the evaluators strongly affect results of the HE [32]. Irregular understanding of the predefined heuristics raises additional challenges in the heuristics-based decision making. However, the biggest struggles of the QNHE are associated with the selection of the mathematical model.
Usability index, which represents the total number of usability problems found on a website, divided by the total number of pages investigated on the site, was presented in [30]. The number of websites with the violated heuristic divided by the total number of analysed websites was calculated to compare the quality of the museum websites [29]. Shayganmeh et al. [31] stressed that indices (heuristics) described by indicators (sub-heuristics) are able to evaluate wider dimensions of the e-services websites and proposed to employ MCDM theory for the checklist-based comparison of the websites. PROMETHEE [33] was suggested to rank indicators, and Analytical Hierarchy Process [34] was proposed to weight indices. The final readiness values were obtained, adding products of indexes weights to the single average indicator readiness value.
Authors of this article believe that MCDM methods are an appropriate way to compare different interfaces based on the data collected from the checklist based heuristic evaluation. Therefore, in this paper, we decided to exploit the advantages of the interval numbers for the MCDM based quantitative heuristic evaluation. This novel methodology will be presented later in this section.

Heuristic Evaluation and the Inconsistencies of the Judgements
Traditionally heuristic evaluation is understood as the expert-based website inspection technique. According to Nielsen et al. [25], HE requires 3-5 evaluators to assess interfaces. HE provides the most reliable results when each of the experts works separately, but at the end of the experiment gathers together to reach a consensus on the evaluation results. If there is a possibility to bring all the team members on board, the probability of having a decision that everyone likes, respects, and supports increases. Nevertheless, there is always a possibility that the desire to reach an agreement might cause people to ignore some of the findings and to put aside insights that may derail the consensus decision. This situation has come to be known as the evaluator effect and has been well-documented by [35].
Ideally, heuristic evaluation should be performed by five usability experts having a deep understanding of the chosen heuristic set and the experience in the application domain. In practice, small companies often do not have a sufficient budget to hire usability experts; therefore, the need for the HE methodologies that can be performed by novice evaluators is getting increased attention. The term "novice evaluators" can be understood as the professionals that do not have enough knowledge on the user experience and possibly participate in the heuristic evaluation for the first time [36]. For such cases, checklist-based HE might be a beneficial approach to reduce misunderstanding related to the inconsistent interpretation of the heuristics. However, checklist-based criteria (heuristics) assessments are not able to remove all the inaccuracies raised by the differences of the evaluators. Therefore, MULTIMOORA-IVNS is proposed in this study as the mathematical model for the analysis of QNHE. The novel QNHE methodology HEBIN (Heuristic evaluation based on interval numbers) that exploit MULTIMOORA-IVNS for decision-making is also presented in this paper.

HEBIN Methodology
Heuristic evaluation based on interval numbers (HEBIN) methodology consists of 7 stages, each of which is briefly described in Figure 1. When the novice evaluators or the usability experts are hired for the experiment, the short briefing session, where the goal of the research, methodology and the chosen heuristics set explained by sub-heuristics, should be organised. Each of the evaluators is asked to work individually. The final estimate for each of the heuristics is calculated as the sum of the points assigned to the corresponding sub-heuristics. If five evaluators are hired for the experiment, five separate reports of the HE should be prepared for each of the alternatives. As soon as it is done, the collected data can be used for the construction of the initial decision matrix X consisting of the values x na : Here, a = 1, 2, . . . A denotes the number of the analysed alternatives and n = 1, 2, . . . N denote the number of the heuristics. Value x na for each alternative a and the heuristic H n has to be determined as the interval [minH na ; maxH na ], where minH na is the lowest estimate of the heuristics H n , and maxH na is the highest estimate of the heuristics H n among all five evaluators that presented their estimates for the heuristic H n . In this way, the inconsistencies caused by the experience of the evaluators can be recorded for further data processing. We propose to employ MULTIMOORA-IVNS as the appropriate approach to deal with the uncertainty and inconsistencies of the collected data.
Heuristic evaluation based on interval numbers (HEBIN) exploits the different opinions of the evaluators and does not seek consensus on the valuation results. There is only one requirement for the evaluators. All evaluators must use the same set of heuristics and sub-heuristics for the assessment of the alternatives.

HEBIN Application for the Comparison of E-Commerce Websites
Over the past few years, e-commerce has become an irreplaceable part of the international retail system. Global data platform www.statista.com shows that the total number of people purchasing goods and services online reached 1.92 billion customers in 2019th. In the same period, the total annual sales revenue from the e-commerce market topped 3.5 trillion U.S. dollars. Since we are living in the global industry and internet users can freely choose electronic shops (e-shops) where they would like to purchase, neither of the online business can be prosperous without the periodical appraisal of the e-commerce websites they own. In this context, analysis of the competitive environment is becoming especially important for the success of the online businesses. The competitor benchmarking allows business owners to identify the advantages and disadvantages of the solutions they provide, gives an understanding of the features, functions and design decisions successfully acting in the rival e-shops.
However, it is still a great challenge to judge and compare the quality of different electronic shops, since both the functional and non-functional requirements should be assessed to make reliable decisions on the quality of e-commerce websites. Even though functionality, security, privacy, accessibility and reliability are still traditionally recognised as the significant criteria affecting the value of the online shops [37]; trustworthiness, personalisation, navigation and customer support are slowly becoming the decisive factors for the customers' willingness to buy [38,39]. While non-functional requirements like user experience have the positive impact for the quality of the electronic shops and the negative effect on the uncertainty of the assessment information [40,41], specific checklists capable of collecting data on the user experience of the websites should be chosen for the competitor benchmarking.
Quinones and Rusu [26] made a review of the studies where various sets of domainorientated heuristics were offered. Research presented by Bonastre and Granollers [42] was the only one appraising the user experience of e-commerce websites. The checklist presented in [42] consists of 64 questions divided into six stages of online purchasing: (1) need recognition and problem awareness, (2) information search, (3) purchase decisionmaking, (4) transaction, (5) post-sales behaviour and (6) other factors that affect the user experience. Since these stages of the purchasing process cannot be directly mapped with the heuristics representing service quality, system quality and information quality [43,44], based on it we composed a new checklist dedicated to assessing trust, response time, reliability, responsiveness, empathy, timeliness, accuracy, navigability and accessibility of e-commerce websites. Nine criteria that we analysed as heuristic were proposed by Nilashi in [45]. The novel checklist that consists of 9 heuristics and 82 sub-heuristics is presented in Table A1 in Appendix A.
Three different scales are proposed to assess sub-heuristics. Most of the sub-heuristics can be measured in a two-point scale, where 0 means "No", 1-"Partly yes", 2-"Yes". Since reputation is a critical aspect of any online business, the sub-heuristic TR1 has the 5-point evaluation scale. Checklist items that describe accessibility (AC) issues are the only ones that require an additional tool for the assessment of the sub-heuristics.
In the study case presented in this paper, evaluators were recommended to use https: //www.webpagetest.org to measure webpage size and the loading time.

Weighting of the Heuristics
Criteria weighting is an important part of any multicriteria decision-making process. Direct and indirect weighting approaches can be applied for the criteria weighting. When indirect methods are applied, criteria weights are derived from mathematical modelling, whereas in the direct methods, the decision-makers compare criteria directly, via a chosen ratio scale. Direct weighting (DW) techniques like the SWING [46], SMARTS [47], SMARTER [47], point allocation [48], direct rating [48], or the VASMA weighting [49] were recently applied in various MCDM tasks [49][50][51].
SMARTS methodology was chosen for the heuristics weighting in HEBIN methodology. Ten external experts working with online shopping were asked to participate in the experiment. A matrix constructed of nine visual analogue scales (VAS) with the endpoints meaning "Not important" (numeric value 1) and "Very important" (100) was printed and presented for each of the evaluators to simplify the preference elicitation process. The distance between the tick marks of the VAS scales was determined to 5.
At the beginning of the meeting, all ten decision-makers (DM) agreed that Trust (TR) is the most important aspect of the e-commerce business. Also, they decided that all nine heuristics involved in the evaluation procedure have a significant impact on the quality of the electronic shop. Therefore, 50 was determined as the minimum value that can be given to assess the importance of heuristics. Then, all ten DMs individually ranked the heuristics according to their importance to the quality of the e-commerce websites. SMARTS weights provided by the DMs are provided in Table 2, and their normalised values are shown in Table 3. It was also determined that online shops of the highest quality are those where the maximum number of points is given for heuristics trust (TR), reliability (RE), customer support (CS) , empathy (EM), ease of site navigation (SN) and the accuracy of information (AI). The minimum number of points should be assigned for heuristics System response time (RT), the number of accessibility issues (AC) and the amount of outdated content (OC).

Data Collection and Construction of the Decision Matrices
Since HEBIN is designed as the methodology that can be used by both the experts and the novice evaluators, 30 persons with different online shopping experiences were asked to assess the quality of the chosen e-commerce websites. Specifically, 5 UX experts, 5 IT professionals, 5 middle-aged persons (who do not work in IT industries) and 15 multimedia students participated in this study. Six global e-commerce websites (A1. Amazon.com, A2. Walmart.com, A3. Rakuten.com, A4. Ebay.com, A5. Aliexpress.com, A6. BestBuy.com) were analysed in the experiment, which was completed in January of 2019.
Each of the participants assessed all the alternatives individually and then sent the prefilled questioners to the organizers of this study. When all the appraisals were collected, we analysed these responses as six different experiments designed to determine how HEBIN responds to the HE performed by different target groups (15 students were randomly divided into three groups of 5 people).
Although all the participants used the same checklist to judge the websites, individual assessments of the heuristics diverged. While the dispersion of the judgements gathered the UX experts was noticeably small, judgements collected from the novice evaluators were much more diverse. For instance, Trust (TR) of the alternative A5 got 17-18 points from UX experts; 14-18 points from IT professionals; 14-21 points from the persons who do not work in IT industries; 13-20 points from the first group of students; 10-19 from the second group of students and 10-20 points from the third group of students. Such inconsistency in the HE results might have a significant effect on the final rankings of the analysed alternatives. Therefore, six separate decision matrices X were constructed for each of the target groups. The decision matrix for expert based judgements is presented in Table 4. Another five matrices were generated in the same manner. When the decision matrix X was constructed (Table 4), and the importance of the heuristics (weights) was determined (Table 3), the novel multicriteria decision-making approach MULTIMOORA-IVNS was applied to determine the final ranks of the alternatives. Elements of the initial decision matrix X calculated after the normalisation and the neutrosophication are presented in Table 5. The normalisation function that was applied is presented in Equation (10). The first target (the interval-valued neutrosophic ratio system objective) was calculated by means of the Equations (11) and (12). Rankings for the first objective of the MULTIMOORA-IVNS are presented in Table 6. The second objective of the neutrosophic MULTIMOORA approach was calculated as the deviation from the reference point and the min-max matrix. Equations (13)-(15) were applied to get the scores of the second objective: The third objective of MULTIMOORA -IVNS approach is presented as the matrix U, which is calculated by the Equation (16), where A j is the product of criteria of the j th alternative to be maximized and B j corresponds to the product of criteria of the j th alternative to be minimized (Table 7).  Finally, the dominance theory was applied for the summarisation of all three objectives. The final ranks of the six international e-commerce websites are presented in Table 8. Analogous calculations were done for each of the six decision matrices constructed from the data of the experiment. The final ranks of the analysed websites determined separately for each of the target groups are provided in Figure 2. It can be seen that alternative A5 (Aliexpress.com) was recognized as the leader among the IT professionals and UX experts. A1 (Amazon.com) was identified as the best website for the professionals who do not work in IT industries, and A4 (Ebay.com) was detected as the best website for all three groups of multimedia students. However, it must be mentioned that the presented study was performed at the beginning of 2019, and currently, the quality of these websites might be altered.

Results and Discussion
Comparison of the MULTIMOORA-IVNS and MULTIMOORA-SVNS [14] was completed to analyse the credibility of the interval-valued neutrosophic sets. Since MULTIMOORA-SVNS works with single-valued numbers, the new decision-making matrices X' were constructed, where intervals [minH na ; maxH na ] were converted to the single-valued numbers x na by a formula: where minH na is the lowest estimate of the heuristics H n and maxH na is the highest estimate of the heuristics H n among all five evaluators that assessed the alternative a. The example of such a decision matrix constructed from the judgements of UX experts is presented in Table 9. Table 9. Decision matrix constructed to assess alternatives via MULTIMOORA-SVNS approach (constructed from the judgements of UX experts).

Heuristic ID
Analogously, decision matrices were constructed for the rest of the five target groups. Then, MULTIMOORA-SVNS approach [14] was applied for the ranking of the alternatives. Rankings calculated by MULTIMOORA-SVNS are presented in Figure 3. Comparison of the MULTIMOORA-IVNS and MULTIMOORA-SVNS also disclosed that MULTIMOORA-IVNS provides high stability among the rankings calculated for all three groups of multimedia students (Figure 2). Such stability cannot be seen when MULTIMOORA-SVNS is applied (Figure 3). This finding suggests that interval-valued neutrosophic sets should be chosen when decision-makers are trying to understand how alternatives are ranked in the target group where assessors have a similar experience on the analysed topic. However, more studies should be performed to approve or negate this finding.
Additionally, the comparison of four different multicriteria decision-making approaches was completed for the sensitivity analysis. MULTIMOORA-IVNS, MULTIMOORA-SVNS [14], WASPAS-SVNS [52], and Crisp PROMETHEE [53] were applied for the comparison of rankings based on the data provided by UX experts (Tables 4 and 9). The results presented in Table 10 displays high consistency in the alternative ranking regardless of the chosen MCDM method. This shows the reliability of MULTIMOORA-IVNS and also implies that the checklist presented in A1 was appropriately constructed for the assessment of the e-commerce websites.

Conclusions
The novel multicriteria decision-making approach MULTIMOORA-IVNS (multiobjective optimisation by ratio analysis under interval-valued neutrosophic sets) was presented in this paper. The original quantitative heuristic evaluation methodology HEBIN that exploit IVNS theory was also presented in this paper. HEBIN under MULTIMOORA-IVNS is an easy-to-use approach that exploits the advantages of the interval-valued neutrosophic sets and reduces biases and instabilities caused by novice evaluators. In this study, HEBIN was used to assess the quality of the six international e-commerce websites. A comparison of the results provided by MULTIMOORA-IVNS and MULTIMOORA-SVNS revealed that MULTIMOORA-IVNS is a reliable MCDM approach, which shows its credibility when the distribution of the opinions in the group of the evaluators is growing.

RE15
Is it easy to modify the number of products in the shopping cart? 0-2

RE16
Are the additional charges (taxes and shipping costs) shown as soon as possible? 0-2

RE17
Is there a possibility to purchase goods without registration? 0-2

RE18
If the registration is necessary, is the process quick and require only the fundamental information? 0-2

RE19
Is the button confirming the purchase clearly visible in the interface? 0-2 RE20 Is the checkout process divided into logical and easy understandable steps? 0-2

RE22
Is the progress indicator shown in the checkout process? 0-2

RE23
Is it possible to track the status of the orders? 0-2

RE24
Is there a possibility for the registered users to modify or cancel the order? 0-2

ID Heuristics/Sub-Heuristics Evaluation Scale
System response time (RT)

RT1
How long does it take to launch the homepage of the website? Seconds

RT2
What is the homepage download size? MB

RT3
How long does it take to launch the product page of the website? Seconds

RT4
What is the product page download size? MB

AC1
Are there any difficulties to open the website on the computer screen? 0-2

AC2
Are there any issues to see the website on mobile phones? 0-2

AC3
Are there any issues, that makes it difficult to use the site for persons with disabilities? 0-2