Towards Sustainability in E ‐ Banking Website Assessment Methods

: Nowadays, banking services have evolved from offline financial services to online platforms available in the form of websites and mobile applications. While multiple methods exist for evaluation of generic ‐ purpose websites, the appraisal of banking services requires a more sophisticated approach. Multiple factors need to be taken into consideration, revolving not only around technical and usability aspects of the sites, but also considering the economic and anti ‐ crisis factors. Moreover, due to the fact that one of the groups of people interested in banking services assessment are potential clients, which might or might not be technically and theoretically literate, a sustainable approach to banking services evaluation is needed. The main contribution of this paper is a sustainable approach balancing the evaluation accuracy with usage simplicity and computational complexity of evaluation methods. Also, a reference model for banking services evaluation is provided. In practical terms, a set of all significant commercial banking services in Poland is assessed. Last, but not least, a preliminary study of practical applicability of various evaluation methods amongst computer ‐ literate banking clients is performed.


Introduction
One of the most important problems associated with bank management at present is how to maintain the existing clients and how to acquire new ones.It is particularly difficult in the case of individual clients, whose choices are determined not only by measurable economic or technical factors but also non-measurable aspects, for example, trends, unwillingness to make changes and cultural or psychological determinants.These choices are influenced also by factors that are independent of clients, such as economic policy of a given country.Moreover, from the banks' perspective, such choices are frequently associated with the possibility of their further operations and functioning in an increasingly competitive market.It is important to note that the bank is perceived by individual clients mainly through the perspective of contact or the efficiency with which tasks are carried out or the client's financial problems are solved.This points to the importance of information technology in providing clients with the highest quality services such as HCI (human computer interaction).
Presently, the common manifestations of these services are mainly the two most popular ebanking tools: i-banking and m-banking.The first one refers to the use of website browsers, and the second one to the use of mobile applications of particular banks.The significance of the problem is indicated by the fact that in the European countries like Norway, Finland, etc., which are most developed in terms of e-banking, the penetration rate at the end of 2018 amounted to over 90%, in Europe the level reached 54%, and in Poland, it was estimated at the level of 44% [1].In the case of Poland, this means that at the end of the third quarter of 2019, over 18 million clients actively used ibanking and more than 10 million used m-banking services [2].
In the banking sector, the basic categories of users include individual and institutional clients, as well as the representatives of the website/application owners (bank employees).A banking software analyst/designer is a separate category in this case.Each of them has different skills, knowledge, diverse education or training, and different requirements resulting from intuition or awareness of using computer systems.The category of individual e-banking clients is the most diversified, so there is a need for "optimization", both in terms of the methods of communication and the scope of functionality of banking systems.One should also pay attention to the need of finding a common language for such diverse categories of users.
Websites and applications should be tailored to the sector of the services they represent.In particular, one should take into account the fact that banking systems contain a number of characteristics/criteria related to typically financial assessment, which on the one hand, affects their designing process, and on the other, their assessment by the user.In the literature, however, there are universal, unified guidelines for creating websites and internet applications [3,4], but in practice, there is a need to use specific, individual assessment criteria, tailored both to the users' requirements or anticipated needs and the requirements and conditionings of the economic sector.
The considerations to date show that the assessment and selection of the banking system requires an individual client to make decisions based on many complex, diverse factors (criteria) specific to the system.The above situation meets the conditions that locate it among the solutions associated with solving multi-criteria problems: There are many criteria, often contradictory or poorly structured, which are subjectively perceived by individual users.Contrary to single-criteria optimization methods, multi-criteria methods do not present the optimal value for one indicator, but rather a specific "compromise" Pareto-optimal value [5][6][7].However, the basic research problem is the question of selecting the best MCDA method for the decision-maker to the problem under consideration.As far as methodology is concerned, this issue is increasingly present in the literature on the subject [8][9][10].
Although there is a considerable number of methods for evaluating general-purpose websites, such as eQual [11], Web Portal Site Quality [12], SiteQual [13], or SERVQUAL [14], banking platforms are more sophisticated systems, which require a more comprehensive evaluation approach.Not only does the list of evaluation criteria need to be expanded, but also the selection of MCDA aggregation method is crucial, as selection of an improper method can result in completely different final rankings [15].Nonetheless, the analytical capabilities of individual clients should also be taken into consideration when selecting the evaluation method.This, in turn, constitutes the research gap, which this paper tries to address.
The authors' main contribution in this paper is aiming towards a sustainable approach to banking service assessment, taking into account three pillars: Evaluation methods' computational complexity, usage simplicity, as well as the methods' accuracy.Moreover, a reference bank services assessment model comprised of 18 criteria divided into three groups is provided.In practical terms, a comprehensive multi-method assessment of all significant banks in Poland is performed.Last, but not least, an additional contribution is provided in the form of a survey study of practical applicability of the proposed evaluation methods.Statistical familiarity and preference of users regarding various evaluation methods was studied.
After this introduction, in Section 2, a literature review justifying the research problem is presented.In Section 3, the proposed methodological framework is explained.The empirical research follows in Section 4. Conclusions and possible future works are indicated in Section 5.

Literature Review
Since the emergence of information technology systems, one of the key issues was how to assess and select the best IT systems to be used in the organization.In the 1990s, the problem gained recognition due to the widespread use of IT systems and the implementation of expensive integrated, IT systems.In this context, the biggest problem from the managers' point of view was how to achieve a return on their investment, i.e., the reimbursement of increasingly higher costs of IT solutions, and how to achieve the functionality resulting from the implementation of information system [16].The reasons indicated above partially explain why the researchers and business practitioners have strived to look for the best method of achieving economic efficiency with regard to the use of IT systems and focused on their broadly understood utility [17].It was not a trivial task, however after years of searching, most researchers came to the conclusion that there is no one universal measure of effectiveness or one widely applicable measure of utility/usefulness [18][19][20].Searching for universal indicators based on the amount of time allocated to developing software did not bring the expected results [21] even though there were many attempts to include these factors into the standards of IT systems' evaluation, at least in terms of the assessment of software itself, not its application.It is currently believed that the method of solving this problem should rather be adapted to the decisionmaking situation [22].It is important to note that the situation regarding the assessment of online tools is not very different in this respect.
The assumptions of most e-banking evaluation methods are largely based on e-commerce website assessment models [23][24][25].These are methods traditionally derived from simple methods applied since the 1990s, based on sets of criteria specified for a given industry and their assessment according to the adopted scale.It is important to indicate that technical and functional factors predominate among the criteria groups [26].It is difficult to talk about objectivity in this case, since many of these sets contain highly subjective factors, such as readability of the text, the attractiveness of color schemes, photos or videos, high quality of presentation, ease of navigating the site, etc.).Frequently, achieving the desired results can be carried out with the use of several tailored sample assessment criteria (such as the Web Assessment Index method, which focuses on four categories: speed, availability, screen navigation and content analysis [27]).Practitioners often choose methods that are as simple as possible from the point of view of providing input information for establishing indicators or the ease of interpretation, which is needed to make a decision concerning the adoption of a new IT system or changes to an existing solution.The literature on such assessment of electronic banking systems is very extensive and shows that the banking websites are analyzed from the point of:
This multitude of factors considered prior to selection of a bank creates an interesting research problem of multi-criteria evaluation of banking websites.Whilst the problem of generic websites' evaluation was widely studied and resulted in evaluation methods such as eQual [11], Web Portal Site Quality [12], SiteQual [13], SERVQUAL [14], or PEQUAL [35,36], the study of multi-criteria evaluation of banking services is still in its preliminary stage, which constitutes the research gap addressed in this paper.

MCDA Foundations of the Bank Websites' Evaluation
There are numerous approaches for multi-factor decision making that can be divided into five categories.The first one, simple methods such as scoring methods [37,38], are often used in large-scale research; being easy to apply and not requiring explanations to users, they provide easy and transparent in interpretation.These methods are based on an unambiguous scoring scale applied to assess various qualitative features.They allow for establishing a single value for all the criteria features.A variation of the scoring method is the scoring method with preferences related to individual criteria or their groups.The usability of these methods depends on the proper selection of criteria features, in line with the user's expectations.Their subjectivity is mitigated by averaging mass results.
The second category, pairwise comparison, is a group of methods based on the AHP/ANP method (AHP, ANP, Modified AHP, Fuzzy AHP, Fuzzy ANP, e.g., [39]), allowing not only to assess the absolute value of a given feature and the total assessment index, but also the strength of the relationship of a given criterion feature to other characteristics.Such methods need prior instructions given to users, and it is troublesome to use in surveys, due to the large number of inversely scaled answers given by the user to complete the procedure.This method is affected by the rank reversal phenomenon.The ambiguity of the comparisons of different types of features, which the AHP method was accused of, has been partially mitigated by the assumptions of the ANP method [40,41].
The third group, parametric methods, are the methods in which respondents are encouraged to assign values to specific additional parameters.Evaluators avoid such methods because the determination of these indicators is often vague and ambiguous for them.This group of methods include, for example, PROMETHEE II [42], where the main problem connected with using such a method is a necessity-similarly to the AHP method-to educate the respondents before the assessment is to take place [43][44][45].
The fourth category, i.e., two or multi-criteria methods, is represented by the methods where in the first stage, the criteria and evaluation scale (or preferences) are established, and data are collected without the scoring analysis, and in the second stage, the researchers apply other multi-criteria methods [9,46].The researchers differentiated nearly 300 of such methods, based on various theoretical assumptions, taking into account the assumed user preferences, distance from the assumed optimal levels, etc.While the first stage is relatively easy for the user, the second stage is sometimes difficult to adapt and apply due to the problems associated with the interpretation of the findings.
The last category comprises the combinations of aforementioned methods.These are hybrids, such as AHP + TOPSIS, Fuzzy AHP + TOPSIS, etc.The combination of methods aims at eliminating possible deficiencies of the well-known methods or bringing them closer to the standards related to findings interpretation [47].
In general, the consideration of multi-criteria assessment methods from the users' perspective is based on their opinions or their attitudes to: (1) The accuracy of the selection of criteria (attributes) in relation to the issue under consideration, even when these attributes are contradictory; (2) the ease and intuitiveness of using the assessment scale or the scale of the proposed preferences (when some criteria are more important for the user than others) applied to collect data; and the (3) ease and versatility of interpreting the evaluation results and the possibility of recommending taking specific decisions.The final selection of the evaluation method can differ depending on the nature of the decision maker (DM).While DMs in large corporations are accustomed to business analytics mechanisms and are less interested in the method of obtaining results, more in their interpretation and recommendations, the DMs from SMEs might prefer simpler and easier methods that can easily be used on less-educated individuals, not accustomed to making strategic-decisions to large extent.
In this paper, the sustainability concept is used to compare several evaluation methods and look for ones that balance the complexity of the method and amount of work required to arrive at a result, while obtaining the result that would be satisfactory enough for the user.

Criteria for Bank Websites' Evaluation
All methods presented in Section 3.1, regardless of the category to which they belong, require a set of criteria.Whilst there are numerous works proposing evaluation criteria for generic websites evaluation, online banks' evaluation has not been studied thoroughly yet.In this paper, a set of criteria for evaluating online banks is proposed.The proposed criteria are divided into 3 groups: Economic, technological, and anti-crisis.


The economic criteria include: Last, but not least, the anti-crisis group consists a single criterion A18 representing anti-crisis measures.

Sustainability in e-Banking Website Assessment
In this paper, the sustainability in e-banking website assessment is explored.In order to study how the methods' accuracy, usage simplicity, and computational complexity balance each other, the authors have adopted the following procedure:


Definitions of a set of attributes (criteria) for assessing the functionality of electronic banking websites,  Verification of the clarity and correctness of the set of questions for a particular, randomly selected client, using a randomly selected group of users,  Adopting an unambiguous scale for evaluating attributes during the data collection process,  Conducting a survey to obtain data and their initial verification,  Analysis of the results for simple scoring methods (without preferences and with imposed, sample preferences),  Adopting selected evaluation methods and performing calculations on the data provided by the research sample,  Comparison of the obtained findings and discussion of their compliance,  Conclusions resulting from the comparison of the applied methods.

Bank Websites' Evaluation Results Aggregation Scheme
In this paper, a comparative study between four groups of possible approaches is performed:  A simple scoring method and a simple scoring method with preferences [48],  Authors' own conversion method [49],  Parametric methods Promethee II [50] and PROSA [18],  TOPSIS Method [35].
The methodological foundations of each of the compared approaches are presented below.

Simple Scoring Method
In the case of a simple scoring method, the researchers measure either the sum of the average scores obtained in the study or the average distance from the maximum value (according to the adopted scale) for a given attribute.It refers to the value of the criterion and the distance is the same when we measure it from the first to the second criterion, and vice versa.However, the relationship between the individual criteria is not specified.Assigning the preference scale, adding up to 100%, particular criteria (or the group of criteria) can be seen as such a measure.The linear scale of preferences in its normalized form determines, in turn, the share of individual criteria in the final score.The scoring methods are considered to be subjective, although their subjectivity seems to decrease as the number of people surveyed increases and the preference scale is used.These methods are widely used, and their findings are easy to interpret.Other types of methods, which are based on measuring the relationship between individual attributes, are generally regarded as more objective.However, on the other hand, e.g., the AHP method [51], the Promethee II, Electre I and III, TOPSIS, and other methods are more complicated to apply and they are not very transparent in terms of interpretation of the findings.They often rely on the calculation of relative distances to another attribute (assessment criterion) or the adopted significance ranges.Users, both those evaluating banking services and later decision-makers, are reluctant to use them in the cases where the work efforts needed to identify input data as well as those related to subsequent interpretation are greater than those required in the case of a scoring method.The authors' research related to the use of such website evaluation methods as Promethee II, AHP etc. [50,52] suggests that in the case of the abovementioned methods, respondents perceive that the process of completing the questionnaire as very difficult.As a result, it often leads to ill-considered and accidental assessment, and frequently depends on the order in which particular criteria appear.

Authors' Own Conversion Method
In order to minimize the problems encountered in simple scoring methods, the authors have developed their own method of evaluating websites-the conversion method.It does not require respondents to estimate any additional parameters, while calculations use the same data as in the scoring method.It is based on average distances from a possible maximum value.A detailed description of this method, together with the algorithm of its solution, can be found in [49].Its main advantages are: The ease of collecting data for the evaluation, minimal amount of data necessary to obtain, the ease of application for non-experts in a given field, and the fact that there are no additional indicators that might be difficult to understand, such as in the case of ELECTRE method-the veta threshold, which may not be transparent for the respondent [53]-and the result of calculations in the form of the range of evaluations of the examined objects are easy to interpret.This method is still subject to verification; however, it constitutes an additional reference point for the results obtained with the use of other methods.
The method consists of determining the relation of each criterion to other criteria, based on averaged distances from the maximum potential value established on the basis of previous scoring evaluation.Data received from the scoring evaluation are the starting point for a conversion method.Then, we adopt the following assumptions: After constructing the experts' table of evaluations of particular criteria for each website, we need to perform the conversion with the established preference vector of the superior level criteria.Next, the authors perform the transformation of the combined scoring table into the preference vector (first converter).
The next steps are:  Constructing a matrix of distances from the maximum value for each criterion in every website, establishing the maximum value (Equation ( 1)): Pi.max=Max{fi(aj), …, fn(am)} for i = 1,…, n and j = 1,…, m  Establishing the matrix of the distances from the maximum value (Equation ( 2))


Calculating the average distance from the maximum value for each criterion (Equation (3)),  As a result of the above operation, constructing a matrix of differences in the distance from the maximum value and the average distance according to criteria,  For each bank website: Constructing conversion matrices-modules of relative distances of particular criteria to remaining criteria (the distance from the same criterion is 0), the obtained distances below the diagonal are the converse of the values over the diagonal,  Averaging criteria conversion matrices-creating one matrix of average modules of values for all criteria (Equation ( 4)): Ai,j = ∑(n,m,i=1,j=1) (αi,j-αi+2,j)/n (4)


Transforming the conversion matrix of criteria into a superior preference matrix (calculating squared matrix, adding up rows, standardization of the obtained preference vector; repeated squaring, adding up rows, standardization of preference vector-repeating this iteration until there are minimum differences in subsequent preference vectors).
As a result of the above operations we establish a criteria conversion matrix  .Subsequently, the authors performed a transformation of the scores presented by experts on the level of a matrix specifying expert websites' evaluations for particular criteria (second converter).
The results have been obtained in an analogical way:  Constructing a matrix of distances from the maximum value for each criterion and each website:  establishing the maximum value (Equation ( 5)): Pi.max=Max{fi(aj), …, fn(am)} for i = 1,…, n and j = 1,…, m  Establishing the matrix of distances from the maximum value (Equation ( 6)):


Calculating the average distance from the maximum value for each website (Equation ( 7)),  Constructing a matrix of the differences of deviations from the maximum value and the average distance of the features from the maximum,  For each criterion: Constructing a matrix of transformations (conversions) of the differences of the average distance from the maximum value between the websites, analogically as presented above values below the diagonal are the converse of the values over the diagonal,  Constructing a module matrix of transformations of the differences of average distance from the maximum value between the websites, for each criterion (Equation ( 8)), Ai,j = ∑(n,m,i=1,j=1) (αi,j-αi+2,j)/n (8)


For each module matrix of the transformation of the differences of the average distance from the maximum value between the websites, squaring it, adding up rows, standardization of the obtained ranking vector and repeating this operation until the obtained differences between two ranking vectors for each criterion will be minimal, As a result of the above-presented operations we obtain a conversion matrix of websites' evaluations Equation ( 9)):  Using the obtained vectors to construct a combined ranking matrix-returning to the matrix where in,  In its side-heading, there are criteria; in the heading, names of bank websites by appropriate transfer of the obtained preference vectors for each criterion, multiplying the matrix obtained in such a way by the previously calculated preference vector Equation ( 10)), T ' = Tf*Ta (10)


Analyzing final results and drawing conclusions (note: the lowest distances in this case are the most favorable, comparability adjustments to other methods can be obtained by subtracting these values from 1 and their repeated standardization).

Promethee II
The Promethee II method has a limited effect of linear compensation of criteria, and in contrast to other methods referred to as "European school" (e.g., the Electre methods), the result of its application is the full final ranking of alternatives with their quantification.The Promethee II method allows for obtaining a complete ranking of the resulting alternatives [54].After determining the compliance factors for each pair of variants, dominance flows are determined for each of the variants:


Output dominance flow describing how much the variant i a exceeds the other variants Equation ( 11)):  Input dominance flow, indicating how much variant i a is dominated by other variants Equation ( 12)): In turn, the decision-maker may create a total ranking of variants.In the Promethee II method, in order to create a complete order of variants, one should calculate the net dominance flow described by Equation ( 13): In the Promethee II method, relations in a broad sense are defined as follows:  Preference relation (threshold)-strict preference-variant ai exceeds variant (Equation ( 14)): bj (ai L bj) when Φ(ai) > Φ(bj)  Indifference relation (threshold)-equivalence-variant ai is equivalent to variant (Equation ( 15)): bj (ai I bj) when The PROSA method originated from the aforementioned Promethee II method.The basic assumptions for the PROSA methodology and the results of the procedure are set out as follows.
After determining the value Φnet(a) and Φj(a) for j = 1...n, the decision-maker can determine the balance/compensation of criteria for particular decision-making alternatives:  Φj(a) << Φnet(a) means that for the alternative a, the performance of criterion j is compensated by other criteria (alternative a is not balanced in terms of criterion j),  Φj(a) >> Φnet(a) means that for the alternative a, the performance of criterion j is compensated by other criteria (alternative a is not balanced in terms of criterion j),  Φj(a) ~ Φnet(a)means that for the alternative a is balanced in terms of criterion j.
The operators >> and << mean the relations "much greater than" and "much less than".These relations express the subjective view of the decision-maker about whether the value on the right of the operator is much greater/much less than the value on the right of the operator, and thus whether the alternative a is balanced in terms of the criterion j or not.
In the next step, the value of the mean absolute deviation in a weighted form is determined taking into consideration the balance factor (compensation), according to the Equation ( 16): where sj is the compensation factor for criterion j.It is easy to note that WMAD(a) is a specific type of weighted average distance of the solution Φnet(a) from the solutions Φj(a) obtained for individual criteria.
The final assessment of the alternatives, i.e., PSVnet (PROSA Sustainable Value Netto), is calculated based on the Equation (17): This evaluation allows for compiling a complete ranking of objects.

TOPSIS
The TOPSIS method is made of six stages.Initially, the decision maker (DM) describes the decision problem (DP) with n criteria and m alternatives.A decision matrix D[xij] is then constructed, with rows representing the decision attributes of the alternatives, and columns representing the criteria Equation (18): Subsequently, the decision matrix undergoes normalization in the second step, with the following Equations ( 19) and (20): for the benefit and cost criteria, respectively.
In the following step, the weights are imposed on the normalized decision matrix, thus resulting in a weighted normalized decision matrix, where each element is computed with the Equation ( 21): In the fourth step, the positive (PIS) and negative (NIS) ideal solutions are obtained (V + j and V -j) (Equations ( 22) and ( 23)).
The best alternative should be as close as possible to PIS and as far as possible from NIS.Therefore, the Euclidean distances between each alternative and PIS and NIS are computed in the fifth step(Equations ( 24) and (25)): Finally, in the last step of the algorithm, a relative closeness to the ideal solution is obtained (Equation ( 26)): The obtained closeness coefficient  is the score value produced by the TOPSIS method and is used to construct the ranking of alternatives.

Research Sample and Its Initial Preparation
The set of attributes used in the research is partly based on the set adopted in 2006, consulted at that time with Polish experts in the field of electronic banking.In 2008, this set was expanded to include factors that may indicate anti-crisis activities, and in 2017 after extending this set to approximately 70 attributes, the correctness and comprehensibility of all criteria and their significance for respondents was verified on a group of over 240 people.This allowed reducing the set of attributes to the current form used since 2018.In 2019, the preferences were re-evaluated limited to the previous set of attributes and the authors selected 18 attributes with the significance of above 60%, on a scale of 1-100% (Table 1).The respondents did not report any problems in the case of a scoring method; however, the data for the study were collected in the first two weeks of April 2019.Over 940 people were asked to fill in the data needed to evaluate electronic banking services.The authors adopted a simplified, standardized Likert scale [55] to carry out the evaluation of the various criteria distinguished by bank clients.
The scale was as follows:  1.00-complete fulfilment of the evaluation criterion (attribute),  0.75-almost complete fulfilment of the criteria,  0.50-partial fulfilment of the criteria,  0.25-minimum fulfilment of the criteria,  0.00-failure to fulfil the criteria conditions.
In the study, the authors have indicated an additional condition for the respondents related to the provision of data.They should have the experience of using at least three online banking websites and evaluate three of the banks which are best known to them.This condition resulted from the requirement to receive answers from experienced respondents dealing with various electronic banking services.More than half of the respondents expressed their willingness (51%) to assess only one bank, and 19%-two banks.Two hundred and seventy-six individuals (less than 30% of the entire sample) assessed the websites of three banks.This represents a total of 828 ratings of electronic bank websites obtained as part of the study.
After the next verification and taking into account the recent comments of respondents, in their study, the authors considered 18 attributes (criteria), divided into the following three groups: Economic, technical, and anti-crisis criteria.A detailed list of attributes is provided in Table 1.
Apart from the significance assessment, the respondents defined their preferences regarding the share of individual attributes in the quality evaluation of the banking service.As far as individual attributes were concerned, it turned out that they do not deviate in any significant way from the average of 5.56%.However, the scores were fundamentally different in the case when the attributes were divided into groups.The highest average significance for respondents was recorded in the case of technological attributes, and the scores were slightly lower for economic factors.It is important to point out that the highest preferences that were indicated in the study, which should be taken into consideration when using a scoring method, concerned economic attributes-they amounted to nearly 56%, in the case of technological aspects they constituted only 39%, and for anti-crisis measures the result was estimated at just over 5% (Table 1).Economic attributes, which are a manifestation of the bank's current policy, in the eyes of the respondents are often decisive factors in terms of assessing the quality of the service and the latter translates into the bank's ability to retain existing customers or acquire new ones.
In addition, the respondents completed two further preference and indifference parameters needed for the PROMETHEE II method.The indifference index meant, in this case, the indication of the value (for each of the attributes) for which the difference in the evaluation of a particular attribute in relation to another is not important, which generally applies to the immediate environment of a given attribute (scale 1-100%).The beginning of the scale means a low difference and the end means a large difference.The preference index was defined as the value for which this difference is significant, and it is described using the same scale in the study as the level of indifference (Table 1).
Also, in a pilot study, a population of 32 individuals (12% of the selected group) presented their subjective assessments of the website evaluation methods, which they were familiar with taking into consideration the following features: Advanced knowledge/expertise needed to apply a particular method, the convenience/ease of collecting basic data, the need to estimate additional parameters, the ease of performing calculations for the analyses, the need to use specialized calculation software, methodological correctness, ease of making extensive analyses for decision making as well as the reliability of results.
The respondents did not report any problems in the case of a scoring method; however, there was some difficulty related to the assessment of the indifference and preference indicators for the Promethee II method.Despite this, the respondents still chose the Promethee II and not the AHP method that requires filling in more tables indicating relationships between particular attributes, which they perceived as uncertain or doubtful.
The study was a case of purposeful sampling.The respondents included students of the last years of specialization studies at the University of Warsaw.The survey was distributed after the completion of a series of lectures on e-business website ratings.The fact that the sample consisted of individuals aged 18-25, selected from randomly chosen groups, could affect the results of the survey (96% of the population in Poland are potential online banking customers, 48% are active users of online banking, and 25% are active users of mobile banking); the surveyed age group accounts for over 60% of users) [56].The sample was made up of 75% of women and 25% of men.Over 95% of the surveyed population declared having secondary education, Bachelor's degree or engineering studies were indicated by over 2% of the sample, and over 2% pointed out that they have higher education.The majority (65%) described themselves as working students and 35% as students.Most people (24%) stated that they come from cities with over 500,000 residents, the second largest group (21%) indicated that they come from cities below 50,000, and 17% of residents were born in the countryside.

Comparison of the Results of the Scoring Method without Preferences and the Scoring Method with Preferences
In order to carry out the analysis based on the scoring method the authors used the output tables exported from the survey system.In the survey, each client evaluated the banks' offers for selected e-banking services as well as the fees related to the use of bank accounts, which can be managed via the Internet.Subsequently, on the basis of the completed questionnaires, the researchers created a summary table of average ratings of criteria with the scores that were generated by users.This way, they could analyze and discuss the obtained results (Table 2).The individual rows contain the attributes of the evaluated website and the columns include the names of banks, which are listed in alphabetical order.%% maximum score-the share of a particular bank's website in the maximum favorable score expressed in the percentages or a share of a given attribute in the maximum score of each criterion.total-sum of points according to a standardized five-point Likert scale (0-1).
The analysis of the data collected and processed using the scoring method without preferences (equivalent attribute weights) indicates that the first place in the ranking was taken by the Inteligo PKO BP SA website, which meets over 80% of customer requirements.This coincides with the calculations made for the respondents' preferences.Toyota Bank's website ranks close to the previous position, i.e., it takes the second place in the ranking of points, and the third position in the ranking according to customer preferences due to good economic indicators.MBank is also rated highly.In the assessment influenced by customer preferences, Getin Noble Bank was also able to reach a high position, due to favorable conditions related to the credits and deposits offered by the bank.In this case, the worst results were indicated for Bank Pocztowy and Bank BPS Grupa BPS (Bank Polskiej Spółdzielczości SA).Considering the significance of the implementation of individual attributes, the most important aspects included fees for transfers to the user's own bank and to another bank, account maintenance, and the last positions taken by interest rates on savings accounts, deposits, and loans.It follows that bank customers perceive Internet-access accounts as a tool for implementing a current policy, rather than conducting a strategy of managing financial resources.However, the distribution of significance of individual attributes indicates the high importance of financial criteria in the evaluation of e-banking websites.In the case of e-commerce websites, technological factors are more prevalent [57].
Based on the table presented above, four variants were considered in the study.One of the methods limiting the specific subjectivity of the group of experts or users in the case of a scoring method is the application of unit preferences to individual criteria or groups of criteria.The study categorized all attributes (criteria) into three groups: Economic, technical, and anti-crisis criteria.The fourth group of criteria was adopted as a result of customer preferences indicated in the stages preceding the analyses.For each group, the authors adopted one variant with a group of dominant criteria:  Economic (70% for economic criteria, 15% for the remaining ones),  Technological (70% for technological criteria, 15% for the remaining ones),  Anti-crisis (70% for anti-crisis criteria, 15% for the remaining ones).
The fourth variant calculated in the study were the average preferences presented by the respondents (56% economic criteria, 39%-technological, 5%-anti-crisis criteria, see Table 1.Column 4).Based on the calculations, the authors were able to establish an unambiguous ranking of places occupied by individual banks in the order of numerical values, assigned according to specific criteria.A comparison for these variants is provided in Table 3. Imposing an absolute advantage (70%) on the group of economic attributes resulted in significant shifts both in relation to the variant related to customer preferences as well as to the ranking without preferences.That is why BGŻ BNP Paribas achieves the first position in the economic and anti-crisis variant (Bank BGŻ BNP Paribas SA).In the ranking, BGŻ BNP Paribas is followed by Bank Pocztowy, which attracts its customers with relatively good economic conditions, though is perceived by clients as less attractive than Inteligo or Toyota Bank.The best anti-crisis measures are provided by BGŻ BNP Paribas (Bank BGŻ BNP Paribas SA) and BGŻ Optima (Bank BGŻ BNP Paribas SA) as well as by Idea Bank (Idea Bank SA).IPKO, PKO Bank Polski (Bank Polski SA) and Bank Millennium (Bank Millennium SA) received the lowest scores in this area.A significant advantage of technical attributes caused a shift in the first positions: The most modern e-banking websites of Idea Bank (Idea Bank SA) and Nest Bank (Nest Bank SA) moved them to the top of the ranking.ING, ING Bank Śląski (ING Bank Śląski SA) obtained the lowest scores in this respect.The large discrepancy between the results of the version of the ranking related to the client (respondent's) preferences and the economic variant seems strange.In both cases, economic criteria of assessment have a significant advantage over other types of attributes.An experiment showed specific sensitivity to the assignment of extreme preferences to the distinguished groups of attributes, compared with studies carried out a year before [58].In the authors' opinion, this points to the fact that in the spring of 2019, the relations between different banks and their clients appeared uncertain and rapidly changing.At that time, we could observe a specific disturbance in a temporarily established financial balance between banks and, what follows, changes in the clients' perception of e-banking websites.

Comparison of the Results of the Scoring Methods with the Conversion, Promethee II, PROSA and TOPSIS Methods
Among the scoring methods with preferences, the scoring method with the respondent's preferences pointed to the largest discrepancies in terms of its comparison with other variants.Similar discrepancies were observed in the result of the ranking established with the application of the conversion method.Similar to previous studies, the conversion method contributed to the "flattening" of the results, reducing the maximum spread of significance obtained as a result of using other methods.Nevertheless, the discrepancies in the results of the ranking obtained by this method compared to the rankings established with other methods were the largest.Thus, the top places in the ranking were occupied by the banks that in April 2019 completed the process modernizing their websites.They included: mBank (mBank SA), Raiffeisen POLBANK (Bank BGŻ BNP Paribas SA), and ING, ING Bank Śląski (ING Bank Śląski SA).Unfortunately, this resulted in a specific shift in the ranking: The banks that held much higher positions in the rankings obtained by other methods moved to further positions.The most stable position was recorded by mBank (mBank SA), where the difference between different assessments amounted to four places and INTELIGO PKO Polish Bank (Bank SA Polish) where the difference between the evaluations was limited to five positions.
Due to the advantages of Promethee II method, e.g., the relative (in reference to the scoring method) evaluation objectivity and the fact that a significant number of criteria or services does not cause a considerable increase in the number of questions included in the form, it seemed that the results obtained with the application of this method would be more objective than in the case of the scoring methods.However, they do not differ greatly from the ranking obtained with the use of other methods.Perhaps the reasons are certain disadvantages associated with this method, such as specific additional measures that respondents had to enter (such as weights and significance, parameters of indifference and strict preference), which may be difficult to understand for the respondents and the fact that when obtaining results one needs to use dedicated software.Thus, it is possible that indicators of indifference and strict preferences were determined without a deep understanding of their meaning, despite earlier instructions provided to the participants of the study.The average results presented in Table 1 show that they are very similar.The second important issue is the different preference weights used in the calculations-both when using the scoring, conversion, Promethee II, and the PROSA method, preference weights were used in relation to the attributes distinguished by the respondents (clients).In addition, calculations were made for the scoring method and Promethee II method without using a preference scale (all attributes were equivalent).This resulted in additional implications for the rankings of bank websites (Table 4.).Banks in Table 4 have been ordered according to the sum of places obtained in individual rankings.The smaller the sum obtained, the smaller the discrepancy between the results obtained with the application of the distinguished methods.The most stable position in this ranking belongs to INTELIGO, PKO Bank Polski (Bank Polski SA): in six out of eight rankings this bank held the first position in the scoring method (without preferences), Promethee II (without preferences), PROSA and TOPSIS (without and with preferences) method.The situation of mBank (mBank SA) is largely similar, the bank occupies the second position in the conversion method, in the Promethee II method without preferences, PROSA and TOPSIS with preferences method, and it takes the third place in the calculations made in the case of the scoring method without preferences and TOPSIS without preferences.The next two positions are very close to each other, ING, ING Bank Śląski (ING Bank Śląski SA) and Toyota Bank Polska SA (Toyota Bank Polska SA); however, the sum of their obtained positions is at least twice as high as in the case of the first two places.The last places are occupied by Bank BPS Grupa BPS, (Bank Polskiej Spółdzielczości SA) and Bank Pocztowy (Bank Pocztowy SA) (the sum of positions is over 8 times higher than in the case of the first positions).
The results in Table 5 were additionally used to calculate the differences between all the pairs of methods used and the so-called city block distance, which is the sum of the absolute values of the differences between the results obtained (in this case, places in the ranking).In addition, two hypotheses have been formulated: H0 hypothesis assumed there is difference between each pair of analyzed methods, and H1 hypothesis assumed the existence of such differences, with a potential probability at the level of 0.95.In order to substantiate the hypotheses, the authors calculated the significance level α for the probability distribution of the inverse (right-sided) function of Fisher-Snedecor.It can be applied in the Fisher-Snedecor test to compare the degree of variability of two result sets for the same populations and compare it with the p value determined on the basis of test statistics.If p ≤ α, we reject H0 and take H1, if p ≥ α, we reject H1 and take H0.
The lowest city-block distance value in this case means the best method match from the point of view of the results achieved.Almost identical results, in terms of ranking, were obtained with the use of the Promethee II method without preferences and the PROSA method, if we calculated the absolute differences between the places obtained in the ranking (6 in total).The same result was calculated for combination TOPSIS without preferences and TOPSIS with preferences.Small differences also occurred in the results obtained by the scoring method without preferences and TOPSIS with and without preferences or PROSA as well as the scoring method without preferences and Promethee II without preferences.The biggest differences appeared in three cases: Scoring method according to the average weights and PROSA; the scoring method without preference and Promethee II without preferences; or the scoring method with preferences versus TOPSIS without preferences.
However, if we consider the calculated Fisher-Snedecor test indicators, they served as a no confirmation of the above conclusions.The H0 hypothesis for individual criteria was confirmed only in 36% of cases, in one case of comparison Promethee II without preferences and PROSA methods despite the city block indicator result.However, the differentiation-compared to the findings from other methods-of results obtained thanks to the conversion method and scoring with preferences was confirmed.The results of the calculations are presented in Table 5, too.

Comparison of the Results of the Scoring Method without Preferences and the Scoring Method with Preferences
A pilot study has been carried out to assess the characteristics of individual evaluation methods from the client's point of view.The results of this study are of preliminary nature, but they confirm the observations made to date.The respondents in the survey, who were clients of different banks, most frequently assigned positive scores to the simplest methods, which they were familiar with and whose results they were able to check in person.Sophisticated, scientific research methods based on complex and complicated methodologies, requiring expert knowledge and specialized (often paid or costly) software for making calculations were not particularly appreciated during mass testing (even in the case of experienced users).It is widely believed that these tend to be methods for field experts or that they are not particularly useful.The detailed results for the pilot sample are shown in Table 6.

Conclusions
Banking services have evolved from offline financial services to highly available i-banking and m-banking platforms.Such online exposition of these services increases such services' penetration in the population, which in turn justifies the researchers' and clients' interest in assessment and competitive comparison of such services.Whilst there are numerous methods to evaluate generalpurpose websites, banking services evaluation requires a more sophisticated approach, with extended set of criteria, as economic, technological, and anti-crisis aspects of each platform need to be considered.However, the accuracy of the evaluation method is not the only aspect that needs to be taken into account-the methods' usage simplicity as well as computational complexity are also very important factors.
The main contribution of this paper was its proposed sustainable methodological approach to banking services' evaluation, balancing the accuracy, computational complexity, and usage simplicity factors.Moreover, a reference model for banking services assessment was provided.In practical terms, all significant banks in Poland were appraised.As the results of the conducted analyses suggest, respondents (clients) of online banking mainly chose websites that, on the one hand, offered the lowest possible costs associated with having an account, and on the other, those that give the greatest potential technological benefits such as good functionality, navigation, modern communication media, etc.
Last, but not least, additional contribution is also provided.An attempt was made to evaluate the practical applicability of the proposed methods.The statistical preference and familiarity of technical-literate users regarding various evaluation methods was studied.Surprisingly, the surveyed group was more prone to using less accurate, yet simple methods than more accurate, yet computationally and theoretically complex ones.
First of all, the group of respondents was intentionally limited to the most experienced clients with the experience of using at least three banking services.Another limitation of the sample indicated in the study was the age of the surveyed respondents: The sample included individuals aged 18-25 years.This was a deliberate assumption because according to research and statistics, this group includes the most active users who are most aware of the possibilities offered by the Internet.On the other hand, the conclusions drawn on the basis of their opinions are difficult to generalize because of the lack of opinions of other age groups.Therefore, the surveyed group of respondents could be expanded in future research.
For the purposes of managing e-banking services of individual banks, an in-depth analysis concerning the differences in the assessment of individual attributes of banking services would also be necessary, which may be the next stage of the present research procedure.Future studies should also expand the group of the analyzed evaluation methods to include other popular MCDA methodologies.

Table 1 .
The averaged value of significance and preference indicators for particular attributes.

Table 2 .
Average scoring evaluations for equivalent attribute ratings.

Table 3 .
Ranking of the quality of banks' websites for the respondent, economic, technological, and anti-crisis variants (effects of sensitivity to changes in selected preference scales).
* each number in the table represents a place in the ranking.

Table 4 .
Comparison of internet banking website rankings according to individual methods.

Table 5 .
Comparison of internet banking assessment pair methods distance.

Table 6 .
Comparison of the possibilities of the application of methods in clients' evaluation (assessment dominant).