1. Introduction
In the current turbulent times, airlines struggle to respond to the barriers of various global factors, such as the COVID-19 pandemic and the recent dramatic increase in fuel prices. Airlines have long realized that the provision of high-quality services is one of the few battles that is almost solely in their hands and can be used as their main weapon against the competition [
1,
2], since it significantly affects passengers’ attitudes, behaviors [
3,
4], satisfaction [
4,
5,
6] and, in the long run, loyalty—a significant parameter of service effectiveness for the airline industry [
3,
7,
8] and market share [
9]. Therefore, it is crucial for their survival, as airlines need to be able not only to understand the factors that contribute to customers’ satisfaction but to also realign their strategies in order to increase their passenger numbers and long-term profitability [
10]. Quality is a rather confusing term that is usually interpreted in various ways, based on the conceptual context in which it is addressed. Further, service quality refers to the difference (positive or negative) that exists between customers’ initial expectations and their perception of the actual service they received [
11]. As far as the aviation industry is concerned, service quality refers to all the interactions (before, during and after the flight) between the airline and its customers, which will shape customers’ perceptions (overall experience) and enhance the image of the carrier [
12].
Previous comparative studies have confirmed that passengers flying with a FSC (full-service carrier) are generally more satisfied (with the services they receive) than passengers traveling with a LCC (low-cost carrier) [
13]. Although this is an expected finding, it is important to understand the factors that contribute to this result. In particular, Chen and Chang [
14], investigating the differences in the factors affecting passengers’ choice between a Chinese FSC and LCC, found that the provided service was not an important factor for those passengers who chose LCCs, in contrast to those choosing FSCs. Further, Jiang and Zhang [
15] also focused on the Chinese aviation sector (they analyzed four Chinese airlines), studying the links between service quality, customer satisfaction and confidence in the airlines. They concluded that, although service quality indeed affects customers’ satisfaction, this is not always accompanied by greater confidence. Actually, for LCCs, “service quality may not be a good indicator for predicting customer loyalty” [
16]. Furthermore, in contrast to ticket pricing, FSCs fell short of expectations. Actually, there was a significant difference in the perception of value for money between passengers of FSCs and LCCs [
17,
18]. Thus, although low traveling cost offered by LCCs might be a crucial parameter affecting the selection of these carriers by a considerable number of customers [
19], as it was expected, most LCCs’ passengers select their flight also on the basis that these airlines have a high safety reputation [
19,
20,
21].
In addition, Rajaguru [
11], using the means–end chain theory and the price sensitivity theory, compared the quality of services to the money that passengers of FSCs and LCCs had paid, and confirmed the sensitivity of LCC passengers to the value-for-money factor. This is also suggested by Shen and Yahya [
22] who claim that FSC passengers are less price-sensitive. Further, Han et al. [
23] found that although lounge facilities are affecting the selection of an airline, luxury lounges are considered necessary for passengers flying with an FSC, in contrast to the anticipation of LCC passengers. Similarly, although tangibles and in-flight service quality are important for FSC passengers [
24], the expectations of LCC passengers for the same services are rather low [
25]. This is actually the main reason why LCCs pay less attention to these factors [
24]. However, it seems that given the post-flight service offered by FSCs, the frequent-flyer program—which provides to its members a number of benefits and discounts—makes up lost ground against LCCs [
26] and increase passengers’ loyalty [
27]. Finally, there are empirical studies which have found that significant differences do exist between FSCs and LCCs in all the pre-, in-, and post-flight services offered [
28]. However, passengers’ intentions in traveling with either FSCs or LCCs are not directly affected by value for money and/or service quality, “but they are mediated through the chronological path of customer satisfaction and airline image” [
18,
22]. It should be stressed, though, that the impact of satisfaction on image and behavioral intention is higher in LCCs [
18].
When it comes to multiple-criteria decision analysis (MCDA) methods, Tsaur et al. [
29] introduced a hybrid Fuzzy AHP-TOPSIS to deal with imprecise evaluator knowledge, using 15 evaluation criteria in order to evaluate the performance of three airlines. In the same direction, Liou and Chuang [
30] integrated a Fuzzy AHP-Fuzzy Rating option to evaluate the performance of eight airlines incorporating the responses of 446 passengers. The group extension of the typical AHP has been implemented by Singh [
31] to obtain the ranking of three candidate airlines considering 23 evaluation criteria. The 171 responses to an AHP standard-type questionnaire were combined in terms of geometric mean to generate single pairwise comparison matrices. In the same line, Budimčević et al. [
32] integrated the responses of 200 customers to evaluate three airlines. A hybrid Fuzzy AHP-TOPSIS model was also utilized by Bae et al. [
33] to rank ten airlines according to their performance in financial and operational ratios. Finally, without specifying the passenger sample interviewed, nor the method for combining their preferences, Al Awadh [
34] evaluated three airlines using twenty-two evaluation criteria.
In this research, the problem of customer satisfaction in the airline industry is also treated as a Group Multi-Criteria Decision-Making (GMCDM) problem using the Analytic Hierarchy Process (AHP) method. AHP has been selected among other well-established MCDM methods due to its capacity to address a/problem formation, b/criterion weight elicitation and alternative standardization in a unified environment and c/its ability to sufficiently assist integration with simulation options. AHP is used to combine passengers’ opinions and perceptions of various evaluation criteria to produce satisfaction indices (SIs) that allow airlines to be ranked. The decision framework developed allows the transformation of standard questionnaire survey responses into relative judgments and examines two AHP options. For the first typical AHP (T-AHP), pairwise comparison matrices are obtained using the geometric mean of passenger responses and a single rank is obtained without further investigation of the stochastic decision environment. In the second option, local weights are established using Monte Carlo simulation (S-AHP), where multiple rounds of AHP estimation are performed according to the stochastic characteristics of the derived responses. In this way, S-AHP identifies possible decision outcomes and generates various indices that support coherent decision-making in stochastic environments by examining the stability and the robustness of the rankings obtained with respect to the simulation trials. Both T-AHP and S-AHP are implemented in a real case study, supported by a questionnaire survey conducted at the two main international airports in Greece, and passengers were asked to indicate their preferences at arrival. The analysis performed and the results discussed in detail identify the S-AHP option as more efficient in terms of the quality of rankings generated and its ability to provide decision-makers with significant insights into the performance of candidate airlines.
4. Results and Discussion
Initially, exploratory factor analysis was performed (using principal component analysis) with standard varimax rotation, using, as a selection criterion, an eigenvalue score greater than one. To validate the one-dimensional structure of the scales, Kaiser–Meyer–Olkin (KMO) was used to test the sampling adequacy, while convergent validity was tested using the item loadings and, finally, internal consistency of the factors was measured using Cronbach’s α. The scores for all tests and all factors and subfactors were within acceptable range, indicating that the factors and subfactors are valid and reliable (
Table 3).
The descriptive statistical analysis shows (
Table 3) that the quality of the services provided in-flight is that which passengers were more satisfied with (4.07). In contrast, the factor they were the least happy with was the pricing policy of the airline (3.46). As far as the subfactors are concerned, they were very satisfied with the level of security (4.39), the crew (4.26) and the website (4.27).
Finally, ANOVA analysis indicated that there is a significant difference in the arithmetic mean score of the factors and subfactors based on the reason (price, safety, quality and availability) each airline was chosen, as well as their membership (or lack thereof) in a frequent-flyer program. The country of permanent residency (Greece or elsewhere), the age group and the frequency of travel are also some characteristics that differentiate the mean score of some (relatively few) factors and subfactors.
A deterministic AHP option using the geometric mean operator (T-AHP) indicates that the overall global weights of the primary (first-level) criteria almost equally contribute to the analysis goal (
Table 4). The simulation trials were executed by generating 5000 trials, where the differences in the obtained expected performance results remained constant up to the fifth decimal place. Estimations of the average overall inconsistency (
) in every simulation trial indicated that the maximum acceptable threshold of 10% was violated in 1.963 (39.26%) of the total trials implemented. The expected
(average of the per-trial
) in the total and in the accepted trials has been found (9.83% and 7.51%, respectively). However, on the first occasion,
falls into the range
, while for the accepted trials
ranges between
. Although the analysis should be performed on the accepted trials, results for both S-AHP variants will be generated to investigate the quality of the derived rankings with respect to both the number of rounds generated and the impact of inconsistent judgments.
Table 4 also reveals that the expected criteria weights, estimated by averaging the weights generated in the trials of both S-AHP variants, are almost equivalent to those derived by using the geometric mean option (T-AHP). In detail, the root mean square of the deviations among the T-AHP and S-AHP variants are 1.091% and 0.985%, respectively, while the simulation variants produce more closely related results with a 0.143% root mean square deviation. However, the minimum and maximum derived criteria weight values indicate that the decision space has a significant range in each dimension (criterion).
According to the T-AHP option, Alt3 is ranked first, recording an SI of 3.717, followed by Alt4 with an SI of 3.675. Alternatives 1 and 2 are in third and fourth place, registering a performance of 3.602 and 3.181, respectively (
Table 5). On the other hand, estimation of the expected SIs for each alternative, generated as the average of the values estimated at the simulation trials (Equation (25)), yielded the same ranking order for the simulation variants no matter if the consistency check was omitted or not. In detail, both variants indicate that Alt1 is ranked first, followed by Alt3, while Alt4 and Alt2 are ranked in third and fourth place. Summarizing, the difference in the observed ranking orders is because Alt1 manages to climb to the first-rank position in S-AHP variants although the specific airline was ranked third using T-AHP.
Table 5 indicates that the T-AHP-estimated SIs fall between the minimum and maximum estimated values of the simulation trials generated. Thus, there is evidence that T-AHP ranking may also occur in the simulation trials. Moreover, the estimated SI magnitude for Alt1, Alt3 and Alt4 remains too close for both T-AHP and S-AHP especially, leaving space to develop an argument regarding the existence of practically equivalent alternatives. In contrast to the geometric mean option (T-AHP), where a single SI is obtained, S-AHP variants are capable of investigating the stability of the results to various changes regarding both criteria and alternative performances. A simulation of passengers’ responses enables an investigation with respect to both the stability of the derived solution and the quality of the ranking orders obtained.
An approach to quantify the stability related to the performance of the examined alternatives in simulation trials is to estimate the distribution of the expected score that will be generated. Such an analysis, focused on the 3037 consistent trials, is highlighted in
Table 6. The results presented in percentages and the counted trials indicate the following: (a) Alt2 is not a competitive alternative as its maximum performance is lower than the other alternatives’ minimum. Its most counted performances fall into the range
; (b) Alt4 is ranked third since its most counted performances fall into the range
; Alt4 records a significant number of trials that fall into the range
. Τhis number is lower than that of Alt3, which is ranked second although it records the highest observed performance (3.878); (c) Alt1 is ranked first since it dominates the satisfaction level in the range
. At the same time, its performances remain, with only one exception, constantly above 3.6 and it records the highest minimum and expected observed
performances.
To address the previous issues, the S-AHP option also allows further investigation of the extent to which high satisfaction performance ensures higher rankings by examining the rankings derived in each trial. In particular, the unique rankings derived from the 5000 trials of the simulation can be obtained in order to investigate the stability of the result derived and to estimate the robustness of the ranking obtained by both T-AHP and S-AHP.
Table 7 summarizes the results of such an analysis and indicates that the 3037 consistent trials generate only six possible rankings. As expected, the most counted ranking order is the one already indicated by the S-AHP option, which occurs as a result of 1785 trials, corresponding to a percentage of 58.78%. The second most common ranking is the one where Alt3 prevails over Alt1 with a significantly lower appearance probability (30.16%). It is noticed that the ranking order of the T-AHP is assigned to only 0.26% of the simulation trials, highlighting that the T-AHP in the current analysis does not generate a robust solution to the evaluation problem.
The ranking appearance probability analysis also indicates that Alt1 is ranked first at R1 and R3 unique rankings. Thus, Alt1 is ranked first among a total of 2076 trials, which results in an overall 68.36% probability of a first-ranking appearance. Additionally, Alt1 records a 31.31% and a 0.33% probability of a second- and third-ranking appearance. The analysis also validates that Alt2 is constantly ranked in the last position. Alt4 is commonly ranked third (88.94%), while in the first position there has been found only 37 trials with a probability of 1.22%. Alt3 is ranked second for 58.84% of the trials and records the second highest number of trials for which it is ranked first (924). An overall index adopted to provide a synopsis of the simulation rankings order analysis can be reached as the satisfaction expected rank (
) using Equation (26). The lower the index, the higher the level of customer satisfaction.
Table 8 summarizes the results of the airline ranking appearance in the simulation trials.
The proposed framework has the capacity to support sectorial analysis by partitioning the sample dataset in terms of gender, age, income, etc. Such an analysis can provide significant insights with respect to various aspects of satisfaction that may occur based on passenger profiles. To estimate different attitudes and differentiations in SIs among the genders, the dataset has been queried accordingly and both T-AHP and S-AHP were used to estimate the SIs and the ability of the examined alternatives to be ranked in a certain position. With respect to the simulation option, 5000 trials were implemented and SIs were obtained for both the total number of trials as well as for those that fall into the accepted level of inconsistency. The comparative analysis between the clusters studied, in relation to the overall estimates previously discussed, was applied to the accepted trials to ensure consistency. However, it is noted that the results derived from the total trials present strong similarities. From the total of 5000 trials, the corresponding number of accepted trials were 3925 for males and 2457 for females.
The analysis results (
Table 9) indicate that similar SI estimations were generated for the geometric mean and the simulation options, which results in the same alternative ranking for every group of responders. However, these rankings are different for each group of respondents, and they also differ from those of the overall analysis. For the male passengers, the 3014 (76.79%) consistent trials indicate a single ranking according to which Alt4 and Alt2 are ranked third and fourth, while Alt1 and Alt3 are additionally ranked second and first. Nevertheless, for the remaining 23.21% (911 trials) Alt1 has better performance compared to Alt3, and they switch positions. Consequently, the
index stands in favor of Alt3 recording a performance of 1.232. The analysis of the female passengers returns almost (99.02%) a single ranking with Alt3 and Alt2 in third and fourth places with certainty. Alt4 is now in first place and Alt1 remains in second place. As observed, male passengers are most satisfied with Alt3 airline, while female passengers are most satisfied with Alt4. Irrespective of gender, both genders agree that Alt1 is the second-best option and Alt2 is in fourth place. The analysis at this stage verifies the overall analysis results, where Alt1 is ranked first, because the same alternative is consistently ranked second, while Alt3 and Alt4 switch places among the first- and third-ranking order. Finally, Alt3 outranks Alt4 in second place in the overall rankings since Alt3 is ranked first with respect to the male cluster, which provides more consistent judgments and, thus, it is overrepresented in the overall trial analysis.
5. Summary and Conclusions
This research proposes a novel decision framework for evaluating airline service provision based on passenger satisfaction. The proposed framework approaches customer satisfaction analysis as a multi-criteria decision analysis problem and uses AHP to estimate SIs. To support the implementation of the AHP, a modification was developed to allow the extraction of relative judgments from a single questionnaire survey, which is the most common way of collecting customer perceptions regarding the quality of the services they receive. The responses were then coded using a five-point Likert scale based on the AHPs’ fundamental scale of preferences, with the aim of maintaining the original properties of the methodology.
As a GMCDM approach, two options are integrated into the analysis. In the typical T-AHP option, global pairwise comparison judgments are estimated using the geometric mean of the individual judgments received, while in the second, the stochastic characteristics of the responses collected are used to build a Monte Carlo simulation-based AHP (S-AHP) version. To minimize the effect of the rank reversal phenomenon, the analysis is conducted as an open system using the ratings option based on which of the alternatives’ expected performance relative to the analysis criteria has been estimated.
The analysis is supported by the responses of a questionnaire survey conducted in the two main airports in Greece. Four airline service providers were examined; three of them are FSCs (Alt1, Alt3 and Alt4), and one an LCC (Alt2). According to the responses collected from 675 passengers, not a single ranking based on the analysis criteria has occurred. LCC airline (Alt2) is ranked first (along with Alt4) for the ticket price criterion (C1.1) and records moderate performance for the seasonal offers and discounts (C1.4) criterion. Apart from the online check-in (C2.1.3) and destination (C2.2.1) criteria, for which it is ranked third, Alt2 is constantly ranked fourth. Consequently, FSCs dominate the first three positions of the analysis. Alt4 is ranked first for 6 evaluation criteria, second for 17 and third for 26. Alt3 dominates the first ranking order regarding its expected performance since it is ranked first for 30 evaluation criteria. On the other hand, Alt1 is ranked first for 15 evaluation criteria, but it presents a more balanced performance at the second and third ranking orders. In detail, Alt1 is ranked second for 23 evaluation criteria and third for 13 of them.
Since the lack of an obvious solution has been verified, an alternative comparative evaluation could only be reached by combining criteria weights and their satisfaction performances. In the current research, both T-AHP and S-AHP have been implemented to obtain the rankings of the alternatives examined. The results derived indicate common rankings, similar SIs, and expected SI estimations for the T-AHP and the S-AHP with and without consistency constraint implementation in the trials under consideration. However, T-AHP implementation may lead to low differences in the SIs and that may provoke disagreement or doubts regarding the ranking obtained. In this research, T-AHP generates an efficient ranking order but not a robust solution, indicating that in a stochastic decision environment the simulation-based extension of the AHP is more efficient in generating candidate alternative rankings. In particular, the ranking order obtained by the T-AHP has been reached in only 8 out of the 3037 consistent rounds of the S-AHP and Alt1, which is ranked third, climbs to first place in the simulation-based extension since the specific airline company is ranked first in 68,36% of the accepted trials. Moreover, the introduced Satisfaction Expected Rankings (SER) index, which effectively summarizes the results of the simulation trials, also ensures the first position for Alt1, the second for Alt3, the third for Alt4 and the fourth for Alt2.
By exploiting the potential of the S-AHP to provide a mapping of the decision space, a family of indices can be extracted to ensure final decisions. Such indices have been included in the present research to ensure a robust decision framework. In particular, SI descriptive statistics and probability distributions, together with the rank order probabilities and SER index estimates obtained for each alternative considered, provide a clearer picture of their final rankings. In this way, the S-AHP produces more robust rankings, which is of great importance, especially when valuable companies are being evaluated. The proposed methodological framework can be used to produce reports on the breakdown of respondents by gender, age, etc., providing useful insights for managers and administrator officials of the airline sector. As a customer satisfaction methodology, it can be further enriched by SWOT-like analysis to highlight strengths, weaknesses, opportunities and threads in a criterion-by-criterion basis to enrich its potential as a decision-making tool.