1. Introduction
Business intelligence (BI), in general, aims at providing decision support—based on empirical information—for various business activities in different domains such as industry, science, technology, healthcare, commerce, defense, etc. [
1,
2]. BI is often used as an umbrella term that combines architectures, tools, databases, analytical tools, applications, and methodologies [
3] summarizing a huge set of models and analytical methods such as reporting, data warehousing, and data mining [
1]. Data about the considered business are the core elements of BI. Modern data mining (or knowledge discovery in data) tools rooted in the field of computational intelligence have given rise to knowledge-based decision support systems (equipped with knowledge bases and inference engines [
4]) which can significantly enhance the formal apparatus of BI.
Knowledge-based BI approaches are well suited for decision support of business activities in aviation industry. It is due to the availability of huge amounts of data collected over time by airlines and airports on various aspects of their operation. A significant amount of domain knowledge is buried and hidden in such data. Effective knowledge-discovery approaches can reveal, in an automatic way, understandable and useful structures, trends and patterns in the considered data to improve accuracy and provide decision explanation in aviation industry decision support. Many data mining techniques—including those rooted in the field of computational intelligence—have been applied to the airline data analysis—see, e.g., [
5,
6,
7,
8,
9,
10]. However, their essential drawback is their non-transparent, black-box, and accuracy-oriented nature, i.e., they do not provide any deeper (or any) explanations and justifications of the decisions made. This work is an attempt to address this problem by providing a solution characterized both by high interpretability and transparency as well as by high accuracy in the airline passenger satisfaction study.
The main goal and contribution of this paper is the application of our knowledge-discovery technique (fuzzy rule-based classification systems) characterized by genetically optimized interpretability-accuracy trade-off (see, e.g., [
11,
12,
13,
14] for details) to decision support related to airline passenger satisfaction problems. The quality of service and passenger satisfaction in the airline industry are increasingly recognized as critical factors of business performance and strategic tools for gaining competitive advantage [
15]. In particular, we discover the hierarchy of influence of particular input attributes upon the airline passenger satisfaction. We also analyze the effect of possible “overlapping” of some input attributes over the other ones from the passenger satisfaction point of view. In our approach, measures of system’s interpretability and accuracy are the separate performance indices and optimization objectives in designing fuzzy rule-based classifiers (FRBCs) from data. Due to the complementary/contradictory nature of both optimization objectives, multi-objective evolutionary optimization algorithms (MOEOAs) are employed in the process of the FRBC’s structure and parameter optimization which is also equivalent to the FRBC’s interpretability-accuracy trade-off optimization (see, e.g., [
16] for review of related work and [
17,
18] for the single-objective optimization case).
First, the recently published Kaggle’s airline passenger satisfaction data set containing as many as 259,760 records and used in our experiments is briefly characterized. Then, main components of our FRBCs and their MOEOA-based learning and optimization are outlined. For comparison purposes, two MOEOAs are used: our generalization (referred to as SPEA3 [
19,
20,
21]) of the well-known Strength Pareto Evolutionary Algorithm 2 (SPEA2) [
22] and SPEA2 itself. In turn, the afore outlined main goal of the paper, i.e., the application of our approach to the Kaggle’s airline passenger satisfaction data and a comparative analysis with an alternative approach are presented and discussed.
3. Methodology: An Outline of Main Components of the Proposed FRBCs and Their MOEOA-Based Learning and Optimization
For convenience of the reader, in this section we briefly outline main components of our approach—see [
11,
12,
13,
19] for details and discussion. In general, we consider FRBC with
n input attributes
(including both numerical and qualitative ones) and an output—a fuzzy set over the set
of
c class labels.
Learning data set
L: it is a basis for the FRBC’s design from data and contains
K input-output samples:
where
(× stands for Cartesian product of ordinary sets) is the set of input attributes and
is the corresponding class label (
) for the
k-th data sample,
.
Attribute representation: each numerical attribute
,
is represented by
fuzzy sets
(
denotes a family of all fuzzy sets defined in the universe
),
.
is an
S-type fuzzy set (representing linguistic term “
Small”),
is an
L-type set (representing linguistic term “
Large”), and
are
M-type sets (representing linguistic terms “
Medium 1”, “
Medium 2”, ..., “
Medium ”). For simplicity,
s denote also the corresponding linguistic terms. Trapezoidal membership functions of
S-,
M-, and
L-type fuzzy sets are shown in
Figure 1. Each qualitative attribute
,
(
) is characterized by
fuzzy singletons
,
defined for particular "values"
of
as follows:
for
and 0 elsewhere. Similarly, class labels
in particular rules (
2) (
) are represented by appropriate fuzzy singletons
with the following membership functions:
for
and 0 elsewhere.
FRBC’s knowledge base: it contains
R genetically optimized fuzzy rules discovered in the learning data set (
1). We introduced the following form of the
r-th rule,
(
R changes during the learning process):
where: (i)
in (
2) denotes conditional inclusion of
into a given rule if and only if (
) is fulfilled, (ii)
returns the absolute value, and (iii)
is a switch which controls the presence/absence of the
i-th input attribute in the
r-th rule,
.
, where is the number of fuzzy sets (linguistic terms) defined for the i-th attribute. For , the i-th attribute is excluded from (not active in) that rule, whereas for the component () is included (active) and for the component is used in that rule ( and ; and are membership functions of fuzzy sets and ).
In our approach two entities, i.e., rule base structure RB and data base DB represent FRBC’s knowledge base. We introduce direct and computationally efficient RB’s representation in the following form:
In turn, DB contains: (i) a tunable part, i.e., parameters of fuzzy sets representing numerical attributes and (ii) a non-tunable part, i.e., domains of qualitative attributes , and the set of class labels .
We developed original crossover and mutation operators for the transformation of the RB population as well as we adopted some specialized crossover and mutation operators for the DBs processing—see [
11,
12,
13] for details.
Evolutionary optimization objectives—FRBC’s accuracy: the accuracy measure (subject to maximization) is defined as follows:
where
;
(characterized by its membership function
) is the fuzzy-set response of system (
2) for the learning data sample
, and
(represented by its membership function
for
and 0 elsewhere) is the desired fuzzy-singleton response for that sample.
Evolutionary optimization objectives—FRBC’s interpretability: we use the notion interpretability in a broader sense including not only semantic aspects of FRBC but also its complexity. We proposed the following measure (subject to maximization) for the evaluation of the FRBC’s complexity-related interpretability:
where
and
The FRBC’s complexity measure
(
7) (
; 0 and 1 represent minimal and maximal complexities, respectively) is an average of three sub-indices that measure an average complexity of particular rules
(
8) and the complexity of the whole system in terms of its active inputs
(
9) and active fuzzy sets
(
9).
in (
8) is the number of active input attributes in the
r-th rule.
and
in (
9) are the numbers of active inputs and fuzzy sets (linguistic terms), respectively, in the whole system.
In turn, the FRBC’s semantic-related interpretability is addressed by us by implementing strong fuzzy partitions (SFPs) [
25] of domains of all numerical attributes. SFPs are fuzzy partitions in which the sum of the values of all membership functions for any domain value is equal to 1. SFPs satisfy the desired semantics-related interpretability demands [
25]. Simple and computationally efficient implementation of SFP requirements for trapezoidal membership functions can be formulated as follows (see
Figure 2 for three-set SFP of
):
and, obviously,
MOEOAs used: as already mentioned in the Introduction of the paper, for comparison purposes, two MOEOAs are used in our experiments. They include the well-known SPEA2 method and our generalization of SPEA2 referred to as SPEA3. Three indices are usually applied to evaluate and compare different MOEOAs [
25]. These indices include: (i) the accuracy of non-dominated solutions obtained, (ii) the spread of solutions in the solution set, and (iii) the distribution of solutions in the solution set. The accuracy represents the closeness of the generated non-dominated solutions to Pareto-optimal solutions (if they are available) or to reference solutions. The spread of solutions in the solution set—measured by the distance between extreme solutions in the set—represents how well the generated solutions arrive at the extrema of the Pareto-optimal or reference solution set. In turn, the distribution of solutions in the solution set represents how evenly the solutions are distributed along the approximation of Pareto-optimal/reference front in the objective space. Therefore, a set of solutions characterized by a higher accuracy, a higher spread, and a better-balanced distribution outperforms the alternative sets of solutions.
An appropriate MOEOA for our experiments should have the ability to generate solution set of possibly high accuracy, spread and distribution balance. It is directly related to the analysis of possible “overlapping” of some of input attributes over the other ones in the airline passenger satisfaction data (see the next section for a detailed presentation). Among the traditional MOEOAs, SPEA2 and NSGA-II (Nondominated Sorting Genetic Algorithm II) [
26] are the best known. Since SPEA2 has a higher ability to generate solution sets of better-balanced distribution ([
22,
26]), we selected SPEA2 for further improvement of the spread and distribution of generated solutions.
The essence of the proposed SPEA2’s generalization (referred to as SPEA3) consists in replacing its environmental selection procedure by our original algorithm, which improves the spread, distribution balance and diversity of generated solutions. The environmental selection, in general, creates a collection of the best solutions out of all solutions obtained so far and keeps them in an external archive.
Table 2 presents differences between environmental selection procedure of SPEA2 and our original environmental selection procedure implemented in SPEA3.
Concluding:
- (a)
in our original approach implemented in SPEA3, the complementary operations of increasing and reducing the archive lead to obtaining the best available distribution balance and spread of solutions belonging to Pareto-front approximation (see [
19] for a detailed presentation and [
20] for a discussion), whereas
- (b)
in SPEA2 only the truncation procedure (if activated) contributes to improving the distribution balance and diversity of the final set of solutions (not addressing, however, the problem of improving the spread of solutions) and giving, in general, worse results than its counterpart of SPEA3.
4. Experiments (Application to Kaggle’s Airline Passenger Satisfaction Data) and Discussion
First, we would like to reveal some details of the operation of our approach to design FRBCs from the considered Kaggle’s airline data. For this purpose, the MOEOA-based genetic learning and optimization experiments for a single learning-test data split are presented and discussed. The ratio of split of the original data was 1:9, i.e., only 10% of the whole data set (preserving the class proportions) was used as the learning data to build the system whereas the remaining 90% of the original data were used for testing purposes. The learning-test data split applied in our experiments is widely used in processing data sets with large number of data samples (in our case, we dealt with as many as 259,760 data samples). Such a data split not only reduces the computational complexity of the learning procedure but, more importantly, formulates much higher demands for the classification technique since “generally, the larger the training dataset, the better the classification performance regardless of which classification algorithm is used” (quotation from [
27]). Therefore, the assumed data split poses a significant challenge for the system’s design technique. For comparison purposes,
Figure 3 presents two 10-element collections of non-dominated solutions (i.e., optimized FRBCs) obtained in a final generation of a single run of our FRBC’s design technique using, independently, our SPEA3 and SPEA2 methods. Both collections represent the best available approximations of Pareto-optimal solutions generated by our SPEA3 and SPEA2, respectively. Particular solutions from a given front were characterized by different levels of optimized accuracy–interpretability trade-off allowing the user to select a single solution (a specific FRBC) characterized by a desired level of compromise between the accuracy and interpretability.
Figure 3 shows that our SPEA3-based approach outperforms the SPEA2-based one by generating the collection of solutions characterized by much-better-balanced distribution in the objective space (the solutions are distributed along the front in a much more even way). The accuracy- and interpretability-related numerical details of all SPEA3-based solutions from
Figure 3 are collected in
Table 3, in which
is the number of input attributes per rule,
and
are the percentages of correct decisions in the learning and test sets, respectively (the remaining parameters were defined earlier in the paper). Fuzzy rule bases of the first seven solutions from
Figure 3 and
Table 3 are presented in
Table 4 and
Table 5 (membership functions of fuzzy sets used in those rule bases are shown in
Figure 4). The remaining three solutions Nos. 8, 9, and 10 from
Figure 3 and
Table 3 have not been included in
Table 5 because—comparing with solution No. 7—they provide very small increases in the test-data-accuracy
(by 0.1%, 0.4%, and 0.5%, respectively—see
Table 3) despite significant increase in their complexity—see their interpretability measures in
Table 3.
Table 4 and
Table 5 also reveal an interesting regularity, i.e., the fuzzy rule base of the solution No.
i contains some rule(s) or extension(s) of some rule(s) from the solution No.
(
). Therefore, if the higher accuracy is required then our approach adds some additional fuzzy rules or extends the already existing rules to provide a more detailed (and thus, more accurate) description of the considered classification problem. Such regularity also confirms an internal integrity of our approach. The considered regularity is illustrated—from a bit different angle—in
Table 6,
Table 7 and
Table 8 (see, first, Part A of
Table 6), in which each black square denotes the presence of a given input attribute in the fuzzy rule base of a given solution (FRBC).
is the test-data-accuracy of the system exclusively based on the most significant input attribute (attribute “Inflight entertainment”, system (solution) No. 1, accuracy 75.2%).
,
is the accuracy increase following the inclusion of 2nd, 3rd,...most significant attribute into the system. For instance, the inclusion of “Seat comfort” (2nd most significant attribute) yields +2.0% increase in test-data accuracy and is related to system (solution) No. 2. In turn, the inclusion of “Type of travel” and “Inflight WiFi service” (3rd most significant attributes) gives further 3.2% test-data-accuracy-increase and is related to solution No. 3, etc.
The above presented reasoning is correct provided that there is no “overlapping” of some of input attributes over the other ones in the airline passenger satisfaction data. In order to verify that aspect of airline data, we remove from the original data set the so-far most significant attribute (i.e., “Inflight entertainment”) and we repeat, in an analogous way, the learning experiment. Its results are presented in Part B of
Table 6 giving “Online boarding” attribute the most significant place. Clearly “Online boarding” which occupied a low position in experiment of Part A was “overlapped” by “Inflight-entertainment”. In the next step, we remove “Online boarding” from the present data set and repeat the learning experiment—see Part C of
Table 6—obtaining “Ease of online booking” as the most significant attribute at this stage. It occupied second position in experiments of Part B. Therefore, we can conclude that it was not “overlapped” by “Online boarding”. We repeat analogous experiments several times, i.e., removing the most significant attribute at a given stage and repeating the learning process on the reduced data—see Parts D, E, F of
Table 7, and G, H, I of
Table 8. In such a way, we arrive to the final hierarchy of input attribute significance from the perspective of the airline passenger satisfaction. It is shown in the left part of
Table 9.
of
Table 9 means the same as in
Table 6,
Table 7 and
Table 8, i.e., the test-data-accuracy of the system exclusively based on a single attribute (listed to the left of
in
Table 9).
The right part of
Table 9 presents the results of alternative approach by Patlolla [
24]. The paper [
24], to our knowledge, is the only available now reference addressing the considered data set (most probably due to the fact that the data set has been published most recently). The approach of [
24] uses SAS system to calculate the mean values of particular input attributes separately for both classes of passengers (satisfied and neutral or dissatisfied) and to build decision tree to determine the attribute importance hierarchy.
Figure 5 summarizes, in a graphical form, details of our approach (the left part of a block scheme of
Figure 5) and the alternative Patlolla’s method [
24] (the right part of that block scheme) for the purpose of their comparative analysis regarding the determination of the final hierarchy of input attribute significance. Data preprocessing, data partitioning, learning process, selection of the most significant input attributes, and determining the final hierarchy of attribute significance are the main stages of performing that task. Since 9 most significant input attributes were selected in the alternative work [
24], we also select the same number of the most significant attributes. Although both approaches select "Inflight entertainment" as the most significant attribute (as shown in
Table 9), our approach provides much deeper insight into the mechanisms encoded in the considered airline data set (see collections of easily-interpretable linguistic, fuzzy classification rules presented in
Table 4 and
Table 5).
An important part of our work is the cross-validation experiment with 1:9 learning-test data split ratio. Each single learning experiment starts with generation of a Pareto-front approximation. Then, a single solution characterized, first, by the highest test-data accuracy and, second, by the highest interpretability is selected from that front approximation. The results from all partial experiments are averaged. The experiment is then repeated 10 times for different initializations of our approach. The averaged results are shown in
Table 10. The only available alternative approach is the aforementioned decision tree of [
24] with 7:3 learning-test data split ratio.
Figure 6 summarizes, in an analogous form as in
Figure 5, details of our approach (the left part of a block scheme of
Figure 6) and the alternative method [
24] (the right part of that block scheme) for the purpose of their comparative analysis in terms of the cross-validation-based test-data accuracy and interpretability. The following stages are distinguished in performing that task: data preprocessing, data partitioning, preparation of single
k-th experiment for 10-fold cross validation (exclusively for our approach), learning process, and calculation of final results. The method of [
24] is outperformed by our approach both in terms of system’s accuracy and interpretability. Our SPEA3-based approach slightly outperforms its SPEA2-based counterpart in terms of accuracy whereas both of them are characterized by comparable interpretability.
Concluding the experimental section of our work, it is worth emphasizing that measurement of airline passenger satisfaction is a key factor for improving service quality in airline companies [
9]. In turn, passenger-satisfaction-based service quality is a strategic tool for gaining competitive advantage [
15]. Various specialized companies and agencies such as, e.g., J.D.POWER (see its last report [
28]) or American Customer Satisfaction Index (see, e.g., its last ACSI Travel Report [
29]) carry out, process, and analyze airline passenger satisfaction surveys to target performance activities that—by attracting more passengers—have a direct impact on profits and reputation.
As far as attributes affecting airline passenger satisfaction are concerned, “In-Flight Wi-Fi Service” and “Simplicity of Online Booking” (an analogical attribute to “Ease of Online Booking” in our research) have been identified in [
30] as those which should be optimized by airlines. According to [
31], “F&B” (i.e., catering service) and “In-flight entertainment” are principal attributes that affect passenger satisfaction. In turn, four attributes, i.e., ”Ease of online booking”, “e-ticketing”, “Boarding” (analogical attributes to “Online boarding” in our research), and "Clearance time" have been selected in [
32] as significant and important factors considered by airline passengers.
In contrast to different methods formulating various sets of attributes affecting airline passenger satisfaction, our approach—employing the modern fuzzy-genetic business-intelligence approach—discovers from the huge set of representative data not only a collection of most important attributes but also formulates a hierarchy of their significance. Moreover, our approach precisely (in percentage values shown in
Table 6,
Table 7,
Table 8 and
Table 9) determines the level of significance of particular attributes from the airline passenger satisfaction perspective. In order to discover the real significance hierarchy of input attributes, the analysis of the effect of possible “overlapping” of some input attributes over the other ones is also performed. Even more, our approach generates collections of linguistic, fuzzy rules (shown in
Table 4 and
Table 5) that provide a precise and easily-interpretable insight into the mechanisms “connecting” the selected input attributes with airline passenger satisfaction or dissatisfaction. For instance, our approach is able to discover in the airline data not only rather obvious mechanisms (see, e.g., rule No. 5 in solution No. 5: IF Seat comfort is high AND Flight distance is long THEN Passenger is satisfied) but also less obvious ones (see, e.g., rule No. 10 in solution No. 6: IF Inflight WiFi service is low or medium AND Customer type is loyal customer AND Ease of online booking is high AND Leg room service is high THEN Passenger is satisfied). The last rule says that for loyal customers, WiFi services are not significant when online booking and leg room services are of a high quality—perhaps, this rule relates to older passengers who usually do not use WiFi devices. Another such an example is rule No. 9 in solution No. 5: IF Inflight entertainment is high AND Inflight WiFi service is low or medium AND Customer type is disloyal customer AND Flight distance is short THEN Passenger is neutral or dissatisfied. This rule says that high quality of inflight entertainment is not enough to satisfy disloyal passengers travelling on short distances when WiFi services are low—perhaps, this rule concerns young passengers travelling on short distances and only interested in inflight WiFi services.
5. Conclusions
This paper presents the application of our MOEOA-based knowledge-discovery business-intelligence technique (fuzzy rule-based classification systems) characterized by genetically optimized interpretability-accuracy trade-off to decision support related to airline passenger satisfaction problems. These problems include, first, discovering—in an automatic way—in a large and representative set of data describing airline passenger satisfaction, optimized collections of linguistic, fuzzy classification rules uncovering the “connections” of input attributes with airline passenger satisfaction or dissatisfaction. Second, the considered problems include determining, in a precise and quantitative way, the level of significance (and thus, the formulation of the significance hierarchy) of particular input attributes from the airline passenger satisfaction perspective. Moreover, in order to discover the real significance hierarchy of input attributes, the effect of possible “overlapping” of some of them over the other ones is carefully analyzed.
The main theoretical contribution of our work, in general, consists in introducing our modern MOEOA-based fuzzy-genetic business-intelligence approach with optimized interpretability-accuracy trade-off to broadly understood airline passenger satisfaction decision support. The interpretability and transparency (i.e., the ability to provide the user with compact and understandable explanations and justifications of the decisions proposed) and the accuracy (i.e., the ability to generate precise and correct decisions) are the fundamental aspects of the operation of any decision support systems including those in the aviation industry. On the other hand, compact, linguistic, fuzzy classification rules—due to their easy-to-grasp interpretation and readability—belong to the most effective knowledge-representation schemes in the considered and also in many other domains.
The main experimental contribution our work is twofold. First, it is the application of our approach to recently published and accessible at Kaggle’s repository airline passengers satisfaction data set containing 259,760 records. The aspects already listed in the first paragraph of this Conclusions section have been addressed. Second, by means of cross-validation-based experiments, we show that our approach outperforms the alternative method of [
24] in terms of both the interpretability and accuracy of the solutions obtained (the paper [
24], to our knowledge, is the only available reference addressing the considered and recently published airline passenger satisfaction data set). We also hope that the findings in this research provide insights that could be used by managers and practitioners from aviation industry in defining service strategies and policies that improve airline passenger satisfaction and, consequently, airline reputation and profits.
Our further work will concentrate on two aspects. First, we intend to investigate additional attributes characterizing airline passenger satisfaction. They include “Disembarkation efficiency” pointed out in [
9] as the most significant attribute characterizing the flight stage immediately after landing as well as “Announcement of delay and arrival”, “Degree of courtesy of staff”, and ”Adult cost” indicated in [
33] as important attributes with which the passengers are most unsatisfied. Second, we intend to concentrate on improving the systems’ interpretability-accuracy trade-off optimization, which is essential for generating highly interpretable and accurate modern intelligent decision systems (cf. explainable artificial intelligence [
34,
35] or interpretable machine learning [
36,
37]).