Prediction of Willingness to Pay for Airline Seat Selection Based on Improved Ensemble Learning

: Airlines have launched various ancillary services to meet their passengers’ requirements and to increase their revenue. Ancillary revenue from seat selection is an important source of revenue for airlines and is a common type of advertisement. However, advertisements are generally delivered to all customers, including a signiﬁcant proportion of people who do not wish to pay for seat selection. Random advertisements may thus decrease the amount of proﬁt generated since users will tire of useless advertising, leading to a decrease in user stickiness. To solve this problem, we propose a Bagging in Certain Ratio Light Gradient Boosting Machine (BCR-LightGBM) to predict the willingness of passengers to pay to choose their seats. The experimental results show that the proposed model outperforms all 12 comparison models in terms of the area under the receiver operating characteristic curve (ROC-AUC) and F1-score. Furthermore, we studied two typical samples to demonstrate the decision-making phase of a decision tree in BCR-LightGBM and applied the Shapley additive explanation (SHAP) model to analyse the important inﬂuencing factors to further enhance the interpretability. We conclude that the customer’s values, the ticket fare, and the length of the trip are three factors that airlines should consider in their seat selection service.


Introduction
With the development of airline business, ancillary services [1][2][3][4][5][6][7] that satisfy passengers' personal requirement are becoming increasingly important for airlines. Ancillary revenue has already played a vital role in airline profit and greatly increases the amount of extra financial revenue for airlines. By improving the quality of ancillary services, airlines increase their user satisfaction [8,9] and the adhesiveness of customers [2,7], which enhances their competitiveness and prevents homogeneity. Due to the worldwide spread of COVID-19, the global market for airlines has reduced dramatically [10][11][12][13][14][15][16][17]. Airline companies are, thus, urgently seeking extra profit to reduce fiscal pressure, leading to more serious competition based on ancillary services.
Airline ancillary revenue refers to income beyond the ticket fare and acts as a directly recommended service or implicit travel experience. Ancillary services are rapidly growing due to the fast-growing airline market (2007∼2018) and the impact of COVID-19 (2019∼2021). Ancillary revenue [18] greatly increased from $2.1 billion to $35.2 billion for the top 10 airlines within 12 years (2007∼2018). The significant growth in airline business in these years brings a great potential market for ancillary service.
Owing to the impact of the pandemic, the airline market faced a dramatic regression (2019∼2021), compelling airlines to seek revenue other than from flight tickets [12,14]. Therefore, establishing ancillary services is significantly important for airlines due to the ability to increase the airline's revenue. This also serves as an approach to solving the problem of customer churn and ensuring the resources are adequately utilized.

1.
We study the seat selection service from the perspective of passengers' willingness to pay, create new features from the original dataset, and propose an ensemble model, named BCR-LightGBM, to predict the willingness of passengers to pay for seat selection.

2.
The experimental results show that BCR-LightGBM outperformed all 12 comparison models in terms of the AUC and F1-score. 3.
We demonstrate the rules learned by BCR-LightGBM by visualizing the decisionmaking phase of two typical samples and analyse the important factors based on the SHAP model.

Airline Ancillary Service
As the representative of high-level transportation, airlines are expected to acquire extra revenue from ancillary services and to satisfy passengers' personal requirements. How to increase their ancillary revenue has become a research hotspot for airlines. Chen et al. [23] studied passenger value on the air market between the Taiwan region and China's mainland. The results demonstrated that business travellers are less likely to perceive a trade-off compared with non-business travellers. Correia et al. [24] studied customers' preferences for ancillary services provided by low-cost airlines.
The research found that low-cost passengers are sensitive to the price of ancillary services. O'Connell et al. [1] employed an online survey to research the preference of travellers to ancillary service. They found that airport car parking and checked baggage charges were the most acceptable services. Wittmer et al. [20] studied the customer value and ancillary services based on a European network carrier's economy class.
The research revealed that the key point of passengers' willingness to pay for ancillary services is their perception of the importance of these services. Warnock-Smith et al. [2] investigated the relation between the willingness of passengers to pay for ancillary services and the pricing of the service. The work found that passengers prefer to choose necessary services that enhance the travel experience, e.g., seat selection, instead of optional services.
These studies examined ancillary services overall instead of focusing on a particular service. Han et al. [25] analysed the role of in-flight food and beverage in re-flying intention. Specifically, the quality of food and beverage, the reasonableness of the price, the airline image, and satisfaction were positive factors that influence re-flying intention. Klislinar et al. [4] analysed the relation between four main factors and the revenue generated from ancillary services based on a survey for Garuda Indonesia customers.
The results showed that passengers valued unbundled products more. Chiambaretto et al. [5] researched the willingness to choose ancillary services for air passengers on long-haul airlines. Five ancillary services, i.e., checked baggage, in-flight meal, seat selection, priority boarding, and onboard Wi-Fi, were analysed and revealed that leisure passengers were more likely to pay for extra services.
Influenced by the COVID-19 pandemic, the strategies for ancillary services by airlines have greatly changed. Various dynamic pricing strategies on airline ancillary services have been proposed to further increase the amount of extra revenue, mitigating the shortage of funds [15]. Vinod et al. [12] proposed an airline revenue planning method, including scheduling, airline pricing, and revenue management, to mitigate the volatility of airline revenue due to COVID-19.
Shukla et al. [26] proposed a dynamic, customer-specific pricing recommendation framework to increase the revenue of airline ancillary services. Compared with human rule-based approaches, the framework dramatically improved the expected revenue in online testing, showing the great potential of machine learning in decision-making.
Kolbeinsson et al. [7] proposed a dynamic and personalized pricing strategy based on flight characteristics and customer needs. The system greatly surpassed human-curated rules over a six-month live-implementation testing. Zhao et al. [6] analysed the passenger's willingness to pay for ancillary services through pricing strategies. By analysing the relationship between the pricing of services and the willingness to pay, that paper further proposed a dynamic pricing model for ancillary services to increase the extra revenue. Moreover, Shaw et al. [27] studied how to increase revenue from third-party ancillary services further increasing the number of sources of ancillary revenue.

Seat Selection
Profit from payable seat selection occupies a great proportion of ancillary profit. Rouncivell et al. [3] utilized UK domestic flights to study the willingness to pay for airline seat selection. They found that ticket fare was an important factor for both business and non-business travel and that passengers who chose the service in the past were more likely to choose it again.
Shao et al. [28] analysed five intercontinental routes from major European airlines to propose a statistical model for advanced seat reservation. The results showed that passengers generally avoid middle seats and prefer to sit in the first row, which provided an empirical foundation for seat selection services.
Zhou et al. [29] focused on the Chinese market to analyse the influencing factors for seat selection in economy class. They concluded that the length of the trip, the seat comfort and convenience, and payment and consumption situations greatly influenced the willingness of passengers to pay.
Yoon et al. [30] focused on customers' demands being uncertain and analysed how to maximize airline revenue by providing payable upgrade options, especially for seat assignment problems. This work analysed the willingness of passengers to pay and proposed some suggestions for airlines. However, to the best of our knowledge, no research has focused on the prediction of passengers' willingness to pay for seat selection.
If passengers who are willing to pay cannot be precisely predicted and targeted by advertisements, airline profits may decrease since customers may tire of random advertising. To solve this problem, in this paper, we propose a model to predict passengers who are willing to pay for seat selection and provide corresponding recommended services to them.

LightGBM
LightGBM (Light Gradient Boosting Machine) [31] is an improved Gradient Boosting Decision Tree (GBDT) [32] combining the decision tree and boosting methods. The essence of GBDT is to take the value of the negative gradient of the loss function in the current model as the approximation of the residual and to iteratively train multiple decision trees according to that value. However, the traditional GBDT model has some shortcomings, e.g., difficult parallelization, high computational cost, and not being suitable for highdimensional sparse data.
LightGBM overcomes the shortcomings of traditional GBDT by supporting parallelized training to achieve fast speeds and low memory consumption when processing huge amounts of data. The biggest difference between LightGBM and other GBDT models is that the other models pre-sort the feature values and find the optimal division point according to the sorting result. The implementation is simple but difficult to be optimized. When the dataset is large, the training process occupies a great deal of memory, which leads to a waste of CPU cycles and reduces the training speed. To solve this problem, LightGBM applies a histogram-based algorithm to discretize the numerical features into K discrete values and to pick the value that achieves the highest accumulated number as the split point.
LightGBM utilizes a sampling algorithm named Gradient-based One-side Sampling (GOSS) to reduce the number of instances. The algorithm excludes most low-gradient samples and calculates the information gained by the other samples, maintaining the performance of the model when the training dataset is reduced. To further improve the computational speed, LightGBM applies a bundle method, Exclusive Feature Bundling (EFB), to combine features that have small conflicts or are totally exclusive to reduce the number of features. Although the algorithms mentioned above greatly reduce the computational consumption, the performance of the model is also decreased. To handle this issue, LightGBM introduces a leaf-wise splitting mechanism, which effectively reduces the loss and increases the precision.

Methodology
How to predict the willingness to pay for airline seat selection is an urgent problem that needs to be solved. We utilize real air passenger history provided by Neusoft (described in Section 4.1) to predict their willingness since the corresponding dataset is hard to collect from individuals rather than from airlines. First, we construct and select new features to overcome the data sparsity and the curse of dimensionality. Then, we propose Bagging in Certain Ratio LightGBM (BCR-LightGBM) to solve the issue of imbalance.

Feature Construction
To overcome the problem of data sparsity, we construct new features on the basis of the original dataset. There are three types of data in the dataset, i.e., date, numeral, and category; we apply different transformations on each. For the date and time, e.g., "16 December 2018 20:00", we construct two features to indicate the season (month-wise) and time period (hour-wise), shown as follows: where x is the month of the flight. 13, 14, 15, 16, 17} c, x ∈ {18, 19,20,21,22 where x is the hour of the flight.
For numerical and categorical features, the names of the features are divided into two parts, i.e., characteristic (prefix) and time interval (suffix). The characteristic denotes the history of the passenger or the inherent property of the flight. For instance, "dist_all_cnt" indicates the total mileage of a passenger and "pax_fcny" indicates the fare of the flight ticket. The time interval denotes the time scope of the characteristic, e.g., "dist_all_cnt_m3" represents the total mileage of a specific passenger collected from three months ago to the current time. The time interval includes five scopes, i.e., 3 months, 6 months, 1 year, 2 years, and 3 years. For simplicity, we call the characteristics prefixes and time intervals suffixes in the following.
We observe that the issue of sparsity is severe for both numerical and categorical features. To overcome the sparsity for the numerical features, we directly conduct statisticsbased transformation on the original numerical features, i.e., maximum, minimum, mean, and variance. On the basis of the transformation, we improve the interpretability of each numerical feature. The features newly formed are named "prefix_max", "prefix_min", "prefix_mean", and "prefix_std" for each prefix.
For each categorical feature, we define two sub-flags, named secondary indexes, to indicate the relationship between the target and the feature. The secondary indexes are represented as "prefix_T" and "prefix_F", where "T" denotes a passenger paying for seat selection services and "F" is a passenger who does not. The constructional rule is shown as follows: 1.
According to the target, all values are divided into two sets for each categorical feature. These two sets are denoted as S 0 and S 1 , which contain values {x|x ∈ {0, 1}}.

2.
If S 0 or S 1 contains 0, delete it from the set.

3.
Construct two new features, "prefix_T" and "prefix_F", based on the transformation rule followed: where x represents the values of a sub-label, and pre f ix is the prefix of the feature.

Feature Selection
In addition to sparsity, the dataset suffers from dimensionality. In this section, we apply Pearson correlation coefficient-based [33] and chi-square test-based [34] feature selection techniques to reduce the dimension of numerical and categorical features, respectively. To select the numerical features, we perform the process below:

1.
Calculate the Pearson correlation coefficient between any two features and sort them in descending order.

2.
Set a preserve threshold and delete the threshold to indicate the state of features.

3.
For each feature set (a, b), if the correlation coefficient is less than the preserve threshold, we set a as a preserve state; if the correlation coefficient is greater than the delete threshold, we set a as a delete state.

4.
If feature a is in the delete state, delete it unless b is in the delete state.

5.
If feature a is in the preserve state, delete feature b unless b is in the preserve state. 6.
If features a and b are in neither the delete state nor the preserve state, delete the feature with the smaller variance.
For each categorical feature, we validate the mutual independence between it and the target through a chi-square test. If the feature is independent from the target, we directly delete the feature.

BCR-LightGBM
Predicting whether air passengers are willing to pay for seat selection is a binary task. In this section, we illustrate the structure of Bagging in Certain Ratio LightGBM (BCR-LightGBM). To ensure the robustness, multiple LightGBMs are assembled through a bagging method [35]. Bagging ensembles are multiple models that were trained by subsets extracted from the original dataset through bootstrap sampling. The result is obtained by applying an average or voting strategy. By utilizing bagging, the effectiveness and stability are improved and the variety of the model is lowered. However, the bootstrap sampling does not change the data distribution; thus, it cannot overcome the data imbalance in the original dataset.
To mitigate the imbalance of the original dataset, we only sample the negative samples in the sampling phase and then combine them with the positive samples to create the subset. Note that the ratio between positive and negative needs to be pre-assigned since different ratios lead to different results. Then, each LightGBM is trained through the subsets sampled, and the prediction is the average of their results. The training process of BCR-LightGBM is shown in Algorithm 1.

Experimental Results
In this section, we compare the proposed BCR-LightGBM against various machine learning algorithms and sampling-based methods. Then, we illustrate the decision-making procedure of a decision tree in BCR-LightGBM on two real samples to demonstrate the learned mode of the model. Furthermore, we analyse the feature importance through a SHAP model to improve the interpretability. Extensive experiments are conducted on a 64-bit Ubuntu 16.04 operating system. The setting environment is as follows: CPU: Intel (R) Xeon(R) Silver 4114 CPU @ 2.20 GHz, memeory: 64 RAM, and graphics: GeForce GTX 1080 Ti.

Data Description
This paper uses the dataset of air passenger willingness to pay for seat selection provided by Neusoft http://fwwb.org.cn/attached/file/20201211/20201211132638_47 .zip (accessed on 29 November 2021), consisting of flight information, passenger history, and customer characteristics, which are shown in Table 1. The dataset comprises 23,432 samples, and the feature dimension is 657, which increases the risk of dimensionality [36]. Note that the dataset contains features with the same prefix.
Positive samples in the dataset indicate the people who paid for seat selection, and negative samples represent the people who did not. Note that the dataset is extremely imbalanced [37,38], where the ratio between positive and negative is 1 : 15. The situation is common for ancillary services since the majority of people do not choose extra services even though ancillary profits dramatically aid airlines.

Metrics
In this section, we introduce the five metrics, i.e., Accuracy, Precision, Recall, F1-score, and ROC-AUC, used to evaluate the performance of the model. Accuracy is the simplest metric, which is defined as the number of correct predictions divided by the total number of predictions, indicating the proportion of correct predictions. Precision and Recall are two mutually influencing indicators, where Precision indicates the correctness of the prediction and Recall indicates the prediction performance for users who are willing to pay for seats. However, there are many cases in which these metrics are not good enough to indicate of the model performance.
A scenario is when the class distribution is imbalanced, e.g., the case in the experiment. In this case, even if the model predicts all samples as the most frequent class, the performance of these metrics would obtain a high accuracy rate. However, the model is not learning anything and is simply predicting every sample as the top class. For the dataset used in the experiment where the negative class occupies around 93.7% samples, if the model predicts all instances as negative, it would result in a 93.7% accuracy.
To cover the shortage of these metrics and better indicate the model performance, we further introduce the F1-score and ROC-AUC (area under the receiver operating characteristic curve). The F1-score combines Precision and Recall into a single metric, with the cases in which both Precision and Recall are important. The indicator is the harmonic average of Precision and Recall, and always achieves a trade-off between them, which is generally applied to indicate the overall performance of model when the dataset is imbalanced.

ROC-AUC indicates the area under the ROC (receiver operating characteristic curve)
where the ROC is used to show the performance of a binary classifier. Specifically, the ROC-AUC is an aggregated measure of performance of a binary classifier on all possible threshold values. Thus, the indicator is not sensitive to threshold. When the ratio between positive samples and negative samples changes, the ROC-AUC value does not change dramatically. TP is the number of instances correctly classified as positive, TN is the number of instances correctly classified as negative, FP is the number of instances incorrectly classified as positive, and FN is the number of instances incorrectly classified as negative.

Comparison Models
In this section, we introduce various comparison models used in the experiments, including machine-learning methods and sampling-based methods.

1.
LR (Logistic Regression) is a simple linear model that can be easily interpreted, where the performance greatly relies on feature engineering. 2.
KNN (K-Nearest Neighbours) is a learning-free model that classifies a sample based on the k-nearest samples in the feature space. 3.
SVM (Support Vector Machine) is not sensitive to outliers due to the inherent properties of support vectors. However, the kernal function should be dedicated and designed to fit the input space.

4.
AdaBoost (Adaptive Boosting) is a boosting method, which dynamically adjuncts the weight of each base learner to improve the robustness. 5.
GBC (Gradient Boosting) is a boosting method in which the objective is to find the optimal solution in the parameter space by fitting the residual error of a previous learner. 6.
RF (Random Forest) adopts bagging to improve the robustness, where the decision tree is a base learner that has been widely used in various fields. 7.
XGBoost (eXtreme Gradient Boosting) [39] is an extension of GBC that achieves better performance and scalability. 8.
LightGBM (Light Gradient Boosting Machine) [31] is an extension of GBC. Compared with XGBoost, LightGBM is faster and lighter. Note that LightGBM is the base learner in BCR-LightGBM. 9.
RUS (Random Under Sampling) randomly samples negative instances until the number is the same as that of positive instances. 10. ROS (Random Over Sampling) randomly samples positive instances until the number is same as that of negative instances. 11. SMOTE (Synthetic Minority Over-Sampling) [40] is an over-sampling method, creating synthetic instances for minorities based on the nearest neighbours. 12. SMOTE-ENN (Synthetic Minority Over-Sampling and Edited Nearest Neighbours) [41] is the combination of SMOTE and ENN, which applies ENN to clean the samples created by SMOTE.

Comparative Analysis
To demonstrate the superiority of BCR-LightGBM, we compare the performances of the model against existing machine-learning methods and sampling methods in this section. For the proposed BCR-LightGBM, we set the ratio between positive samples and negative samples to 1:3. Table 2 shows the performance comparison between the proposed BCR-LightGBM and machine-learning models without sampling. Note that BCR-LightGBM outperforms all methods in terms of the F1-score and AUC, which are two widely used indicators when the dataset is imbalanced. Furthermore, BCR-LightGBM achieves the narrowest gap between Precision and Recall. For other compared methods, the Precision is much higher than the Recall, indicating that these models only find a small set of passengers who are willing to pay for seat selection when reducing the error rate.
In other words, these models only can identify instances of people who are the most likely to pay for seat selection. However, this limitation is unnecessary for airlines as the cost of advertising is not too unacceptable that messages cannot be advertised to a relatively large set of people. BCR-LightGBM achieves a desired Recall and an acceptable Precision, satisfying the requirements of airlines. Table 3 shows the comparative results between the BCR-LightGBM and samplingbased methods. The proposed model achieves the best score in terms of Accuracy, Precision, F1-score, and AUC. Note that sampling-based methods are generally better than machinelearning models, which is reflected in the narrower gap between Precision and Recall. Furthermore, the performance of under-sampling-based methods (RUS and SMOTE-ENN) is worse than over-sampling-based methods (SMOTE and RUS) because under-samplingbased methods drop a large number of instances in the original dataset, leading to the model not effectively learning the characteristics of the discarded samples and increasing the possibility of under-fitting.
The over-sampling-based methods create new samples to mitigate the issue of imbalance, improving the performance even though noise is introduced. Although BCR-LightGBM applies an under-sampling method, its performance is better than that of oversampling-based models since the ensemble strategy is utilized. To further illustrate the performance of BCR-LightGBM, we plot the ROC (Receiver Operating curve) of machine-learning methods and sampling-based methods, as shown in Figure 1. In Figure 1a, we compare the proposed model against various machine-learning methods, and, in Figure 1b, we compare BCR-LightGBM with LightGBM based on different sampling strategies. The ROC demonstrates the trade-off between Precision and Recall. We note that the curve of BCR-LightGBM wraps around all other curves, indicating that the proposed model surpasses all other methods. Note that the superiority of BCR-LightGBM is derived from the ability to correctly learn the relationship between important factors. The relation cannot be learned by other models. We attempted to analyse this from the perspective of model capability. A simple linear model, i.e., LR, cannot perform feature crossing, which limits its capability to learn the relation between features. KNN classifies samples through k-nearest neighbours in the original feature space, and the correlation between features cannot be identified.
Although SVM is not sensitive to outliers, it also cannot perform feature crossing and the kernel function needed to be dedicated in design. AdaBoost dynamically changes the weight of the base learner; however, the weight is sensitive to the data distribution. GBC finds the optimal solution through a descending gradient, which is dramatically influenced by an imbalance in the dataset. Although RF, XGBoost, and LightGBM achieve great performances in various fields, they are still weak when solving with data imbalances.
In summary, these models cannot solve or are weak when solving the issue of data imbalances, mainly in learning information from negative samples, which leads these models to not correctly find the relationship between important factors. However, the proposed BCR-LightGBM sets the ratio between the positive samples and negative samples at a certain value, mitigating the impact of negative samples and achieving the best performance.
For sampling-based methods, RUS causes information loss by dropping existing samples, and ROS magnifies the impact of outliers in positive samples. SMOTE creates new samples based on the samples in the dataset but introduces noise. Although SMOTE-ENN leverages ENN to clean the samples generated by SMOTE, the data distribution may be further misled due to the lack of prior data. To reduce the impact of noise and to avoid information loss, BCR-LightGBM applies random under sampling to avoid extra noise and uses an ensemble approach to learn all information in the dataset, thereby, improving the robustness.

Hyperparameter Analysis
To further analyse the performance of BCR-LightGBM, we conducted experiments to analyse two important hyperparameters, i.e., the ratio between negative samples and positive samples, and the number of LightGBMs in the model. Figure 2 shows the performance of BCR-LightGBM under different ratios between negative samples and positive samples. In the figure, α is the ratio between negative samples and positive samples. When α = 1, the number of positive samples is equal to the number of negative samples. For a fair comparison, the number of LightGBMs is set at 100. Note that, if the ratio is too large, the model does not learn the correct relationship between features, thus, causing serious model degradation.
We observe that, when the ratio between negative samples and positive samples is 3:1, the model achieves the best performance in terms of the F1-score and AUC. In fact, with the increase in the ratio, the F1-score and AUC dramatically decreased and the gap between Precision and Recall becomes large. We do not use Accuracy to indicate the performance of the model because, even though all samples are predicted to be negative when the data distribution is imbalanced, the indicator still maintains a high level. Figure 3 shows the impact of the number of LightGBMs (β) on the model performance. For simplicity, we select ROC-AUC as the indicator from the five matrices. To completely demonstrate the impact of the number of base classifiers, i.e., LightGBM, on the BCR-LightGBM, we conduct experiments when the ratio of negative samples and positive samples is in {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14}. From the figure, we observe that the increase in the number of LightGBMs can greatly improve the performance of the model since the model obtains a desirable performance gain even though the number of LightGBMs is relatively small. Moreover, with the increase in the number, the model shows strong robustness because the model rapidly converges even though slight fluctuation occurs. Note that, the performance of BCR-LightGBM generally converges when the number of LightGBMs is less than 50, thereby, demonstrating the robustness and stability of the model.

Importance Analysis of Influencing Factors
In addition to comparing the performances between BCR-LightGBM and existing models, we also attempt to explain the rules learned by the model. Figure 4 illustrates a decision tree in the proposed model. For simplicity, we only visualize the top four layers. We set the layers of the model to seven to improve the capability. We note that the flight cabin ('seg_cabin') is located within the top of the tree, indicating that the cabin is most discriminative factor for seat selection in terms of the Gini index. Moreover, we find that some flight information, e.g., the tax of the ticket (pax_tax) and the month of travel (seg_dep_month), influences their willingness to pay.
In addition to that, the history of flights shows whether the passenger values also contribute to their willingness to pay, e.g., the number of paid seat selections (select_seat_cnt_max), number of seats by the window (select_window_cnt_var), number of international tickets (tkt_i_amt_max), point additions from airline mile accumulation (pit_add_air_cnt_y1), number of economy class flights (cabin_y_cnt_max), number of first-class flights (cabin_f_cnt_max), and member level (member_level).
To illustrate the rules learned by the model, we selected two typical samples from the dataset to simulate decision-making by the model, as shown in Table 4. The positive sample is a person who pays to select a seat, and the negative sample is one who does not. The features in the table match the corresponding split point in Figure 4. For the positive sample, we observe that the passenger is a frequent flyer since the value of corresponding factors is high, e.g., the times of economy class (cabin_y_cnt_var), the times of first class (cabin_f_cnt_max), and the total amount of international flight mileage accumulated (tkt_i_amt_max, tkt_i_amt_min).
The passenger always pays to select a seat (select_seat_cnt_max) and prefers to seat by the window (seat_window_cnt_var). Thus, the positive sample has a high customer value. We assume that the passenger generally takes business trips due to the frequency of flight and their willingness to pay for seat selection. Intuitively, a business traveller is generally willing to pay for seat selection to acquire seats that provide them with better rest. The model learns this mode following the orange arrow presented in Figure 4.
For the negative sample, we observe that the passenger does not always take flights due to the low number of flights in economy class (cabin_y_cnt_var) and in first class (cabin_f_cnt_max), the slow accumulation of points (pit_add_air_cnt_y1, pit_income_avg_amt_var), and the unwillingness to pay for seat selection (select_seat_cnt_max). Intuitively, these kinds of passengers are not willing to pay for seat selection. The model can learn this mode following the blue arrows in Figure 4. Table 4. Two typical samples. The positive sample is a person who pays to select a seat. The negative sample is a person who does not pay to select a seat, representing people who do not always take flights. These two samples are used to demonstrate the learning mode of the decision tree in Figure 4.

Positive Sample
Negative We further utilize a SHAP [22] model to enhance the interpretability of BCR-LightGBM. To interpret the results of ensemble models, SHAP [22] provides an approach to explain the prediction of ensemble models utilizing the contributions of allocation methods from cooperative games. The model considers the contribution of features for the prediction and calculates the feature importance based on that. Compared with feature importance derived from LightGBM to indicate the number of times to create a split point, the SHAP model explains the influences of each sample for the prediction and indicates the positive or negative effects on the prediction. To obtain the influence of each feature, SHAP calculates the Shapley value [42] of each feature.  Figure 5 shows the feature importance based on the Shapley value. Note that aircraft cabin (seg_cabin), ticket tax (pax_tax), ticket fare (pax_fany), the gap between current and recent travel date (recent_gap_day), and total international flight mileage (dist_i_cnt_max) have the greatest impacts on the willingness of passengers to pay for seat selection. According to the SHAP value, we note that flight information has a great influence on the prediction since five features are important, as presented in Figure 5, i.e., aircraft cabin (seg_cabin), ticket fare (pax_fany), ticket tax (pax_tax), and the month of flight (seg_deg_month).
We conclude that passengers who pay for better aircraft cabins with higher ticket fares and taxes are more likely to choose the extra seat selection service and that those who travel in fall or winter are more likely to pay for these services. Furthermore, passenger history, which denotes the customer value, also greatly influences the prediction result, i.e., the gap from the most recent flight (recent_gap_day), total mileage and international mileage (dist_all_cnt_mean, dist_i_cnt_max), and average ticket fare and international ticket fare (tkt_avg_amt_max, tkt_i_amt_max). In general, the higher the total mileage and international mileage, the higher the average and international ticket fare, and the more frequently a passenger travels, the more likely the passenger is to pay for a seat selection service. According to the sample analysis based on a visualization of the decision tree and SHAP-based feature importance analysis mentioned above, we conclude the following: (1) Passengers who have a high customer value, reflected in the total fare of a ticket, the total mileage accumulated from flights, and the frequency of flights, will pay for seat selection. Thus, airlines can recommend seat selection services to them since they may pay more attention to their comfort on the plane. (2) Passengers who choose airlines with higher ticket fares are more willing to pay for seat selection. The fare of the ticket also denotes their customer value when the customer history is difficult to acquire. Airlines can easily identify the willingness of a passenger to pay for seat selection based on information from a single flight and can recommend seat selection services to them. (3) Passengers who make international flights are more likely to pay for seat selection.
We assume that this is because passengers like to have a more comfortable experience in long-haul flights. Therefore, airlines can recommend these services to passengers in long-haul flights.

Conclusions
Ancillary service revenue has become important for airlines in recent years. Under the impact of COVID-19, how to precisely provide personalized ancillary services to passengers to increase revenue and how to mitigate capital shortages are problems that need to be urgently solved. In this paper, we analysed seat selection services from the perspective of the prediction of the willingness of passengers to pay for seat selection and propose a machine-learning-based method to identify their willingness to pay for seat selection. Specifically, we proposed a model, named BCR-LightGBM, to identify passengers who are willing to pay for seat selection as the basis of recommendation.
We first preprocessed the original dataset to overcome the data sparsity and the curse of dimensionality inherent in the dataset. Then, the bagging method was applied, where positive samples and negative samples were combined at a specific ratio for multiple subsets to solve the problem of data imbalance. The experimental results demonstrated that the proposed model achieved 0.28 and 0.77 in terms of the F1-score and AUC, outperforming all existing machine-learning models and sampling-based methods.
Finally, we analysed two typical samples based on the visualization of a decision tree in BCR-LightGBM and applied a SHAP model to further enhance the interpretability by analysing feature importance. We note that customer value, ticket fare, and flight length had positive influences on the willingness to pay for seat selection. Based on this rule, airlines can recommend seat selection services to the corresponding passengers to increase their revenue.
The limitation of this research is that the number of samples is relatively small and cannot cover all situations regarding seat selection around the world. Thus, our conclusions may only be appropriate in similar cases to those contained in the dataset. In future research, we will collect more samples from different airlines to make the conclusions more convincing. We will further study the intrinsic properties of these important factors and mine knowledge from the dataset to guide the recommendation policies of airlines to increase revenue from other ancillary services, e.g., priority boarding, checked baggage, and onboard Wi-Fi.