1. Introduction
Revenue administration is the application of data frameworks and estimating schemes, and it is employed to assign correct proportions to an appropriate client at a genuine price [
1]. It was initially created in 1966 by the aircraft industry [
2] and was subsequently embraced by more service provider businesses, such as hotels, rental cars, golf courses, and casinos [
1,
2]. In the hospitality industry (rooms division), the definition of revenue administration is “making the right room available for the right person and the genuine price at the apparent time via the right circulation medium” [
3]. Considering that lodgings (hotels) have an established number of rooms, and that they offer them as a perishable item to provide the right room to a suitable individual, lodgings have to acknowledge appointments ahead of time. Booking is a kind of an agreement between a lodging and its clients [
4], and it gives clients the right to cancel an agreement. For hotels, bookings in advance are the main indicator of a hotel’s forecast performance [
5]. However, cancelations impact hotels more than guests, as a hotel should have rooms for clients who respect their bookings but, at the same time, it struggles financially when a client cancels a booking or does not show up [
4]. A booking cancelation occurs when a client closes their contract before their entry, while a no-show is when a client does not inform the lodging of a change in plans and fails to check in.
However, booking may also be canceled due to some apprehensible reason such as bad weather, vacation rescheduling, sudden illness, change in meeting place, and many others. However, [
6,
7] pinpointed that, currently, a sizeable number of cancelations occur because of deal-seeking clients who seek out the best bargain. Occasionally, these customers keep looking for a better deal for the same service or product, even after they have booked. In a few cases, clients indeed make additional bookings to secure their alternatives and, after that, cancel all but one [
4]. As a result, cancelations have a compelling effect on demand administration choices within a revenue administration framework.
While exact predictions are a rigid instrument in terms of revenue administration performance, predictions are, without a doubt, influenced by cancelations [
4]. Booking cancelations can comprise up to 20% of all bookings acknowledged by a given lodging [
8], and this can rise to 60% in the case of airplane terminal and roadside lodgings [
9]. However, with such a large cancelation rate, to mitigate loses, hotel managers have implemented many overbooking strategies and restrictive cancelation policies [
3,
4,
10]. However, such strategies can have a negative effect on a hotel’s revenue, as well as its social image. For illustration, overbooking can spur a lodging to deny renumeration to a client, which can influence the latter’s perception of the lodging and persuade them to seek another lodging [
11]. Restrictive cancelation policies, particularly non-refundable and 48 h advance cancelation deadlines [
10], decrease client bookings, as well as income, due to the application of impressive cost rebates and the number of bookings [
6,
10].
In machine learning (ML), supervised learning is ordinarily partitioned into two sorts of problem [
11]: “regression”, when an output is quantitative (e.g., stock market prediction), or “classification”, when an output is categorical or discrete (e.g., forecasting in the case of hotel bookings that show whether a customer “will cancel booking” or “will not cancel booking”). Evidently, several studies in the existing literature have already proposed strategies to relieve the consequence of cancelations in terms of revenue and stock allotment, cancelation arrangements, and overbookings [
5,
12,
13]. However, most of the published research focused on the carrier industry, which differentiates itself from the hospitality industry from a number of perspectives [
14,
15,
16,
17,
18,
19]. For instance, in the carrier industry, the demand forecast is used to determine the number of seats under a particular class (like economy, business, and semi-business class) [
16]. Furthermore, in the carrier industry, the task is to predict the optimal limits on the number of bookings that can book for a particular reservation class, whereas in hotel booking, tourists book for a separate room according to their budget and the facilities they are looking for [
14]. Given this, in the hospitality industry, external factors such as—location, weather condition, visiting place etc.—plays an important role; however, in the carrier industry, these factors do not have much importance. However, in recent years, research related to the hospitality industry has gained wider attention [
20,
21]. Most research has used traditional statistical methods such as regression [
9], whereas some research has used the advantages conferred by machine learning methods and techniques [
21]. A similar plan applies to the exploration of demand forecasting to anticipate retractions, particularly in relation to hospitality [
8,
9,
22,
23]. Moreover, only three investigations have utilized information specific to lodgings (property management systems—PMS information) [
9,
22,
24]. Furthermore, the other two investigations utilized passenger name record (PNR) information, which is an aircraft industry standard set up by the International Air Transport Association (International Civil Aviation Organization, 2010).
Much of the literature has also assumed booking cancelation to be a “regression problem”. However, the prediction of hotel cancelations using machine learning is limited, and only a few studies have considered it a classification problem [
22,
24,
25]. In fact, authors in [
8] specified that “it is hard to say that one can predict whether a booking will be canceled or not with high accuracy”. Moreover, António et al. [
24] presented that it is possible to predict hotel cancelations as a classification problem using machine learning approaches, and they achieved high accuracy in their study. They evaluated a set of machine learning classifiers for four separate resort hotels in Portugal. Authors in [
25] checked the effectiveness of machine learning models in a real environment, and they built a prototype model with computerized AI and intended to search property management systems (PMS) information from past forecast hits and mistakes.
Since booking cancelations can be solved as regression and classification problems, it is important to know when to choose between these two methods. For instance, when the only aim is to estimate cancelation rates, then it can be considered a regression problem; however, when the aim is to estimate the likelihood of a booking being canceled and to understand the reason for such a cancelation, it should be considered a classification method [
26]. Furthermore, classification allows for the estimation of an overall cancelation rate [
26]. Another reason to consider booking cancelation as a classification problem is that, from class output, it is also possible to achieve a quantitative output [
24]. For instance, in [
24], the authors suggested that the number of bookings predicted as “will cancel” on a certain day can be removed from the demand to achieve the net demand, while cancelation rates can be calculated by dividing the total bookings predicted as “will cancel” by the total number of bookings for a certain day. In this study, we also consider hotel cancelation as a classification problem.
Moreover, in ML, classification algorithms consider that every class has an equal number of examples, which, in practice, may also fail due to class imbalances. In an imbalanced dataset, the class with fewer examples is called a minority class, and the category with many examples is called a majority class. Machine learning algorithms that use imbalanced datasets overlook this imbalanced distribution of classes that ultimately results in poor performance for the minority class (because a model will learn more about the majority class during classifier training, creating model bias for the majority class) [
27]. In terms of hotel booking cancelations, the minority class is classified by its “will cancel booking” attitude; thus, if we train classifiers on imbalance data for hotel booking cancelations, the classifiers will mostly learn about the majority class, or the “will not cancel booking” class. This erroneous information can have a significant effect on a hotel’s revenue and reputation, as, in most cases, hotel administrators assume that a particular booking will not cancel, since the classifier is trained in a certain way to demonstrate that a particular booking will not be canceled; in reality, however, the opposite might occur. As a classifier trained on an imbalanced dataset can become a challenge for hotel administrators, and they are therefore unable to properly track which booking might cancel; actions are required to generate revenue for the hotel and manage the image of said hotel in the eyes of their customers. This imbalanced distribution of classes also exists in hotel booking cancelation classifications. This question has not been addressed in previous studies, and there is a need to address it so that hotel administrators can create better policies and take certain actions to increase revenue.
To overcome the abovementioned shortcomings, this study introduces a synthetic minority oversampling technique and an edited nearest neighbors (SMOTE-ENN) algorithm to address the issue of class imbalance in the case of hotel booking cancelations. This algorithm first generates the examples for a minority class with the help of SMOTE. Thereafter, it uses the neighborhood noise removing rule based on the edited neighbor (ENN) [
28] to discard the extra overlaying between classes, which eliminates samples that vary from two examples in the three closest neighbors [
29]. Therefore, the methodological contribution of this research is the introduction of SMOTE-ENN to address the problem of class imbalance in the case of hotel booking cancelations, i.e., the associations between over-sampling and under-sampling techniques. By over-sampling, it creates examples for the minority class and discards the noise from the dataset using the ENN under-sampling technique. In this research, we present a hybrid approach that combines the oversampling method and a machine learning algorithm for hotel cancelation predictions. Our approach first utilizes the SMOTE-ENN to adjust class distributions. Next, it uses machine learning algorithms for hotel cancelation predictions. The first experiment was conducted to normalize the data. The second experiment balanced the class distribution using SMOTE-ENN. A comparison between proposed and current methodologies is assessed in the third experiment. Furthermore, we also used feature selection and feature engineering for selecting important features that have greater impact in prediction for further improvements. The remainder of this composition is characterized as follows:
Section 2 presents a literature review related to the hospitality industry and hotel cancelations.
Section 3 presents a procedure for hotel cancelation predictions, which initially sums up the trial dataset and our oversampling method (SMOTE-ENN). As the fundamental contribution of this study,
Section 3 presents the hybrid approach for hotel cancelation predictions. In
Section 4, we show the experimental results of the study and compare them with existing methods.
Section 5 presents the conclusion of the study. Finally, implications, limitations and future research issues are presented in
Section 6 and
Section 7, respectively.
2. Related Works
Booking cancelation is a well-known issue in revenue administration, and it is applicable to the service industry and, most importantly, to the hospitality industry. Customers’ increasing interest in the internet has changed the way in which they buy or look for any service. Current customer behavior has a considerable influence on contemporary research on the issue of booking cancelations, particularly that related to the effects of cancelations on revenue and inventory allocation, as well as on cancelation and overbooking policies [
12,
13]. That said, there is minimal literature related to booking cancelations in the hospitality industry. For instance, authors in [
23] presented a neural network model and a regression neural network model for predicting customer cancelations. Their study showed that both prediction models achieved good prediction capabilities and could be useful in service capacity scheduling. Authors in [
20] used competitive sets, a recursive approach for forecasting daily occupancy in a hotel. Other authors in [
30] applied a linear approximation technique to decide price and seat control simultaneously in the airline industry. Authors used a data mining method to forecast cancelations at any time, and they addressed the behavior of customers in different stages of booking [
8].
With rapid advancements in affordable data storage, huge amounts of data availability, less expensive, and more powerful computing have all contributed to the success of ML [
26]. In turn, this has motivated industries to develop robust ML models for analyzing big and complex data simultaneously [
27]. Machine learning tools facilitate the identification of beneficial liberties and risks [
28], making ML use progress rapidly and strengthening the employment of ML in nearly every field [
29]. However, in the case of hotel cancelations, there are only a limited number of studies that have utilized ML algorithms. For instance, authors utilize data science methods to synthesize the current fining of booking cancelations in travel- and tourism-related industries, and they have identified a new topic related to booking cancelation research [
31]. Authors have also employed big data to improve hotel demand and its deviation from booking cancelations [
32]. Their study suggests that, by identifying cancelation factors, this model helps hotel management understand cancelation patterns and allows them to make changes or adjustments in a hotel’s cancelation policies and tackle overbooking according to clients’ booking behaviors. Other authors have addressed hotel cancelation as a classification problem, and their study shows that a classification model can achieve suitable accuracy [
24]. They included four hotels in their study to predict hotel cancelation rates. They presented an automated machine learning-based support system to predict hotel booking cancelations, developing two prototypes and observing their performance. Their system was able to allow hotels to predict overall demand, which helps hotels to make better decisions and act on which booking should be accepted or rejected, as well as to make key changes in booking and room prices.
None of the previous studies explored the issue of imbalance in hotel cancelation predictions. As such, in this research, we combined the imbalanced SMOTE-ENN method with a machine learning classifier to predict hotel booking cancelation patterns.
4. Modelling and Performance Evaluation
At the end of the SMOTE-ENN step, we trained a random forest classifier. Since all features had a diverse structure of importance or significance and weights per hotel (lodging), a particular model had to be developed for every hotel. As distinctive algorithms show distinctive outcomes, new models were created utilizing diverse classification methods; this was performed after selecting the ones that showed better execution indicators. As the name “IsCanceled” within the dataset could take two values (0: no; 1: yes), the adherents of two-class simple classification methods were chosen: logistic regression (LR), decision tree (DT), AdaBoost (AB), gradient boosting (GD) and random forest (RF).
All approaches were executed in Python 3.7 and the experiment was completed on a Windows 10 machine with a 16 GB RAM, 4 GB NVDIA GTX 1650Ti graphic card and a core i7 processor. In addition, SMOTE and SMOTE-ENN were executed by the imbalanced-learn bundle [
42,
43] and LR, DT, AB, GD, and RF in the Scikit-learn bundle [
44]. The imbalanced-learn bundle is a free-source from the Python library that comprises many techniques for managing the issue of class imbalance, while the Scikit-learn bundle is a free machine learning library for the Python language.
To show the viability of our approach, we examined the exhibition among the stand-alone standard machine learning methods, the standard ML method with SMOTE, and the standard ML method with SMOTE-ENN. We used a standard method to predict hotel cancelation directly from the data, i.e., in those methods, we did not apply any resample methods prior to sending the data to the classifiers. For the second group of methods, we applied the oversampling method (SMOTE) prior to sending the data to the same classifiers. For the third group of methods, we applied a hybrid of under-sampling and oversampling methods (SMOTE-ENN) prior to sending the same set of classifiers to access the performance of the classifier after the addition of class imbalance methods to adjust for class distribution. Additionally, this study utilized 10-fold cross-validation with a diverse arrangement of folds for each execution to achieve average performance. When using 10-fold cross validation, we utilized the GridSearchCV function in Scikit-learn that allowed us to choose the cross-validation scheme according to our needs; in this study, we used 10-fold cross-validation. Following this, we utilized GridSearchCV in the Scikit-learn bundle [
44] to tune the parameters of RF.
We used different classification metrices to assess the performance of the proposed strategy on test data. Accuracy, precision, recall, AUC-ROC curve, AUC score, F1 Score, and G-mean were included to access the performance of the test data [
45]. We also included a precision–recall (PR) curve, since some studies suggested that the ROC with an imbalanced dataset may well be tricky and lead to incorrected interpretations regarding the method’s performance [
46]. The reason behind this unusual behavior is because ROC and PR are diverse, since the latter targets the minority class, while ROC encompasses both classes. The precision–recall–auc (PR-AUC) score used to access the model’s performance using a single digit [
47]. We compared our results with the standard random forest and random forest with SMOTE, and concluded that the addition of SMOTE-ENN before the classifier increased random forest classifier performance while addressing the class imbalance problem in relation to hotel cancelation predictions. We selected different values for the random forest classifier, such as criterain: {‘Entropy’,’Gini’}, Max-features: {‘log2′,’Auto’}, Min-samples_leaf: {1, 2, 3, 4, 5}, Min_samples_split: {4, 5, 6, 7, 8}, and N-estimators: {100, 150, 200, 250, 300, 350, 400, 450}. All these values were passed as parameters inside the GridSearchCV function that was fitted 8000 times on the dataset to find optimal parameters for the random forest classifier. Optimal parameters for the classifier were achieved through a grid search.
Table 2 shows the list of parameters of the random forest classifier.
We assessed the performance of the classification model using the number of counts from the dataset that were correctly and incorrectly classified by the model. The counts are arranged in a square table recognized as a confusion matrix. There, “true positive” indicates that the classifier predicted values as true, and they were true in reality. Meanwhile, “false positive” indicates that the classifier predicted values were true, but they were false. “False negative” indicates that the classifier predicted values were negative, but they were true; “true negative” indicates that the classifier predicted values as negative, which they were.
AUC-ROC curve: Receiver operator characteristic (ROC) is a widely used performance metric in binary classification [
48]. It plots true positive rates against false positive rates at different thresholds and separate signals against noise. Area under curve (AUC) measures the separability of a classification model for binary classification, and it also uses the ROC curve as a summary. There are other metrics that are important for calculating the AUC-ROC curve.
True Negative Rate: This recognizes to what extent the negative class accurately classified as negative is in fact negative.
False Positive rate: This identifies what proportion of the negative class is incorrectly classified as positive with respect to all negative classes.
False Negative Rate: This distinguishes to what extent the positive class is inaccurately classified as a negative class by the classifier.
Figure 6 shows the ROC curve (
Figure 6A,C) and the precision–recall curve (
Figure 6B,D) for the H1 and H2 hotels. We can observe from the figures that, after addressing the imbalance problem, the performance of the classifier improves significantly. In addition to this, we found out that, even for the H2 hotel, which was not imbalanced by much, it was still able to perform better after applying the SMOTE-ENN method before feeding the data into the classifier. The H1 hotel was initially highly imbalanced; however, accuracy increased to a certain extent.
To assess the performance of different classifiers, we used the data from [
24] as a case study in this research. We also reported the results of SMOTE with classifiers to give a better picture when it comes to applying SMOTE-ENN.
The results of SMOTE-ENN were promising. For both hotels, the lowest accuracy was 86.3%, which was achieved in the HI hotel with logistic regression, while random forest achieved more than 95% accuracy in both the hotels. All methods registered better accuracy compared to the standard and standard + SMOTE classifiers, except for LR+SMOTE, which received slightly better accuracy compared to LR + SMOTE-ENN. If we take AUC as an assessment measure, this is even better in all standard + SMOTE-ENN methods, as they registered better results compared to standard and standard + SMOTE classifiers. In terms of performance, RF + SMOTE-ENN was the most accurate algorithm. In terms of precision and recall, LR + SMOTE-ENN beat all other algorithms, including the standard and standard +SMOTE classifiers. For F1 Score and PR-AUC, RF + SMOTE-ENN turned out to be the best among all algorithms. In the case of G-mean, which is a multiplication of sensitivity and specificity, the classifier performance values were between 0 and 1. A value closer to 1 showed a better classifier, and RF + SMOTE-ENN achieved the best values of 95% and 96.3% for hotels H1 and H2, respectively.
Another significant measure is the count of false positives rate. A false positive rate is important in the event of a hotel taking action against a booking classified as “going to be canceled”. In such cases, the model that generates the smallest number of false predictions is beneficial for a hotel, as such an establishment would need to spend fewer resources on bookings that are yet to be canceled. If such important criteria are taken into account, RF+SMOTE-ENN should be chosen for hotel cancelation predictions, as this algorithm presents the smallest number of false predictions among all algorithms.
For hotels to increase their revenue and make important decisions regarding their allocation of rooms, it is important that they accurately predict which customers might cancel their bookings in advance. Since hotel cancelation problems normally suffer from class imbalance issues, it is equally important to address this issue before applying any classifier for prediction, so that a model does not show bias toward the majority class [
33]. Our inclusion of SMOTE-ENN in case of the hotel cancelation problem could benefit the hospitality industry if preexisting datasets are suffering from problems related to class imbalances. Gustavo Batista et al. investigated numerous combinations of oversampling and under-sampling strategies compared to currently utilized strategies [
28]. Ultimately, the researchers noted that ENN was more effective at down sampling the majority class than the methods included in their study. They applied their strategy by expelling samples from both the majority and minority classes. Hence, any sample that was misclassified by its three closest neighbors was eliminated from the preparing set, which makes class distribution better for both classes and helps the classifier in its predictions compared to the SMOTE method itself.
Table 3 and
Table 4 show the results of different classifier performances, and we can observe from the table that standard+ SMOTE-ENN improved performance compared to the standard and standard +SMOTE classifiers. Among all the classifiers, random forest achieved the best results. From all the results, we can observe that SMOTE-ENN is able to enhance the prediction performance of classifiers by a significant amount.
Table 3 and
Table 4 present the true negative rate (TNR), false positive rate (FPR), and false negative rate (FPR) for both hotels. From the table, we can see that, for both hotels, RF+SMOTE-ENN achieved the highest TNR of almost 95% and 97%, which demonstrates that this classifier is able to accurately classify negative examples compared to other classifiers. Similarly, RF+SMOTE-ENN achieved the lowest false positive rates, 4.54 and 3.01, which shows that only 4.5% and 3% of the examples were misclassified as positive examples from all negative examples; this is an important measure regarding hotel booking cancelations. Furthermore, RF + SMOTE-ENN also achieved the lowest false negative rate, which demonstrates the extent to which positive examples were misclassified as negative examples. RF + SMOTE-ENN achieved 4.49 and 4.87 FNR for both hotels, which are the smallest values among all classifiers. We presented a statistical test for the classifiers included in this study, and we used a 5 × 5 cv combined F-test to establish the statistical significance of all classifiers; this approach is recommended for the testing of a classifier in one dataset [
10].
Table 5 and
Table 6 display the statistical significance of the different classifiers included in this study. We calculated the p-value of RF vs. every other classifier. All classifiers registered less value compared to a significance threshold of α = 0.05, which shows that both classifier performances are not similar.