# Ticket Sales Prediction and Dynamic Pricing Strategies in Public Transport

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Related Work

- Demand forecasting. Liu et al. [11] proposed a probabilistic framework for modeling the travel preferences of airline customers and predicting the personalized demand of airline passengers, i.e., the destination and the airline that an individual customer will choose. Experimental results on two real-world datasets demonstrate the effectiveness of our approach on both travel topic discovery and customer travel prediction. Mumbower et al. [12] estimated the fluctuation in flight prices using a database of online prices and seat map displays. The results show how airlines can design optimal promotions by using factors such as booking date and departure time and date (day of the week). In particular, a linear regression method was used to predict the number of bookings for a specific flight. Abdelghany and Guzhva [13] used a time-series modeling approach for airport short-term demand forecasting. The model evaluates how different factors such as seasonality, fuel price, airline strategies, incidents and financial conditions, affect airport activity levels. In [14] Yeboah et al. developed an explanatory model of pre-travel information-seeking behaviors in a British urban environment, using binomial logistic regression. The considered factors include socio-demographics, trip context, frequency of public transport use, used information sources, and smartphone ownership and use. Abdella et al. [15] showed that existing models are generally based on a limited number of factors, such as historical ticket price data, ticket purchase date and departure date, that are not effective enough for ticket/demand prediction.
- Dynamic pricing. Many market leaders, including Groupon [16] and Walmart [17], extensively investigate and utilize dynamic pricing algorithms to obtain and maintain a competitive advantage over time. Kemper and Breuer [18] combined the mathematical principles of dynamic pricing with empirical marketing research methods on a dataset of auctions (from ebay.de), to determine demand functions of football tickets during the 2013-14 Bundesliga season. Through Monte Carlo simulations, the effects of a pricing approach in terms of revenues, number of purchased tickets, and average ticket price were evaluated, discovering that the stadium attendees’ willingness to pay could be significantly higher than the current ticket prices. Sato and Sawaki [19] presented a revenue management model of dynamic pricing for a competitive route. Specifically, the authors suppose that the passengers choose several transport vehicles and that each transport mode offers the multiple alternative schedules. Using a multinomial logit model to describe a customer’s discrete choice, they derive an optimal pricing policy thus as to maximize the expected total revenue for alternative schedules for high-speed rails. Lou et al. [20] proposed a self-learning approach to determine optimal pricing strategies for toll roads. The approach learns recursively motorists’ willingness to pay and then specifies toll rates to maximize the freeway’s throughput. Simulation experiments were performed to demonstrate and validate the proposed approach and to provide insights on when to convert high-occupancy lanes to toll lanes. Rana and Oliveira [21] examined the problem of establishing a pricing policy that maximizes the revenue for selling a given inventory by a fixed deadline. They propose a methodology to optimize revenue in which demand is learned and pricing decisions are updated in real time, showing that the reinforcement learning can be used to model the problem with inter-dependent demands.

## 3. Problem Definitions and Goal

## 4. Proposed Methodology

- (1)
- Data collection: given a bus ticketing platform, the user-generated event logs are collected.
- (2)
- Process mining: the event logs, after being processed for fixing wrong data and missing values, are analyzed by a process mining algorithm for discovering models and patterns of platform users.
- (3)
- Discovery purchase factors: the event logs are analyzed to discover the main factors that influence a user’s buying decisions. In particular, the goal of this step is to identify the correlation between the information present when booking a ticket and the sale of the ticket.
- (4)
- Prediction model: the main factors that influence a user’s buying decisions are used for training machine learning algorithms for predicting whether or not a user will buy a ticket.
- (5)
- Dynamic pricing: the discovered factors are used for defining dynamic pricing strategies, which have the purpose of increasing the number of purchased tickets and the total revenue of a bus company.

#### 4.1. Steps 1–2: Data Collection and Process Mining

#### 4.2. Step 3: Discovery of Purchase Factors

#### 4.3. Step 4: Prediction Model

#### 4.4. Step 5: Dynamic Pricing

## 5. A Case Study

#### 5.1. Data Description

#### 5.2. Steps 1–2: Data Collection and Process Mining

#### 5.3. Step 3: Discovery of Purchase Factors

#### 5.4. Step 4: Prediction Model

#### 5.5. Step 5: Dynamic Pricing

#### 5.5.1. Event Logs Generator

- Select a $route$ randomly from $RD$ (line 3) in accordance with a weighted random distribution.
- Select a $departure\_day$ randomly among all those present in $RD$ for the chosen $route$ (line 5).
- Select the number of $days\_before\_departure$ randomly among all those present in $DC$ (line 7).
- Calculate the $booking\_day$ starting from the $departure\_day$ and subtracting $days\_before\_departure$ days (line 9).
- Determine the $number\_of\_tickets$ the user will want to buy randomly among all those present in $TC$ (line 11). Most users buy only one ticket, in rare cases more than one.
- Define the $event$ by grouping all the information we just generated ($route$, $departure\_day$, $booking\_day$, $number\_of\_tickets$) (line 12).
- Add the $event$ to E (line 13).

Algorithm 1: Event logs generator. |

#### 5.5.2. Event Logs Processing

- Retrieve the information about $days\_before\_departure$, $booking\_days$ and $number\_of\_tickets$ from the current $event$ (lines 5–7). It also retrieves the $bus\_occupation$ information from $OM$ (line 8).
- Set a price for the $event$ according to the $pricing\_strategy$ (line 10).
- Establish the probability of purchase based on the $event$ information ($days\_before\_departure$, $booking\_day$, $bus\_occupation$ and $price$ of the ticket) and based on users’ behavior as described in Section 5.3 (line 12). Since for each user’s choice (e.g., $days\_before\_departure$) we can determine the probability of purchasing a ticket, we combine these probabilities to define a cumulative probability of the $event$.
- Generate a $random$ number between 0 and 1 (line 13).
- If the random number is less than the probability of purchase and if there are free seats on the considered bus, the tickets are considered sold (lines 14–19). Otherwise, we consider that the user has left the platform without buying any tickets. If the user has purchased tickets, the number of purchased events and tickets are increased (lines 15–16) and the total revenue are updated (line 17). Finally, the occupancy of the considered bus is updated based on the number of tickets sold (line 18).

Algorithm 2: Event logs processing. |

#### 5.5.3. Strategies Evaluation

- Standard: A basic strategy similar to the one used by the bus company. This strategy sets ticket prices according to the rules present in the original data.
- Fix-Low: A strategy that always establishes a low ticket price. We are able to do this because for each route we know the price ranges and we can divide them into three categories (low, medium and high). With this strategy, given a route, we select one of the prices that fall into the low category.
- Fix-Med: A strategy that always establishes a medium ticket price.
- Fix-Hig: A strategy that always establishes a high ticket price.
- Dyn-Day: A dynamic strategy that sets the price based on how many days before departure ($DBD$) the user try to book a ticket. Specifically, the formula used is the following:$$price\left(DBD\right)=\left(\right)open="\{"\; close>\begin{array}{cc}HIGH,\hfill & \mathrm{if}\phantom{\rule{4.pt}{0ex}}DBD\le 10\hfill \\ MEDIUM,\hfill & \mathrm{if}\phantom{\rule{4.pt}{0ex}}10DBD\le 30\hfill \\ LOW,\hfill & otherwise\hfill \end{array}$$
- Dyn-Occ: A dynamic strategy that sets the price based on the bus occupancy rate ($OCCR$) the user wants to book. Specifically, the formula used is the following:$$price\left(OCCR\right)=\left(\right)open="\{"\; close>\begin{array}{cc}HIGH,\hfill & \mathrm{if}\phantom{\rule{4.pt}{0ex}}OCCR30\hfill \\ MEDIUM,\hfill & \mathrm{if}\phantom{\rule{4.pt}{0ex}}10OCCR\le 30\hfill \\ LOW,\hfill & otherwise\hfill \end{array}$$
- Dyn-Mix: A strategy that sets the price based on the days before departure ($DBD$) and on the bus occupancy rate ($OCCR$). Specifically, the formula used is the following:$$price(DBD,OCCR)=\left(\right)open="\{"\; close>\begin{array}{cc}HIGH,\hfill & \mathrm{if}\phantom{\rule{4.pt}{0ex}}DBD\le 10\phantom{\rule{0.277778em}{0ex}}or\phantom{\rule{0.277778em}{0ex}}OCCR30\hfill \\ MEDIUM,\hfill & \mathrm{if}\phantom{\rule{4.pt}{0ex}}10DBD\le 30\phantom{\rule{0.277778em}{0ex}}or\phantom{\rule{0.277778em}{0ex}}10OCCR\le 30\hfill \\ LOW,\hfill & otherwise\hfill \end{array}$$

## 6. Conclusions

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## References

- Grimaldi, R.; Augustin, K.; Beria, P. Intercity coach liberalisation. The cases of Germany and Italy. In World Conference on Transport Research-WCTR 2016; Elsevier: Amsterdam, The Netherlands, 2017; pp. 474–490. [Google Scholar]
- Gremm, C. Impacts of the German interurban bus market deregulation on regional railway services. In Proceedings of the International Conference Series on Competition and Ownership in Land Passenger Transport, Stockholm, Sweden, 1 January 2017. [Google Scholar]
- Belcastro, L.; Marozzo, F.; Talia, D. Programming Models and Systems for Big Data Analysis. Int. J. Parallel Emergent Distrib. Syst.
**2019**, 34, 632–652. [Google Scholar] [CrossRef] - Talia, D.; Trunfio, P.; Marozzo, F. Data Analysis in the Cloud: Models, Techniques and Applications; Elsevier: Amsterdam, The Netherlands, 2015; pp. 1–138. [Google Scholar] [CrossRef]
- Branda, F.; Marozzo, F.; Talia, D. Discovering Travelers’ Purchasing Behavior from Public Transport Data. In Proceedings of the 6th International Conference on machine Learning, Optimization and Data science-LOD 2020, Siena, Italy, 19–23 July 2020; pp. 702–713. [Google Scholar]
- Saharan, S.; Bawa, S.; Kumar, N. Dynamic pricing techniques for Intelligent Transportation System in smart cities: A systematic review. Comput. Commun.
**2020**, 150, 603–625. [Google Scholar] [CrossRef] - Bayoumi, A.E.M.; Saleh, M.; Atiya, A.F.; Aziz, H.A. Dynamic pricing for hotel revenue management using price multipliers. J. Revenue Pricing Manag.
**2013**, 12, 271–285. [Google Scholar] [CrossRef] - Abrate, G.; Fraquelli, G.; Viglia, G. Dynamic pricing strategies: Evidence from European hotels. Int. J. Hosp. Manag.
**2012**, 31, 160–168. [Google Scholar] [CrossRef] [Green Version] - Hall, J.M.; Kopalle, P.K.; Krishna, A. Retailer dynamic pricing and ordering decisions: Category management versus brand-by-brand approaches. J. Retail.
**2010**, 86, 172–183. [Google Scholar] [CrossRef] - Dutta, G.; Mitra, K. A literature review on dynamic pricing of electricity. J. Oper. Res. Soc.
**2017**, 68, 1131–1145. [Google Scholar] [CrossRef] [Green Version] - Liu, J.; Liu, B.; Liu, Y.; Chen, H.; Feng, L.; Xiong, H.; Huang, Y. Personalized air travel prediction: A multi-factor perspective. ACM Trans. Intell. Syst. Technol. (TIST)
**2017**, 9, 1–26. [Google Scholar] [CrossRef] - Mumbower, S.; Garrow, L.A.; Higgins, M.J. Estimating flight-level price elasticities using online airline data. Transp. Res. Part A Policy Pract.
**2014**, 66, 196–212. [Google Scholar] [CrossRef] - Abdelghany, A.; Guzhva, V. A time-series modelling approach for airport short-term demand forecasting. J. Airpt. Manag.
**2010**, 5, 72–87. [Google Scholar] - Yeboah, G.; Cottrill, C.D.; Nelson, J.D.; Corsar, D.; Markovic, M.; Edwards, P. Understanding factors influencing public transport passengers’ pre-travel information-seeking behaviour. Public Transp.
**2019**, 11, 135–158. [Google Scholar] [CrossRef] [Green Version] - Abdella, J.A.; Zaki, N.; Shuaib, K.; Khan, F. Airline ticket price and demand prediction: A survey. J. King Saud Univ. Comput. Inf. Sci.
**2019**. [Google Scholar] [CrossRef] - Cheung, W.C.; Simchi-Levi, D.; Wang, H. Dynamic pricing and demand learning with limited price experimentation. Oper. Res.
**2017**, 65, 1722–1731. [Google Scholar] [CrossRef] - Ganti, R.; Sustik, M.; Tran, Q.; Seaman, B. Thompson sampling for dynamic pricing. arXiv
**2018**, arXiv:1802.03050. [Google Scholar] - Kemper, C.; Breuer, C. How efficient is dynamic pricing for sport events? Designing a dynamic pricing model for Bayern Munich. Int. J. Sport Financ.
**2016**, 11, 4–15. [Google Scholar] - Sato, K.; Sawaki, K. Dynamic pricing of high-speed rail with transport competition. J. Revenue Pricing Manag.
**2012**, 11, 548–559. [Google Scholar] [CrossRef] - Lou, Y.; Yin, Y.; Laval, J.A. Optimal dynamic pricing strategies for high-occupancy/toll lanes. Transp. Res. Part C Emerg. Technol.
**2011**, 19, 64–74. [Google Scholar] [CrossRef] - Rana, R.; Oliveira, F.S. Real-time dynamic pricing in a non-stationary environment using model-free reinforcement learning. Omega
**2014**, 47, 116–126. [Google Scholar] [CrossRef] - Diamantini, C.; Genga, L.; Marozzo, F.; Potena, D.; Trunfio, P. Discovering Mobility Patterns of Instagram Users through Process Mining Techniques. In Proceedings of the IEEE International Conference on Information Reuse and Integration, San Diego, CA, USA, 4–6 August 2017; pp. 485–492. [Google Scholar]
- Pearson, K. Determination of the coefficient of correlation. Science
**1909**, 30, 23–25. [Google Scholar] [CrossRef] - O’brien, R.M. A caution regarding rules of thumb for variance inflation factors. Qual. Quant.
**2007**, 41, 673–690. [Google Scholar] [CrossRef] - Kotsiantis, S.; Kanellopoulos, D.; Pintelas, P. Handling imbalanced datasets: A review. GESTS Int. Trans. Comput. Sci. Eng.
**2006**, 30, 25–36. [Google Scholar] - Belcastro, L.; Marozzo, F.; Talia, D.; Trunfio, P. Using Scalable Data Mining for Predicting Flight Delays. ACM Trans. Intell. Syst. Technol.
**2016**, 8, 5:1–5:20. [Google Scholar] [CrossRef] - Maron, M.E. Automatic indexing: An experimental inquiry. J. ACM
**1961**, 8, 404–417. [Google Scholar] [CrossRef] - Walker, S.H.; Duncan, D.B. Estimation of the probability of an event as a function of several independent variables. Biometrika
**1967**, 54, 167–179. [Google Scholar] [CrossRef] [PubMed] - Safavian, S.R.; Landgrebe, D. A survey of decision tree classifier methodology. IEEE Trans. Syst. Man Cybern.
**1991**, 21, 660–674. [Google Scholar] [CrossRef] [Green Version] - Breiman, L. Random forests. Mach. Learn.
**2001**, 45, 5–32. [Google Scholar] [CrossRef] [Green Version] - Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13 August 2016; pp. 785–794. [Google Scholar]

**Figure 3.**No. of purchased tickets considering (

**A**) departure month, (

**B**) departure day of the week, and (

**C**) route attributes.

**Figure 6.**No. and percentage of purchased tickets considering the days before departure (DBD) and the booking day of the week (BDOW).

**Figure 7.**No. and percentage of purchased tickets considering the occupancy rate for a bus (OCCR), the number of attempts (NA), and the fare of a ticket (HMLF).

**Figure 8.**Accuracy obtained (

**A**) and number of predicted tickets (

**B**) by Naïve Bayes (NB), Logistic Regression (LR), Decision Tree (DT), Random Forest (RF) and eXtreme Gradient Boosting (XGBoost).

**Figure 9.**Comparative analysis among pricing strategies, evaluating the number of purchased tickets and the relative revenue.

Algorithms | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|

Naïve Bayes | 0.61 | 0.64 | 0.61 | 0.59 |

Logistic Regression | 0.61 | 0.61 | 0.61 | 0.61 |

Decision Tree | 0.86 | 0.86 | 0.86 | 0.86 |

Random Forest | 0.93 | 0.93 | 0.93 | 0.93 |

XGBoost | 0.95 | 0.95 | 0.95 | 0.95 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Branda, F.; Marozzo, F.; Talia, D.
Ticket Sales Prediction and Dynamic Pricing Strategies in Public Transport. *Big Data Cogn. Comput.* **2020**, *4*, 36.
https://doi.org/10.3390/bdcc4040036

**AMA Style**

Branda F, Marozzo F, Talia D.
Ticket Sales Prediction and Dynamic Pricing Strategies in Public Transport. *Big Data and Cognitive Computing*. 2020; 4(4):36.
https://doi.org/10.3390/bdcc4040036

**Chicago/Turabian Style**

Branda, Francesco, Fabrizio Marozzo, and Domenico Talia.
2020. "Ticket Sales Prediction and Dynamic Pricing Strategies in Public Transport" *Big Data and Cognitive Computing* 4, no. 4: 36.
https://doi.org/10.3390/bdcc4040036