Study of Delay Prediction in the US Airport Network
Abstract
:1. Introduction
2. Literature Review
2.1. Data Used
2.2. Features Chosen
2.3. Used Machine Learning Techniques
2.4. Evaluation Methods Used
3. Methodology
3.1. Datasets Used
3.2. Data Preprocessing
3.3. Hyperparameter Tuning
3.4. Model Evaluation
4. Results
4.1. Data Processing
4.2. Logistic Regression Results
4.3. Random Forest
4.4. Gradient Boosting Machine
4.5. Feed-Forward Neural Network
4.6. Model Comparison
5. Discussion
6. Conclusions
- For arrival flight delay prediction in the United States airport network, the usage of publicly available flight and weather allows for the design of usable machine learning models.
- Between the four machine learning models chosen in this study: logistic regression, random forest, gradient boosting machine, and feed-forward neural network, the gradient boosting machine has the best performance. The gradient boosting machine beats the other three models by far across all model evaluation metrics.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
US | United States |
ML | Machine Learning |
AI | Artificial Intelligence |
LR | Logistic Regression |
RF | Random Forest |
GBM | Gradient Boosting Machine |
NN | Feed-Forward Neural Network |
References
- Peterson, E.B.; Neels, K.; Barczi, N.; Graham, T. The economic cost of airline flight delay. J. Transp. Econ. Policy 2013, 47, 107–121. [Google Scholar]
- Dissanayaka, D.; Adikariwattage, V.; Pasindu, H. Evaluation of Emissions from Delayed Departure Flights at Bandaranaike International Airport (BIA). In Proceedings of the 11th Asia Pacific Transportation and the Environment Conference (APTE 2018), Malang, Indonesia, 18–19 October 2018; Atlantis Press: Paris, France, 2019; pp. 143–146. [Google Scholar] [CrossRef] [Green Version]
- Bombelli, A.; Sallan, J.M. Analysis of the effect of extreme weather on the US domestic air network. A delay and cancellation propagation network approach. J. Transp. Geogr. 2023, 107, 103541. [Google Scholar] [CrossRef]
- Wang, Y.; Li, M.Z.; Gopalakrishnan, K.; Liu, T. Timescales of delay propagation in airport networks. Transp. Res. Part E Logist. Transp. Rev. 2022, 161, 102687. [Google Scholar] [CrossRef]
- Pastorino, L.; Zanin, M. Local and Network-Wide Time Scales of Delay Propagation in Air Transport: A Granger Causality Approach. Aerospace 2023, 10, 36. [Google Scholar] [CrossRef]
- Zanin, M. Can we neglect the multi-layer structure of functional networks? Phys. A Stat. Mech. Its Appl. 2015, 430, 184–192. [Google Scholar] [CrossRef] [Green Version]
- Choi, S.; Kim, Y.J.; Briceno, S.; Mavris, D. Prediction of weather-induced airline delays based on machine learning algorithms. In Proceedings of the 2016 IEEE/AIAA 35th Digital Avionics Systems Conference (DASC), Sacramento, CA, USA, 25–29 September 2016; pp. 1–6. [Google Scholar] [CrossRef]
- Rebollo, J.J.; Balakrishnan, H. Characterization and prediction of air traffic delays. Transp. Res. Part C Emerg. Technol. 2014, 44, 231–241. [Google Scholar] [CrossRef]
- Shao, W.; Prabowo, A.; Zhao, S.; Tan, S.; Koniusz, P.; Chan, J.; Hei, X.; Feest, B.; Salim, F.D. Flight Delay Prediction using Airport Situational Awareness Map. In Proceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Chicago, IL, USA, 5–8 November 2019; ACM: New York, NY, USA, 2019; pp. 432–435. [Google Scholar] [CrossRef] [Green Version]
- Wang, Z.; Liao, C.; Hang, X.; Li, L.; Delahaye, D.; Hansen, M. Distribution Prediction of Strategic Flight Delays via Machine Learning Methods. Sustainability 2022, 14, 15180. [Google Scholar] [CrossRef]
- Bureau of Transport Statistics, United States Department of Transportation. Airline On-Time Statistics. Available online: https://www.transtats.bts.gov/ONTIME/ (accessed on 1 January 2023).
- Wang, X.; Wang, Z.; Wan, L.; Tian, Y. Prediction of Flight Delays at Beijing Capital International Airport Based on Ensemble Methods. Appl. Sci. 2022, 12, 10621. [Google Scholar] [CrossRef]
- Cai, K.; Li, Y.; Fang, Y.P.; Zhu, Y. A Deep Learning Approach for Flight Delay Prediction through Time-Evolving Graphs. IEEE Trans. Intell. Transp. Syst. 2022, 23, 11397–11407. [Google Scholar] [CrossRef]
- Cai, K.; Li, Y.; Zhu, Y.; Fang, Q.; Yang, Y.; Du, W. A geographical and operational deep graph convolutional approach for flight delay prediction. Chin. J. Aeronaut. 2023, 36, 357–367. [Google Scholar] [CrossRef]
- Li, Q.; Jing, R. Flight delay prediction from spatial and temporal perspective. Expert Syst. Appl. 2022, 205, 117662. [Google Scholar] [CrossRef]
- Khan, R.; Akbar, S.; Zahed, T.A. Flight Delay Prediction Based on Gradient Boosting Ensemble Techniques. In Proceedings of the 2022 16th International Conference on Open Source Systems and Technologies (ICOSST), Lahore, Pakistan, 14–15 December 2022; pp. 1–5. [Google Scholar] [CrossRef]
- Hu, P.; Zhang, J.; Li, N. Research on Flight Delay Prediction Based on Random Forest. In Proceedings of the 2021 IEEE 3rd International Conference on Civil Aviation Safety and Information Technology (ICCASIT), Changsha, China, 20–22 October 2021; pp. 506–509. [Google Scholar] [CrossRef]
- Lambelho, M.; Mitici, M.; Pickup, S.; Marsden, A. Assessing strategic flight schedules at an airport using machine learning-based flight delay and cancellation predictions. J. Air Transp. Manag. 2020, 82, 101737. [Google Scholar] [CrossRef]
- Yu, B.; Guo, Z.; Asian, S.; Wang, H.; Chen, G. Flight delay prediction for commercial air transport: A deep learning approach. Transp. Res. Part Logist. Transp. Rev. 2019, 125, 203–221. [Google Scholar] [CrossRef]
- Kim, Y.J.; Choi, S.; Briceno, S.; Mavris, D. A deep learning approach to flight delay prediction. In Proceedings of the 2016 IEEE/AIAA 35th Digital Avionics Systems Conference (DASC), Sacramento, CA, USA, 25–29 September 2016; pp. 1–6. [Google Scholar] [CrossRef]
- National Oceanic and Atmospheric Administration: Global Hourly–Integrated Surface Database (ISD). Available online: https://www.ncei.noaa.gov/products/land-based-station/integrated-surface-database (accessed on 1 January 2023).
- Weather Underground: Los Angeles International Airport. Available online: https://www.wunderground.com/history/daily/us/ca/los-angeles/KLAX/date (accessed on 1 January 2023).
- Federal Aviation Administration, United States Department of Transportation. System Wide Information Management (SWIM). Available online: https://www.faa.gov/air_traffic/technology/swim/ (accessed on 1 January 2023).
- Gui, G.; Liu, F.; Sun, J.; Yang, J.; Zhou, Z.; Zhao, D. Flight Delay Prediction Based on Aviation Big Data and Machine Learning. IEEE Trans. Veh. Technol. 2020, 69, 140–150. [Google Scholar] [CrossRef]
- Liu, F.; Sun, J.; Liu, M.; Yang, J.; Gui, G. Generalized Flight Delay Prediction Method Using Gradient Boosting Decision Tree. In Proceedings of the 2020 IEEE 91st Vehicular Technology Conference (VTC2020-Spring), Antwerp, Belgium, 25–28 May 2020; pp. 1–5. [Google Scholar] [CrossRef]
- Murphy, K.P. Machine Learning: A Probabilistic Perspective; MIT Press: Cambridge, MA, USA, 2012. [Google Scholar]
Attribute Name | Type | Example |
---|---|---|
Quarter | Integer | 1, 2, 3, 4 |
Month | Integer | 1, 2, 3, … 12 |
Day of month | Integer | 1, 2, 3, … 31 |
Day of week | Integer | 1, 2, 3, … 7 |
Hour of day | Integer | 1, 2, 3, … 23 |
Minute of the hour | Integer | 0, 1, 2, 3, … 59 |
Attribute | Type | Example |
---|---|---|
Planned departure time | Date-time | 1 January 2017 01:00:00 |
Planned departure local hour. | Integer | 0, 1, 2, 3, … 23 |
Planned arrival time | Date-time | 2 January 2017 02:00:00 |
Planned arrival local hour | Integer | 0, 1, 2, 3, … 23 |
Actual departure time | Date-time | 3 January 2017 03:00:00 |
Actual departure local hour | Integer | 0, 1, 2, 3, … 23 |
Actual arrival time | Date-time | 4 January 2017 04:00:00 |
Actual arrival local hour | Integer | 0, 1, 2, 3, … 23 |
Wheels on time | Date-time | 5 January 2017 05:00:00 |
Wheels on local hour | Integer | 0, 1, 2, 3, … 23 |
Wheels off time | Date-time | 6 January 2017 06:00:00 |
Wheels off local hour | Integer | 0, 1, 2, 3, … 23 |
Attribute | Type | Example |
---|---|---|
Carrier | String | AA, AS, B6, DL, EV |
Tail number | Categorical | N001AA, N104AA, N10575 |
Flight number | Integer | 2330, 1590, 1320, 2202 |
Origin airport | Categorical | ABE, STX, LAX, ORD |
Destination airport | Categorical | STS, SMF, SUN, RSW |
Flight distance | Integer | 650, 482, 518, 510 |
Seating capacity | Integer | 140, 196, 186, 176 |
Attribute | Type | Example |
---|---|---|
Wind Speed | Double | 3.58, 4.56, 6.71 |
Wind direction | Double | 118.31, 91.77, 209.13 |
Air temperature | Double | 8.23, 11.25, 15.33, 10.87 |
Atmospheric pressure | Double | 1019.21, 1020.42, 1011.99 |
Visibility | Double | 5042.91, 871.25, 10,840.76 |
Dew point | Double | 7.54, 11.15, 13.73, 7.20 |
Precipitation | Double | 2.82, 3.36, 4.81, 2.89 |
Cloud cover | Integer | 100, 85, 66, 74, 12 |
Wind gust | Integer | 3, 12, 24, 30 |
Total snow | Double | 0.0, 22.6, 7.4, 5.4 |
Symbol | Airport | State |
---|---|---|
ATL | Hartsfield-Jackson Atlanta | Georgia |
DEN | Denver International Airport | Colorado |
DFW | Dallas/Fort Worth | Texas |
LAS | Harry Reid International | Nevada |
LAX | Los Angeles International | California |
MSP | Minneapolis−Saint Paul | Minnesota |
ORD | Chicago O’Hare International | Illinois |
PHX | Phoenix Sky Harbor | Arizona |
SEA | Seattle-Tacoma International | Washington |
SFO | San Francisco International | California |
Attribute | Type | Example |
---|---|---|
Quarter | Integer | 1, 2, 3, 4 |
Month | Integer | 1, 2, 3, … 12 |
Day of month | Integer | 1, 2, 3, … 31 |
Day of week | Integer | 1, 2, 3, … 7 |
Planned departure local hour. | Integer | 0, 1, 2, 3, … 23 |
Planned arrival local hour | Integer | 0, 1, 2, 3, … 23 |
Flight distance | Integer | 650, 482, 518, 510 |
Seating capacity | Integer | 140, 196, 186, 176 |
Origin airport | String | ATL, DEN, LAX, ORD |
Destination airport | String | SFO, SEA, PHX, MSP |
Flight time | Integer | 60, 120, 135, 85 |
Flight speed | Double | 3.93, 5092.3 |
Carrier | String | AA, AS, B6, DL, EV |
Attribute | Type | Example |
---|---|---|
Wind Speed | Double | 3.58, 4.56, 6.71 |
Wind direction | Double | 118.31, 91.77, 209.13 |
Air temperature | Double | 8.23, 11.25, 15.33, 10.87 |
Atmospheric pressure | Double | 1019.21, 1020.42, 1011.99 |
Visibility | Double | 5042.91, 871.25, 10,840.76 |
Dew point | Double | 7.54, 11.15, 13.73, 7.20 |
Precipitation | Double | 2.82, 3.36, 4.81, 2.89 |
Cloud cover | Integer | 100, 85, 66, 74, 12 |
Wind gust | Integer | 3, 12, 24, 30 |
Total snow | Double | 0.0, 22.6, 7.4, 5.4 |
Algorithm | Parameter | Symbol | Values |
---|---|---|---|
Logistic Regression | Elastic Net Regularization | 0, 0.25, 0.5, 0.75, 1 | |
Regularization | 0, 0.25, 0.5, 0.75, 1 | ||
Random Forest | Maximum depth | max_depth | 1, 3, 5, 7, 10 |
Number of trees | num_trees | 1, 3, 5, 7, 10, 25, 50 | |
Gradient Boosting machine | Maximum depth | max_depth | 1, 3, 5, 7, 9, 11, 13, 15 … 29 |
Row sampling rate | sample_rate | 0.20, 0.21, 0.22, 0.23, … 1.00 | |
Column sampling rate | col_sample_rate | 0.20, 0.21, 0.22, 0.23, … 1.00 | |
Column sample rate per tree | col_sample_rate_change_per_level | 0.90, 0.91 0.92, 0.93, … 1.10 | |
Minimum observations per leaf | min_rows | 1, 2, 4, 8, 16, 32, 64, … 2048 | |
Bins for continuous features | nbins | 16, 32, 64, 128, 256, … 1024 | |
Bins for categorical features | nbins | 16, 32, 64, 128 … 4096 | |
Minimum error improvement | min_split_improvement | 0, , , | |
Feed-forward Neural Network | Hidden layers and nodes | hidden | 32→32→32 |
64→64 | |||
100→100→100 | |||
Input dropout ratio | input_dropout_ratio | 0, 0.05 | |
Learning rate | rate | 0.01, 0.02 | |
Learning rate annealing | rate_annealing | , , |
Predicted | |||
---|---|---|---|
Delay | No Delay | ||
Actual | Delay | True Positive (TP) | False Negative (FN) |
No delay | False Positive (FP) | True Negative (TN) |
Parameter | Symbol | Value |
---|---|---|
Elastic net regularization | 0 | |
Regularization | 0 |
Predicted | |||
---|---|---|---|
Delay | No Delay | ||
Actual | Delay | 7137 | 4143 |
No delay | 15,040 | 28,025 |
Metric | Logistic Regression | Random Forest | Gradient Boosting Machine | Feed-Forward Neural Network |
---|---|---|---|---|
Accuracy | 0.65 | 0.70 | 0.75 | 0.47 |
Precision | 0.32 | 0.37 | 0.45 | 0.26 |
Recall | 0.63 | 0.66 | 0.88 | 0.87 |
F1-Score | 0.42 | 0.47 | 0.60 | 0.40 |
Specificity | 0.65 | 0.71 | 0.72 | 0.37 |
PR AUC | 0.38 | 0.48 | 0.68 | 0.43 |
ROC AUC | 0.69 | 0.75 | 0.89 | 0.73 |
Parameter | Symbol | Value |
---|---|---|
Maximum depth | max_depth | 10 |
Number of trees | num_trees | 50 |
Predicted | |||
---|---|---|---|
Delay | No Delay | ||
Actual | Delay | 7428 | 3852 |
No delay | 12,626 | 30,439 |
Parameter | Symbol | Value |
---|---|---|
Maximum depth | max_depth | 17 |
Row sampling rate | sample_rate | 0.91 |
Column sampling rate | col_sample_rate | 0.33 |
Column sample rate per tree | col_sample_rate_change_per_level | 0.95 |
Minimum observations per leaf | min_rows | 16 |
Bins for continuous features | nbins | 512 |
Bins for categorical features | nbins | 64 |
Minimum error improvement | min_split_improvement | 0 |
Predicted | |||
---|---|---|---|
Delay | No Delay | ||
Actual | Delay | 9901 | 1350 |
No delay | 12,152 | 31,019 |
Parameter | Symbol | Value |
---|---|---|
Hidden layers and nodes | hidden | 64→64 |
Input dropout ratio | input_dropout_ratio | 0.05 |
Learning rate | rate | 0.02 |
Learning rate annealing | rate_annealing |
Predicted | |||
---|---|---|---|
Delay | No Delay | ||
Actual | Delay | 9842 | 1409 |
No delay | 27,407 | 15,764 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kiliç, K.; Sallan, J.M. Study of Delay Prediction in the US Airport Network. Aerospace 2023, 10, 342. https://doi.org/10.3390/aerospace10040342
Kiliç K, Sallan JM. Study of Delay Prediction in the US Airport Network. Aerospace. 2023; 10(4):342. https://doi.org/10.3390/aerospace10040342
Chicago/Turabian StyleKiliç, Kerim, and Jose M. Sallan. 2023. "Study of Delay Prediction in the US Airport Network" Aerospace 10, no. 4: 342. https://doi.org/10.3390/aerospace10040342
APA StyleKiliç, K., & Sallan, J. M. (2023). Study of Delay Prediction in the US Airport Network. Aerospace, 10(4), 342. https://doi.org/10.3390/aerospace10040342