Data Transformation in the Predict-Then-Optimize Framework: Enhancing Decision Making under Uncertainty
Abstract
:1. Introduction
2. Related Literature
3. Methods
3.1. Problem Setting
Algorithm 1 The predict-then-optimize framework |
1: Input: training dataset , prediction model , and optimization model |
2: Output: solution |
3: Predicted values |
4: Solution |
5: Return |
3.2. Prediction Models
3.2.1. Linear Regression Model with the Response Variable Transformation
3.2.2. Decision Tree Model with the Response Variable Transformation
3.2.3. Random Forest Model with the Response Variable Transformation
3.3. Optimization Model
4. Evaluation
4.1. Experiment Settings
4.2. Evaluation of Models
4.3. Discussion and Analysis
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Byrnes, J.P. The development of decision-making. J. Adolesc. Health 2002, 31, 208–215. [Google Scholar] [CrossRef] [PubMed]
- Simon, H.A.; Dantzig, G.B.; Hogarth, R.; Plott, C.R.; Raiffa, H.; Schelling, T.C.; Shepsle, K.A.; Thaler, R.; Tversky, A.; Winter, S. Decision making and problem solving. Interfaces 1987, 17, 11–31. [Google Scholar] [CrossRef]
- Wang, Y.; Peng, S.; Zhou, X.; Mahmoudi, M.; Zhen, L. Green logistics location-routing problem with eco-packages. Transp. Res. Part E Logist. Transp. Rev. 2020, 143, 102118. [Google Scholar] [CrossRef]
- Pečený, L.; Meško, P.; Kampf, R.; Gašparík, J. Optimisation in transport and logistic processes. Transp. Res. Procedia 2020, 44, 15–22. [Google Scholar] [CrossRef]
- Shanmuganathan, M. Behavioural finance in an era of artificial intelligence: Longitudinal case study of robo-advisors in investment decisions. J. Behav. Exp. Financ. 2020, 27, 100297. [Google Scholar] [CrossRef]
- Vo, N.; He, X.; Liu, S.; Xu, G. Deep learning for decision making and the optimization of socially responsible investments and portfolio. Decis. Support Syst. 2019, 124, 113097. [Google Scholar] [CrossRef]
- Liu, Z.; Wang, Y. Handling constrained multiobjective optimization problems with constraints in both the decision and objective spaces. IEEE Trans. Evol. Comput. 2019, 33, 870–884. [Google Scholar] [CrossRef]
- Shabani, A.; Asgarian, B.; Salido, M.; Gharebaghi, S.A. Search and rescue optimization algorithm: A new optimization method for solving constrained engineering optimization problems. Expert Syst. Appl. 2020, 161, 113698. [Google Scholar] [CrossRef]
- Bérubé, J.; Gendreau, M.; Potvin, J. An exact ϵ-constraint method for bi-objective combinatorial optimization problems: Application to the Traveling Salesman Problem with Profits. Eur. J. Oper. Res. 2009, 194, 39–50. [Google Scholar] [CrossRef]
- Xu, Y. Data-Driven Dynamic Decision Making: Algorithms, Structures, and Complexity Analysis. Ph.D. Thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 2023. [Google Scholar]
- Wang, S.; Yan, R.; Qu, X. Development of a non-parametric classifier: Effective identification, algorithm, and applications in port state control for maritime transportation. Transp. Res. Part B Methodol. 2019, 128, 129–157. [Google Scholar] [CrossRef]
- Yang, Z.; Yang, Z.; Yin, J. Realising advanced risk-based port state control inspection using data-driven Bayesian networks. Transp. Res. Part A Policy Pract. 2018, 110, 38–56. [Google Scholar] [CrossRef]
- Grömping, U. Variable importance in regression models. Wiley Interdiscip. Rev. Comput. Stat. 2015, 7, 137–152. [Google Scholar] [CrossRef]
- Fitzmaurice, G. Regression. Diagn. Histopathol. 2016, 22, 271–278. [Google Scholar] [CrossRef]
- Akita, R.; Yoshihara, A.; Matsubara, T.; Uehara, K. Deep learning for stock prediction using numerical and textual information. In Proceedings of the 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS), Okayama, Japan, 26–29 June 2016; pp. 1–6. [Google Scholar] [CrossRef]
- Zhu, W.; Xie, L.; Han, J.; Guo, X. The application of deep learning in cancer prognosis prediction. Cancers 2020, 12, 603. [Google Scholar] [CrossRef] [PubMed]
- Sun, Y.; Ding, S.; Zhang, Z.; Jia, W. An improved grid search algorithm to optimize SVR for prediction. Soft Comput. 2021, 25, 5633–5644. [Google Scholar] [CrossRef]
- Panahi, M.; Sadhasivam, N.; Pourghasemi, H.R.; Rezaie, F.; Lee, S. Spatial prediction of groundwater potential mapping based on convolutional neural network (CNN) and support vector regression (SVR). J. Hydrol. 2020, 588, 125033. [Google Scholar] [CrossRef]
- Peña, D. Detecting outliers and influential and sensitive observations in linear regression. In Springer Handbook of Engineering Statistics; Springer: Berlin/Heidelberg, Germany, 2023; pp. 605–619. [Google Scholar] [CrossRef]
- Tan, F.; Jiang, X.; Guo, X.; Zhu, L. Testing heteroscedasticity for regression models based on projections. Stat. Sin. 2021, 31, 625–646. [Google Scholar] [CrossRef]
- Motegi, K.; Iitsuka, Y. Inter-regional dependence of J-REIT stock prices: A heteroscedasticity-robust time series approach. N.Am. J. Econ. Financ. 2023, 64, 101840. [Google Scholar] [CrossRef]
- Zaki, J.; Nayyar, A.; Dalal, S.; Ali, Z.H. House price prediction using hedonic pricing model and machine learning techniques. Concurr. Comput. Pract. Exp. 2022, 34, e7342. [Google Scholar] [CrossRef]
- Meng, L.; McWilliams, B.; Jarosinski, W.; Park, H.; Jung, Y.; Lee, J.; Zhang, J. Machine learning in additive manufacturing: A review. JOM 2020, 72, 2363–2377. [Google Scholar] [CrossRef]
- Basso, R.; Kulcsár, B.; Sanchez-Diaz, I. Electric vehicle routing problem with machine learning for energy prediction. Transp. Res. Part B Methodol. 2021, 145, 24–55. [Google Scholar] [CrossRef]
- Poldrack, R.A.; Huckins, G.; Varoquaux, G. Establishment of best practices for evidence for prediction: A review. JAMA Psychiatry 2020, 5, 534–540. [Google Scholar] [CrossRef] [PubMed]
- Srivastava, A.; Kumar, S.A. Heart Disease Prediction using Machine Learning. In Proceedings of the 2022 2nd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE), Greater Noida, India, 28–29 April 2022; pp. 2633–2635. [Google Scholar] [CrossRef]
- Wang, D.; Li, L.; Zhao, D. Corporate finance risk prediction based on LightGBM. Inf. Sci. 2022, 602, 259–268. [Google Scholar] [CrossRef]
- Liu, W.; Chen, Z.; Hu, Y. XGBoost algorithm-based prediction of safety assessment for pipelines. Int. J. Press. Vessel. Pip. 2022, 197, 104655. [Google Scholar] [CrossRef]
- Lee, J.H.; Chon, K.S.; Park, C. Accommodating heterogeneity and heteroscedasticity in intercity travel mode choice model: Formulation and application to HoNam, South Korea, high-speed rail demand analysis. Transp. Res. Rec. 2004, 1898, 69–78. [Google Scholar] [CrossRef]
- Morgan, I.G. Stock prices and heteroscedasticity. J. Bus. 1976, 49, 496–508. [Google Scholar] [CrossRef]
- Di Bella, A.; Fortuna, L.; Graziani, S.; Napoli, G.; Xibilia, M.G. A comparative analysis of the influence of methods for outliers detection on the performance of data driven models. In Proceedings of the 2007 IEEE Instrumentation & Measurement Technology Conference IMTC 2007, Warsaw, Poland, 1–3 May 2007; pp. 1–5. [Google Scholar] [CrossRef]
- Kalisch, M.; Michalak, M.; Sikora, M.; Wróbel, Ł.; Przystałka, P. Influence of outliers introduction on predictive models quality. In Proceedings of the Advanced Technologies for Data Mining and Knowledge Discovery: 12th International Conference, BDAS 2016, Ustroń, Poland, 31 May–3 June 2016; pp. 79–93. [Google Scholar] [CrossRef]
- Montgomery, D.C.; Peck, E.A.; Vining, G.G. Introduction to Linear Regression Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2021. [Google Scholar]
- James, G.; Witten, D.; Hastie, T.; Tibshirani, R.; Taylor, J. Linear regression. In An Introduction to Statistical Learning: With Applications in Python; Springer: Berlin/Heidelberg, Germany, 2023; pp. 69–134. [Google Scholar] [CrossRef]
- Myles, A.J.; Feudale, R.N.; Liu, Y.; Woody, N.A.; Brown, S.D. An introduction to decision tree modeling. J. Chemom. 2004, 18, 275–285. [Google Scholar] [CrossRef]
- Kotsiantis, S.B. Decision trees: A recent overview. Artif. Intell. Rev. 2013, 39, 261–283. [Google Scholar] [CrossRef]
- Biau, G. Analysis of a random forests model. J. Mach. Learn. Res. 2012, 13, 1063–1095. [Google Scholar]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Yan, R.; Wang, S.; Fagerholt, K. A semi-“smart predict then optimize” (semi-SPO) method for efficient ship inspection. Transp. Transp. Res. Part B Methodol. 2020, 142, 100–125. [Google Scholar] [CrossRef]
- Yan, R.; Wang, S.; Peng, C. An artificial intelligence model considering data imbalance for ship selection in port state control based on detention probabilities. J. Comput. Sci. 2021, 48, 101257. [Google Scholar] [CrossRef]
Model | Optimal m | MSE | MAE | ||||||
---|---|---|---|---|---|---|---|---|---|
Before | After | Before | After | Before | After | Before | After | ||
Linear regression | 832,465.15 | 832,465.15 | 3982.00 | 3982.00 | 1991.00 | 1991.00 | −4.437 | −4.437 | |
Decision tree | 164,131.00 | 161,589.00 | 3890.00 | 4021.00 | 774.50 | 782.50 | −0.318 | −0.242 | |
Random forest | 88,552.13 | 86,156.87 | 4039.50 | 4044.00 | 566.43 | 561.75 | 0.246 | 0.264 |
Model | Optimal m | MSE | MAE | ||||||
---|---|---|---|---|---|---|---|---|---|
Before | After | Before | After | Before | After | Before | After | ||
Linear regression | 331,327.38 | 331,327.38 | 1532.25 | 1532.25 | 383.06 | 383.06 | −4.123 | −4.123 | |
Decision tree | 74,373.25 | 70,775.75 | 1449.75 | 1551.50 | 163.69 | 162.56 | −0.296 | −0.225 | |
Random forest | 36,072.06 | 34,296.20 | 1530.00 | 1564.00 | 108.77 | 107.63 | 0.229 | 0.245 |
Model | Optimal m | MSE | MAE | ||||||
---|---|---|---|---|---|---|---|---|---|
Before | After | Before | After | Before | After | Before | After | ||
Linear regression | 3070.09 | 3070.09 | 262.67 | 262.67 | 87.54 | 87.54 | −0.757 | −0.757 | |
Decision tree | 2885.41 | 2857.03 | 328.33 | 339.00 | 99.07 | 101.19 | −0.326 | −0.293 | |
Random forest | 1442.53 | 1423.77 | 396.00 | 397.33 | 67.26 | 66.51 | 0.298 | 0.303 |
Model | Optimal m | MSE | MAE | ||||||
---|---|---|---|---|---|---|---|---|---|
Before | After | Before | After | Before | After | Before | After | ||
Linear regression | 2331.20 | 2331.20 | 165.00 | 165.00 | 55.00 | 55.00 | −0.744 | −0.744 | |
Decision tree | 2428.06 | 2243.19 | 185.67 | 216.67 | 70.17 | 68.24 | −0.321 | −0.288 | |
Random forest | 1145.56 | 1143.81 | 255.33 | 255.67 | 42.86 | 43.21 | 0.294 | 0.298 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tian, X.; Guan, Y.; Wang, S. Data Transformation in the Predict-Then-Optimize Framework: Enhancing Decision Making under Uncertainty. Mathematics 2023, 11, 3782. https://doi.org/10.3390/math11173782
Tian X, Guan Y, Wang S. Data Transformation in the Predict-Then-Optimize Framework: Enhancing Decision Making under Uncertainty. Mathematics. 2023; 11(17):3782. https://doi.org/10.3390/math11173782
Chicago/Turabian StyleTian, Xuecheng, Yanxia Guan, and Shuaian Wang. 2023. "Data Transformation in the Predict-Then-Optimize Framework: Enhancing Decision Making under Uncertainty" Mathematics 11, no. 17: 3782. https://doi.org/10.3390/math11173782
APA StyleTian, X., Guan, Y., & Wang, S. (2023). Data Transformation in the Predict-Then-Optimize Framework: Enhancing Decision Making under Uncertainty. Mathematics, 11(17), 3782. https://doi.org/10.3390/math11173782