Statistical Predictive Hybrid Choice Modeling: Exploring Embedded Neural Architecture
Abstract
:1. Introduction
2. Related Studies
2.1. DCM with Deep Neural Network
2.2. Embedding Representation
2.3. Attention Mechanism
3. Model Architecture
3.1. MNL as an ANN
3.2. Role of Temporal Attention in Utility Function
3.3. The Proposed Architecture
3.3.1. Extended Utility with Temporal Attention
3.3.2. Incorporating Temporal Attention into Utility Function
3.3.3. Fine-Tuning Model Constraints for Improved Predictions and Interpretation
- Unique Embedding Dimension Constraints
- Sparse Embedding Constraints
- Temporal Attention Weight Constraint
- Regularization of Attention Mechanism
- Consistency Constraints
- Dynamic Embedding Constraints
- Cross-Validation Stability Constraint
- Interpretability Constraints
4. Application and Simulation Study
4.1. Specification of Data Features and Model Design
4.2. Model Tuning
4.3. Simulation Study
- We simulate n = 300 observation, and we generate explanatory variables (continuous and categorical) that act as our input data.
- These variables are generated in Python (e.g., np.random.normal() and np. random.randint () for continuous and categorical variables, respectively).
- For “attention weights”, we use the “Heuristics-based approach”, which is also called uniform weights for each continuous variable X.
- The embedded layers are obtained from PyTorch’s nn. Embedding.
- We choose 2 sets of coefficients ( and ), one derived from step 3, while another is the result of step 4. Next, we compute the utility functions (as defined in Equation (6)).
- Subsequently, we apply the attention mechanism using attention parameters (0.3, 0.6, 0.9) and attention-specific parameters (0.1, 0.2, 0.3).
- The simulation process is repeated multiple times to obtain a sufficient number of simulated datasets.
5. Results and Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Van Cranenburgh, S. Blending Computer Vision into Discrete Choice Models. Preprint. 2020. Available online: https://transp-or.epfl.ch/heart/2020/abstracts/HEART_2020_paper_109.pdf (accessed on 5 May 2024).
- Ben-Akiva, M.; Lerman, S. Discrete Choice Analysis: Theory and Application to Travel Demand; MIT Press Series in Transportation Studies; MIT Press: Cambridge, MA, USA, 1985; ISBN 9780262022170. [Google Scholar]
- Hagenauer, J.; Helbich, M. A comparative study of machine learning classifiers for modeling travel mode choice. Expert Syst. Appl. 2017, 78, 273–282. [Google Scholar] [CrossRef]
- Acuna-Agost, R.; Delahaye, T.; Lheritier, A.; Bocamazo, M. Airline itinerary choice modelling using machine learning. In Proceedings of the International Choice Modelling Conference, Cape Town, South Africa, 3–5 April 2017. [Google Scholar]
- Guo, C.; Berkhahn, F. Entity embeddings of categorical variables. arXiv 2016, arXiv:1604.06737. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Otsuka, M.; Osogami, T. A deep choice model. In Proceedings of the AAAI, Phoenix, AZ, USA, 12–17 February 2016; pp. 850–856. [Google Scholar]
- Brathwaite, T.; Vij, A.; Walker, J.L. Machine learning meets microeconomics: The case of decision trees and discrete choice. arXiv 2017, arXiv:1711.04826. [Google Scholar]
- Sajjad, I.; Nafisah, I.A.; Almazah, M.M.A.; Alamri, O.A.; Dar, J.G. A Symmetrical Analysis of Decision Making: Introducing the Gaussian Negative Binomial Mixture with a Latent Class Choice Model. Symmetry 2024, 16, 908. [Google Scholar] [CrossRef]
- Nickel, M.; Tresp, V.; Kriegel, H.P. A three-way model for collective learning on multi-relational data. In Proceedings of the International Conference on Machine Learning, Washington, DC, USA, 28 June–2 July 2011. [Google Scholar]
- Wang, Z.; Li, H.; Rajagopal, R. Urban2Vec: Incorporating Street View Imagery and POIs for Multi-Modal Urban Neighborhood Embedding. arXiv 2020, arXiv:2001.11101. [Google Scholar] [CrossRef]
- Sifringer, B.; Lurkin, V.; Alahi, A. Enhancing discrete choice models with representation learning. Transp. Res. Part B Methodol. 2020, 140, 236–261. [Google Scholar] [CrossRef]
- Perozzi, B.; Al-Rfou, R.; Skiena, S. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 24–27 August 2014; pp. 701–710. [Google Scholar]
- Verwimp, L.; Pelemans, J.; Wambacq, P. Expanding n-gram training data for language models based on morpho-syntactic transformations. Comput. Linguist. Neth. J. 2015, 5, 49–64. [Google Scholar]
- Han, Y.; Zegras, C.; Pereira, F.C.; Ben-Akiva, M. A neuralembedded choice model: Tastenet-mnl modeling taste heterogeneity with flexibility and interpretability. arXiv 2020, arXiv:2002.00922. [Google Scholar] [CrossRef]
- Alwosheel, A.; van Cranenburgh, S.; Chorus, C.G. Is your dataset big enough? sample size requirements when using artificial neural networks for discrete choice analysis. J. Choice Model. 2018, 28, 167–182. [Google Scholar] [CrossRef]
- Iranitalab, A.; Khattak, A. Comparison of four statistical and machine learning methods for crash severity prediction. Accid. Anal. Prev. 2017, 108, 27–36. [Google Scholar] [CrossRef] [PubMed]
- Wang, Y. A new concept using LSTM Neural Networks for dynamic system identification. In Proceedings of the 2017 American Control Conference (ACC), Seattle, WA, USA, 24–26 May 2017; pp. 5324–5329. [Google Scholar]
- Van Cranenburgh, S.; Wang, S.; Vij, A.; Pereira, F.; Walker, J. Choice modelling in the age of machine learning-discussion paper. J. Choice Model. 2022, 42, 100340. [Google Scholar] [CrossRef]
- Camacho-Collados, J.; Pilehvar, M.T. From word to sense embeddings: A survey on vector representations of meaning. J. Artif. Intell. Res. 2018, 63, 743–788. [Google Scholar] [CrossRef]
- Paredes, M.; Hemberg, E.; O’Reilly, U.-M.; Zegras, C. Machine learning or discrete choice models for car ownership demand estimation and prediction? In Proceedings of the 2017 5th IEEE International Conference on Models and Technologies for Intelligent Transportation Systems (MT-ITS), Naples, Italy, 26–28 June 2017; pp. 780–785. [Google Scholar]
- Foudeh, P.; Salim, N. An ontology-based, fully probabilistic, scalable method for human activity recognition. arXiv 2021, arXiv:2109.02902. [Google Scholar]
- Perone, C.S.; Silveira, R.; Paula, T.S. Evaluation of sentence embeddings in downstream and linguistic probing tasks. arXiv 2018, arXiv:1806.06259. [Google Scholar]
- Bordes, A.; Usunier, N.; Garcia-Duran, A.; Weston, J.; Yakhnenko, O. Translating embeddings for modeling multi-relational data. Adv. Neural Inf. Process. Syst. 2013, 26. [Google Scholar]
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient estimation of word representations in vector space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
- Grover, A.; Leskovec, J. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 855–864. [Google Scholar]
- Wong, M.; Farooq, B. Reslogit: A residual neural network logit model for data-driven choice modelling. Transp. Res. Part C Emerg. Technol. 2021, 126, 103050. [Google Scholar] [CrossRef]
- Feng, S.; Cong, G.; An, B.; Chee, Y.M. Poi2vec: Geographical latent representation for predicting future visitors. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
- De Brébisson, A.; Simon, É.; Auvolat, A.; Vincent, P.; Bengio, Y. Artificial neural networks applied to taxi destination prediction. arXiv 2015, arXiv:1508.00021. [Google Scholar]
- Wang, Z.; Xiao, D.; Fang, F.; Govindan, R.; Pain, C.; Guo, Y. Model identification of reduced order fluid dynamics systems using deep learning. Int. J. Numer. Methods Fluids 2018, 86, 255–268. [Google Scholar] [CrossRef]
- Wang, B.; Shaaban, K.; Kim, I. Revealing the hidden features in traffic prediction via entity embedding. Pers. Ubiquitous Comput. 2021, 25, 21–31. [Google Scholar] [CrossRef]
Attributes | Values | Description |
---|---|---|
Index | Categorical | Index or Identifier |
Arrival Time | Continuous | Time of Arrival |
Creation Time | Continuous | Creation Time |
X | Continuous | Accelerometer reading along the x-axis |
Y | Continuous | Accelerometer reading along the x-axis |
Z | Continuous | Accelerometer reading along the x-axis |
User | Categorical | User Identifier |
Model | Categorical | Smartphone model |
Device | Categorical | Device Identifier |
Gestures | Categorical | Activity Class (Sit, Stand, Walk, Bike, Stairs up, Stairs Down) |
Phone Accelerometer | Static Accelerometer | |||||
---|---|---|---|---|---|---|
x-Axis | y-Axis | z-Axis | x-Axis | y-Axis | z-Axis | |
Min | −3.3424 | −3.7771 | −4.0476 | −20.9079 | −19.6133 | −1.1880 |
−0.0284 | −0.0824 | −0.1370 | −6.1700 | −0.5390 | 7.2400 | |
0.0003 | 0.0004 | −0.0001 | −5.0087 | 0.1263 | 8.1730 | |
Mean | 0.0006 | −0.0089 | −0.0129 | −3.9921 | 0.2276 | 8.2881 |
0.1233 | 0.0528 | 0.0466 | −0.3352 | 0.7853 | 9.7201 | |
Max | 2.7197 | 7.6496 | 4.5979 | 17.9290 | 19.6127 | 24.3962 |
SD | 0.4722 | 0.4227 | 0.4722 | 3.6867 | 1.2643 | 2.0311 |
Range | 6.0621 | 11.4267 | 8.6455 | 38.8362 | 39.2260 | 38.8369 |
Skewness | −0.6177 | 0.0738 | 0.3547 | 0.5281 | 0.8364 | 0.1281 |
Kurtosis | 9.6207 | 8.2604 | 10.5227 | 3.5859 | 6.8863 | 4.0182 |
Models | Attention Weights | F1-Score | Accuracy | Recall | Precision |
---|---|---|---|---|---|
ECM | (0.3, 0.1) | 71.35 | 73.01 | 77.64 | 82.29 |
(0.6, 0.2) | 76.54 | 79.15 | 73.99 | 81.05 | |
(0.9, 0.3) | 77.11 | 80.08 | 81.71 | 81.74 | |
ECMAM | (0.3, 0.1) | 71.90 | 85.73 | 78.90 | 88.97 |
(0.6, 0.2) | 78.17 | 85.90 | 80.65 | 87.15 | |
(0.9, 0.3) | 77.89 | 89.01 | 88.57 | 88.03 |
Acceleration | Parameters | Betas | Weights | Bias | St Errors | t-Stats | p-Value |
---|---|---|---|---|---|---|---|
AT | 0.2378 | 0.7891 | 0.5132 | 0.1347 | 1.7653 | 0.0294 | |
CT | 0.3948 | 1.2145 | 0.8412 | 0.1654 | 1.5297 | 0.0543 | |
X | 0.5482 | 1.0473 | 0.6956 | 0.2123 | 2.3746 | 0.0172 | |
Y | 0.4159 | 0.9123 | 0.7210 | 0.1784 | 2.1238 | 0.0121 | |
Phone | Z | 0.1076 | 0.5321 | 0.3010 | 0.0496 | 1.8552 | 0.0053 |
User | 0.6243 | 1.3265 | 1.0178 | 0.2487 | 1.9421 | 0.0065 | |
Model | 0.8153 | 1.5123 | 0.9064 | 0.3011 | 0.1976 | 0.0041 | |
Device | 0.7498 | 1.1247 | 0.7836 | 0.1987 | 0.7543 | 0.0087 | |
Gt | 0.8965 | 1.7435 | 1.2134 | 0.3564 | 0.8621 | 0.0002 | |
AT | 0.2598 | 0.7490 | 0.5592 | 0.1223 | 1.9874 | 0.0298 | |
CT | 0.3456 | 1.2564 | 0.8709 | 0.1762 | 2.2413 | 0.0317 | |
X | 0.4568 | 1.0342 | 0.6543 | 0.2521 | 2.0981 | 0.0001 | |
Y | 0.1423 | 0.9367 | 0.7823 | 0.1892 | 2.4591 | 0.0000 | |
Static | Z | 0.6912 | 0.5132 | 0.3891 | 0.0973 | 2.3145 | 0.0000 |
User | 0.8791 | 1.4553 | 1.0235 | 0.2786 | 1.6709 | 0.0000 | |
Model | 0.7210 | 1.3421 | 0.9389 | 0.3097 | 1.8093 | 0.0219 | |
Device | 0.9234 | 1.1892 | 0.8024 | 0.2065 | 1.9803 | 0.0391 | |
Gt | 0.5623 | 1.7845 | 1.7845 | 0.3812 | 2.9390 | 0.0147 |
Model | DCM | MNL | NestedLogit | Entity Embedding | Attention Mechanism | Proposed |
---|---|---|---|---|---|---|
Log-Loss | 0.454 | 0.384 | 0.403 | 0.323 | 0.382 | 0.252 |
AUC | 0.786 | 0.838 | 0.811 | 0.872 | 0.915 | 0.946 |
Accuracy | 0.762 | 0.815 | 0.794 | 0.858 | 0.887 | 0.923 |
Precision | 0.734 | 0.796 | 0.787 | 0.847 | 0.864 | 0.915 |
Recall | 0.792 | 0.843 | 0.814 | 0.875 | 0.906 | 0.948 |
F1 Score | 0.758 | 0.818 | 0.798 | 0.867 | 0.877 | 0.935 |
Model | DCM | MNL | NestedLogit | Entity Embedding | Attention Mechanism | Proposed |
---|---|---|---|---|---|---|
Log-Loss | 0.451 | 0.383 | 0.401 | 0.324 | 0.384 | 0.252 |
AUC | 0.784 | 0.837 | 0.813 | 0.876 | 0.912 | 0.948 |
Accuracy | 0.768 | 0.813 | 0.798 | 0.853 | 0.886 | 0.925 |
Precision | 0.732 | 0.796 | 0.783 | 0.845 | 0.862 | 0.918 |
Recall | 0.794 | 0.842 | 0.817 | 0.872 | 0.907 | 0.943 |
F1 Score | 0.757 | 0.814 | 0.792 | 0.869 | 0.873 | 0.937 |
Model | DCM | MNL | NestedLogit | Entity Embedding | Attention Mechanism | Proposed |
---|---|---|---|---|---|---|
Log-Loss | 0.552 | 0.423 | 0.461 | 0.243 | 0.481 | 0.184 |
AUC | 0.785 | 0.736 | 0.785 | 0.778 | 0.894 | 0.962 |
Accuracy | 0.782 | 0.852 | 0.763 | 0.756 | 0.872 | 0.907 |
Precision | 0.767 | 0.808 | 0.747 | 0.745 | 0.896 | 0.916 |
Recall | 0.672 | 0.766 | 0.764 | 0.892 | 0.843 | 0.918 |
F1 Score | 0.719 | 0.774 | 0.833 | 0.859 | 0.857 | 0.922 |
Model | DCM | MNL | NestedLogit | Entity Embedding | Attention Mechanism | Proposed |
---|---|---|---|---|---|---|
Log-Loss | 0.453 | 0.381 | 0.407 | 0.326 | 0.382 | 0.257 |
AUC | 0.787 | 0.837 | 0.813 | 0.873 | 0.914 | 0.944 |
Accuracy | 0.763 | 0.813 | 0.798 | 0.859 | 0.882 | 0.924 |
Precision | 0.738 | 0.799 | 0.785 | 0.843 | 0.861 | 0.913 |
Recall | 0.793 | 0.843 | 0.811 | 0.877 | 0.906 | 0.947 |
F1 Score | 0.758 | 0.810 | 0.793 | 0.863 | 0.872 | 0.938 |
Phone Acceleration | Static Acceleration | |||||
---|---|---|---|---|---|---|
Models | Training Time | Memory Usage (MB) | Inference Speed/ms | Training Time | Memory Usage (MB) | Inference Speed/ms |
DCM | 56.12 | 4851 | 19.06 | 81.65 | 12,334 | 34.45 |
MNL | 64.32 | 12,295 | 16.78 | 72.79 | 2482 | 32.09 |
NestedLogit | 109.09 | 1987 | 78.42 | 88.02 | 3462 | 23.06 |
Entity Embedding | 87.09 | 670 | 36.76 | 56.32 | 5932 | 38.25 |
Attention mechanism | 66.98 | 763 | 10.38 | 67.95 | 458 | 25.95 |
Proposed | 27.46 | 139 | 7.64 | 33.61 | 309 | 22.06 |
Phone Acceleration | Static Acceleration | |||||
---|---|---|---|---|---|---|
Models | Explainability | Interpretability | Complexity | Explainability | Interpretability | Complexity |
DCM | 0.721 | 0.802 | 0.601 | 0.816 | 0.781 | 0.651 |
MNL | 0.784 | 0.754 | 0.715 | 0.793 | 0.794 | 0.734 |
NestedLogit | 0.883 | 0.802 | 0.654 | 0.876 | 0.673 | 0.662 |
Entity Embedding | 0.915 | 0.915 | 0.726 | 0.893 | 0.876 | 0.745 |
Attention mechanism | 0.833 | 0.881 | 0.813 | 0.885 | 0.793 | 0.813 |
Proposed | 0.945 | 0.976 | 0.955 | 0.913 | 0.977 | 0.966 |
Models | Phone Acceleration | Static Acceleration |
---|---|---|
DCM | 0.7143 | 0.6214 |
MNL | 0.5123 | 0.7231 |
NestedLogit | 0.6219 | 0.8036 |
Entity Embedding | 0.8913 | 0.9324 |
Attention mechanism | 0.7576 | 0.8643 |
Proposed | 0.9356 | 0.9481 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Nafisah, I.A.; Sajjad, I.; Alshahrani, M.A.; Alamri, O.A.; Almazah, M.M.A.; Dar, J.G. Statistical Predictive Hybrid Choice Modeling: Exploring Embedded Neural Architecture. Mathematics 2024, 12, 3115. https://doi.org/10.3390/math12193115
Nafisah IA, Sajjad I, Alshahrani MA, Alamri OA, Almazah MMA, Dar JG. Statistical Predictive Hybrid Choice Modeling: Exploring Embedded Neural Architecture. Mathematics. 2024; 12(19):3115. https://doi.org/10.3390/math12193115
Chicago/Turabian StyleNafisah, Ibrahim A., Irsa Sajjad, Mohammed A. Alshahrani, Osama Abdulaziz Alamri, Mohammed M. A. Almazah, and Javid Gani Dar. 2024. "Statistical Predictive Hybrid Choice Modeling: Exploring Embedded Neural Architecture" Mathematics 12, no. 19: 3115. https://doi.org/10.3390/math12193115
APA StyleNafisah, I. A., Sajjad, I., Alshahrani, M. A., Alamri, O. A., Almazah, M. M. A., & Dar, J. G. (2024). Statistical Predictive Hybrid Choice Modeling: Exploring Embedded Neural Architecture. Mathematics, 12(19), 3115. https://doi.org/10.3390/math12193115