A Multimodal Learning Approach for Protecting the Metro System of Medellin Colombia Against Corrupted User Traffic Data
Highlights
- The novel concepts of Self-Supervised Tabular Learning and Large Multimodal Models are integrated to create a multimodal learning solution for auditing the metro system of Medellin, Colombia.
- On publicly available data, in an offline process, corrupted user traffic is detected, explained, and corroborated using SHAP values and the image understanding process of a Large Multimodal Model.
- A visibility layer is added for smart proper policy making, also shedding light on the pros and opportunities of the current publicly available data.
- Each abnormal passenger behavior is not only flagged, but a thorough justification is also provided to enhance the robustness of the detections.
Abstract
1. Introduction
- A multimodal AI solution that functions as an offline data inspector for the control dashboards of the Colombian metro. In late information fusion [20], the system integrates posts from X representing external influencing factors, with numerical user traffic data. This offline inspection serves as a visibility tool to flag and to explain potential anomalies that may have been overlooked and that require further analysis.
- The design of Deep Autoencoders using the embeddings produced by the S-SL TabNet architecture, in tandem with the calculation of SHAP values for unsupervised learning, while also exploring the use of LMMs such as LLaVA to provide additional information for finding the reason behind each of the detected anomalies.
- We are closing the gap between the integration of novel AI approaches such as multimodal learning, S-SL, and LMMs for intelligent metro systems.
2. Related Work
3. Methodology
3.1. User Traffic Data from the Colombian Metro
- 731 samples for training: 30 June 2022–30 June 2024.
- 184 for validation: 1 July 2024–31 December 2024.
- 90 for testing: 1 January 2025–31 March 2025.
- It is a weekday, but it is a holiday.
- It is a weekday, but it is not a holiday.
- It is a weekend, but it is a holiday.
- It is a weekend, but it is not a holiday.
3.2. Deep Autoencoders Based on Self-Supervised Learning
Reconstruction Threshold ()
3.3. SHAP Values for Deep Autoencoders
3.4. Large Multimodal Models: LLaVA Framework
4. Experimental Results and Discussion
4.1. Anomalies Detected by the Deep Autoencoders
4.2. SHAP-Based Explainability and Image Understanding Using LLaVA
4.3. Limitations of the Proposed Solution
5. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Tan, S.; De, D.; Song, W.Z.; Yang, J.; Das, S.K. Survey of Security Advances in Smart Grid: A Data Driven Approach. IEEE Commun. Surv. Tutorials 2017, 19, 397–422. [Google Scholar] [CrossRef]
- Cui, J.; Chen, Y.; Zhong, H.; He, D.; Wei, L.; Bolodurina, I.; Liu, L. Lightweight Encryption and Authentication for Controller Area Network of Autonomous Vehicles. IEEE Trans. Veh. Technol. 2023, 72, 14756–14770. [Google Scholar] [CrossRef]
- Wu, F.; Zheng, C.; Zhou, S.; Lu, Y.; Wu, Z.; Zheng, S. An interpretable approach to passenger flow prediction and irregular passenger travel patterns understanding in metro system. Expert Syst. Appl. 2025, 265, 125991. [Google Scholar] [CrossRef]
- Introducing the Subway Origin-Destination Ridership Dataset. 2024. Available online: https://www.mta.info/article/introducing-subway-origin-destination-ridership-dataset (accessed on 14 July 2025).
- MTA Subway Hourly Ridership: 2020–2024. 2025. Available online: https://data.ny.gov/Transportation/MTA-Subway-Hourly-Ridership-2020-2024/wujg-7c2s (accessed on 14 July 2025).
- Safer Subways: Governor Hochul Announces Budget Investments to Protect Subway Riders and Transit Workers. 2025. Available online: https://www.governor.ny.gov/news/safer-subways-governor-hochul-announces-budget-investments-protect-subway-riders-and-transit (accessed on 14 July 2025).
- Datos Abiertos–Metro de Medellin–Afluencia. Available online: https://datosabiertos-metrodemedellin.opendata.arcgis.com/search?tags=afluencia (accessed on 14 July 2025).
- Tarifas Metro de Medellín. Available online: https://www.metrodemedellin.gov.co/usuarios (accessed on 20 August 2025).
- Yun, H.; Lee, E.H. Party politics in transport policy with a large language model. Transp. Policy 2025, 171, 487–496. [Google Scholar] [CrossRef]
- Datos Abiertos–Metro de Medellin–Tableros de Control. Available online: https://datosabiertos-metrodemedellin.opendata.arcgis.com/ (accessed on 14 July 2025).
- Yoon, J.; Zhang, Y.; Jordon, J.; van der Schaar, M. VIME: Extending the Success of Self- and Semi-supervised Learning to Tabular Domain. In Proceedings of the Advances in Neural Information Processing Systems; Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2020; Volume 33, pp. 11033–11043. [Google Scholar]
- Arik, S.Ö.; Pfister, T. TabNet: Attentive Interpretable Tabular Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 35, pp. 6679–6687. [Google Scholar] [CrossRef]
- Gui, J.; Chen, T.; Zhang, J.; Cao, Q.; Sun, Z.; Luo, H.; Tao, D. A Survey on Self-Supervised Learning: Algorithms, Applications, and Future Trends. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 9052–9071. [Google Scholar] [CrossRef] [PubMed]
- Mahdavifar, S.; Ghorbani, A.A. Application of deep learning to cybersecurity: A survey. Neurocomputing 2019, 347, 149–176. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. In Proceedings of the Advances in Neural Information Processing Systems; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
- Antwarg, L.; Miller, R.M.; Shapira, B.; Rokach, L. Explaining anomalies detected by autoencoders using Shapley Additive Explanations. Expert Syst. Appl. 2021, 186, 115736. [Google Scholar] [CrossRef]
- Liu, H.; Li, C.; Li, Y.; Lee, Y.J. Improved Baselines with Visual Instruction Tuning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024; pp. 26296–26306. [Google Scholar]
- Deng, Z.; Ma, W.; Han, Q.L.; Zhou, W.; Zhu, X.; Wen, S.; Xiang, Y. Exploring DeepSeek: A Survey on Advances, Applications, Challenges and Future Directions. IEEE/CAA J. Autom. Sin. 2025, 12, 872–893. [Google Scholar] [CrossRef]
- Civica. Medio de Pago Para que te Muevas por la Ciudad. Available online: https://civica.metrodemedellin.gov.co/ (accessed on 14 July 2025).
- Hangloo, S.; Arora, B. Multimodal fusion techniques: Review, data representation, information fusion, and application areas. Neurocomputing 2025, 649, 130827. [Google Scholar] [CrossRef]
- Wang, H.; Li, L.; Pan, P.; Wang, Y.; Jin, Y. Online detection of abnormal passenger out-flow in urban metro system. Neurocomputing 2019, 359, 327–340. [Google Scholar] [CrossRef]
- Wang, J.; Zhang, Y.; Wei, Y.; Hu, Y.; Piao, X.; Yin, B. Metro Passenger Flow Prediction via Dynamic Hypergraph Convolution Networks. IEEE Trans. Intell. Transp. Syst. 2021, 22, 7891–7903. [Google Scholar] [CrossRef]
- Zhang, Q.; Liu, X.; Spurgeon, S.; Yu, D. A two-layer modelling framework for predicting passenger flow on trains: A case study of London underground trains. Transp. Res. Part A Policy Pract. 2021, 151, 119–139. [Google Scholar] [CrossRef]
- Wei, X.; Zhang, Y.; Zhang, X.; Ge, Q.; Yin, B. Real-time passenger flow anomaly detection in metro system. IET Intell. Transp. Syst. 2023, 17, 2020–2033. [Google Scholar] [CrossRef]
- Bapaume, T.; Côme, E.; Ameli, M.; Roos, J.; Oukhellou, L. Forecasting passenger flows and headway at train level for a public transport line: Focus on atypical situations. Transp. Res. Part C Emerg. Technol. 2023, 153, 104195. [Google Scholar] [CrossRef]
- Zhang, Y.; Li, S.; Yuan, Y.; Yang, L. Multi-step look ahead deep reinforcement learning approach for automatic train regulation of urban rail transit lines with energy-saving. Eng. Appl. Artif. Intell. 2025, 145, 110181. [Google Scholar] [CrossRef]
- Apley, D.W.; Zhu, J. Visualizing the Effects of Predictor Variables in Black Box Supervised Learning Models. J. R. Stat. Soc. Ser. B Stat. Methodol. 2020, 82, 1059–1086. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2021. [Google Scholar] [CrossRef]
- un año de reactivación y aportes del Metro al Valle de Aburrá. Available online: https://www.metrodemedellin.gov.co/al-dia/noticias/2022-reactivacion-y-aportes-del-metro-al-valle-de-aburra (accessed on 15 July 2025).
- Berahmand, K.; Daneshfar, F.; Salehi, E.S.; Li, Y.; Xu, Y. Autoencoders and their applications in machine learning: A survey. Artif. Intell. Rev. 2024, 57, 28. [Google Scholar] [CrossRef]
- Almaraz-Rivera, J.G.; Cantoral-Ceballos, J.A.; Botero, J.F. Enhancing IoT Network Security: Unveiling the Power of Self-Supervised Learning against DDoS Attacks. Sensors 2023, 23, 8701. [Google Scholar] [CrossRef]
- Almaraz-Rivera, J.G.; Cantoral-Ceballos, J.A.; Botero, J.F.; MuñOz, F.J.; Martinez, B.D. Hyphatia: A Card-Not-Present Fraud Detection System Based on Self-Supervised Tabular Learning. IEEE Open J. Comput. Soc. 2025, 6, 812–821. [Google Scholar] [CrossRef]
- Martins, A.; Astudillo, R. From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification. In Proceedings of the 33rd International Conference on Machine Learning, New York, New York, USA, 19–24 June 2016; Balcan, M.F., Weinberger, K.Q., Eds.; Proceedings of Machine Learning Research. Curran Associates, Inc.: Red Hook, NY, USA, 2016; Volume 48, pp. 1614–1623. [Google Scholar]
- Dauphin, Y.N.; Fan, A.; Auli, M.; Grangier, D. Language Modeling with Gated Convolutional Networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; Precup, D., Teh, Y.W., Eds.; Proceedings of Machine Learning Research. Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 70, pp. 933–941. [Google Scholar]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Proceedings of the Advances in Neural Information Processing Systems; Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2019; Volume 32. [Google Scholar]
- Zhang, D.; Yu, Y.; Dong, J.; Li, C.; Su, D.; Chu, C.; Yu, D. MM-LLMs: Recent Advances in MultiModal Large Language Models. arXiv 2024. [Google Scholar] [CrossRef]
- Huang, D.; Yan, C.; Li, Q.; Peng, X. From Large Language Models to Large Multimodal Models: A Literature Review. Appl. Sci. 2024, 14, 5068. [Google Scholar] [CrossRef]
- Liu, H.; Li, C.; Wu, Q.; Lee, Y.J. Visual Instruction Tuning. In Proceedings of the Advances in Neural Information Processing Systems; Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2023; Volume 36, pp. 34892–34916. [Google Scholar]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning, Virtual, 18–24 July 2021; Meila, M., Zhang, T., Eds.; Proceedings of Machine Learning Research. Curran Associates, Inc.: Red Hook, NY, USA, 2021; Volume 139, pp. 8748–8763. [Google Scholar]
- Chiang, W.L.; Li, Z.; Lin, Z.; Sheng, Y.; Wu, Z.; Zhang, H.; Zheng, L.; Zhuang, S.; Zhuang, Y.; Gonzalez, J.E.; et al. Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality. 2023. Available online: https://lmsys.org/blog/2023-03-30-vicuna/ (accessed on 25 August 2025).
- Lin, D.; Broere, W.; Cui, J. Metro systems and urban development: Impacts and implications. Tunn. Undergr. Space Technol. 2022, 125, 104509. [Google Scholar] [CrossRef]
- Stiller, D.; Wurm, M.; Sapena, M.; Nieland, S.; Dech, S.; Taubenböck, H. Does formal public transport serve the city well? The importance of semiformal transport for the accessibility in Medellín, Colombia. PLoS ONE 2025, 20, e0321691. [Google Scholar] [CrossRef]
- Lee, E.H. eXplainable DEA approach for evaluating performance of public transport origin-destination pairs. Res. Transp. Econ. 2024, 108, 101491. [Google Scholar] [CrossRef]
- How to Use Advanced Search. Available online: https://help.x.com/en/using-x/x-advanced-search (accessed on 28 July 2025).
- Chakraborty, N.; Ornik, M.; Driggs-Campbell, K. Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art. ACM Comput. Surv. 2025, 57, 1–35. [Google Scholar] [CrossRef]












| Self-Supervised Learning | Explainable AI | Large Multimodal Models | External Data Sources | |
|---|---|---|---|---|
| Wang et al., 2019 [21] | ✗ | ✗ | ✗ | ✔ |
| Wang et al., 2021 [22] | ✗ | ✗ | ✗ | ✗ |
| Zhang et al., 2021 [23] | ✗ | ✔ | ✗ | ✔ |
| Wei et al., 2023 [24] | ✗ | ✗ | ✗ | ✔ |
| Bapaume et al., 2023 [25] | ✔ | ✔ | ✗ | ✔ |
| Wu et al., 2025 [3] | ✗ | ✔ | ✗ | ✔ |
| Zhang et al., 2025 [26] | ✗ | ✗ | ✗ | ✔ |
| This work | ✔ | ✔ | ✔ | ✔ |
| Feature | Description |
|---|---|
| Date information | |
| Is weekend | Boolean numerical value to indicate if the day is a weekday (0) or weekend (1). |
| Is holiday | Boolean numerical value to indicate if the day is a holiday (1) in the country or not (0). |
| Time information | |
| 04:00:00, 05:00:00,…, 23:00:00 | Granularity at the hour level. The normal operation time of the metro spans from 4 am to 11 pm. |
| Variable | Selected Value |
|---|---|
| Batch size | 32 |
| Virtual batch size | 32 |
| Maximum learning rate | 5 × |
| Minimum learning rate | 1 × |
| Number of blocks (steps) | 3 |
| Number of shared layers | 2 |
| Number of independent layers | 2 |
| Momentum | 0.3 |
| Embedding dimension | 8 |
| Feature re-usage () | 2 |
| Dimension of the attention layer | 8 |
| Line A | Line B |
|---|---|
| 2 January 2025 | 1 January 2025 |
| 3 January 2025 | 2 January 2025 |
| 7 January 2025 | 3 January 2025 |
| 9 January 2025 | 7 January 2025 |
| 5 March 2025 | 10 January 2025 |
| 24 March 2025 | 24 March 2025 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Almaraz-Rivera, J.G.; Cantoral-Ceballos, J.A.; Botero, J.F.; Muñoz, F.J.; Martinez, B.D. A Multimodal Learning Approach for Protecting the Metro System of Medellin Colombia Against Corrupted User Traffic Data. Smart Cities 2025, 8, 198. https://doi.org/10.3390/smartcities8060198
Almaraz-Rivera JG, Cantoral-Ceballos JA, Botero JF, Muñoz FJ, Martinez BD. A Multimodal Learning Approach for Protecting the Metro System of Medellin Colombia Against Corrupted User Traffic Data. Smart Cities. 2025; 8(6):198. https://doi.org/10.3390/smartcities8060198
Chicago/Turabian StyleAlmaraz-Rivera, Josue Genaro, Jose Antonio Cantoral-Ceballos, Juan Felipe Botero, Francisco Javier Muñoz, and Brian David Martinez. 2025. "A Multimodal Learning Approach for Protecting the Metro System of Medellin Colombia Against Corrupted User Traffic Data" Smart Cities 8, no. 6: 198. https://doi.org/10.3390/smartcities8060198
APA StyleAlmaraz-Rivera, J. G., Cantoral-Ceballos, J. A., Botero, J. F., Muñoz, F. J., & Martinez, B. D. (2025). A Multimodal Learning Approach for Protecting the Metro System of Medellin Colombia Against Corrupted User Traffic Data. Smart Cities, 8(6), 198. https://doi.org/10.3390/smartcities8060198

