LTVPGA: Distilled Graph Attention for Lightweight Traffic Violation Prediction
Abstract
1. Introduction
2. Research Area and Materials
2.1. Research Area
2.2. Data Sources and Preprocessing
- (1)
- Road Network Graph
- (1)
- Preprocessing Pipeline:
- Segmentation: Following traffic safety guidelines, we partitioned the network into 500 m long fundamental units, converting 8550 raw segments into 12,580 spatial units.
- Graph Construction: These units form nodes, with edges connecting spatially adjacent units.
- Adjacency Representation: The graph was encoded as a symmetric adjacency matrix of the order [N, N] (N = 12,580), where entry denotes connectivity (Figure 2 provides an example of an 11-node subgraph).
- (2)
- Feature Engineering:
- Road Classification: This serves as a composite proxy for lane count, speed limits, and traffic volume.
- Intersection Presence: This identifies elevated violation risks from conflict points and operational complexity.
- These topological features directly modulate the graph attention mechanisms in our model.
- (2)
- Traffic Violation Dataset
- (3)
- Historical Meteorological Dataset
- (4)
- Historical Calendar Dataset
- (5)
- Point-of-Interest Dataset
3. Methods
3.1. Knowledge Distillation
3.2. LTVPGA Model
3.2.1. Overview of LTVPGA
3.2.2. LTVPGA Algorithm
- Soft targets: The teacher’s probabilistic distributions.
- Hard targets: Ground-truth violation labels.
Algorithm 1 LTVPGA Training |
1: Input: 2: Output: 3: function () 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: end function |
4. Experiments and Results
4.1. Parameter Configuration
- (1)
- Train–test partitioning: Here, the initial 1747 days (≈4.8 years) are allocated for training, and the subsequent 70 days (10 weeks) are reserved for testing to maintain temporal continuity.
- (2)
- Training parameters: More than 95% of the loss reduction occurred during the first 25 epochs, followed by progressive refinement of marginal gains until epoch 40 (Figure 6). Training beyond 40 epochs yields no significant improvement. Thus, 50 epochs are selected as the saturation point where the validation loss plateaus. The temperature hyperparameter T softens the teacher’s output probability distribution, primarily to preserve critical “dark knowledge”. Since the teacher and student models are trained on data from the same source, their predictive distributions exhibit high similarity. We set , as empirically referenced in [36,38], which benefits from “dark knowledge” while suppressing noise from over-smoothing. For deployment, the temperature is reset to to output deterministic predictions.
- (3)
- Distillation Loss: Here, dual-supervision is integrated:
- Soft loss: Kullback–Leibler divergence transfers the teacher’s inductive biases (e.g., feature correlations, uncertainty awareness). We set to prioritize learning from the teacher’s soft targets, which transfer critical “dark knowledge” about class similarities [36], avoiding local optima caused by the student’s limited capacity.
- Hard loss: Cross-entropy anchors predictions to ground-truth labels and mitigates teacher-related overfitting risks. We set to impose minimally sufficient boundary constraints and maximize dark knowledge transfer efficiency. This ratio is a de facto standard in knowledge distillation, whose efficacy has been consistently demonstrated in lightweight models [36,38].
- (4)
- Optimization specification: The Adam optimizer is employed with a fixed learning rate (lr = 0.001) and unit batch size (batch size = 1) to prevent temporal information leakage while preserving topological dependency integrity.
4.2. Evaluation Metrics
- Memory consumption during training (GB);
- Average training time per epoch (s);
- Total training duration (min).
4.3. Results
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Wang, F.; Lin, Y.; Ioannou, P.A.; Vlacic, L.; Liu, X.; Eskandarian, A.; Lv, Y.; Na, X.; Cebon, D.; Ma, J.; et al. Transportation 5.0: The DAO to Safe, Secure, and Sustainable Intelligent Transportation Systems. IEEE Trans. Intell. Transp. Syst. 2023, 24, 10262–10278. [Google Scholar] [CrossRef]
- Zhang, G.; Yau, K.K.; Chen, G. Risk factors associated with traffic violations and accident severity in China. Accid. Anal. Prev. 2013, 59, 18–25. [Google Scholar] [CrossRef]
- Anderson, T.K. Kernel density estimation and K-means clustering to profile road accident hotspots. Accid. Anal. Prev. 2009, 41, 359–364. [Google Scholar] [CrossRef]
- Caliendo, C.; Guida, M.; Parisi, A. A crash-prediction model for multilane roads. Accid. Anal. Prev. 2007, 39, 657–670. [Google Scholar] [CrossRef]
- Lin, L.; Wang, Q.; Sadek, A.W. A novel variable selection method based on frequent pattern tree for real-time traffic accident risk prediction. Transp. Res. Part C Emerg. Technol. 2015, 55, 444–459. [Google Scholar] [CrossRef]
- Chen, Q.; Song, X.; Fan, Z.; Xia, T.; Yamada, H.; Shibasaki, R. A context-aware nonnegative matrix factorization framework for traffic accident risk estimation via heterogeneous data. In Proceedings of the 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), Miami, FL, USA, 10–12 April 2018. [Google Scholar]
- Yu, R.; Liu, X.Q. Study on Traffic Accidents Prediction Model Based on RBF Neural Network. In Proceedings of the International Conference on Information Engineering & Computer Science, Xi’an, China, 22 October 2010. [Google Scholar]
- Zeng, K.H.; Chou, S.H.; Chan, F.H.; Niebles, J.C.; Sun, M. Agent-centric risk assessment: Accident anticipation and risky region localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Hawaii, HI, USA, 21–26 July 2017. [Google Scholar]
- Xiao, T.; Lu, H.; Wang, J.; Wang, K. Predicting and interpreting spatial accidents through MDLSTM. Int. J. Environ. Res. Public Health 2021, 18, 1430. [Google Scholar] [CrossRef]
- Najjar, A.; Kaneko, S.I.; Miyanaga, Y. Combining satellite imagery and open data to map road safety. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
- Li, P.; Abdel-Aty, M.; Yuan, J. Real-time crash risk prediction on arterials based on LSTM-CNN. Accid. Anal. Prev. 2019, 135, 105371. [Google Scholar] [CrossRef]
- Li, W.S.; Zou, T.T.; Wang, H.Y.; Huang, H. Traffic accident quantity prediction model based on dual-scale long short-term memory network. J. Zhejiang Univ. (Eng. Sci.) 2020, 54, 1613–1619. [Google Scholar]
- Yuan, Z.; Zhou, X.; Yang, T. Hetero-convlstm: A deep learning approach to traffic accident prediction on heterogeneous spatio-temporal data. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, New York, NY, USA, 19–23 August 2018. [Google Scholar]
- Liu, Y.; Zheng, H.; Feng, X.; Chen, Z. Short-term traffic flow prediction with Conv-LSTM. In Proceedings of the 2017 9th International Conference on Wireless Communications and Signal Processing (WCSP), Nanjing, China, 11–13 October 2017. [Google Scholar]
- Yu, B.; Yin, H.; Zhu, Z. Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI), Stockholm, Sweden, 13–19 July 2018. [Google Scholar]
- Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- Li, M.; Zhu, Z. Spatial-Temporal Fusion Graph Neural Networks for Traffic Flow Forecasting. AAAI Tech. Track Data Min. Knowl. Manag. 2021, 35, 4189–4196. [Google Scholar] [CrossRef]
- Zhao, L.; Song, Y.; Zhang, C.; Liu, Y.; Wang, P.; Lin, T.; Deng, M.; Li, H. T-GCN: A Temporal Graph Convolutional. Network for Traffic Prediction. IEEE Trans. Intell. Transp. Syst. 2020, 21, 3848–3858. [Google Scholar] [CrossRef]
- Sheng, Z.; Xu, Y.; Xue, S.; Li, D. Graph-based spatial-temporal convolutional network for vehicle trajectory prediction in autonomous driving. IEEE Trans. Intell. Transp. Syst. 2023, 23, 17654–17665. [Google Scholar] [CrossRef]
- Chen, B.; Ma, Y.; Wang, J.; Jia, T.; Liu, X.; Lam, W.H.K. Graph convolutional networks with learnable spatial weightings for traffic forecasting applications. Transp. A Transp. Sci. 2025, 21, 2239377. [Google Scholar] [CrossRef]
- Wu, M.; Jia, H.; Luo, D.; Luo, H.; Zhao, F.; Li, G. A multi-attention dynamic graph convolution network with cost-sensitive learning approach to road-level and minute-level traffic accident prediction. IET Intell. Transp. Syst. 2023, 17, 270–284. [Google Scholar] [CrossRef]
- Liu, Z.; Chen, Y.; Xia, F.; Bian, J.; Zhu, B.; Shen, G.; Kong, X. TAP: Traffic Accident Profiling via Multi-Task Spatio-Temporal Graph Representation Learning. ACM Trans. Knowl. Discov. Data 2023, 17, 1–25. [Google Scholar] [CrossRef]
- Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P.; Bengio, Y. Graph attention networks. In Proceedings of the 6th International Conference on Learning Representations (ICLR), Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Bai, J.; Zhu, J.; Song, Y.; Zhao, L.; Hou, Z.; Du, R.; Li, H. A3T-GCN: Attention Temporal Graph Convolutional Network for Traffic Forecasting. ISPRS Int. J. Geo-Inf. 2021, 10, 485. [Google Scholar] [CrossRef]
- Zhou, Y.; Wang, Y.; Zhang, F.; Zhou, H.; Sun, K.; Yu, Y. GATR: A Road Network Traffic Violation Prediction Method Based on Graph Attention Network. Int. J. Environ. Res. Public Health 2023, 20, 3432. [Google Scholar] [CrossRef]
- Qin, J.; Wang, Q.; Tao, T. Structural reinforcement-based graph convolutional networks. Connect. Sci. 2022, 34, 2807–2821. [Google Scholar] [CrossRef]
- You, J.; Ying, Z.; Leskovec, J. Optimizing memory efficiency for graph convolutional networks on edge computing platforms. In Proceedings of the 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), Seoul, Republic of Korea, 27 February–3 March 2021. [Google Scholar]
- Pan, S.J.; Yang, Q. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [Google Scholar] [CrossRef]
- Wang, J.; Gao, M.; Zhai, W.; Rida, I.; Zhu, X.; Li, Q. Knowledge Generation and Distillation for Road Segmentation in Intelligent Transportation Systems. IEEE Trans. Intell. Transp. Syst. 2025, 6, 1–13. [Google Scholar] [CrossRef]
- Yi, P.; Li, Z.X.; Chen, L.; Yang, C.; Cai, X. Prune and Distill: A Novel Knowledge Distillation Method for GCNs-Based Recommender Systems. IEEE Access 2025, 13, 92365–92375. [Google Scholar] [CrossRef]
- Joshi, C.; Liu, F.; Xun, X.; Lin, J.; Sheng, C. On Representation Knowledge Distillation for Graph Neural Networks. arXiv 2021, arXiv:2111.04964. [Google Scholar] [CrossRef]
- Zhang, S.; Liu, Y.; Sun, Y.; Shah, N. Graph-less neural networks: Teaching old mlps new tricks via distillation. In Proceedings of the Tenth International Conference on Learning Representations, Virtual Event, 25–29 April 2022. [Google Scholar]
- Hu, Y.; You, H.; Wang, Z.; Zhou, E.; Gao, Y. Graph-MLP: Node classification without message passing in graph. arXiv 2021, arXiv:2106.04051. [Google Scholar] [CrossRef]
- Izadi, M.; Safayani, M.; Mirzaei, A. Knowledge Distillation on Spatial-Temporal Graph Convolutional Network for Traffic Prediction. arXiv 2024, arXiv:2401.11798. [Google Scholar] [CrossRef]
- Franco, M.N.; Cosimo, R.; Salvatore, T.; Rossano, V. Distilled Neural Networks for Efficient Learning to Rank. IEEE Trans. Knowl. Data Eng. 2023, 35, 4695–4712. [Google Scholar]
- Gou, J.; Yu, B.; Maybank, S.; Tao, D. Knowledge Distillation: A Survey. Int. J. Comput. Vis. 2021, 129, 1789–1819. [Google Scholar] [CrossRef]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the Knowledge in a Neural Network. arXiv 2015, arXiv:1503.02531. [Google Scholar] [CrossRef]
- Hugo, T.; Matthieu, C.; Matthijs, D.; Francisco, M.; Alexandre, S.; Hervé, J. Training data-efficient image transformers & distillation through attention. arXiv 2021, arXiv:2012.12877. [Google Scholar] [CrossRef]
- Bodnar, C.; Bruinsma, W.P.; Lucic, A.; Stanley, M.; Allen, A.; Brandstetter, J.; Garvan, P.; Riechert, M.; Weyn, J.A.; Dong, H.; et al. A foundation model for the Earth system. Nature 2025, 641, 1180–1187. [Google Scholar] [CrossRef]
Metric | GATR | LTVPGA |
---|---|---|
Memory Consumption (GB) | 6.9414 | 1.0469 |
Training Epochs | 200 | 50 |
Avg. Time per Epoch (s) | 433.19 | 10.09 |
Total Training Duration (min) | 1443.98 | 8.4076 |
Model | RMSE | MAE | MAPE |
---|---|---|---|
Conv-LSTM | 1.9180 | 0.1597 | 0.37% |
GATR | 1.7078 | 0.1388 | 0.31% |
LTVPGA | 1.7653 | 0.1492 | 0.32% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Published by MDPI on behalf of the International Society for Photogrammetry and Remote Sensing. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, Y.; Zhou, Y.; Zhang, F. LTVPGA: Distilled Graph Attention for Lightweight Traffic Violation Prediction. ISPRS Int. J. Geo-Inf. 2025, 14, 332. https://doi.org/10.3390/ijgi14090332
Wang Y, Zhou Y, Zhang F. LTVPGA: Distilled Graph Attention for Lightweight Traffic Violation Prediction. ISPRS International Journal of Geo-Information. 2025; 14(9):332. https://doi.org/10.3390/ijgi14090332
Chicago/Turabian StyleWang, Yingzhi, Yuquan Zhou, and Feng Zhang. 2025. "LTVPGA: Distilled Graph Attention for Lightweight Traffic Violation Prediction" ISPRS International Journal of Geo-Information 14, no. 9: 332. https://doi.org/10.3390/ijgi14090332
APA StyleWang, Y., Zhou, Y., & Zhang, F. (2025). LTVPGA: Distilled Graph Attention for Lightweight Traffic Violation Prediction. ISPRS International Journal of Geo-Information, 14(9), 332. https://doi.org/10.3390/ijgi14090332