Integrating Temporal Event Prediction and Large Language Models for Automatic Commentary Generation in Video Games
Abstract
1. Introduction
- A hybrid architecture for automated soccer commentary generationA hybrid framework specifically designed for soccer games combines time-series event prediction with natural language generation. This framework enables the automatic generation of event-driven commentaries, reducing the reliance on manual annotations and scripted templates.
- Real-time commentary generation based on three-second-ahead event predictionBy leveraging a temporal prediction model to anticipate events three seconds in advance, the system enables real-time commentary generation with strong contextual consistency and responsiveness. In contrast to template-based methods that retrospectively generate commentary, this approach significantly reduces commentary latency, enhances the diversity and fluency of the output, and alleviates the repetitiveness typically associated with template-based generation.
2. Related Works
2.1. Research on Game Commentary Generation
2.2. Challenges in Soccer Commentary Generation
2.3. Time-Series Models for Event Prediction
- It uses multiscale convolutional kernels to simultaneously capture both short- and long-range temporal features.
- The CNN-based architecture is lightweight and fast, making it suitable for real-time applications.
- It effectively captures the flow of event transitions across time, which is particularly useful for modeling complex gameplay sequences.
2.4. Importance of Balanced Datasets
2.5. Summary
3. Proposed Method
3.1. Overview
3.2. Game Data Collection and Preprocessing
3.3. OS-CNN-Based Event Prediction for Future Event
- Convolution operations using kernels of various sizes to extract features at multiple temporal scales.
- Concatenation of outputs from all kernel branches to merge the multi-scale features.
- Batch normalization to stabilize learning and to improve convergence.
- ReLU activation to introduce non-linearity.
3.4. Event-Prediction-Based Commentary Generation Using LLMs
- Event Triggering: The process begins with a ball event. The decision module checks whether the predicted ball event class label indicates a valid event (i.e., goal kick, free kick, corner kick, or throw-in with class labels ∈ {2, 3, 4, 5}). If a valid event is detected, the pipeline proceeds to a prompt construction stage. Otherwise, if the class label is zero (no event), the system bypasses commentary generation to conserve computational resources.
- Prompt Construction: When an event is triggered, the game information module provides essential game time-step information, including the time step, ball-owned team, and ball-owned player information. These details, combined with the ball event type from the event prediction output, are used to construct a prompt following a predefined template. The prompt template incorporates structured placeholders as follows:
- {time_event}: Current time step of the event;
- {event_type}: Type of ball event (e.g., goal kick);
- {team}: Ball-owned team;
- {ball owned player}: Player currently possessing a ball;
- {benefiting_team}: Teams benefiting from the event.
- This template ensures the generated commentary is informative, situationally grounded, and expressive.
- 3.
- LLM Processing: The constructed prompt is fed into the LLM (LLaMa3.3) module, which generates fluent, natural, and contextually relevant commentary text reflecting the predicted ball event and gameplay context. The generative capabilities of the LLM enable diverse narrative expressions, overcoming the repetitiveness of purely template-based systems.
- 4.
- Final Commentary Delivery: The generated commentary output is then delivered via the real-time commentary output module, synchronized to maintain a three-second lead time relative to the predicted event occurrence. This ensures that viewers receive timely and natural narration aligned with live gameplay, enhancing immersion and maintaining a smooth narrative flow.
4. Experimental Setup and Results
4.1. Implementation Details
4.2. Dataset Description
4.2.1. Data Collection for Event Prediction
- Balanced Dataset: Down-sampled from 1247 matches to ensure equal class distribution (837 samples per class).
- Imbalanced Dataset: Constructed by randomly sampling 100 matches to obtain a naturally imbalanced dataset that reflects real-world game scenarios. During training, the SMOTE was applied to balance the dataset and to allow the model to fully learn the characteristics of each event type.
4.2.2. Feature Selection and Preprocessing
4.3. Evaluation Metrics
4.3.1. Event Prediction Metrics
- Recall (1):
- Precision (2):
- F1-Score (3):
4.3.2. Commentary Evaluation Metrics
4.4. Event Prediction Results
4.4.1. Confusion Matrix Analysis: Balanced Dataset
4.4.2. Confusion Matrix Analysis: Imbalanced Dataset
4.4.3. Loss Analysis for Balanced and Imbalanced Dataset
4.4.4. Results Summary
4.5. Commentary Generation with Prompt Engineering
Improved Prompt Design
4.6. Commentary Evaluation Results
4.7. Latency Analysis for Real-Time Commentary Generation
5. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
LLM | Large Language Models |
SVM | Support Vector Machine |
ANN | Artificial Neural Network |
OS-CNN | Omni-Scale Convolutional Neural Network |
Seq2Seq | Sequence-to-Sequence |
LSTM | Long Short-Term Memory |
LLaMA 3.3 | Large Language Model Meta AI 3.3 |
GRF | Google Research Football |
GPT-3.5 | Generative Pre-trained Transformer 3.5 |
ChatGPT | Chat Generative Pre-trained Transformer |
API | Application Programming Interface |
CNN | Convolutional Neural Network |
BART | Bidirectional and Auto-Regressive Transformer |
EfficientDet | Efficient Object Detection |
HUD | Heads-Up Display |
FPS | First-Person Shooter |
LLaVA v1.5 | Large Language and Vision Assistant version 1.5 |
TCN | Temporal Convolutional Network |
RNN | Recurrent Neural Network |
GRU | Gated Recurrent Unit |
XGBoost | eXtreme Gradient Boosting |
MCC | Matthews Correlation Coefficient |
AUC | Area Under the Curve |
VLM | Vision Language Model |
TTR | Type/Token Ratio |
SMOTE | Synthetic Minority Over-sampling Technique |
Appendix A
Match Introduction |
---|
You are a football commentator covering a match at the Santiago Bernabeu between the home side, Real Madrid, and the visiting team, Manchester United. I will provide you with match information, and you will generate concise and vivid commentary. Use a maximum of two sentences per turn. Only mention players, teams, or events I have specified, and do not invent any new details. Avoid repetition, and do not refer to the commentary itself. This is the beginning of the match. Write a vivid opening line to set the scene, clearly stating that Real Madrid is the home team and Manchester United is the away team. |
Goal Event |
---|
At minute {timestep}, {player} from {team} scored a goal in the match against {benefiting_team}. Write a vivid and expressive commentary using a {chosen_emotion} tone. Clearly describe how this goal shifts the momentum or atmosphere of the match without exaggeration or subjective judgment. Mention the updated score: {final_score}. Keep the output under 75 words. |
Non-Goal Event |
---|
At minute {timestep}, {player} from {team} was involved in a sequence that led to a {event_type} awarded to {benefiting_team}. Write a vivid and expressive commentary using a {chosen_emotion} tone. Describe how this shift impacts the match’s momentum or atmosphere, without explicitly mentioning any scoring or direct error. Keep the output under 75 words. |
Timestep | Team | Player | Event | Score | Emotion | Substituted Prompt |
---|---|---|---|---|---|---|
` | Real Madrid | 10 | goal kick | 1-0 | excited | At minute 25, No. 10 from Real Madrid was involved in a sequence that led to a goal kick awarded to Manchester United. Write a vivid and expressive commentary using an excited tone. Describe how this shift impacts the match’s momentum or atmosphere, without explicitly mentioning any scoring or direct error. Keep the output under 75 words. |
12 | Real Madrid | 5 | goal kick | 1-1 | calm | At minute 12, No. 5 from Real Madrid was involved in a sequence that led to a goal kick awarded to Manchester United. Write a vivid and expressive commentary using a calm tone. Describe how this shift impacts the match’s momentum or atmosphere, without explicitly mentioning any scoring or direct error. Keep the output under 75 words. |
19 | Real Madrid | 8 | goal kick | 0-1 | nervous | At minute 19, No. 8 from Real Madrid was involved in a sequence that led to a goal kick awarded to Manchester United. Write a vivid and expressive commentary using a nervous tone. Describe how this shift impacts the match’s momentum or atmosphere, without explicitly mentioning any scoring or direct error. Keep the output under 75 words. |
26 | Real Madrid | 4 | free kick | 1-0 | excited | At minute 26, No. 4 from Real Madrid was involved in a sequence that led to a free kick awarded to Manchester United. Write a vivid and expressive commentary using an excited tone. Describe how this shift impacts the match’s momentum or atmosphere, without explicitly mentioning any scoring or direct error. Keep the output under 75 words. |
33 | Real Madrid | 5 | free kick | 1-1 | calm | At minute 33, No. 5 from Real Madrid was involved in a sequence that led to a free kick awarded to Manchester United. Write a vivid and expressive commentary using a calm tone. Describe how this shift impacts the match’s momentum or atmosphere, without explicitly mentioning any scoring or direct error. Keep the output under 75 words. |
40 | Real Madrid | 7 | free kick | 0-1 | nervous | At minute 40, No. 7 from Real Madrid was involved in a sequence that led to a free kick awarded to Manchester United. Write a vivid and expressive commentary using a nervous tone. Describe how this shift impacts the match’s momentum or atmosphere, without explicitly mentioning any scoring or direct error. Keep the output under 75 words. |
47 | Real Madrid | 4 | corner kick | 1-0 | excited | At minute 47, No. 4 from Real Madrid was involved in a sequence that led to a corner kick awarded to Manchester United. Write a vivid and expressive commentary using an excited tone. Describe how this shift impacts the match’s momentum or atmosphere, without explicitly mentioning any scoring or direct error. Keep the output under 75 words. |
54 | Real Madrid | 3 | corner kick | 1-1 | calm | At minute 54, No. 3 from Real Madrid was involved in a sequence that led to a corner kick awarded to Manchester United. Write a vivid and expressive commentary using a calm tone. Describe how this shift impacts the match’s momentum or atmosphere, without explicitly mentioning any scoring or direct error. Keep the output under 75 words. |
61 | Real Madrid | 5 | corner kick | 0-1 | nervous | At minute 61, No. 5 from Real Madrid was involved in a sequence that led to a corner kick awarded to Manchester United. Write a vivid and expressive commentary using a nervous tone. Describe how this shift impacts the match’s momentum or atmosphere, without explicitly mentioning any scoring or direct error. Keep the output under 75 words. |
68 | Real Madrid | 7 | throw-in | 1-0 | excited | At minute 68, No. 7 from Real Madrid was involved in a sequence that led to a throw-in awarded to Manchester United. Write a vivid and expressive commentary using an excited tone. Describe how this shift impacts the match’s momentum or atmosphere, without explicitly mentioning any scoring or direct error. Keep the output under 75 words. |
75 | Real Madrid | 8 | throw-in | 1-1 | calm | At minute 75, No. 8 from Real Madrid was involved in a sequence that led to a throw-in awarded to Manchester United. Write a vivid and expressive commentary using a calm tone. Describe how this shift impacts the match’s momentum or atmosphere, without explicitly mentioning any scoring or direct error. Keep the output under 75 words. |
82 | Real Madrid | 10 | throw-in | 0-1 | nervous | At minute 82, No. 10 from Real Madrid was involved in a sequence that led to a throw-in awarded to Manchester United. Write a vivid and expressive commentary using a nervous tone. Describe how this shift impacts the match’s momentum or atmosphere, without explicitly mentioning any scoring or direct error. Keep the output under 75 words. |
Match Summary |
---|
It is the end of the match at the Santiago Bernabéu between the home team, Real Madrid, and the visiting side, Manchester United. The final score is [final_score]. Summarize the key moments of the match in 1–2 natural sentences. Do not invent any events. Only refer to those I have provided. Avoid subjective opinions, emotional evaluations, or unnecessary praise. Focus on factual closure that matches the tone of a professional sports broadcast. |
Appendix B
Time (min) | Event | Latency (s) | Generated Commentary |
---|---|---|---|
3 | Throw-in | 3.85 | [Czaplicki [6]] The match at Santiago Bernabeu is underway and Real Madrid’s Player 8 has already made his presence felt, kicking the ball out of bounds just 3 min in. Manchester United will now have a throw-in opportunity to regain possession of the ball. |
3.35 | [Proposed Approach] Pandemonium erupts in the 3rd minute as Player 8’s desperate lunge sparks chaos! Manchester United pounces, sensing vulnerability, and wins a crucial throw-in, igniting a frenzied counter-attack! | ||
16 | Free kick | 3.42 | [Czaplicki [6]] Real Madrid has been awarded a free kick here at the Santiago Bernabeu in the 16th minute after a foul was committed on their player, number 10. The Manchester United defence will need to be cautious as Real Madrid looks to capitalize on this scoring opportunity. |
2.69 | [Proposed Approach] Tension boils over as Manchester United’s defense falters! A burst of pressure sparks chaos, and suddenly Real Madrid seize control, sensing blood in the water. The crowd erupts! | ||
22 | Goal | 4.18 | [Czaplicki [6]] Real Madrid have taken an early lead here at the Santiago Bernabeu, with Player 10 scoring the opening goal in the 22nd minute to put the home team up 1-0 against Manchester United. The crowd is electric as the Spanish giants take control of the match, putting pressure on the visiting Red Devils to respond. |
3.58 | [Proposed Approach] UNBELIEVABLE! Player 10 unleashes a rocket of a shot, curves it in with precision! The Santiago Bernabéu ERUPTS! Real Madrid takes the lead, 1-0, in the 22nd minute! | ||
26 | Goal kick | 3.53 | [Czaplicki [6]] The referee has blown his whistle and signaled for a goal kick, which will be taken by Real Madrid, allowing them to regain possession of the ball from their own end of the Santiago Bernabeu pitch. Manchester United’s players are already positioning themselves in anticipation of the restart, looking to put pressure on the home team. |
2.70 | [Proposed Approach] Tension boils over! A sudden lapse sparks chaos, and Real Madrid seize the initiative with a goal kick! The Bernabéu faithful sense opportunity, as momentum swings dramatically in their favor! | ||
33 | Throw-in | 3.66 | [Czaplicki [6]] The crowd here at Santiago Bernabeu is waiting anxiously for the game to resume as Manchester United’s player has cleared the ball, momentarily relieving pressure on their defense. Real Madrid will now have a throw-in, looking to regain possession and continue their attack. |
3.08 | [Proposed Approach] Tension boils over as Manchester United seizes momentum! A flash of uncertainty sparks among Real Madrid’s defenders, and suddenly the visitors are on the front foot, sensing opportunity. The throw-in is theirs, fueling a surge of energy that electrifies the Bernabéu air. |
Appendix C
Data Preparation Pseudocode |
---|
Game Event Data Preparation Pipeline Input: Raw GRF match logs under folders episode_done_*; each file stores per-frame states. Output: Two .ts datasets: D_dim (dimension × time = 123 × 3), D_flat (flattened [123 × 3]); labels Y ∈ {0,2,3,4,5} (0 = no-event). 1: for each file f in episode_done_* do 2: S ← ParseFrames( RemoveWrappers( Read(f) ) ) ▷ list of per-frame dictionaries 3: E ← { t | game_mode_{t − 1} = 0 ∧ game_mode_t ∈ {2,3,4,5} } ▷ event indices 4: for each t ∈ E do 5: if t < 5 then continue 6: W ← [t − 5, t − 4, t − 3] ▷ 3 preceding frames, stride = 1 7: x ← ExtractFeatures(S, W) ▷ ball state; team/player kinematics; ▷ yellow cards; tiredness; score; game_mode 8: x ← ZScoreNormalize(x) ▷ across all time steps 9: Append(D_dim, ToDimTime(x; 123 × 3)) 10: Append(D_flat, Flatten(x)) ▷ [123 × 3]-length vector 11: Append(Y, game_mode_t) ▷ label = current game_mode 12: end for 13: R ← RandomSample( NonEventIndices(S)\E, |E| ) ▷ balancing with no-event samples 14: for each r ∈ R do 15: if r < 5 then continue 16: W ← [r − 5, r − 4, r − 3]; x ← ExtractFeatures(S, W); x ← ZScoreNormalize(x) 17: Append(D_dim, ToDimTime(x)); Append(D_flat, Flatten(x)); Append(Y, 0) 18: end for 19: end for 20: WriteTS(D_dim, headers); WriteTS(D_flat, headers); LogParseFailures() 21: return D_dim, D_flat, Y |
References
- Li, L.; Uttarapong, J.; Freeman, G.; Wohn, D.Y. Spontaneous, yet studious: Esports commentators’ live performance and self-presentation practices. Proc. ACM Hum.-Comput. Interact. 2020, 4, 103. [Google Scholar] [CrossRef]
- Baughman, A.; Morales, E.; Agarwal, R.; Akay, G.; Feris, R.; Johnson, T.; Hammer, S.; Karlinsky, L. Large scale generative AI text applied to sports and music. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Barcelona, Spain, 25–29 August 2024; pp. 4784–4792. [Google Scholar]
- Ma, S.; Cui, L.; Dai, D.; Wei, F.; Sun, X. Livebot: Generating live video comments based on visual and textual contexts. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 6810–6817. [Google Scholar]
- Rao, J.; Wu, H.; Liu, C.; Wang, Y.; Xie, W. MatchTime: Towards automatic soccer game commentary generation. arXiv 2024, arXiv:2406.18530. [Google Scholar] [CrossRef]
- Xu, J.H.; Cai, Y.; Fang, Z.; Paliyawan, P. Promoting mental well-being for audiences in a live-streaming game by highlight-based bullet comments. arXiv 2021, arXiv:2108.08083. [Google Scholar]
- Czaplicki, M. Live Commentary in a Football Video Game Generated by an AI. Mater’s Thesis, University of Twente, Enschede, The Netherlands, 2023. [Google Scholar]
- Kościołek, J. Enhancing live commentary generation in soccer video games through event prediction with machine learning methods. In Proceedings of the TScIT 41, Enschede, The Netherlands, 5 July 2024; pp. 1–8. [Google Scholar]
- Vatsal, S.; Dubey, H. A survey of prompt engineering methods in large language models for different NLP tasks. arXiv 2024, arXiv:2407.12994. [Google Scholar] [CrossRef]
- Nimpattanavong, C.; Taveekitworachai, P.; Khan, I.; Nguyen, T.V.; Thawonmas, R.; Choensawat, W.; Sookhanaphibarn, K. Am I fighting well? Fighting game commentary generation with ChatGPT. In Proceedings of the IAIT '23: Proceedings of the 13th International Conference on Advances in Information Technology, Bangkok, Thailand, 6–9 December 2023; pp. 1–7. [Google Scholar]
- Renella, N.; Eger, M. Towards automated video game commentary using generative AI. In Proceedings of the AIIDE Workshop on Experimental Artificial Intelligence in Games, Salt Lake City, UT, USA, 8 October 2023; pp. 1–9. [Google Scholar]
- Wang, Z.; Yoshinaga, N. From eSports data to game commentary: Datasets, models, and evaluation metrics. In Proceedings of the DEIM Forum, Tokyo, Japan, 1–3 March 2021; pp. 1–6. [Google Scholar]
- Li, C.; Gandhi, S.; Harrison, B. End-to-end Let’s Play commentary generation using multi-modal video representations. In Proceedings of the 14th International Conference on the Foundations of Digital Games, San Luis Obispo, CA, USA, 26–30 August 2019; pp. 1–7. [Google Scholar] [CrossRef]
- Kameko, H.; Mori, S.; Tsuruoka, Y. Learning a game commentary generator with grounded move expressions. In Proceedings of the 2015 IEEE Conference on Computational Intelligence and Games (CIG), Tainan, Taiwan, 31 August–2 September 2015; pp. 177–184. [Google Scholar]
- Zang, H.; Yu, Z.; Wan, X. Automated chess commentator powered by neural chess engine. arXiv 2019, arXiv:1909.10413. [Google Scholar] [CrossRef]
- Ishigaki, T.; Topic, G.; Hamazono, Y.; Noji, H.; Kobayashi, I.; Miyao, Y.; Takamura, H. Generating racing game commentary from vision, language, and structured data. In Proceedings of the 14th International Conference on Natural Language Generation, Scotland, UK, 20–24 September 2021; pp. 103–113. [Google Scholar]
- Mamoru, M.D.L.S.; Panditha, A.D.; Perera, W.A.S.S.J.; Ganegoda, G.U. Automated commentary generation based on FPS gameplay analysis. In Proceedings of the 7th International Conference on Information Technology Research (ICITR), Moratuwa, Sri Lanka, 7–9 December 2022; pp. 1–5. [Google Scholar] [CrossRef]
- Ishigaki, T.; Topic, G.; Hamazono, Y.; Kobayashi, I.; Miyao, Y.; Takamura, H. Audio commentary system for real-time racing game play. In Proceedings of the 16th International Natural Language Generation Conference: System Demonstrations, Prague, Czech, 11–15 September 2023; pp. 9–10. [Google Scholar]
- Stournaras, G. Generating Automatic Commentary in Video Games Using Large Language and Vision Language Models. Master’s Thesis, University of Twente, Enschede, The Netherlands, 2024. [Google Scholar]
- Kurach, K.; Raichuk, A.; Stańczyk, P.; Zając, M.; Bachem, O.; Espeholt, L.; Riquelme, C.; Vincent, D.; Michalski, M.; Bousquet, O.; et al. Google Research Football: A novel reinforcement learning environment. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 4501–4510. [Google Scholar]
- Schmidhuber, J.; Gagliolo, M.; Wierstra, D. Evolino for recurrent support vector machines. arXiv 2005, arXiv:cs/0512062. [Google Scholar] [CrossRef]
- Wunderlich, T. Gradient-Based Learning and Regularization in Spiking Neurons. Ph.D. Thesis, Technische Universität Berlin, Berlin, Germany, 2024. [Google Scholar]
- Kharoubi, R.; Mkhadri, A.; Oualkacha, K. High-dimensional penalized Bernstein support vector machines. arXiv 2023, arXiv:2303.09066. [Google Scholar] [CrossRef]
- Yan, W. Toward automatic time-series forecasting using neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2012, 23, 1028–1039. [Google Scholar]
- Muñoz-Zavala, A.E.; Macías-Díaz, J.E.; Alba-Cuéllar, D.; Guerrero-Díaz-de-León, J.A. A literature review on some trends in artificial neural networks for modeling and simulation with time series. Algorithms 2024, 17, 76. [Google Scholar] [CrossRef]
- Wang, Y.; Wu, H.; Dong, J.; Liu, Y.; Long, M.; Wang, J. Deep time series models: A comprehensive survey and benchmark. arXiv 2024, arXiv:2407.13278. [Google Scholar] [CrossRef]
- Petneházi, G. Recurrent neural networks for time series forecasting. arXiv 2019, arXiv:1901.00069. [Google Scholar] [CrossRef]
- Shi, J.; Jain, M.; Narasimhan, G. Time series forecasting (TSF) using various deep learning models. arXiv 2022, arXiv:2204.11115. [Google Scholar] [CrossRef]
- Bai, S.; Kolter, J.Z.; Koltun, V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv 2018, arXiv:1803.01271. [Google Scholar] [CrossRef]
- Wu, H.; Hu, T.; Liu, Y.; Zhou, H.; Wang, J.; Long, M. TimesNet: Temporal 2D-variation modeling for general time series analysis. arXiv 2022, arXiv:2210.02186. [Google Scholar]
- Liu, Y.; Wu, H.; Wang, J.; Long, M. Non-stationary transformers: Exploring the stationarity in time series forecasting. Adv. Neural Inf. Process. Syst. 2022, 35, 9881–9893. [Google Scholar]
- Dai, M.; Yuan, J.; Liu, H.; Wang, J. TSCF: An improved deep forest model for time series classification. Neural Process. Lett. 2024, 56, 13. [Google Scholar] [CrossRef]
- Chen, X.; Qiu, P.; Zhu, W.; Li, H.; Wang, H.; Sotiras, A.; Wang, Y.; Razi, A. TimeMIL: Advancing multivariate time series classification via a time-aware multiple instance learning. arXiv 2024, arXiv:2405.03140. [Google Scholar]
- Tang, W.; Long, G.; Liu, L.; Zhou, T.; Blumenstein, M.; Jiang, J. Omni-Scale CNNs: A simple and effective kernel size configuration for time series classification. arXiv 2020, arXiv:2002.10061. [Google Scholar]
- Narin, A. Performance comparison of balanced and unbalanced cancer datasets using pre-trained convolutional neural network. arXiv 2020, arXiv:2012.05585. [Google Scholar] [CrossRef]
- Wei, Q.; Dunbrack, R.L., Jr. The role of balanced training and testing data sets for binary classifiers in bioinformatics. PLoS ONE 2013, 8, e67863. [Google Scholar] [CrossRef]
- Velarde, G.; Weichert, M.; Deshmunkh, A.; Deshmane, S.; Sudhir, A.; Sharma, K.; Joshi, V. Tree boosting methods for balanced and imbalanced classification and their robustness over time in risk assessment. Intell. Syst. Appl. 2024, 22, 200354. [Google Scholar] [CrossRef]
- Aguilar-Ruiz, J.S.; Michalak, M. Classification performance assessment for imbalanced multiclass data. Sci. Rep. 2024, 14, 10759. [Google Scholar] [CrossRef]
- Ustuner, M.; Sanli, F.B.; Abdikan, S. Balanced vs imbalanced training data: Classifying RapidEye data with support vector machines. Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci. 2016, 41, 379–384. [Google Scholar] [CrossRef]
- Sen, P. Speech disfluencies occur at higher perplexities. In Proceedings of the Workshop on the Cognitive Aspects of the Lexicon, Online, 12 December 2020; pp. 92–97. [Google Scholar]
- Baevski, A.; Auli, M. Adaptive input representations for neural language modeling. arXiv 2018, arXiv:1809.10853. [Google Scholar]
- Toral, A.; Pecina, P.; Wang, L.; van Genabith, J. Linguistically-augmented perplexity-based data selection for language models. Comput. Speech Lang. 2015, 32, 11–26. [Google Scholar] [CrossRef]
- Li, J.; Galley, M.; Brockett, C.; Gao, J.; Dolan, B. A diversity-promoting objective function for neural conversation models. arXiv 2015, arXiv:1510.03055. [Google Scholar]
- Lu, X. The relationship of lexical richness to the quality of ESL learners’ oral narratives. Mod. Lang. J. 2012, 96, 190–208. [Google Scholar] [CrossRef]
- Zhu, Y.; Lu, S.; Zheng, L.; Guo, J.; Zhang, W.; Wang, J.; Yu, Y. Texygen: A benchmarking platform for text generation models. In Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, Ann Arbor, MI, USA, 8–12 July 2018; pp. 1097–1100. [Google Scholar]
- Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations Using RNN Encoder–Decoder for Statistical Machine Translation. arXiv 2014, arXiv:1406.1078. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is All You Need. In Proceedings of the 30th Conference on Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Ref. | Environment | Event Detection Method | Commentary Generation Method | Evaluation Method | Real-Time Commentary |
---|---|---|---|---|---|
Ishigaki et al. [15] | Racing Game | Softmax | LSTM | Auto + Human | × |
Ishigaki et al. [17] | API | BART | Human | √ | |
Renella and Eger [10] | League of Legends | EfficientDet | ChatGPT | Auto + Human | √ |
Wang and Yoshinaga [11] | API | Seq2Seq | Auto + Human | × | |
Nimpattanavong et al. [9] | Fighting Game | API | ChatGPT | Auto + Human | × |
Kameko et al. [13] | Shogi | Rule-based Parsing + External Knowledge Grounding | Log-linear Language Model | Auto + Human | × |
Mamoru et al. [16] | FPS | HUD Parsing | T5 Transformer | Auto + Human | × |
Li et al. [12] | Platformer | LSTM | Seq2Seq + Attention | Auto | × |
Zang et al. [14] | Chess | Neural Chess Engine (CNN) | LSTM + Attention | Auto + Human | × |
Ref. | Environment | Event Detection Method | Commentary Generation Method | Evaluation Method | Real-Time Commentary | Commentary Delay | Hybrid Model |
---|---|---|---|---|---|---|---|
Czaplicki [6] | Google Research Football [19] | API | GPT-3.5 | Auto + Human | × | √ | × |
Stournaras [18] | API | LLaVA-v1.5 | Auto | × | √ | × | |
Proposed Approach | OS-CNN | LLaMa3.3 | Auto | √ | × | √ |
Model Type | Long-Term Dependency Modeling | Inference Efficiency | Model Complexity | Real-Time Suitability | Key Feature |
---|---|---|---|---|---|
RNN | Moderate | Low | Low | × | Recurrent structure |
GRU | Good | Low | Medium | × | Gated memory cells |
Transformer | Excellent | Low | High | × | Self-attention |
OS-CNN | Good (multi-scale receptive fields) | High | Low | √ | Multi-scale CNN hierarchy |
Category | Info |
---|---|
Operating System | Ubuntu 22.04.3 LTS |
GPU | NVIDIA GeForce RTX 3090 |
Library Stack | PyTorch 2.4.1, CUDA 11.8, cuDNN 9 |
Large Language Model | LLaMA 3.3 |
Dataset Processing Time | ~150 h |
Event Prediction Training Time | ~3 days |
Commentary Generation Test Time | ~3 h |
Component | Value/Description |
---|---|
Model Layers | 3 OS-Blocks (each with multiple kernel branches), 1 final FC layer |
Prime Kernel Sizes | Automatically generated from start_kernel_size = 1 to Max_kernel_size = 3 |
Filters per Kernel | Automatically determined via generate_layer_parameter_list() based on parameter budget |
Layer Params | paramenter_number_of_layer_list = [1*4] → roughly four parameters per channel |
Activation | ReLU (after BatchNorm) |
Dropout | 0.3 (applied after ReLU) |
Pooling Layer | AdaptiveAvgPool1D (output size = 1) |
Loss Function | Class-Weighted Focal Loss (γ = 2.0), using log1p-scaled class weights clipped to [1.0, 10.0] to address severe class imbalance during event prediction. |
Optimizer | AdamW |
Learning Rate | 0.0005 |
Weight Decay | 1 × 10−6 |
Scheduler | CosineAnnealingLR (T_max = 2000, η_min = 1 × 10−5) |
Regularization | L2 regularization added (λ = 1 × 10−4) |
Gradient Clipping | max_norm = 5 |
Batch Size | Dynamic: min(samples/10, 16), minimum batch size = 2 |
Epochs | 100 |
Data Sampling | SMOTE for class balancing |
Early Stopping | Stop after five epochs with no F1 improvement |
Pretraining | Supports loading pretrained weights if available (optional) |
Dataset | Set Type | Matches | Event Type | Raw Event Count | Sampled Event Count |
---|---|---|---|---|---|
Balanced | Test | 1247 matches | No event (0) | 3,647,134 | 837 |
Goal kick (2) | 1658 | 837 | |||
Free kick (3) | 9025 | 837 | |||
Corner kick (4) | 837 | 837 | |||
Throw-in (5) | 2394 | 837 | |||
Train | No event (0) | 3,647,134 | 837 | ||
Goal kick (2) | 1658 | 837 | |||
Free kick (3) | 9025 | 837 | |||
Corner kick (4) | 837 | 837 | |||
Throw-in (5) | 2394 | 837 | |||
Imbalanced | Test | 100 matches | No event (0) | 291,255 | 291,255 |
Goal kick (2) | 133 | 291,255 | |||
Free kick (3) | 708 | 291,255 | |||
Corner kick (4) | 76 | 291,255 | |||
Throw-in (5) | 234 | 291,255 | |||
Train | No event (0) | 291,255 | 291,255 | ||
Goal kick (2) | 133 | 133 | |||
Free kick (3) | 708 | 708 | |||
Corner kick (4) | 76 | 76 | |||
Throw-in (5) | 234 | 234 |
Name | Description | Number |
---|---|---|
Ball | Position: [x, y, z] | 3 |
Ball Direction | Movement vector: [x, y, z] | 3 |
Ball Rotation | Rotation (in radians): [x, y, z] | 3 |
Ball Owned Team | Owning team | 1 |
Ball Owned Player | Player index | 1 |
Team | Positions: [x, y] | 44 |
Team Direction | Movement vectors: [x, y] | 44 |
Team Tired Factor | Fatigue levels (0 = energetic, 1 = exhausted) | 22 |
Team Yellow Card | Yellow cards count | 22 |
Team Active | Active status in game | 22 |
Team Roles | Roles (e.g., forward, defender) | 22 |
reward | The immediate reward received at the current time step, reflecting outcomes such as goals or intermediate progress (e.g., checkpoints). | 1 |
cumulative_reward | The accumulated reward over an episode or a given horizon, used to evaluate overall performance. | 1 |
Designated | Designated player index (e.g., ball owner) | 2 |
Sticky Actions | Current active actions (10-dimensional one-hot vector: move directions, sprint, dribble) | 286 |
Score | Score (number of goals) | 2 |
Steps Left | Remaining steps in match | 1 |
Game Mode | Current game mode (e.g., no event, kick-off, goal kick, free kick, corner, throw-in, penalty) | 1 |
Method | Goal Kick (2) | Free Kick (3) | Corner Kick (4) | Throw-In (5) | No-Event (0) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
P ↑ | R ↑ | F1 ↑ | P ↑ | R ↑ | F1 ↑ | P ↑ | R ↑ | F1 ↑ | P ↑ | R ↑ | F1 ↑ | P ↑ | R ↑ | F1 ↑ | |
Kościołek [7] | 0.54 | 0.53 | 0.53 | 0.50 | 0.49 | 0.49 | 0.56 | 0.59 | 0.58 | 0.63 | 0.65 | 0.64 | 0.4 | 0.38 | 0.39 |
GRU [45] | 0.72 | 0.70 | 0.71 | 0.74 | 0.73 | 0.73 | 0.71 | 0.77 | 0.74 | 0.93 | 0.88 | 0.91 | 0.69 | 0.70 | 0.69 |
Transformer [46] | 0.68 | 0.75 | 0.71 | 0.70 | 0.71 | 0.70 | 0.70 | 0.72 | 0.71 | 0.90 | 0.92 | 0.91 | 0.69 | 0.59 | 0.64 |
Proposed Approach | 0.76 | 0.68 | 0.72 | 0.69 | 0.73 | 0.71 | 0.68 | 0.83 | 0.75 | 0.88 | 0.96 | 0.92 | 0.76 | 0.56 | 0.64 |
Method | Goal Kick (2) (%) | Free Kick (3) (%) | Corner Kick (4) (%) | Throw-In (5) (%) | No-Event (0) (%) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
P ↑ | R ↑ | F1 ↑ | P ↑ | R ↑ | F1 ↑ | P ↑ | R ↑ | F1 ↑ | P ↑ | R ↑ | F1 ↑ | P ↑ | R ↑ | F1 ↑ | |
Kościołek [7] | 0.09 | 31.58 | 0.17 | 0.35 | 31.07 | 0.69 | 0.06 | 32.89 | 0.12 | 0.22 | 43.59 | 0.43 | 99.69 | 30.97 | 47.26 |
GRU [45] | 0.70 | 52.63 | 1.38 | 0.57 | 65.68 | 1.13 | 0.54 | 39.47 | 1.07 | 0.73 | 73.08 | 1.45 | 99.85 | 58.84 | 74.04 |
Transformer [46] | 0.85 | 53.38 | 1.68 | 0.39 | 85.17 | 0.78 | 0.57 | 35.53 | 1.12 | 0.76 | 80.77 | 1.50 | 99.89 | 34.28 | 51.04 |
Proposed Approach | 0.95 | 50.38 | 1.87 | 0.69 | 49.72 | 1.36 | 0.86 | 50.00 | 1.68 | 1.37 | 71.37 | 2.68 | 99.81 | 74.56 | 85.36 |
Dataset | Method | Model | Loss |
---|---|---|---|
Balanced | Kościołek [7] | SVM | 1.1738 |
Base Model [45,46] | GRU | 06046 | |
Transformer | 0.6381 | ||
Proposed Approach | OS-CNN | 0.6658 | |
Imbalanced | Kościołek [7] | SVM | 2.3568 |
Base Model [45,46] | GRU | 1.9026 | |
Transformer | 2.1953 | ||
Proposed Approach | OS-CNN | 0.2596 |
Dataset | Method | Model | Recall ↑ | Precision ↑ | F1-Score ↑ |
---|---|---|---|---|---|
Balanced | Kościołek [7] | SVM | 0.5286 | 0.5260 | 0.5271 |
Base Model [45,46] | GRU | 0.7539 | 0.7562 | 0.7547 | |
Transformer | 0.7359 | 0.7354 | 0.7344 | ||
Proposed Approach | OS-CNN | 0.7531 | 0.7515 | 0.7470 | |
Imbalanced | Kościołek [7] | SVM | 0.3099 | 0.9930 | 0.4708 |
Base Model [45,46] | GRU | 0.5886 | 0.9946 | 0.7376 | |
Transformer | 0.3445 | 0.9950 | 0.5085 | ||
Proposed Approach | OS-CNN | 0.7448 | 0.9942 | 0.8503 |
Event Type | Prompt |
---|---|
Opening Commentary | You are a football commentator, who is getting information about a football match at Santiago Bernabeu between Real Madrid and Manchester United. I will give you info about the match, and you will write commentary. Be truthful and concise, max two sentences per turn. Only add information about teams, players, or goals that I have specified. You may use as much of my input as required in the output, use what you need to sound natural. Avoid repeating yourself too often. Do not mention the commentary itself. |
Goal Event | In [time] minute, there is a goal for [team], scored by [player], the actual score is [score]. |
Corner Kick Event | In [time] minute, the ball is kicked on corner by [player], [team] will have a chance. |
Free Kick Event | In [time] minute, there is a free kick for [team] due to a foul on [player]. |
Throw-in Event | In [time] minute, the ball was kicked out from the field by [player] from [team]. |
Goal Kick Event | In [time] minute, the ball is out of play, and a goal kick is awarded to [team]. |
Match Summary | It is the end of the match, the score is [score], summarize the match. |
Event Type | Prompt |
---|---|
Goal Event | At minute {timestep}, {player} from {team} scored a goal in the match against {benefiting_team}. Write a vivid and expressive commentary using a {chosen_emotion} tone. Clearly describe how this goal shifts the momentum or atmosphere of the match without exaggeration or subjective judgment. Mention the updated score: {final_score}. Keep the output under 75 words. |
Non-goal Event | At minute {timestep}, {player} from {team} was involved in a sequence that led to a {event_type} awarded to {benefiting_team}. Write a vivid and expressive commentary using a {chosen_emotion} tone. Describe how this shift impacts the match’s momentum or atmosphere, without explicitly mentioning any scoring or direct error. Keep the output under 75 words. |
Method | Perplexity ↑ | Distinct-1 ↑ | Distinct-2 ↑ | Lexical Diversity ↑ | Self-BLEU ↑ | Time (s) ↑ |
---|---|---|---|---|---|---|
Czaplicki [6] | 17.36 | 0.29 | 0.56 | 0.29 | 0.53 | 3.89 |
Proposed Approach | 52.95 | 0.45 | 0.74 | 0.45 | 0.39 | 3.08 |
Event No. | Event Type | Percentage of Human Votes Marking as ‘Appropriate’ | |||
---|---|---|---|---|---|
OS-CNN | Transformer | GRU | SVM | ||
1 | Throw-in (16 min) | 100% | 30% | 10% | 0% |
2 | Free kick (19 min) | 70% | 60% | 30% | 10% |
3 | Goal kick (34 min) | 60% | 40% | 40% | 0% |
4 | Goal kick (45 min) | 100% | 40% | 10% | 0% |
5 | Free kick (65 min) | 100% | 10% | 20% | 10% |
6 | Free kick (68 min) | 80% | 50% | 0% | 10% |
7 | Corner kick (71 min) | 100% | 0% | 0% | 0% |
8 | Free kick (75 min) | 90% | 30% | 10% | 0% |
9 | Free kick (81 min) | 100% | 30% | 10% | 0% |
10 | Throw-in (88 min) | 50% | 20% | 10% | 0% |
Average | 85% | 31% | 14% | 3% |
Component | Description | Time per Sample (s or ms) |
---|---|---|
Data Processing | Data loading, formatting, feature scaling | 0.131 s |
Event Prediction | GRU inference | 0.1987 ms |
Transformer inference | 0.5195 ms | |
OS-CNN inference | 0.1909 ms | |
LLM Commentary Generation | LLaMA-based text generation | 3.080 s |
Total Latency | Preprocessing + OS-CNN + LLM | 3.156 s |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sheng, X.; Yu, A.; Zhang, M.; An, G.; Park, J.; Cho, K. Integrating Temporal Event Prediction and Large Language Models for Automatic Commentary Generation in Video Games. Mathematics 2025, 13, 2738. https://doi.org/10.3390/math13172738
Sheng X, Yu A, Zhang M, An G, Park J, Cho K. Integrating Temporal Event Prediction and Large Language Models for Automatic Commentary Generation in Video Games. Mathematics. 2025; 13(17):2738. https://doi.org/10.3390/math13172738
Chicago/Turabian StyleSheng, Xuanyu, Aihe Yu, Mingfeng Zhang, Gayoung An, Jisun Park, and Kyungeun Cho. 2025. "Integrating Temporal Event Prediction and Large Language Models for Automatic Commentary Generation in Video Games" Mathematics 13, no. 17: 2738. https://doi.org/10.3390/math13172738
APA StyleSheng, X., Yu, A., Zhang, M., An, G., Park, J., & Cho, K. (2025). Integrating Temporal Event Prediction and Large Language Models for Automatic Commentary Generation in Video Games. Mathematics, 13(17), 2738. https://doi.org/10.3390/math13172738