Modeling Viewing Engagement in Long-Form Video Through the Lens of Expectation-Confirmation Theory
Abstract
1. Introduction
2. Literature Review
2.1. Long-Form Video Recommendation Systems
2.2. Factors Influencing Viewing Engagement
2.3. Expectation-Confirmation Theory
3. Research Design
3.1. Problem Statement
3.2. Measurement of Expectation
3.3. Perceived Experience Construction
3.4. Viewing Engagement Prediction
| Algorithm 1: Forward Propagation algorithm in LVVEP |
| Input: sequence of viewing records, ; sequence of viewing engagement, ; sequence of video plot, ; complete set of long-form videos, ; target video, ; block number, ; environment, ; user demographic features, ; video features, . Output: predicted engagement for the target video , |
| 1: |
| 2: # Step 1: Encode historical plots and target video plot |
| 3: |
| 4: |
| 5: # Step 2: Compute semantic similarity between historical and target plots |
| 6: |
| 7: # Step 3: Estimate user expectation on target video |
| 8: |
| 9: |
| 10: |
| 11: # Step 4: Extract dynamic user preference via attention-based GRU |
| 12: Initialize |
| 13: for to do |
| 14: |
| 15: |
| 16: 17: |
| 18: |
| 19: end for |
| 20: |
| 21: # Step 5: Construct Matching degree between user preference and plot semantics |
| 22: 23: for to do |
| 24: 25: 26: 27: end for |
| 28: 29: |
| 30: # Step 6: Predict engagement via confirmation model |
| 31: |
| 32: return |
| 33: end function |
4. Experimental Design
4.1. Experimental Data


4.2. Evaluation Metrics
4.3. Baselines and Experimental Environment
5. Experimental Results
5.1. Main Experimental Results
5.2. Prediction Performance
5.3. Recommendation Performance
5.4. Ablation Study
6. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| ECT | Expectation-Confirmation Theory |
| LVVEP | Long-Form Video Viewing Engagement Prediction |
| GRU | Gated Recurrent Unit |
| BERT | Bidirectional Encoder Representations from Transformers |
References
- Batmaz, Z.; Yurekli, A.; Bilge, A.; Kaleli, C. A review on deep learning for recommender systems: Challenges and remedies. Artif. Intell. Rev. 2019, 52, 1–37. [Google Scholar] [CrossRef]
- McKenzie, J. The economics of movies (revisited): A survey of recent literature. J. Econ. Surv. 2023, 37, 480–525. [Google Scholar] [CrossRef]
- Chen, L.; Zhou, Y.; Chiu, D.M. A study of user behavior in online VoD services. Comput. Commun. 2014, 46, 66–75. [Google Scholar] [CrossRef]
- Mäntymäki, M.; Islam, A.N.; Benbasat, I. What drives subscribing to premium in freemium services? A consumer value-based view of differences between upgrading to and staying with premium. Inf. Syst. J. 2020, 30, 295–333. [Google Scholar] [CrossRef]
- Lin, X.; Chen, X.; Song, L.; Liu, J.; Li, B.; Jiang, P. Tree based progressive regression model for watch-time prediction in short-video recommendation. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Long Beach, CA, USA, 6–10 August 2023. [Google Scholar]
- Zhan, R.; Pei, C.; Su, Q.; Wen, J.; Wang, X.; Mu, G.; Zheng, D.; Jiang, P.; Gai, K. Deconfounding duration bias in watch-time prediction for video recommendation. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 14–18 August 2022. [Google Scholar]
- Balachandran, A.; Sekar, V.; Akella, A.; Seshan, S.; Stoica, I.; Zhang, H. Developing a predictive model of quality of experience for internet video. ACM SIGCOMM Comput. Commun. Rev. 2013, 43, 339–350. [Google Scholar] [CrossRef]
- Bhattacherjee, A. Understanding information systems continuance: An expectation-confirmation model. MIS Q. 2001, 25, 351–370. [Google Scholar] [CrossRef]
- Lin, T.-C.; Wu, S.; Hsu, J.S.-C.; Chou, Y.-C. The integration of value-based adoption and expectation–confirmation models: An example of IPTV continuance intention. Decis. Support Syst. 2012, 54, 63–75. [Google Scholar] [CrossRef]
- Subramaniyaswamy, V.; Logesh, R.; Chandrashekhar, M.; Challa, A.; Vijayakumar, V. A personalised movie recommendation system based on collaborative filtering. Int. J. High Perform. Comput. Netw. 2017, 10, 54–63. [Google Scholar] [CrossRef]
- Gupta, M.; Thakkar, A.; Gupta, V.; Rathore, D.P.S. Movie recommender system using collaborative filtering. In Proceedings of the 2020 International Conference on Electronics and Sustainable Communication Systems (ICESC), Coimbatore, India, 2–4 July 2020. [Google Scholar]
- Koren, Y.; Bell, R.; Volinsky, C. Matrix factorization techniques for recommender systems. Computer 2009, 42, 30–37. [Google Scholar] [CrossRef]
- Sahoo, N.; Krishnan, R.; Duncan, G.; Callan, J. Research note—The halo effect in multicomponent ratings and its implications for recommender systems: The case of yahoo! movies. Inf. Syst. Res. 2012, 23, 231–246. [Google Scholar] [CrossRef]
- Son, J.; Kim, S.B. Content-based filtering for recommendation systems using multiattribute networks. Expert Syst. Appl. 2017, 89, 404–412. [Google Scholar] [CrossRef]
- Shi, Y.; Larson, M.; Hanjalic, A. Mining contextual movie similarity with matrix factorization for context-aware recommendation. ACM Trans. Intell. Syst. Technol. (TIST) 2013, 4, 1–19. [Google Scholar] [CrossRef]
- Lu, W.; Chung, F.-L.; Jiang, W.; Ester, M.; Liu, W. A deep Bayesian tensor-based system for video recommendation. ACM Trans. Inf. Syst. (TOIS) 2018, 37, 1–22. [Google Scholar] [CrossRef]
- Zhou, G.; Mou, N.; Fan, Y.; Pi, Q.; Bian, W.; Zhou, C.; Zhu, X.; Gai, K. Deep interest evolution network for click-through rate prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019. [Google Scholar]
- Jiao, J.; Zhang, X.; Li, F.; Wang, Y. A novel learning rate function and its application on the SVD++ recommendation algorithm. IEEE Access 2019, 8, 14112–14122. [Google Scholar] [CrossRef]
- Zhao, H.; Cai, G.; Zhu, J.; Dong, Z.; Xu, J.; Wen, J.-R. Counteracting Duration Bias in Video Recommendation via Counterfactual Watch Time. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Barcelona, Spain, 25–29 August 2024. [Google Scholar]
- Dobrian, F.; Sekar, V.; Awan, A.; Stoica, I.; Joseph, D.; Ganjam, A.; Zhan, J.; Zhang, H. Understanding the impact of video quality on user engagement. ACM SIGCOMM Comput. Commun. Rev. 2011, 41, 362–373. [Google Scholar] [CrossRef]
- Yang, M.; Wang, S.; Calheiros, R.N.; Yang, F. Survey on QoE assessment approach for network service. IEEE Access 2018, 6, 48374–48390. [Google Scholar] [CrossRef]
- Yu, H.; Zheng, D.; Zhao, B.Y.; Zheng, W. Understanding user behavior in large-scale video-on-demand systems. ACM SIGOPS Oper. Syst. Rev. 2006, 40, 333–344. [Google Scholar] [CrossRef]
- Laiche, F.; Ben Letaifa, A.; Elloumi, I.; Aguili, T. When machine learning algorithms meet user engagement parameters to predict video QoE. Wirel. Pers. Commun. 2021, 116, 2723–2741. [Google Scholar] [CrossRef]
- Green, M.C.; Brock, T.C. The role of transportation in the persuasiveness of public narratives. J. Personal. Soc. Psychol. 2000, 79, 701. [Google Scholar] [CrossRef]
- Hasson, U.; Landesman, O.; Knappmeyer, B.; Vallines, I.; Rubin, N.; Heeger, D.J. Neurocinematics: The neuroscience of film. Projections 2008, 2, 1–26. [Google Scholar] [CrossRef]
- Chen, C.-M.; Wu, C.-H. Effects of different video lecture types on sustained attention, emotion, cognitive load, and learning performance. Comput. Educ. 2015, 80, 108–121. [Google Scholar] [CrossRef]
- Just, M.A.; Keller, T.A.; Cynkar, J. A decrease in brain activation associated with driving when listening to someone speak. Brain Res. 2008, 1205, 70–80. [Google Scholar] [CrossRef]
- Cohen, J. Audience identification with media characters. In Psychology of Entertainment; Routledge: London, UK, 2013; pp. 183–197. [Google Scholar]
- Oliver, R.L. A cognitive model of the antecedents and consequences of satisfaction decisions. J. Mark. Res. 1980, 17, 460–469. [Google Scholar] [CrossRef]
- Brown, S.A.; Venkatesh, V.; Goyal, S. Expectation confirmation in information systems research. MIS Q. 2014, 38, 729–756. [Google Scholar] [CrossRef]
- Yang, T.; Yang, F.; Men, J. Recommendation content matters! Exploring the impact of the recommendation content on consumer decisions from the means-end chain perspective. Int. J. Inf. Manag. 2023, 68, 102589. [Google Scholar] [CrossRef]
- Fu, X.-m.; Zhang, J.-h.; Chan, F.T. Determinants of loyalty to public transit: A model integrating Satisfaction-Loyalty Theory and Expectation-Confirmation Theory. Transp. Res. Part A Policy Pract. 2018, 113, 476–490. [Google Scholar] [CrossRef]
- Yang, T.; Yang, F.; Men, J. Understanding consumers’ continuance intention toward recommendation vlogs: An exploration based on the dual-congruity theory and expectation-confirmation theory. Electron. Commer. Res. Appl. 2023, 59, 101270. [Google Scholar] [CrossRef]
- Duanmu, Z.; Ma, K.; Wang, Z. Quality-of-experience for adaptive streaming videos: An expectation confirmation theory motivated approach. IEEE Trans. Image Process. 2018, 27, 6135–6146. [Google Scholar] [CrossRef]
- Devlin, J. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- GLM, T.; Zeng, A.; Xu, B.; Wang, B.; Zhang, C.; Yin, D.; Zhang, D.; Rojas, D.; Feng, G.; Zhao, H. Chatglm: A family of large language models from glm-130b to glm-4 all tools. arXiv 2024, arXiv:2406.12793. [Google Scholar]
- Wu, C.; Wu, F.; Huang, Y. Rethinking infonce: How many negative samples do you need? arXiv 2021, arXiv:2105.13003. [Google Scholar] [CrossRef]
- Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar] [CrossRef]
- Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar]
- Bampis, C.G.; Li, Z.; Katsavounidis, I.; Bovik, A.C. Recurrent and dynamic models for predicting streaming video quality of experience. IEEE Trans. Image Process. 2018, 27, 3316–3331. [Google Scholar] [CrossRef]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016. [Google Scholar]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. Lightgbm: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 2017, 30, 3149–3157. Available online: https://dl.acm.org/doi/10.5555/3294996.3295074 (accessed on 10 October 2025).
- Guo, H.; Tang, R.; Ye, Y.; Li, Z.; He, X. DeepFM: A factorization-machine based neural network for CTR prediction. arXiv 2017, arXiv:1703.04247. [Google Scholar]
- Lu, W.; Yu, Y.; Chang, Y.; Wang, Z.; Li, C.; Yuan, B. A dual input-aware factorization machine for CTR prediction. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, Yokohama, Japan, 7–15 January 2021. [Google Scholar]
- Hidasi, B. Session-based Recommendations with Recurrent Neural Networks. arXiv 2015, arXiv:1511.06939. [Google Scholar]
- Zhou, G.; Zhu, X.; Song, C.; Fan, Y.; Zhu, H.; Ma, X.; Yan, Y.; Jin, J.; Li, H.; Gai, K. Deep interest network for click-through rate prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018. [Google Scholar]
- Yu, Z.; Lian, J.; Mahmoody, A.; Liu, G.; Xie, X. Adaptive User Modeling with Long and Short-Term Preferences for Personalized Recommendation. In Proceedings of the IJCAI, Macao, China, 10–16 August 2019. [Google Scholar]
- Chen, B.; Wang, Y.; Liu, Z.; Tang, R.; Guo, W.; Zheng, H.; Yao, W.; Zhang, M.; He, X. Enhancing explicit and implicit feature interactions via information sharing for parallel deep CTR models. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Gold Coast, Australia, 1–5 November 2021. [Google Scholar]

| Movie Dataset | TV Series Dataset | |||
|---|---|---|---|---|
| MAE | RMSE | MAE | RMSE | |
| RF | 0.343 | 0.473 | 0.432 | 0.549 |
| XGBOOST | 0.320 | 0.394 | 0.464 | 0.582 |
| LIGHTGBM | 0.294 | 0.368 | 0.358 | 0.426 |
| DEEPFM | 0.252 | 0.341 | 0.375 | 0.453 |
| GRU4Rec | 0.262 | 0.332 | 0.324 | 0.394 |
| DIN | 0.251 | 0.344 | 0.317 | 0.403 |
| DIEN | 0.255 | 0.337 | 0.304 | 0.392 |
| BST | 0.216 | 0.279 | 0.222 | 0.298 |
| DIFM | 0.253 | 0.296 | 0.307 | 0.412 |
| EDCN | 0.262 | 0.353 | 0.313 | 0.402 |
| LVVEP | 0.187 | 0.246 | 0.203 | 0.282 |
| Movie Dataset | TV Series Dataset | |||||||
|---|---|---|---|---|---|---|---|---|
| Accuracy | Precision | Recall | F1 | Accuracy | Precision | Recall | F1 | |
| RF | 0.19 | 0.15 | 0.18 | 0.16 | 0.15 | 0.12 | 0.13 | 0.12 |
| XGBOOST | 0.25 | 0.20 | 0.23 | 0.21 | 0.25 | 0.22 | 0.24 | 0.23 |
| LIGHTGBM | 0.26 | 0.22 | 0.22 | 0.22 | 0.26 | 0.23 | 0.20 | 0.21 |
| DEEPFM | 0.25 | 0.23 | 0.25 | 0.24 | 0.25 | 0.22 | 0.24 | 0.23 |
| GRU4Rec | 0.27 | 0.27 | 0.28 | 0.27 | 0.26 | 0.26 | 0.25 | 0.25 |
| DIN | 0.26 | 0.29 | 0.28 | 0.28 | 0.26 | 0.27 | 0.27 | 0.27 |
| DIEN | 0.28 | 0.26 | 0.27 | 0.26 | 0.25 | 0.26 | 0.25 | 0.25 |
| BST | 0.32 | 0.33 | 0.33 | 0.33 | 0.32 | 0.33 | 0.30 | 0.31 |
| DIFM | 0.30 | 0.29 | 0.30 | 0.29 | 0.30 | 0.29 | 0.29 | 0.29 |
| EDCN | 0.24 | 0.25 | 0.20 | 0.22 | 0.26 | 0.25 | 0.21 | 0.23 |
| LVVEP | 0.41 | 0.37 | 0.40 | 0.38 | 0.40 | 0.36 | 0.39 | 0.37 |
| Movie Dataset | TV Series Dataset | |||||
|---|---|---|---|---|---|---|
| HitRate@5 | HitRate@10 | HitRate@15 | HitRate@5 | HitRate@10 | HitRate@15 | |
| RF | 0.05 | 0.08 | 0.15 | 0.03 | 0.06 | 0.1 |
| XGBOOST | 0.06 | 0.08 | 0.14 | 0.04 | 0.08 | 0.11 |
| LIGHTGBM | 0.08 | 0.12 | 0.15 | 0.04 | 0.1 | 0.13 |
| DEEPFM | 0.09 | 0.14 | 0.21 | 0.05 | 0.12 | 0.14 |
| GRU4Rec | 0.08 | 0.15 | 0.21 | 0.09 | 0.12 | 0.15 |
| DIN | 0.12 | 0.15 | 0.24 | 0.08 | 0.15 | 0.18 |
| DIEN | 0.12 | 0.14 | 0.23 | 0.08 | 0.16 | 0.2 |
| BST | 0.14 | 0.19 | 0.26 | 0.11 | 0.16 | 0.24 |
| DIFM | 0.13 | 0.16 | 0.22 | 0.07 | 0.14 | 0.19 |
| EDCN | 0.13 | 0.17 | 0.24 | 0.08 | 0.15 | 0.22 |
| LVVEP | 0.15 | 0.21 | 0.29 | 0.11 | 0.18 | 0.28 |
| Movie Dataset | TV Series Dataset | |||||||
|---|---|---|---|---|---|---|---|---|
| Accuracy | Precision | Recall | F1 | Accuracy | Precision | Recall | F1 | |
| (0) LVVEP | 0.41 | 0.37 | 0.40 | 0.38 | 0.40 | 0.36 | 0.39 | 0.37 |
| (1) Remove Video Content | 0.27 | 0.29 | 0.26 | 0.27 | 0.26 | 0.22 | 0.24 | 0.23 |
| (2) Replace content similarity with label | 0.36 | 0.32 | 0.33 | 0.32 | 0.35 | 0.31 | 0.30 | 0.30 |
| (3) Remove Expectation | 0.31 | 0.28 | 0.30 | 0.29 | 0.32 | 0.33 | 0.29 | 0.31 |
| (4) Replace GRU With LSTM | 0.42 | 0.37 | 0.37 | 0.37 | 0.39 | 0.39 | 0.38 | 0.38 |
| (5) Remove Perceived Experience | 0.17 | 0.19 | 0.11 | 0.14 | 0.11 | 0.12 | 0.07 | 0.09 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, Y.; Zhang, J. Modeling Viewing Engagement in Long-Form Video Through the Lens of Expectation-Confirmation Theory. Appl. Sci. 2025, 15, 11252. https://doi.org/10.3390/app152011252
Chen Y, Zhang J. Modeling Viewing Engagement in Long-Form Video Through the Lens of Expectation-Confirmation Theory. Applied Sciences. 2025; 15(20):11252. https://doi.org/10.3390/app152011252
Chicago/Turabian StyleChen, Yingjie, and Jin Zhang. 2025. "Modeling Viewing Engagement in Long-Form Video Through the Lens of Expectation-Confirmation Theory" Applied Sciences 15, no. 20: 11252. https://doi.org/10.3390/app152011252
APA StyleChen, Y., & Zhang, J. (2025). Modeling Viewing Engagement in Long-Form Video Through the Lens of Expectation-Confirmation Theory. Applied Sciences, 15(20), 11252. https://doi.org/10.3390/app152011252

