VFGF: Virtual Frame-Augmented Guided Prediction Framework for Long-Term Egocentric Activity Forecasting
Abstract
1. Introduction
2. Related Work
2.1. Egocentric Action Recognition
2.2. Egocentric Video Prediction
2.3. Summary
3. Proposed Method
3.1. Visual Semantic Preprocessing
3.2. Recurrent Sequence Prediction
3.3. Feature Guidance Module
3.4. Training Objective Function
4. Experimental Results
4.1. Dataset
4.2. Compared Methods
4.3. Experimental Setting
4.4. Comparison with Other Methods
4.5. Ablation Experiments
4.5.1. Module Effectiveness
4.5.2. Selection of Virtual Frame Threshold
4.6. Parameters and FLOPs
4.7. Generalization Experiments
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
VFGF | Virtual Frame-Augmented Guided Prediction Framework |
LTA | Long-term Action Anticipation |
FGM | Feature Guidance Module |
GRU | Gated Recurrent Unit |
VR | Virtual Reality |
RGB | Red, Green, Blue |
LSTM | Long Short-Term Memory |
H3M | Hierarchical Multitask MLP Mixer Model |
I-CVAE | Intention-Conditioned Variational Autoencoder |
CDFSL | Cross-Domain Few-Shot Learning |
TSN | Temporal Segment Network |
VFA | Virtual Frame Augmentation |
References
- Koppula, H.S.; Saxena, A. Anticipating Human Activities Using Object Affordances for Reactive Robotic Response. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 14–29. [Google Scholar] [CrossRef] [PubMed]
- Damen, D.; Doughty, H.; Farinella, G.M.; Fidler, S.; Furnari, A.; Kazakos, E.; Moltisanti, D.; Munro, J.; Perrett, T.; Price, W.; et al. Scaling Egocentric Vision: The EPIC-KITCHENS Dataset. arXiv 2018, arXiv:1804.02748. [Google Scholar] [CrossRef]
- Ji, Y.; Yang, Y.; Shen, F.; Shen, H.T.; Li, X. A Survey of Human Action Analysis in HRI Applications. IEEE Trans. Circuits Syst. Video Technol. 2020, 30, 2114–2128. [Google Scholar] [CrossRef]
- Xing, Y.; Golodetz, S.; Everitt, A.; Markham, A.; Trigoni, N. Multiscale Human Activity Recognition and Anticipation Network. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 451–465. [Google Scholar] [CrossRef]
- Ma, Y.; Zhu, X.; Zhang, S.; Yang, R.; Wang, W.; Manocha, D. TrafficPredict: Trajectory Prediction for Heterogeneous Traffic-Agents. Proc. Conf. AAAI Artif. Intell. 2019, 33, 6120–6127. [Google Scholar] [CrossRef]
- Rasouli, A.; Kotseruba, I.; Tsotsos, J.K. Pedestrian Action Anticipation Using Contextual Feature Fusion in Stacked RNNs. arXiv 2020, arXiv:2005.06582. [Google Scholar] [CrossRef]
- Grauman, K.; Westbury, A.; Byrne, E.; Cartillier, V.; Chavis, Z.; Furnari, A.; Girdhar, R.; Hamburger, J.; Jiang, H.; Kukreja, D.; et al. Ego4D: Around the World in 3,000 Hours of Egocentric Video. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 1–32. [Google Scholar] [CrossRef]
- Li, Y.; Liu, M.; Rehg, J.M. In the Eye of the Beholder: Gaze and Actions in First Person Video. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 6731–6747. [Google Scholar] [CrossRef] [PubMed]
- Stein, S.; McKenna, S.J. Combining Embedded Accelerometers with Computer Vision for Recognizing Food Preparation Activities. In Proceedings of the 2013 ACM International Joint Conference on Pervasive and Ubiquitous Computing, New York, NY, USA, 8–12 September 2013; pp. 729–738. [Google Scholar] [CrossRef]
- Kuehne, H.; Arslan, A.B.; Serre, T. The Language of Actions: Recovering the Syntax and Semantics of Goal-Directed Human Activities. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 780–787. [Google Scholar] [CrossRef]
- Farha, Y.A.; Richard, A.; Gall, J. When Will You Do What?—Anticipating Temporal Occurrences of Activities. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 5343–5352. [Google Scholar] [CrossRef]
- Furnari, A.; Farinella, G.M. Rolling-Unrolling LSTMs for Action Anticipation from First-Person Video. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 4021–4036. [Google Scholar] [CrossRef]
- Tai, T.-M.; Fiameni, G.; Lee, C.-K.; See, S.; Lanz, O. Unified Recurrence Modeling for Video Action Anticipation. In Proceedings of the 26th International Conference on Pattern Recognition (ICPR), Montréal, QC, Canada, 21–25 August 2022; pp. 3273–3279. [Google Scholar] [CrossRef]
- Qi, Z.; Wang, S.; Su, C.; Su, L.; Huang, Q.; Tian, Q. Self-Regulated Learning for Egocentric Video Activity Anticipation. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 6715–6730. [Google Scholar] [CrossRef]
- Moniruzzaman, M.; Yin, Z.; He, Z.; Leu, M.C.; Qin, R. Jointly-Learnt Networks for Future Action Anticipation via Self-Knowledge Distillation and Cycle Consistency. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 3243–3256. [Google Scholar] [CrossRef]
- Liu, T.; Lam, K.-M. A Hybrid Egocentric Activity Anticipation Framework via Memory-Augmented Recurrent and One-Shot Representation Forecasting. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2022; pp. 13894–13903. [Google Scholar] [CrossRef]
- Mascaro, E.V.; Ahn, H.; Lee, D. Intention-Conditioned Long-Term Human Egocentric Action Anticipation. In Proceedings of the 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–7 January 2023; pp. 6037–6046. [Google Scholar] [CrossRef]
- Ma, Z.; Zhang, F.; Nan, Z.; Ge, Y. Intention Action Anticipation Model with Guide-Feedback Loop Mechanism. Knowl. Based Syst. 2024, 292, 111626. [Google Scholar] [CrossRef]
- Zhang, C.; Fu, C.; Wang, S.; Agarwal, N.; Lee, K.; Choi, C.; Sun, C. Object-Centric Video Representation for Long-Term Action Anticipation. In Proceedings of the 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 4–8 January 2024; pp. 6737–6747. [Google Scholar] [CrossRef]
- Kim, S.; Huang, D.; Xian, Y.; Hilliges, O.; Van Gool, L.; Wang, X. PALM: Predicting Actions through Language Models. In Lecture Notes in Computer Science; Springer Nature: Cham, Switzerland, 2025; pp. 140–158. [Google Scholar] [CrossRef]
- Cao, C.; Sun, Z.; Lv, Q.; Min, L.; Zhang, Y. VS-TransGRU: A Novel Transformer-GRU-Based Framework Enhanced by Visual-Semantic Fusion for Egocentric Action Anticipation. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 11605–11618. [Google Scholar] [CrossRef]
- Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar] [CrossRef]
- Spriggs, E.H.; De La Torre, F.; Hebert, M. Temporal Segmentation and Activity Classification from First-Person Sensing. In Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Miami Beach, FL, USA, 20–25 June 2009; pp. 17–24. [Google Scholar] [CrossRef]
- Weng, J.; Jiang, X.; Zheng, W.-L.; Yuan, J. Early Action Recognition with Category Exclusion Using Policy-Based Reinforcement Learning. IEEE Trans. Circuits Syst. Video Technol. 2020, 30, 4626–4638. [Google Scholar] [CrossRef]
- Liu, Z.; Ning, J.; Cao, Y.; Wei, Y.; Zhang, Z.; Lin, S.; Hu, H. Video Swin Transformer. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2022; pp. 3192–3201. [Google Scholar] [CrossRef]
- Ou, Y.; Mi, L.; Chen, Z. Object-Relation Reasoning Graph for Action Recognition. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2022; pp. 20101–20110. [Google Scholar] [CrossRef]
- Wang, X.; Hu, J.-F.; Lai, J.-H.; Zhang, J.; Zheng, W.-S. Progressive Teacher-Student Learning for Early Action Prediction. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 3551–3560. [Google Scholar] [CrossRef]
- Liu, T.; Zhao, R.; Jia, W.; Lam, K.-M.; Kong, J. Holistic-Guided Disentangled Learning with Cross-Video Semantics Mining for Concurrent First-Person and Third-Person Activity Recognition. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 5211–5225. [Google Scholar] [CrossRef]
- Li, H.; Zheng, W.-S.; Zhang, J.; Hu, H.; Lu, J.; Lai, J.-H. Egocentric Action Recognition by Automatic Relation Modeling. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 489–507. [Google Scholar] [CrossRef]
- Xaviar, S.; Yang, X.; Ardakanian, O. Centaur: Robust Multimodal Fusion for Human Activity Recognition. IEEE Sens. J. 2024, 24, 18578–18591. [Google Scholar] [CrossRef]
- Hatano, M.; Hachiuma, R.; Fujii, R.; Saito, H. Multimodal Cross-Domain Few-Shot Learning for Egocentric Action Recognition. In Lecture Notes in Computer Science; Springer Nature: Cham, Switzerland, 2025; pp. 182–199. [Google Scholar] [CrossRef]
- Wang, H.; Yang, J.; Yu, B.; Zhan, Y.; Tao, D.; Ling, H. Distilling Interaction Knowledge for Semi-Supervised Egocentric Action Recognition. Pattern Recognit. 2025, 157, 110927. [Google Scholar] [CrossRef]
- Ke, Q.; Fritz, M.; Schiele, B. Time-Conditioned Action Anticipation in One Shot. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 9917–9926. [Google Scholar] [CrossRef]
- Lee, S.; Kim, H.G.; Hwi Choi, D.; Kim, H.-I.; Ro, Y.M. Video Prediction Recalling Long-Term Motion Context via Memory Alignment Learning. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual Event, 19–25 June 2021; pp. 3053–3062. [Google Scholar] [CrossRef]
- Nawhal, M.; Jyothi, A.A.; Mori, G. Rethinking Learning Approaches for Long-Term Action Anticipation. In Lecture Notes in Computer Science; Springer Nature: Cham, Switzerland, 2022; pp. 558–576. [Google Scholar] [CrossRef]
- Gong, D.; Lee, J.; Kim, M.; Ha, S.J.; Cho, M. Future Transformer for Long-Term Action Anticipation. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2022; pp. 3042–3051. [Google Scholar] [CrossRef]
- Wang, L.; Xiong, Y.; Wang, Z.; Qiao, Y.; Lin, D.; Tang, X.; Van Gool, L. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition. In Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2016; pp. 20–36. [Google Scholar] [CrossRef]
- Carreira, J.; Zisserman, A. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 4724–4733. [Google Scholar] [CrossRef]
- Jiang, H.; Sun, D.; Jampani, V.; Yang, M.-H.; Learned-Miller, E.; Kautz, J. Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 9000–9008. [Google Scholar] [CrossRef]
- Vondrick, C.; Pirsiavash, H.; Torralba, A. Anticipating Visual Representations from Unlabeled Video. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 98–106. [Google Scholar] [CrossRef]
- Furnari, A.; Battiato, S.; Farinella, G.M. Leveraging Uncertainty to Rethink Loss Functions and Evaluation Measures for Egocentric Action Anticipation. In Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2019; pp. 389–405. [Google Scholar] [CrossRef]
- Berrada, L.; Zisserman, A.; Kumar, M.P. Smooth Loss Functions for Deep Top-k Classification. arXiv 2018, arXiv:1802.07595. [Google Scholar] [CrossRef]
- Sener, F.; Singhania, D.; Yao, A. Temporal Aggregate Representations for Long-Range Video Understanding. In Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2020; pp. 154–171. [Google Scholar] [CrossRef]
- Gao, J.; Yang, Z.; Nevatia, R. RED: Reinforced Encoder-Decoder Networks for Action Anticipation. In Proceedings of the 2017 British Machine Vision Conference (BMVC), London, UK, 4–7 September 2017; pp. 1–13. [Google Scholar] [CrossRef]
- De Geest, R.; Tuytelaars, T. Modeling Temporal Structure with LSTM for Online Action Detection. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 12–15 March 2018; pp. 1549–1557. [Google Scholar] [CrossRef]
- Ma, S.; Sigal, L.; Sclaroff, S. Learning Activity Progression in LSTMs for Activity Detection and Early Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1942–1950. [Google Scholar] [CrossRef]
- Jain, A.; Singh, A.; Koppula, H.S.; Soh, S.; Saxena, A. Recurrent Neural Networks for Driver Activity Anticipation via Sensory-Fusion Architecture. In Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden, 16–21 May 2016; pp. 3118–3125. [Google Scholar] [CrossRef]
- Wu, Y.; Zhu, L.; Wang, X.; Yang, Y.; Wu, F. Learning to Anticipate Egocentric Actions by Imagination. IEEE Trans. Image Process. 2021, 30, 1143–1152. [Google Scholar] [CrossRef] [PubMed]
- Osman, N.; Camporese, G.; Coscia, P.; Ballan, L. SlowFast Rolling-Unrolling LSTMs for Action Anticipation in Egocentric Videos. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Virtual Event, 11–17 October 2021; pp. 3430–3438. [Google Scholar] [CrossRef]
- Girdhar, R.; Grauman, K. Anticipative Video Transformer. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Virtual Event, 11–17 October 2021; pp. 13485–13495. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Pramanick, S.; Song, Y.; Nag, S.; Lin, K.Q.; Shah, H.; Shou, M.Z.; Chellappa, R.; Zhang, P. EgoVLPv2: Egocentric Video-Language Pre-Training with Fusion in the Backbone. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2–6 October 2023; pp. 5262–5274. [Google Scholar] [CrossRef]
- Akbari, H.; Yuan, L.; Qian, R.; Chuang, W.-H.; Chang, S.-F.; Cui, Y.; Gong, B. VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text. In Proceedings of the 35th Conference on Neural Information Processing Systems 34, Virtual, 6–14 December 2021; pp. 1–16. [Google Scholar]
- Li, G.; Chen, Y.; Wu, Y.; Zhao, K.; Pollefeys, M.; Tang, S. EgoM2P: Egocentric Multimodal Multitask Pretraining. arXiv 2025, arXiv:2506.07886. [Google Scholar]
- Wu, Y.; Zhang, S.; Li, P. Multi-Modal Emotion Recognition in Conversation Based on Prompt Learning with Text-Audio Fusion Features. Sci. Rep. 2025, 15, 8855. [Google Scholar] [CrossRef]
- Assefa, M.; Jiang, W.; Zhan, J.; Gedamu, K.; Yilma, G.; Ayalew, M.; Adhikari, D. Audio-Visual Contrastive and Consistency Learning for Semi-Supervised Action Recognition. IEEE Trans. Multimed. 2024, 26, 3491–3504. [Google Scholar] [CrossRef]
Model | Top-5 Activity Accuracy (%) @ Different (s) | Top-5 Acc. (%) @ 1s | M. Top-5 Rec. (%) @ 1s | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | 1.75 | 1.5 | 1.25 | 1 | 0.75 | 0.5 | 0.25 | Verb | Noun | Act. | Verb | Noun | Act. | |
DMR [40] | / | / | / | / | 16.86 | / | / | / | 73.66 | 29.99 | 16.86 | 24.50 | 20.89 | 03.23 |
ATSN [2] | / | / | / | / | 16.29 | / | / | / | 77.30 | 39.93 | 16.29 | 33.08 | 32.77 | 07.60 |
MCE [41] | / | / | / | / | 26.11 | / | / | / | 73.35 | 38.86 | 26.11 | 34.62 | 32.59 | 06.50 |
SVM [42] | / | / | / | / | 25.42 | / | / | / | 72.70 | 38.41 | 25.42 | 41.90 | 34.69 | 05.32 |
ActionBanks [43] | / | / | / | / | 28.60 | 28.60 | ||||||||
ED [44] | 21.53 | 22.22 | 23.20 | 24.78 | 25.75 | 26.69 | 27.66 | 29.74 | 75.46 | 42.96 | 25.75 | 41.77 | 42.59 | 10.97 |
FN [45] | 23.47 | 24.07 | 24.68 | 25.66 | 26.27 | 26.87 | 27.88 | 28.96 | 74.84 | 40.87 | 26.27 | 35.30 | 37.77 | 06.64 |
RL [46] | 25.95 | 26.49 | 27.15 | 28.48 | 29.61 | 30.81 | 31.86 | 32.84 | 76.79 | 44.53 | 29.61 | 40.80 | 40.87 | 10.64 |
EL [47] | 24.68 | 25.68 | 26.41 | 27.35 | 28.56 | 30.27 | 31.50 | 33.55 | 75.66 | 43.72 | 28.56 | 38.70 | 40.32 | 08.62 |
RU-LSTM [12] | 29.44 | 30.71 | 32.33 | 33.41 | 35.32 | 36.34 | 37.37 | 38.98 | 79.55 | 51.79 | 35.32 | 43.72 | 49.90 | 15.10 |
LAI [48] | / | / | 32.50 | 33.60 | 35.60 | 36.70 | 38.50 | 39.40 | 80.00 | 52.80 | 35.60 | / | / | / |
SRL [14] | 30.15 | 31.28 | 32.36 | 34.05 | 35.52 | 36.77 | 38.60 | 40.49 | / | / | 35.52 | / | / | / |
HRO [16] | 31.30 | 32.67 | 34.26 | 35.87 | 37.42 | 38.36 | 39.89 | 42.36 | 81.53 | 54.51 | 37.42 | 45.16 | 51.78 | 17.50 |
SF-RULSTM [49] | 30.58 | / | 32.83 | / | 36.09 | / | 37.87 | / | / | / | 36.09 | / | / | / |
AVT [50] | / | / | / | / | 37.60 | / | / | / | / | / | 37.60 | / | / | / |
IAAM [18] | / | / | / | / | 35.45 | / | / | / | 80.26 | 53.10 | 35.45 | 44.84 | 53.43 | 17.14 |
VS-TransGRU:ES [21] | 32.13 | 33.57 | 35.10 | 36.69 | 38.78 | 39.12 | 39.87 | 40.95 | 79.78 | 55.69 | 38.78 | 46.23 | 52.91 | 18.29 |
VFGF (Ours) | 32.54 | 33.72 | 35.27 | 36.84 | 38.53 | 40.58 | 41.75 | 44.11 | 80.75 | 56.36 | 38.53 | 47.82 | 57.75 | 17.91 |
Exp. | + | Top-5 Accuracy (%) at Different (s) | Top-1 Accuracy (%) at Different (s) | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | 1.75 | 1.5 | 1 | 0.5 | 0.25 | 2 | 1.75 | 1.5 | 1 | 0.5 | 0.25 | ||||
baseline | - | - | - | 24.32 | 25.06 | 26.29 | 29.18 | 31.42 | 33.75 | 10.52 | 11.15 | 11.95 | 13.47 | 16.19 | 18.43 |
+VFA | √ | - | - | 24.76 | 25.51 | 27.04 | 30.01 | 32.22 | 34.43 | 10.58 | 11.29 | 12.13 | 14.18 | 17.56 | 19.16 |
+FGM | - | √ | - | 26.85 | 27.36 | 28.29 | 31.54 | 34.60 | 37.48 | 11.71 | 12.13 | 13.37 | 14.58 | 18.62 | 20.92 |
+VFA&FGM | √ | √ | - | 27.66 | 28.60 | 29.79 | 32.57 | 36.02 | 38.91 | 12.57 | 13.09 | 14.20 | 16.25 | 18.91 | 21.61 |
+All components | √ | √ | √ | 28.52 | 29.38 | 30.06 | 33.64 | 36.77 | 39.26 | 13.09 | 13.37 | 14.32 | 16.29 | 19.20 | 22.72 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Long, X.; Wang, S.; Chen, Y. VFGF: Virtual Frame-Augmented Guided Prediction Framework for Long-Term Egocentric Activity Forecasting. Sensors 2025, 25, 5644. https://doi.org/10.3390/s25185644
Long X, Wang S, Chen Y. VFGF: Virtual Frame-Augmented Guided Prediction Framework for Long-Term Egocentric Activity Forecasting. Sensors. 2025; 25(18):5644. https://doi.org/10.3390/s25185644
Chicago/Turabian StyleLong, Xiangdong, Shuqing Wang, and Yong Chen. 2025. "VFGF: Virtual Frame-Augmented Guided Prediction Framework for Long-Term Egocentric Activity Forecasting" Sensors 25, no. 18: 5644. https://doi.org/10.3390/s25185644
APA StyleLong, X., Wang, S., & Chen, Y. (2025). VFGF: Virtual Frame-Augmented Guided Prediction Framework for Long-Term Egocentric Activity Forecasting. Sensors, 25(18), 5644. https://doi.org/10.3390/s25185644