Non-Local Temporal Difference Network for Temporal Action Detection
Abstract
:1. Introduction
2. Related Work
3. Approach
3.1. Chunk Convolution
3.2. Multiple Temporal Coordination
3.3. Temporal Difference
4. Training and Inference
4.1. Training
4.2. Inference
5. Experiments
5.1. Datasets
5.2. Evaluation Metrics
5.3. Feature Extraction and Implementation Details
6. Results
6.1. Comparison with State-of-the-Art Methods
Method | Year | Backbone | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | AVG |
---|---|---|---|---|---|---|---|---|
S-CNN [20] | CVPR-2016 | DTF | 36.3 | 28.7 | 19.0 | 10.3 | 5.3 | 19.9 |
TURN [21] | ICCV-2017 | Flow | 44.1 | 34.9 | 25.6 | - | - | - |
R-C3D [24] | ICCV-2017 | C3D | 44.8 | 35.6 | 28.9 | - | - | - |
BSN [30] | ECCV-2018 | TSN | 53.5 | 45.0 | 36.9 | 28.4 | 20.0 | 36.8 |
TAL-Net [11] | CVPR-2018 | I3D | 53.2 | 48.5 | 42.8 | 33.8 | 20.8 | 39.8 |
GTAN [25] | CVPR-2019 | P3D | 57.8 | 47.2 | 38.8 | - | - | - |
P-GCN [34] | ICCV-2019 | TSN | 60.1 | 54.3 | 45.5 | 33.5 | 19.8 | 42.6 |
BMN [31] | ICCV-2019 | TSN | 56.0 | 47.4 | 38.8 | 29.7 | 20.5 | 36.8 |
A2Net [12] | TIP-2020 | I3D | 58.6 | 54.1 | 45.5 | 32.5 | 17.2 | 41.6 |
G-TAD [35] | CVPR-2020 | TSN | 54.5 | 47.6 | 40.2 | 30.8 | 23.4 | 39.3 |
BU-MR [33] | ECCV-2020 | TSN | 53.9 | 50.7 | 45.4 | 38.0 | 28.5 | 43.3 |
VSGN [45] | ICCV-2021 | TSN | 66.7 | 60.4 | 52.4 | 41.0 | 30.4 | 50.2 |
CSA [46] | ICCV-2021 | TSN | 64.4 | 58.0 | 49.2 | 38.2 | 27.8 | 47.5 |
AFSD [27] | CVPR-2021 | I3D | 67.3 | 62.4 | 55.5 | 43.7 | 31.1 | 52.0 |
MUSES [47] | ICCV-2021 | I3D | 68.3 | 63.8 | 54.3 | 41.8 | 26.2 | 50.9 |
RefactorNet [16] | CVPR-2022 | I3D | 70.7 | 65.4 | 58.6 | 47.0 | 32.1 | 54.8 |
ActionFormer [4] | 2022 | I3D | 75.5 | 72.5 | 65.6 | 56.6 | 42.7 | 62.6 |
RCL [1] | CVPR-2022 | TSN | 70.1 | 62.3 | 52.9 | 42.7 | 30.7 | 51.7 |
AES [43] | CVPR-2022 | SF R50 | 69.4 | 64.3 | 56.0 | 46.4 | 34.9 | 54.2 |
BCNet [44] | AAAI-2022 | I3D | 71.5 | 67.0 | 60.0 | 48.9 | 33.0 | 56.1 |
NTD (Ours) | I3D | 82.7 | 78.7 | 71.6 | 58.3 | 42.8 | 66.8 |
Method | Year | 0.5 | 0.75 | 0.95 | AVG |
---|---|---|---|---|---|
TAL-Net [11] | CVPR-2018 | 38.2 | 18.3 | 1.3 | 20.2 |
BSN [30] | ECCV-2018 | 46.5 | 30.0 | 8.0 | 30.0 |
GTAN [25] | CVPR-2019 | 52.6 | 34.1 | 8.9 | 34.3 |
BMN [31] | ICCV-2019 | 50.1 | 34.8 | 8.3 | 33.9 |
BC-GNN [48] | ECCV-2020 | 50.6 | 34.8 | 9.4 | 34.3 |
G-TAD [35] | CVPR-2020 | 50.4 | 34.6 | 9.0 | 34.1 |
TCANet [49] | CVPR-2021 | 52.3 | 36.7 | 6.9 | 35.5 |
BSN++ [32] | AAAI-2021 | 51.3 | 35.7 | 8.3 | 34.9 |
MUSES [47] | CVPR-2021 | 50.0 | 35.0 | 6.6 | 34.0 |
ActionFormer [4] | 2022 | 53.5 | 36.2 | 8.2 | 35.6 |
BCNet [44] | AAAI-2022 | 53.2 | 36.2 | 10.6 | 35.5 |
AES [43] | CVPR-2022 | 50.1 | 35.8 | 10.5 | 35.1 |
RCL [1] | CVPR-2022 | 51.7 | 35.3 | 8.0 | 34.4 |
NTD (Ours) | 54.4 | 37.4 | 8.2 | 36.2 |
6.2. Ablation Study of MTC Module
6.3. Ablation Study of TD Module
6.4. Ablation Study of CC Module
6.5. Ablation Study of Combination Strategies
6.6. Qualitative Results
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Conflicts of Interest
References
- Wang, Q.; Zhang, Y.; Zheng, Y.; Pan, P. RCL: Recurrent Continuous Localization for Temporal Action Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–20 June 2022; pp. 13566–13575. [Google Scholar]
- Dai, R.; Das, S.; Minciullo, L.; Garattoni, L.; Francesca, G.; Bremond, F. Pdan: Pyramid dilated attention network for action detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Nashville, TN, USA, 19–25 June 2021; pp. 2970–2979. [Google Scholar]
- Dai, X.; Singh, B.; Ng, J.Y.H.; Davis, L. Tan: Temporal aggregation network for dense multi-label action recognition. In Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), Honolulu, HI, USA, 7–11 January 2019; pp. 151–160. [Google Scholar]
- Zhang, C.; Wu, J.; Li, Y. Actionformer: Localizing moments of actions with transformers. arXiv 2022, arXiv:2202.07925. [Google Scholar]
- Tan, J.; Tang, J.; Wang, L.; Wu, G. Relaxed transformer decoders for direct action proposal generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 13526–13535. [Google Scholar]
- Dai, R.; Das, S.; Kahatapitiya, K.; Ryoo, M.S.; Bremond, F. MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 21–24 June 2022; pp. 20041–20051. [Google Scholar]
- He, L.; Todorovic, S. DESTR: Object Detection with Split Transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 21–24 June 2022; pp. 9377–9386. [Google Scholar]
- Li, Y.; Wu, C.Y.; Fan, H.; Mangalam, K.; Xiong, B.; Malik, J.; Feichtenhofer, C. MViTv2: Improved Multiscale Vision Transformers for Classification and Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 21–24 June 2022; pp. 4804–4814. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Chao, Y.W.; Vijayanarasimhan, S.; Seybold, B.; Ross, D.A.; Deng, J.; Sukthankar, R. Rethinking the faster r-cnn architecture for temporal action localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 1130–1139. [Google Scholar]
- Yang, L.; Peng, H.; Zhang, D.; Fu, J.; Han, J. Revisiting anchor mechanisms for temporal action localization. IEEE Trans. Image Process. 2020, 29, 8535–8548. [Google Scholar] [CrossRef]
- Chen, G.; Zheng, Y.D.; Wang, L.; Lu, T. DCAN: Improving temporal action detection via dual context aggregation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 22 February–1 March 2022; Volume 36, pp. 248–257. [Google Scholar]
- Gao, S.H.; Cheng, M.M.; Zhao, K.; Zhang, X.Y.; Yang, M.H.; Torr, P. Res2net: A new multi-scale backbone architecture. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 43, 652–662. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
- Xia, K.; Wang, L.; Zhou, S.; Zheng, N.; Tang, W. Learning To Refactor Action and Co-Occurrence Features for Temporal Action Localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–20 June 2022; pp. 13884–13893. [Google Scholar]
- Simonyan, K.; Zisserman, A. Two-stream convolutional networks for action recognition in videos. Adv. Neural Inf. Process. Syst. 2014, 27. [Google Scholar] [CrossRef]
- Caba Heilbron, F.; Escorcia, V.; Ghanem, B.; Carlos Niebles, J. Activitynet: A large-scale video benchmark for human activity understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7 June 2015; pp. 961–970. [Google Scholar]
- Idrees, H.; Zamir, A.R.; Jiang, Y.G.; Gorban, A.; Laptev, I.; Sukthankar, R.; Shah, M. The THUMOS challenge on action recognition for videos “in the wild”. Comput. Vis. Image Underst. 2017, 155, 1–23. [Google Scholar] [CrossRef] [Green Version]
- Shou, Z.; Wang, D.; Chang, S.F. Temporal action localization in untrimmed videos via multi-stage cnns. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 1049–1058. [Google Scholar]
- Gao, J.; Yang, Z.; Chen, K.; Sun, C.; Nevatia, R. Turn tap: Temporal unit regression network for temporal action proposals. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 3628–3636. [Google Scholar]
- Gao, J.; Chen, K.; Nevatia, R. Ctap: Complementary temporal action proposal generation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 68–83. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Xu, H.; Das, A.; Saenko, K. R-c3d: Region convolutional 3d network for temporal activity detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5783–5792. [Google Scholar]
- Long, F.; Yao, T.; Qiu, Z.; Tian, X.; Luo, J.; Mei, T. Gaussian temporal awareness networks for action localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 344–353. [Google Scholar]
- Liu, Q.; Wang, Z. Progressive boundary refinement network for temporal action detection. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 11612–11619. [Google Scholar]
- Lin, C.; Xu, C.; Luo, D.; Wang, Y.; Tai, Y.; Wang, C.; Li, J.; Huang, F.; Fu, Y. Learning salient boundary feature for anchor-free temporal action localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 3320–3329. [Google Scholar]
- Liu, X.; Wang, Q.; Hu, Y.; Tang, X.; Zhang, S.; Bai, S.; Bai, X. End-to-end temporal action detection with transformer. IEEE Trans. Image Process. 2022, 31, 5427–5441. [Google Scholar] [CrossRef]
- Zhao, Y.; Xiong, Y.; Wang, L.; Wu, Z.; Tang, X.; Lin, D. Temporal action detection with structured segment networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2914–2923. [Google Scholar]
- Lin, T.; Zhao, X.; Su, H.; Wang, C.; Yang, M. Bsn: Boundary sensitive network for temporal action proposal generation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Lin, T.; Liu, X.; Li, X.; Ding, E.; Wen, S. Bmn: Boundary-matching network for temporal action proposal generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 3889–3898. [Google Scholar]
- Su, H.; Gan, W.; Wu, W.; Qiao, Y.; Yan, J. Bsn++: Complementary boundary regressor with scale-balanced relation modeling for temporal action proposal generation. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 2–9 February 2021; pp. 2602–2610. [Google Scholar]
- Zhao, P.; Xie, L.; Ju, C.; Zhang, Y.; Wang, Y.; Tian, Q. Bottom-up temporal action localization with mutual regularization. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 539–555. [Google Scholar]
- Zeng, R.; Huang, W.; Tan, M.; Rong, Y.; Zhao, P.; Huang, J.; Gan, C. Graph convolutional networks for temporal action localization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 7094–7103. [Google Scholar]
- Xu, M.; Zhao, C.; Rojas, D.S.; Thabet, A.; Ghanem, B. G-tad: Sub-graph localization for temporal action detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10156–10165. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Bodla, N.; Singh, B.; Chellappa, R.; Davis, L.S. Soft-NMS–improving object detection with one line of code. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5561–5569. [Google Scholar]
- Carreira, J.; Zisserman, A. Quo vadis, action recognition? A new model and the kinetics dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6299–6308. [Google Scholar]
- Kay, W.; Carreira, J.; Simonyan, K.; Zhang, B.; Hillier, C.; Vijayanarasimhan, S.; Viola, F.; Green, T.; Back, T.; Natsev, P.; et al. The kinetics human action video dataset. arXiv 2017, arXiv:1705.06950. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Tran, D.; Wang, H.; Torresani, L.; Ray, J.; LeCun, Y.; Paluri, M. A closer look at spatiotemporal convolutions for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6450–6459. [Google Scholar]
- Alwassel, H.; Giancola, S.; Ghanem, B. Tsp: Temporally-sensitive pretraining of video encoders for localization tasks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 3173–3183. [Google Scholar]
- Liu, X.; Bai, S.; Bai, X. An Empirical Study of End-to-End Temporal Action Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–20 June 2022; pp. 20010–20019. [Google Scholar]
- Yang, H.; Wu, W.; Wang, L.; Jin, S.; Xia, B.; Yao, H.; Huang, H. Temporal Action Proposal Generation with Background Constraint. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 22 February–1 March 2022; Volume 36, pp. 3054–3062. [Google Scholar]
- Zhao, C.; Thabet, A.K.; Ghanem, B. Video self-stitching graph network for temporal action localization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 13658–13667. [Google Scholar]
- Sridhar, D.; Quader, N.; Muralidharan, S.; Li, Y.; Dai, P.; Lu, J. Class semantics-based attention for action detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 13739–13748. [Google Scholar]
- Liu, X.; Hu, Y.; Bai, S.; Ding, F.; Bai, X.; Torr, P.H. Multi-shot temporal event localization: A benchmark. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 12596–12606. [Google Scholar]
- Bai, Y.; Wang, Y.; Tong, Y.; Yang, Y.; Liu, Q.; Liu, J. Boundary content graph neural network for temporal action proposal generation. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 121–137. [Google Scholar]
- Qing, Z.; Su, H.; Gan, W.; Wang, D.; Wu, W.; Wang, X.; Qiao, Y.; Yan, J.; Gao, C.; Sang, N. Temporal context aggregation network for temporal action proposal refinement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 485–494. [Google Scholar]
Number | Strategy | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | AVG |
---|---|---|---|---|---|---|---|
6 | MAX | 81.3 | 77.3 | 69.9 | 59.4 | 43.2 | 66.2 |
5 | MAX | 81.2 | 77.3 | 70.0 | 58.2 | 44.4 | 66.2 |
4 | MAX | 82.7 | 78.7 | 71.6 | 58.3 | 42.8 | 66.8 |
3 | MAX | 81.7 | 77.8 | 71.2 | 60.0 | 44.9 | 67.1 |
2 | MAX | 81.6 | 78.0 | 70.5 | 57.7 | 43.2 | 66.2 |
1 | MAX | 81.0 | 77.0 | 69.4 | 59.0 | 43.8 | 66.1 |
4 | AVG | 81.9 | 77.9 | 69.8 | 57.2 | 42.7 | 65.9 |
4 | Conv1D | 81.8 | 77.3 | 70.1 | 58.9 | 44.4 | 66.5 |
K | S | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | AVG |
---|---|---|---|---|---|---|---|
3 | 3 | 82.1 | 77.5 | 71.2 | 58.0 | 43.3 | 66.4 |
5 | 3 | 82.1 | 78.4 | 71.3 | 59.6 | 43.4 | 66.9 |
7 | 3 | 82.7 | 78.7 | 71.6 | 58.3 | 42.8 | 66.8 |
9 | 3 | 81.1 | 77.3 | 70.5 | 57.7 | 44.4 | 66.2 |
11 | 3 | 81.6 | 77.2 | 70.6 | 58.3 | 43.8 | 66.3 |
7 | 5 | 81.9 | 77.9 | 70.2 | 58.1 | 43.4 | 66.3 |
7 | 7 | 81.2 | 77.3 | 70.5 | 58.4 | 43.7 | 66.2 |
SC | DC | D = 1 | D = 3 | D = 6 | D = 9 | D = 13 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | AVG |
---|---|---|---|---|---|---|---|---|---|---|---|---|
√ | √ | 81.7 | 77.5 | 70.9 | 58.5 | 43.3 | 66.4 | |||||
√ | √ | √ | 82.1 | 78.0 | 70.3 | 57.6 | 43.4 | 66.3 | ||||
√ | √ | √ | √ | 82.7 | 78.7 | 71.6 | 58.3 | 42.8 | 66.8 | |||
√ | √ | √ | √ | √ | 81.7 | 77.2 | 69.7 | 57.9 | 42.6 | 65.8 | ||
√ | √ | √ | √ | √ | √ | 81.4 | 77.8 | 70.8 | 58.7 | 43.2 | 66.4 | |
√ | √ | √ | √ | 82.1 | 77.9 | 70.5 | 58.5 | 43.9 | 66.6 |
Strategy | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | AVG |
---|---|---|---|---|---|---|
CC → MTC → TD | 82.7 | 78.7 | 71.6 | 58.3 | 42.8 | 66.8 |
CC → TD → MTC | 81.9 | 77.7 | 70.3 | 57.6 | 42.5 | 66.0 |
TD → CC → MTC | 81.8 | 77.8 | 70.4 | 58.3 | 42.5 | 66.2 |
TD → MTC → CC | 81.7 | 77.6 | 70.0 | 56.8 | 43.2 | 65.8 |
MTC → TD → CC | 81.9 | 78.1 | 70.7 | 57.7 | 43.5 | 66.4 |
MTC → CC → TD | 82.2 | 77.6 | 70.4 | 58.7 | 43.7 | 66.5 |
Stack (avg) | 82.2 | 77.9 | 70.4 | 57.8 | 43.5 | 66.3 |
Stack (max) | 82.0 | 77.7 | 70.1 | 58.2 | 43.7 | 66.3 |
Cancat | 81.1 | 76.9 | 69.8 | 58.3 | 44.0 | 66.0 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
He, Y.; Han, X.; Zhong, Y.; Wang, L. Non-Local Temporal Difference Network for Temporal Action Detection. Sensors 2022, 22, 8396. https://doi.org/10.3390/s22218396
He Y, Han X, Zhong Y, Wang L. Non-Local Temporal Difference Network for Temporal Action Detection. Sensors. 2022; 22(21):8396. https://doi.org/10.3390/s22218396
Chicago/Turabian StyleHe, Yilong, Xiao Han, Yong Zhong, and Lishun Wang. 2022. "Non-Local Temporal Difference Network for Temporal Action Detection" Sensors 22, no. 21: 8396. https://doi.org/10.3390/s22218396
APA StyleHe, Y., Han, X., Zhong, Y., & Wang, L. (2022). Non-Local Temporal Difference Network for Temporal Action Detection. Sensors, 22(21), 8396. https://doi.org/10.3390/s22218396