Supervised Learning Applications of Action Recognition and Action Prediction

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: 15 November 2026 | Viewed by 158

Special Issue Editor


E-Mail Website
Guest Editor
School of Information and Engineering, Zhengzhou University, Zhengzhou 450001, China
Interests: multimedia systems design; digital image processing and pattern recognition

Special Issue Information

Dear Colleagues,

Human action recognition and prediction are fundamental research topics in computer vision and intelligent systems, with video surveillance, healthcare monitoring, human–computer interaction, robotics, sports analytics, and autonomous driving becoming increasingly relevant. Over the past decade, deep learning methods—including convolutional neural networks (CNNs), recurrent neural networks (RNNs), graph convolutional networks (GCNs), and Transformers—have significantly advanced the state of the art, enabling a more accurate and robust recognition and prediction of human actions from video, skeleton, and multimodal data. More recently, the rapid development of large language models (LLMs) and multimodal large language models (MLLMs) has made new advancements in video understanding, offering powerful capabilities in visual reasoning, open-vocabulary action recognition, video captioning, and instruction-following for action analyses.

This Special Issue aims to collect cutting-edge research on both supervised learning approaches and emerging LLM/MLLM-driven methodologies for action recognition and prediction. Particular emphasis is placed on how large-scale foundation models are transforming video understanding—from novel model architectures and efficient training strategies to zero-shot and few-shot action recognition, cross-modal knowledge transfer, and LLM-assisted temporal reasoning. Both theoretical contributions and practical application studies that push the boundaries of current methods are welcomed, including, but not limited to, the integration of multimodal large language models for enhanced action understanding.

In this Special Issue, original research articles and reviews are welcomed. Research areas may include (but are not limited to) the following:

  • Supervised deep learning models for video-based action recognition and prediction;
  • Skeleton-based and pose-based action recognition using deep learning and graph neural networks;
  • Temporal action detection, localization, and segmentation;
  • Early action prediction and future activity anticipation;
  • Multimodal fusion strategies for action understanding (RGB, depth, skeleton, audio, and text);
  • Attention mechanisms and Transformer architectures for action analysis;
  • Large language models (LLMs) and multimodal large language models (MLLMs) for video understanding;
  • Zero-shot, few-shot, and open-vocabulary action recognition leveraging foundation models;
  • Video captioning, video question answering, and video language alignment for action analysis;
  • LLM-assisted temporal reasoning and action chain prediction;
  • Knowledge distillation and efficient deployment of large models for action recognition;
  • Action recognition in complex, real-world, and domain-specific scenarios;
  • Applications in autonomous driving, healthcare, sports analytics, smart environments, and human–robot interaction.

Dr. Yun Tie
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • action recognition
  • action prediction
  • supervised learning
  • deep learning
  • video understanding
  • skeleton-based recognition
  • temporal action detection
  • multimodal fusion
  • large language models (LLMs)
  • multimodal large language models (MLLMs)
  • foundation models
  • zero-shot action recognition
  • video-language alignment

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (1 paper)

Order results
Result details
Select all
Export citation of selected articles as:

Research

24 pages, 8539 KB  
Article
Temporally Consistent Student Behavior Recognition in Smart Classrooms via Attention-Guided Perception and State Estimation
by Shuzhao Zong, Chenyang He, Peng Sun and Chenliang Ma
Electronics 2026, 15(12), 2644; https://doi.org/10.3390/electronics15122644 (registering DOI) - 15 Jun 2026
Abstract
Recognizing student behaviors in classroom videos remains challenging due to complex backgrounds, frequent occlusions, subtle inter-class motion differences, and temporal jitter in frame-wise predictions. To address these issues, this paper proposes a hybrid student behavior recognition framework that integrates a Multi-branch Spatiotemporal Attention [...] Read more.
Recognizing student behaviors in classroom videos remains challenging due to complex backgrounds, frequent occlusions, subtle inter-class motion differences, and temporal jitter in frame-wise predictions. To address these issues, this paper proposes a hybrid student behavior recognition framework that integrates a Multi-branch Spatiotemporal Attention Network (MSTA-Net) with a Behavior State Kalman Filter (BSKF). At the perceptual level, MSTA-Net employs decoupled channel, spatial, and short-term temporal attention branches to enhance discriminative behavioral features while suppressing irrelevant background information. At the cognitive level, BSKF reformulates behavior recognition as a continuous state estimation problem in a high-dimensional probability space, where behavioral inertia is exploited to smooth noisy observations and improve temporal consistency. Experimental results on the SCB-Dataset and real-world classroom video sequences demonstrate that the proposed method achieves an accuracy of 94.7% and a real-time inference speed of 33 FPS. Compared with purely deep learning-based models, the proposed framework reduces the Action Category Switching (ACS) rate by 50%, indicating substantially improved robustness in long-term behavior recognition. These results suggest that coupling attention-based perception with Kalman-based state estimation provides an effective and efficient solution for reliable student behavior analysis in intelligent classroom environments. Full article
Show Figures

Figure 1

Back to TopTop