Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (646)

Search Parameters:
Keywords = inter-frame

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
45 pages, 19804 KB  
Article
Target-Aware Safety-Residual Reinforcement Learning for Cooperative Multi-UAV Pursuit in Complex Environments
by Shun Li, Bo Yu, Dongying Liu, Dayu Gao, Peizheng He, Gongbo Chen and Lin Xu
Machines 2026, 14(7), 733; https://doi.org/10.3390/machines14070733 (registering DOI) - 29 Jun 2026
Abstract
Multi-UAV cooperative persistent tracking in complex obstacle environments requires agents to approach dynamic targets while ensuring obstacle avoidance and flight safety; however, standard multi-agent reinforcement learning (MARL) methods typically rely on a single policy to implicitly handle both objectives, making it difficult to [...] Read more.
Multi-UAV cooperative persistent tracking in complex obstacle environments requires agents to approach dynamic targets while ensuring obstacle avoidance and flight safety; however, standard multi-agent reinforcement learning (MARL) methods typically rely on a single policy to implicitly handle both objectives, making it difficult to balance task performance and risk control. To address this issue, this paper proposes a Target-Aware Safety-Residual Pursuit Reinforcement Learning (TASRP) framework for constrained three-dimensional environments. A continuous-control 3D tracking environment is constructed in IsaacLab, where two multirotor UAVs cooperatively track a dynamic target under random, target-blocking, and gate-like obstacle layouts, boundary constraints, and inter-agent collision risks, with each UAV producing a four-dimensional action composed of normalized thrust and body-frame torques. TASRP adopts a dual-head residual policy in which a pursuit branch generates nominal actions, and a safety branch predicts corrective residuals, together with a risk-aware gating mechanism, a target-guided teacher for obstacle detouring, and a dual-critic safety-constrained optimization scheme. Under clean observations, TASRP achieves task success rates of 75–79%, obstacle crash rates of 13–15%, and boundary crash rates of 1–2% across three representative scenarios. Under noisy observations, TASRP achieves 72.1% task success, 20.3% obstacle crash, and 2.8% boundary crash, outperforming MAPPO (61.2%, 61.2%, 5.6%) and HAPPO (58.1%, 73.5%, 4.1%). These results indicate that explicitly decoupling target-oriented control and safety correction enables a more effective and robust performance–safety trade-off under both clean and moderately noisy observations. Full article
Show Figures

Figure 1

27 pages, 3395 KB  
Article
A Computer-Vision Biological Early Warning System for Marine Pollution Detection Using Aurelia aurita as a Biosensor: Per-Animal Anomaly Detection of Diesel Exposure
by Aleksandr Grekov, Kirill Paraev, Iuliia Baiandina, Aleksei Baiandin and Elena Vyshkvarkova
J. Mar. Sci. Eng. 2026, 14(13), 1189; https://doi.org/10.3390/jmse14131189 (registering DOI) - 28 Jun 2026
Abstract
Marine pollution monitoring increasingly relies on Biological Early Warning Systems (BEWSs), which use living organisms as continuous, integrative sentinels of water quality. The moon jellyfish Aurelia aurita is a sensitive but under-exploited candidate for this role. We present a computer-vision BEWS pipeline that [...] Read more.
Marine pollution monitoring increasingly relies on Biological Early Warning Systems (BEWSs), which use living organisms as continuous, integrative sentinels of water quality. The moon jellyfish Aurelia aurita is a sensitive but under-exploited candidate for this role. We present a computer-vision BEWS pipeline that is unsupervised at inference time and operates without labelled pollution-response data, converting side-view aquarium video of single A. aurita medusae into a binary pollution alarm. Per-frame YOLO bounding-box detections are reduced to a continuous bell-area signal and a centroid trajectory, from which eleven pulsation, kinematic, and detection-quality features are extracted on 60 s sliding windows. A per-animal baseline is fitted on a clean-water baseline (recommended ≥15 min), and a two-layer detector—fast outlier detection on the mean absolute z-score with a k-of-N rule, plus one-sided CUSUM (cumulative sum) accumulation—flags any sustained deviation. Validation on six adult medusae exposed to diesel-WAF detected all six animals (95% CI 54–100%) and produced no false alarms in 203 clean-window opportunities (exact 95% upper bound 1.8%; rule-of-three estimate ≈1.5%). First-alarm latencies ranged from 1.0 to 23.7 min, and the observed responses were described as three descriptive patterns in this pilot dataset: sharp step-change, slow drift, and mixed. The deployed anomaly scoring step contains no neural-network weights, runs in under 300 lines of Python, and is designed for field-portable use in settings where a stationary side-view camera can be positioned alongside an aquarium, although field validation remains required. Per-animal anomaly detection accommodates the strong inter-individual variability of the diesel-WAF response that limits supervised clean-versus-polluted classification at this sample size. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

22 pages, 2587 KB  
Article
Measurement-Oriented 3D Reconstruction and Attitude Estimation of Free-Tumbling Space Targets via Cooperative Multi-View Observation
by Di Zhao, Zhe Yue, Wensong Zhang, Jianping Yuan, Weihua Ma, Haofei Ban, Sen Li and Weiwei Lei
Aerospace 2026, 13(7), 583; https://doi.org/10.3390/aerospace13070583 (registering DOI) - 27 Jun 2026
Viewed by 99
Abstract
Accurate attitude measurement of non-cooperative space targets is essential for on-orbit servicing, active debris removal, and autonomous rendezvous missions. To address the challenges associated with unknown geometry, rapid tumbling motion, and the limited observability of single-view systems, this study proposes a cooperative multi-view [...] Read more.
Accurate attitude measurement of non-cooperative space targets is essential for on-orbit servicing, active debris removal, and autonomous rendezvous missions. To address the challenges associated with unknown geometry, rapid tumbling motion, and the limited observability of single-view systems, this study proposes a cooperative multi-view measurement framework for three-dimensional reconstruction and attitude estimation. Multiple spacecraft are deployed to form a stable observation configuration, and multi-view image sequences are acquired to strengthen geometric constraints. A learning-based multi-view stereo reconstruction module is used to estimate depth information and reconstruct point clouds, which are further processed through iterative closest point (ICP) registration to derive inter-frame attitude variations. An extended Kalman filter (EKF) is then introduced to improve temporal consistency and suppress measurement noise. Validation is conducted in a numerical simulation using a simplified Fengyun-1 (FY-1) satellite model under a three-spacecraft cooperative fly-around scenario. The simulation results demonstrate that the proposed method achieves high-precision attitude estimation, with attitude errors below 0.3 and positional errors within 0.05m. Comparative experiments show that the method maintains stable measurement performance under varying observation distances and viewing configurations. The proposed framework provides a reliable and robust measurement solution for dynamic attitude determination of free-tumbling space targets. Full article
(This article belongs to the Section Astronautics & Space Science)
17 pages, 1242 KB  
Article
Local Twitches During Ultrasound-Guided Fascial Hydrorelease Occur Within Stacking Fascia: A Retrospective Analysis of a Large Video Archive
by Hiroaki Kimura, Tadanao Hiroki, Tadashi Kobayashi and Hideaki Obata
Med. Sci. 2026, 14(3), 350; https://doi.org/10.3390/medsci14030350 (registering DOI) - 27 Jun 2026
Viewed by 70
Abstract
Background/Objectives: Ultrasound-guided fascial hydrorelease (FHR) occasionally elicits a brief localized contraction (“local twitch”) at the moment the needle tip contacts a fascial layer; the anatomical basis of this reaction has not yet been systematically characterized. To examine local twitch occurrence relative to [...] Read more.
Background/Objectives: Ultrasound-guided fascial hydrorelease (FHR) occasionally elicits a brief localized contraction (“local twitch”) at the moment the needle tip contacts a fascial layer; the anatomical basis of this reaction has not yet been systematically characterized. To examine local twitch occurrence relative to stacking fascia (yes/no) at the needle tip (primary outcome), as well as the anatomical distribution and per-video capture rate (secondary outcomes). Methods: We retrospectively analyzed 11,205 ultrasound videos from a single pain clinic (October 2015–March 2026). Twitches were identified by prospective clinical observation and computational screening (frame-difference-based Profile Match classifier; 417 candidates over 30 review rounds). The stacking fascia status was independently determined by two FHR-experienced clinicians, with discordant cases jointly adjudicated. Results: Inter-rater agreement was 86/90 (95.6%; 95% CI 89.0–98.8%); one case was reassessed, deemed to not be a twitch, and excluded. In the final cohort (n = 89), local twitches occurred at stacking fascia in 89/89 (100%; 95% CI 95.9–100%). Events were concentrated in gluteal/pelvic (51%) and lumbar paraspinal (29%) regions, with a per-video capture rate of 0.98% (110/11,205; 95% CI 0.81–1.18%). Conclusions: Local twitches during ultrasound-guided FHR essentially always coincide with the needle tip lying within stacking fascia, identifying this as the structural locus within this cohort. This figure represents inclusion-criterion-bound selectivity within the twitch-positive subset, not the positive predictive value of stacking fascia for twitch occurrence. Full article
Show Figures

Graphical abstract

32 pages, 16203 KB  
Article
Sub-Frame Contact-Onset Estimation in a Self-Calibrated BJT Thermal Pixel Array Using a Four-Frame erfc Template
by Yinglei Ma and Fei Xiao
Sensors 2026, 26(13), 4074; https://doi.org/10.3390/s26134074 (registering DOI) - 26 Jun 2026
Viewed by 225
Abstract
Low-cost bipolar-junction-transistor (BJT) thermal pixel arrays provide robust, force-free contact sensing for tactile skins, but their slow frame rate confines contact-timing resolution to the inter-frame interval—252 ms at the 4 Hz rate of the 16 × 16 array studied here—well below the needs [...] Read more.
Low-cost bipolar-junction-transistor (BJT) thermal pixel arrays provide robust, force-free contact sensing for tactile skins, but their slow frame rate confines contact-timing resolution to the inter-frame interval—252 ms at the 4 Hz rate of the 16 × 16 array studied here—well below the needs of contact-aware control. We propose a four-frame complementary-error-function (erfc) template, derived from one-dimensional semi-infinite heat conduction, that jointly estimates the contact amplitude, the thermal-diffusion parameter, and the sub-frame contact-onset offset (τ1), solved by a grid-initialized semi-analytic Levenberg–Marquardt scheme (Path A) at deterministic single-pass cost. On 42 contacts from five subjects, the per-contact Cramér–Rao lower bound for τ1 is 16.2 ms, and the empirical cross-contact dispersion is 83.5 ms; both are internal, model-derived quantities, since no synchronised external timing reference was available. A two-layer rejection pipeline separates 19/19 valid contacts from 2/2 hardware faults; transfers to four held-out subjects (23/23) without retuning; attains an overall AUC of 0.878 on a five-class synthetic disturbance library—ramp and saturating-exponential remain acknowledged failure modes; and rejects 5/6 disturbance trials in a real-airflow stress session. Larger independent cohorts and externally synchronised timing validation remain parameters for future work. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

23 pages, 10651 KB  
Article
Reusable Adjoint-Octree MLFMA for Full-Wave Radar Signature Analysis of Multi-State UAV Formations
by Haili Zhang, Song Ye, Gen Wang, Chuanyu Fan and Shuangbing Liu
Eng 2026, 7(7), 308; https://doi.org/10.3390/eng7070308 - 25 Jun 2026
Viewed by 150
Abstract
This study presents a reusable adjoint-octree multilevel fast multipole algorithm (MLFMA) for full-wave radar scattering analysis of multi-state unmanned aerial vehicle (UAV) formations. The method is motivated by remote-sensing applications in which dense angular sampling or long motion sequences are required for physically [...] Read more.
This study presents a reusable adjoint-octree multilevel fast multipole algorithm (MLFMA) for full-wave radar scattering analysis of multi-state unmanned aerial vehicle (UAV) formations. The method is motivated by remote-sensing applications in which dense angular sampling or long motion sequences are required for physically reliable signature generation. Instead of rebuilding a global octree for the full formation at every motion state, the proposed approach assigns each sub-target an independent target-attached local octree that translates and rotates with the rigid body. This preserves mesh–cell affiliation in the body-fixed frame and separates the system operator into a state-invariant intra-target near-field component and a state-dependent inter-target far-field component. Consequently, near-field matrices and sparse approximate inverse preconditioners are assembled once and reused throughout the state sequence, while only inter-target far-field coupling terms are updated. The method is evaluated for six representative UAV formations at 3.5 GHz using monostatic radar cross section (RCS) over a full azimuth sweep. Across all tested formations, the proposed solver reproduces the RCS behavior of conventional MLFMA while substantially reducing computational cost. For Formation A, the center-state total time decreases from 251.4 s to 66.06 s; for Formation C, it decreases from 470.95 s to 76.06 s. Over 100-state sequences, the resulting acceleration reaches approximately 11.8-fold and 15.2-fold, respectively. Jitter-envelope analysis further shows that orientation perturbation produces stronger signature uncertainty than planar displacement. The proposed framework therefore provides an efficient and physically consistent forward solver for radar remote-sensing studies of cooperative UAV formations. Full article
(This article belongs to the Section Electrical and Electronic Engineering)
Show Figures

Figure 1

27 pages, 36204 KB  
Article
Full-Field 3D Displacement Measurement of Suspended Ceiling Systems Under Seismic Loading Using a Consumer-Grade Multi-Camera Framework
by Mearge Kahsay Seyfu, Yuan-Sen Yang, Cameron C. W. Flude, David T. Lau, Jeffrey Erochko and Hung-Wei Liu
Sensors 2026, 26(13), 4011; https://doi.org/10.3390/s26134011 - 24 Jun 2026
Viewed by 198
Abstract
Suspended ceiling systems are among the most seismically vulnerable non-structural components in buildings, posing significant life-safety risks and economic losses, yet understanding their full-field kinematic behavior under seismic loading remains a major experimental challenge. Conventional contact sensors offer limited spatial coverage and can [...] Read more.
Suspended ceiling systems are among the most seismically vulnerable non-structural components in buildings, posing significant life-safety risks and economic losses, yet understanding their full-field kinematic behavior under seismic loading remains a major experimental challenge. Conventional contact sensors offer limited spatial coverage and can alter the dynamic properties of lightweight panels due to mass loading. In contrast, non-contact optical alternatives are rarely feasible in shake-table environments due to restricted viewing angles, extensive areal coverage requirements, and the risk of equipment damage from falling panels. This study proposes an end-to-end three-dimensional displacement measurement framework for large-scale shake-table testing of suspended ceiling systems, employing consumer-grade cameras with purpose-built tools that cover the complete experimental workflow, including motion-based video trimming, semi-automated calibration, a robust multi-stage image-tracking pipeline that maintains trajectory continuity under extreme inter-frame displacements, and a ceiling system motion visualization and analysis tool. The framework was validated through a full-scale shake-table experiment continuously tracking 324 spatial nodes across 81 ceiling panels, achieving an RMSE below 3 mm in all spatial directions and exact peak-frequency agreement in 9 out of 10 test cases. A parallel processing architecture reduced total processing time from over 27 h to under 10 min without GPU acceleration, and six-degree-of-freedom rigid-body analysis resolved the complete panel failure sequence from constrained oscillation through multi-axis rotation to gravitational free fall, a level of kinematic detail unattainable with conventional instrumentation. This framework establishes a practical, scalable foundation for full-field seismic performance assessment of non-structural systems where conventional instrumentation is physically or logistically infeasible. Full article
(This article belongs to the Special Issue Advanced Sensors for Image Processing and Analysis)
Show Figures

Figure 1

17 pages, 1312 KB  
Article
DCP-TS: A Unified Spatiotemporal Framework for Real-Time Desmoking and Flicker Suppression in Laparoscopic Surgical Videos
by Chun-Hsien Wu, Chih-Yi Lin and Yi-Chun Du
Bioengineering 2026, 13(7), 714; https://doi.org/10.3390/bioengineering13070714 - 23 Jun 2026
Viewed by 194
Abstract
Surgical smoke generated by energy-based instruments during minimally invasive surgery severely degrades intraoperative visibility in laparoscopic procedures, prolonging operation time and elevating surgical risk. Although deep-learning desmoking methods have improved spatial clarity, most operate frame-by-frame and produce temporal artifacts—flicker, brightness drift, and color [...] Read more.
Surgical smoke generated by energy-based instruments during minimally invasive surgery severely degrades intraoperative visibility in laparoscopic procedures, prolonging operation time and elevating surgical risk. Although deep-learning desmoking methods have improved spatial clarity, most operate frame-by-frame and produce temporal artifacts—flicker, brightness drift, and color instability—that hinder clinical adoption. To our knowledge, no prior framework has jointly addressed spatial restoration and temporal consistency within a unified surgical smoke removal pipeline. We proposed DCP-TS, a unified spatiotemporal framework that coupled a Dark Channel Prior (DCP)-guided conditional generative adversarial network (cGAN) with an inference-time module integrating optical flow alignment, exponential moving-average luminance smoothing, and adaptive gamma correction. A key novelty was that this stabilizer was smoke-aware and operated entirely at inference time, requiring no retraining or post-processing, which distinguished it from generic video temporal-consistency methods. On laparoscopic colorectal surgery videos, DCP-TS achieved a PSNR of 23.39 dB, SSIM of 0.62, NIQE of 4.17, and BRISQUE of 23.66, outperforming DehazeFormer and Colores et al. across all metrics. Temporal analysis showed an approximate 28% reduction in inter-frame luminance variation, and a double-blind reader study with five experienced laparoscopic surgeons confirmed substantial improvements in brightness stability (4.37 vs. 2.86) and overall perceptual quality (4.18 vs. 3.51 on a 5-point Likert scale). The system ran at 22 fps with ~3.9 GB GPU memory on standard operating-room hardware, supporting real-time intraoperative deployment. DCP-TS demonstrated that physics-guided spatiotemporal modeling could transform frame-by-frame desmoking into a clinically promising, perceptually more continuous video stream. Full article
Show Figures

Figure 1

22 pages, 392 KB  
Article
A Low-Power JPEG XS Frame Buffer Codec for On-Chip Display Systems
by Piotr Chodorowski and Dariusz Kania
Appl. Sci. 2026, 16(12), 6263; https://doi.org/10.3390/app16126263 - 22 Jun 2026
Viewed by 262
Abstract
Power consumption in portable display systems is significantly affected by the energy cost of frame buffer memory accesses between the graphics processing unit (GPU) and the display processing unit (DPU). This paper presents the design and FPGA implementation of a visually lossless frame [...] Read more.
Power consumption in portable display systems is significantly affected by the energy cost of frame buffer memory accesses between the graphics processing unit (GPU) and the display processing unit (DPU). This paper presents the design and FPGA implementation of a visually lossless frame buffer codec based on the JPEG XS standard, intended for integration into on-chip systems to reduce memory bandwidth and associated power consumption. The codec is implemented in VHDL and targets the AMD Artix UltraScale+ xcau15p-2ffvb676e device. The codec supports both the standard ISO/IEC 21122 entropy coding path and a simplified non-standard Golomb–Rice mode intended for closed on-chip systems. Post-place-and-route results at PPC = 4 show that the Standard Precinct codec occupies 22.0% of device LUTs, while the proposed Golomb–Rice variant requires only 15.8%. At a compression ratio of 11:1, the codec achieves a PSNR of 40.20 dB, consistent with visually lossless operation reported for JPEG XS. Power estimation at 200 MHz shows that the Golomb–Rice mode reduces total codec power consumption by 44 mW (4.7%) relative to the Standard Precinct mode, with the decoder contributing the majority of this saving. The proposed solution is applicable to portable devices with built-in displays, including smartphones, tablets, and augmented reality headsets, where tile-based frame buffer compression is required without inter-frame dependencies. Full article
Show Figures

Figure 1

21 pages, 20806 KB  
Article
Research on Spanning Tree Topology Optimization and Pyramid-Based Fine Alignment Algorithm for Multi-View Point Cloud Registration
by Chang Deng, Pingqing Fan and Hongzhou Chen
Information 2026, 17(6), 611; https://doi.org/10.3390/info17060611 - 19 Jun 2026
Viewed by 256
Abstract
Multi-view point cloud registration is a fundamental technology for 3D reconstruction and indoor robot navigation and remains a core challenge for robust environmental perception. Its key difficulty lies in achieving globally consistent alignment of multiple partially overlapping point clouds efficiently and reliably. To [...] Read more.
Multi-view point cloud registration is a fundamental technology for 3D reconstruction and indoor robot navigation and remains a core challenge for robust environmental perception. Its key difficulty lies in achieving globally consistent alignment of multiple partially overlapping point clouds efficiently and reliably. To address the limitations of existing methods, including low registration accuracy under small overlaps, severe error accumulation in long sequences, and the difficulty of balancing computational efficiency with global consistency, this paper proposes a multi-view point cloud registration framework that integrates spanning tree-based global topology constraints with a multi-scale pyramid-based local refinement strategy, specifically validated for indoor environments. First, a Voxel-Guided Normal Consistency Keypoint Extraction (VG-NCKE) method is presented. It leverages voxel grids to guide stable computation of local geometric features and filters candidate keypoints using a neighborhood normal direction consistency metric, effectively improving keypoint repeatability and spatial uniformity on unevenly distributed point clouds. Second, a coarse registration strategy with global constraints is constructed based on the Overlap Confidence-weighted Minimum Spanning Tree (OC-WST). It quantifies inter-frame overlap reliability as edge weights and employs Prim’s algorithm to build the minimum spanning tree as the topological skeleton for global registration. By prioritizing high-overlap frame pairs, the method suppresses error propagation and reduces the complexity of multi-view registration. Additionally, a multi-scale pyramid ICP fine registration algorithm is designed. It adopts a point-to-plane error model instead of the traditional point-to-point distance metric and performs progressive optimization through a three-layer point cloud pyramid from coarse to fine. This expands the convergence basin and gradually improves alignment accuracy, mitigating the sensitivity of single-scale ICP to initial poses. Extensive experiments on the indoor 3DMatch dataset and real indoor LiDAR sequences demonstrate that the proposed method outperforms competing approaches in terms of registration accuracy, computational efficiency, and long-sequence robustness, validating its effectiveness for indoor multi-view point cloud registration tasks. Full article
(This article belongs to the Section Information Applications)
Show Figures

Figure 1

16 pages, 600 KB  
Review
Inter-Hemispheric Coordination and Ageing in Visual Working Memory: A Distributed Framework
by Jean-François Delvenne
Brain Sci. 2026, 16(6), 641; https://doi.org/10.3390/brainsci16060641 - 16 Jun 2026
Viewed by 204
Abstract
Visual working memory (VWM) declines with age and has been explained by multiple mechanisms, including reduced precision, capacity limitations, binding deficits, and altered attentional control. However, these accounts are typically framed within a unitary processing architecture and do not fully capture the distributed [...] Read more.
Visual working memory (VWM) declines with age and has been explained by multiple mechanisms, including reduced precision, capacity limitations, binding deficits, and altered attentional control. However, these accounts are typically framed within a unitary processing architecture and do not fully capture the distributed nature of visual cognition. This review advances a coordination-based framework in which age-related differences in VWM are understood as partly reflecting reduced efficiency in integrating and regulating representations across the two cerebral hemispheres. Behavioural, electrophysiological, and neurophysiological evidence is synthesised to characterise the role of inter-hemispheric communication in VWM. Age-related changes in corpus callosum structure and function are then considered in relation to these coordination processes. Deficits in precision, capacity, binding, and attention are proposed to reflect different behavioural expressions of a common limitation in coordinating distributed representations, providing a unifying account of multiple behavioural signatures, particularly under conditions that place high demands on inter-hemispheric coordination. The framework offers a mechanistic explanation of the task-dependent nature of ageing effects and generates testable predictions for future research, highlighting the role of network-level coordination mechanisms in cognitive ageing. Full article
(This article belongs to the Special Issue Ageing and Visual Working Memory: Cognitive and Neural Perspectives)
Show Figures

Graphical abstract

24 pages, 8539 KB  
Article
Temporally Consistent Student Behavior Recognition in Smart Classrooms via Attention-Guided Perception and State Estimation
by Shuzhao Zong, Chenyang He, Peng Sun and Chenliang Ma
Electronics 2026, 15(12), 2644; https://doi.org/10.3390/electronics15122644 - 15 Jun 2026
Viewed by 208
Abstract
Recognizing student behaviors in classroom videos remains challenging due to complex backgrounds, frequent occlusions, subtle inter-class motion differences, and temporal jitter in frame-wise predictions. To address these issues, this paper proposes a hybrid student behavior recognition framework that integrates a Multi-branch Spatiotemporal Attention [...] Read more.
Recognizing student behaviors in classroom videos remains challenging due to complex backgrounds, frequent occlusions, subtle inter-class motion differences, and temporal jitter in frame-wise predictions. To address these issues, this paper proposes a hybrid student behavior recognition framework that integrates a Multi-branch Spatiotemporal Attention Network (MSTA-Net) with a Behavior State Kalman Filter (BSKF). At the perceptual level, MSTA-Net employs decoupled channel, spatial, and short-term temporal attention branches to enhance discriminative behavioral features while suppressing irrelevant background information. At the cognitive level, BSKF reformulates behavior recognition as a continuous state estimation problem in a high-dimensional probability space, where behavioral inertia is exploited to smooth noisy observations and improve temporal consistency. Experimental results on the SCB-Dataset and real-world classroom video sequences demonstrate that the proposed method achieves an accuracy of 94.7% and a real-time inference speed of 33 FPS. Compared with purely deep learning-based models, the proposed framework reduces the Action Category Switching (ACS) rate by 50%, indicating substantially improved robustness in long-term behavior recognition. These results suggest that coupling attention-based perception with Kalman-based state estimation provides an effective and efficient solution for reliable student behavior analysis in intelligent classroom environments. Full article
Show Figures

Figure 1

19 pages, 1688 KB  
Article
Deep Learning-Based Evaluation of Maxillary Dental Midline Deviation on Orthodontic Frontal Photographs
by Sercan Taskin, Serra Aksoy, Mine Gecgelen Cesur, Pinar Demircioglu and Ismail Bogrekci
Bioengineering 2026, 13(6), 687; https://doi.org/10.3390/bioengineering13060687 - 15 Jun 2026
Viewed by 338
Abstract
Aim: This study aimed to detect the maxillary dental midline region on orthodontic frontal photographs using a YOLOv8-based deep learning approach and to evaluate how the detection outputs affect the classification performance of various machine learning algorithms in distinguishing symmetric from asymmetric midline [...] Read more.
Aim: This study aimed to detect the maxillary dental midline region on orthodontic frontal photographs using a YOLOv8-based deep learning approach and to evaluate how the detection outputs affect the classification performance of various machine learning algorithms in distinguishing symmetric from asymmetric midline conditions. Materials and Methods: A total of 146 standardized frontal photographs (72 with midline deviation ≥ 2 mm from the facial midline, defined by the soft-tissue nasion–subnasal line; 74 symmetric) were analyzed. YOLOv8 was used to obtain bounding-box and keypoint predictions, which were converted into a numerical feature vector and used to train 11 classifiers (including Naive Bayes, Logistic Regression with L1 and ElasticNet penalties, Support Vector Machine, AdaBoost, and others). Performance was assessed using accuracy (with 95% Wilson confidence intervals), precision, recall, F1-score, and ROC-AUC. Optimization of hyperparameters for the downstream classifiers employed five-fold cross-validation along with grid search inside the training data set (n = 126) while final classifier assessment was done using a reserved test data set (n = 20). As the YOLOv8 object detector was trained using the full image dataset before extracting features, the classification metrics presented here should be considered as exploratory results only. Results: YOLOv8 achieved mAP@0.5 = 0.995 for midline detection. Naive Bayes attained the highest classification accuracy of 75% (95% CI: 53–89%) with ROC-AUC = 0.75. AdaBoost achieved 65% (95% CI: 43–82%). Several models defaulted to majority-class prediction (accuracy = 40%), indicating insufficient feature discriminability. Conclusions: YOLOv8 detected the maxillary dental midline under the present internal experimental conditions. However, because leakage-free outer k-fold validation of the complete detection-plus-classification pipeline was not performed, the classification results should be considered preliminary. Future work should address information leakage, incorporate facial reference frame normalization, include inter-observer reliability assessment, and validate the approach on larger datasets. Full article
Show Figures

Figure 1

22 pages, 3318 KB  
Article
Research on Global Seismic Reliability Analysis of Steel Frames Based on Machine Learning
by Ziyang Wu, Dewei Kong, Mingming Jia and Xianbao Li
Buildings 2026, 16(12), 2379; https://doi.org/10.3390/buildings16122379 - 14 Jun 2026
Viewed by 280
Abstract
Seismic reliability assessment of steel frame structures using nonlinear finite element analysis is often hindered by implicit limit state functions and high computational cost. To address these challenges, this study proposes a machine learning-based framework for global seismic reliability analysis. A nine-story steel [...] Read more.
Seismic reliability assessment of steel frame structures using nonlinear finite element analysis is often hindered by implicit limit state functions and high computational cost. To address these challenges, this study proposes a machine learning-based framework for global seismic reliability analysis. A nine-story steel frame model is established and validated through modal and pushover analysis. Global sensitivity analysis using the Sobol’ method is performed to identify key parameters governing the maximum inter-story drift ratio. Three machine learning models—PSO-SVR, PSO-XGBoost, and PSO-BPNN—are trained with the selected features and integrated into Monte Carlo simulation (MCS) for reliability calculation. The results show that the PSO-BPNN model achieves the highest accuracy with the maximum error of 1.0259% relative to direct MCS, outperforming the conventional MLE-based approach, which yields errors up to 11.9383% due to the non-standard distribution of the structural response. The impact of training sample size on model performance is also examined, with 1000 samples identified as a practical threshold for acceptable prediction accuracy. Existing code design methods require modifications based on the total probability approach for global reliability analysis. This study offers an efficient and precise methodology for seismic reliability design of steel frame structures, particularly when structural responses deviate from standard parametric distributions. Full article
(This article belongs to the Special Issue Resilience Analysis and Intelligent Simulation in Civil Engineering)
Show Figures

Figure 1

18 pages, 29379 KB  
Data Descriptor
A Markerless RGB-Based Dataset of Continuous Hand Joint Kinematics in Functional Grasping Tasks
by Shubham Yadav and Jyotindra Narayan
Data 2026, 11(6), 142; https://doi.org/10.3390/data11060142 - 12 Jun 2026
Viewed by 367
Abstract
The majority of currently available hand kinematic databases have been gathered using expensive marker-based systems or are restricted to a particular gesture-recognition task, failing to capture the dynamic nature of joints when the hand is engaged with an object. To address this gap, [...] Read more.
The majority of currently available hand kinematic databases have been gathered using expensive marker-based systems or are restricted to a particular gesture-recognition task, failing to capture the dynamic nature of joints when the hand is engaged with an object. To address this gap, we introduce the RGB-based Hand Joint Kinematics (RGB-HJK) dataset, a publicly available collection of continuous, frame-level 3D joint angle trajectories, recorded while ten healthy adults (six male, four female; age 25.8±3.2 years; BMI 22.8±2.0 kg/m2) performed five standardized object interaction grasps: Power Grasp (cylindrical bottle), Tripod Grasp (pen), Static Power Hold (smartphone), Precision Pinch (thin paper), and Lateral Pinch (book). Data were collected using a standard RGB camera and the MediaPipe Hands markerless pipeline at 26.95±0.29 Hz, a rate that was stable across all subjects. Each participant completed five trials for each grasp type. After filtering using active hold, 28,111 validated frames remained, with a 100% detection rate for all 250 trials. Intra-subject repeatability was good (mean SD 7.9° across all joint grasp combinations) and inter-subject variability was within the range expected based on normal anatomical diversity. Importantly, kinematic validation of the Index Proximal Interphalangeal (PIP) joint (61.8° ± 18.4°) showed values consistent with ranges reported in previous studies using instrumented gloves and depth sensors. Principal Component Analysis (PCA) confirmed clear linear separability among the five grasp configurations. Unlike existing datasets, the RGB-HJK method does not compromise the natural sense of touch and is free of hardware occlusions, thereby providing an easily accessible ecological baseline. Full article
Show Figures

Figure 1

Back to TopTop