Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (2,596)

Search Parameters:
Keywords = video dataset

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
25 pages, 10933 KB  
Article
Combining Video Magnification with Machine Learning-Based Source Identification for Contactless Heart Rate Monitoring
by Tiago de Avelar, Vicente M. Garção and Hugo Plácido da Silva
Sensors 2026, 26(9), 2706; https://doi.org/10.3390/s26092706 - 27 Apr 2026
Abstract
Conventional contact-based monitoring of heart rate (HR) presents challenges such as patient discomfort, skin irritation, and poor long-term adherence, motivating the development of contactless, video-based sensing systems. This study proposes a robust hybrid framework combining advanced signal processing with machine learning to enhance [...] Read more.
Conventional contact-based monitoring of heart rate (HR) presents challenges such as patient discomfort, skin irritation, and poor long-term adherence, motivating the development of contactless, video-based sensing systems. This study proposes a robust hybrid framework combining advanced signal processing with machine learning to enhance HR estimation accuracy from facial video. The methodology integrates a two-stage geometric stabilization pipeline with dense facial tessellation to mitigate motion. Eulerian Video Magnification (EVM) amplifies subtle color variations, followed by chrominance-based roi filtering. Signal recovery utilizes a sliding-window Principal Component Analysis (PCA) for local coherence, followed by Second-Order Blind Identification (SOBI), with a Light Gradient Boosting Machine (LightGBM) classifier employed to automatically identify physiological sources. Validated on the challenging COHFACE dataset, the approach achieves a Mean Absolute Error (MAE) of 1.50 bpm, a Root Mean Square Error (RMSE) of 3.07 bpm, and a Pearson Correlation Coefficient (PCC) of 0.97 on the test set. The method demonstrates robustness across diverse lighting conditions, outperforming traditional algorithms and achieving parity with state-of-the-art deep learning models, while offering an interpretable solution for contactless health monitoring. Full article
(This article belongs to the Special Issue Machine Learning in Biomedical Signal Processing)
20 pages, 1844 KB  
Article
AI-Enhanced Prognostic Model for Predicting Polyp Recurrence and Guiding Post-Polypectomy Surveillance Intervals Using the ERCPMP-V5 Dataset
by Sri Harsha Boppana, Sachin Sravan Kumar Komati, Ritwik Raj, Gautam Maddineni, Raja Chandra Chakinala, Pradeep Yarra, Venkata C. K. Sunkesula and Cyrus David Mintz
J. Clin. Med. 2026, 15(9), 3303; https://doi.org/10.3390/jcm15093303 - 26 Apr 2026
Abstract
Introduction: Colorectal cancer remains a leading cause of cancer-related morbidity and mortality, with adenomatous polyps representing a common precursor. Post-polypectomy polyp recurrence represents a significant risk of colorectal cancer, driving periodic colonoscopy surveillance and polypectomy as needed. In this study, we explore a [...] Read more.
Introduction: Colorectal cancer remains a leading cause of cancer-related morbidity and mortality, with adenomatous polyps representing a common precursor. Post-polypectomy polyp recurrence represents a significant risk of colorectal cancer, driving periodic colonoscopy surveillance and polypectomy as needed. In this study, we explore a multimodal machine learning approach that integrates endoscopic imaging with clinical and pathology data to improve recurrence risk prediction and support individualized surveillance planning. Methods: We developed and evaluated a multimodal artificial intelligence (AI) model to predict post-polypectomy colorectal polyp recurrence using the ERCPMP-v5 dataset. The cohort included 217 patients with 796 high-resolution endoscopic RGB images and 21 endoscopic videos; video data were converted to still frames at 2 frames per second. Images and frames were resized to 224 × 224 pixels and normalized. Patient-level demographic, morphological (Paris, Kudo Pit, JNET), anatomical, and pathological variables were encoded using standard scaling for continuous features and one-hot encoding for categorical features. Visual representations were extracted using a pretrained Vision Transformer backbone (ViT-Base-Patch16-224) with frozen weights. Structured metadata (79 variables) was encoded using a multilayer perceptron. A late fusion framework used image and metadata representations to generate a recurrence probability via a sigmoid classifier; probabilities were thresholded at 0.5 for binary prediction. Model performance was evaluated on a held-out test set using accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC). We additionally compared fusion performance with image-only and metadata-only baselines. Predicted probabilities were translated to surveillance recommendations using risk tiers: low risk (0.00 ≤ p < 0.20), moderate risk (0.20 ≤ p < 0.50), and high risk (p ≥ 0.50). Results: On the test set, the multimodal fusion model achieved 90.4% accuracy, 86.7% precision, 83.1% recall, 84.9% F1-score, and an AUC of 0.920. The image-only model achieved 84.6% accuracy (AUC 0.880), and the metadata-only model achieved 81.9% accuracy (AUC 0.850), indicating improved performance with multimodal fusion. Risk stratification enabled surveillance recommendations of 1–3 years for low risk, 6–12 months for moderate risk, and 3–6 months for high risk. Conclusions: A late-fusion multimodal model integrating endoscopic imaging with structured clinical and pathology variables demonstrated excellent performance for predicting post-polypectomy recurrence and generated actionable risk-based surveillance intervals. This approach may support individualized follow-up planning and more efficient allocation of surveillance resources, while prioritizing timely evaluation for patients at higher predicted risk. Full article
Show Figures

Graphical abstract

16 pages, 1132 KB  
Article
Mamba-Based Video Analysis for Blood Pressure Estimation
by Walaa Othman, Batol Hamoud, Nikolay Shilov, Alexey Kashevnik and Alexander Mayatin
Big Data Cogn. Comput. 2026, 10(5), 133; https://doi.org/10.3390/bdcc10050133 - 26 Apr 2026
Viewed by 34
Abstract
Blood pressure monitoring is important for overall health assessment, yet traditional cuff-based methods are intrusive and unsuitable for continuous monitoring. This paper proposes a contactless approach for blood pressure estimation from facial videos using a bidirectional Mamba-based architecture with uncertainty quantification. Our method [...] Read more.
Blood pressure monitoring is important for overall health assessment, yet traditional cuff-based methods are intrusive and unsuitable for continuous monitoring. This paper proposes a contactless approach for blood pressure estimation from facial videos using a bidirectional Mamba-based architecture with uncertainty quantification. Our method processes 64-frame video segments through a hierarchical 3D convolutional encoder to extract spatiotemporal features, then applies bidirectional state-space modeling to capture temporal dynamics efficiently. The model was evaluated on the Vitals for Vision (V4V) dataset, achieving mean absolute errors of 13.15 mmHg for systolic and 9.56 mmHg for diastolic blood pressure, outperforming prior methods while requiring significantly fewer computational resources than attention-based approaches. While these results do not meet clinical-grade diagnostic standards, they demonstrate the feasibility of contactless blood pressure estimation for non-clinical applications such as wellness monitoring, preliminary health screening, and continuous remote observation, where unobtrusive and computationally efficient monitoring is desirable. Full article
(This article belongs to the Section Data Mining and Machine Learning)
16 pages, 4919 KB  
Article
EA-UNET: An Enhanced and Efficient Model for Left-Turn Lane
by Haowei Wang, Haixin Liu, Fei Wang, Xingbin Chen, Baogang Li and Jiang Liu
Sensors 2026, 26(9), 2642; https://doi.org/10.3390/s26092642 - 24 Apr 2026
Viewed by 122
Abstract
Left-turn lanes are critical elements of urban intersections. Accurate and efficient lane detection is essential for the safe navigation of autonomous vehicles. To address the limitations of existing semantic segmentation algorithms—specifically, inadequate detection accuracy, high computational cost, and vulnerability to environmental disturbances—we propose [...] Read more.
Left-turn lanes are critical elements of urban intersections. Accurate and efficient lane detection is essential for the safe navigation of autonomous vehicles. To address the limitations of existing semantic segmentation algorithms—specifically, inadequate detection accuracy, high computational cost, and vulnerability to environmental disturbances—we propose a lightweight deep convolutional neural network named EA-UNet. First, we replace the standard U-Net encoder with EfficientNet-B0 to enhance feature extraction efficiency. Second, we introduce a novel contextual coordination module, termed MP-ASPP, which integrates a Convolutional Block Attention Module (CBAM) to further refine attention mechanisms. Finally, a comprehensive real-world dataset was constructed by collecting videos and images of left-turn waiting areas during real-vehicle testing. Experimental results demonstrate that EA-UNet significantly outperforms the baseline U-Net and other state-of-the-art models, achieving accurate and efficient segmentation of left-turn lanes even in complex scenes. Full article
(This article belongs to the Section Vehicular Sensing)
21 pages, 1928 KB  
Article
Road Traffic Anomaly Detection by Human-Attention-Assisted Text–Vision Learning
by Yachuang Chai and Wushouer Silamu
Sensors 2026, 26(9), 2638; https://doi.org/10.3390/s26092638 - 24 Apr 2026
Viewed by 138
Abstract
With the rapid development of society, the number of road vehicles has increased significantly, leading to a growing severity of traffic accident issues. Timely and accurate detection of road traffic anomalies or accidents is crucial for reducing fatalities and alleviating traffic congestion. Consequently, [...] Read more.
With the rapid development of society, the number of road vehicles has increased significantly, leading to a growing severity of traffic accident issues. Timely and accurate detection of road traffic anomalies or accidents is crucial for reducing fatalities and alleviating traffic congestion. Consequently, the detection of road traffic anomalies has become a focal point of research in recent years. With the assistance of computer technologies such as deep learning, researchers have developed more accurate and effective methods for detecting road traffic anomalies. However, the small proportion of anomaly-prone areas in surveillance video frames, combined with the complex and difficult-to-capture patterns of accidents, presents new challenges for the application of deep models to traffic anomaly detection from a surveillance perspective. In light of this, this paper annotates the TADS dataset we previously proposed, a popular text-assisted video representation learning method, to develop a more efficient detection method. Utilizing the well-known video-text model CLIP, we have constructed a detection model that leverages unique text and eye-gaze annotation data from the TADS dataset to learn anomaly representations more effectively, thereby improving the detection of road traffic anomalies from a surveillance perspective. Experimental results demonstrate the superiority of our model for detecting traffic anomalies from a surveillance perspective, as well as the utility of the text and eye-gaze data included in the dataset. Full article
(This article belongs to the Section Sensing and Imaging)
30 pages, 1431 KB  
Article
Feasibility Analysis of Static-Image-Based Traffic Accident Detection Under Domain Shift for Edge-AI Surveillance Systems
by Chien-Chung Wu and Wei-Cheng Chen
Electronics 2026, 15(9), 1803; https://doi.org/10.3390/electronics15091803 - 23 Apr 2026
Viewed by 120
Abstract
Traffic accident detection is a critical component of intelligent transportation systems (ITS), enabling timely incident response and traffic management. While most existing approaches rely on temporal information from video sequences, such methods are not always applicable in resource-constrained surveillance environments. This study investigates [...] Read more.
Traffic accident detection is a critical component of intelligent transportation systems (ITS), enabling timely incident response and traffic management. While most existing approaches rely on temporal information from video sequences, such methods are not always applicable in resource-constrained surveillance environments. This study investigates the feasibility of detecting traffic accidents from single static images by formulating the task as a binary classification problem. Representative architectures, including Vision Transformer (ViT), Swin Transformer, and ResNet-50, are systematically evaluated on the Car Crash Dataset (CCD) under multiple training configurations. To assess generalization capability, cross-domain evaluation is conducted using an external crash video dataset (ECVD) constructed to approximate real-world deployment conditions. Experimental results show that all models achieve strong performance under in-domain evaluation. However, cross-domain testing reveals substantial performance degradation, particularly in recall, indicating limited generalization capability under domain shift. Qualitative analysis further shows that missed detections are associated with weak visual cues, occlusion, and complex traffic environments, while false positives are caused by visually ambiguous patterns resembling accident scenarios. Unlike prior studies that primarily report performance improvements, this work provides empirical evidence that model behavior in static-image-based accident detection is governed by dataset composition rather than architectural design. Therefore, static-image-based accident detection should be interpreted as a coarse-level screening tool rather than a fully reliable decision-making system. This study highlights the importance of data-centric design and cross-domain evaluation for improving real-world applicability. Full article
(This article belongs to the Section Computer Science & Engineering)
33 pages, 17932 KB  
Article
Early Detection of Aggressive Human Behavior in Video Streams Using Deep Spatiotemporal Models
by Aida Issembayeva, Anargul Shaushenova, Ardak Nurpeisova, Aidar Ispussinov, Buldyryk Suleimenova, Anargul Bekenova, Aliya Satybaldieva, Aigul Zholmukhanova and Galiya Mauina
Computers 2026, 15(5), 267; https://doi.org/10.3390/computers15050267 - 23 Apr 2026
Viewed by 184
Abstract
In this paper, we propose a spatiotemporal approach for binary classification of violent and non-violent behavior in real-world settings. The experimental pipeline includes video preprocessing, stratified data splitting, generation of temporally structured clips, and comparative evaluation of baseline models, including a convolutional neural [...] Read more.
In this paper, we propose a spatiotemporal approach for binary classification of violent and non-violent behavior in real-world settings. The experimental pipeline includes video preprocessing, stratified data splitting, generation of temporally structured clips, and comparative evaluation of baseline models, including a convolutional neural network. We also developed a Residual Adaptive Motion Temporal Binary Heat Network model that combines frame color characteristics, residual motion descriptions, temporal feature fusion, an early risk assessment mechanism, and interpretable localization maps. Experiments were conducted on a balanced dataset of 2000 video clips. The proposed model demonstrated the best early warning performance: a supervision rate of 0.6, an F1 score of 0.9527, and a balanced accuracy of 0.9533. With full supervision, the F1 score was 0.9342, and the area under the receiver operating characteristic curve (AUC) was 0.9871. The practical significance of the work is that the proposed approach can be used as a decision support tool for the preliminary identification of potentially dangerous video fragments with subsequent manual verification, without the assumption of autonomous use in high-risk scenarios. Full article
(This article belongs to the Special Issue Deep Learning and Explainable Artificial Intelligence (2nd Edition))
Show Figures

Figure 1

23 pages, 3022 KB  
Article
Pedestrian Physiological Response Map Prediction Model for Street Audiovisual Environments Using LSTM Networks
by Jingwen Xing, Xuyuan He, Xinxin Li, Tianci Wang, Siqing Mao and Luyao Li
Buildings 2026, 16(9), 1648; https://doi.org/10.3390/buildings16091648 - 22 Apr 2026
Viewed by 138
Abstract
Existing studies of street-related emotional perception mainly rely on static scene evaluations, which cannot capture the cumulative effects of environmental exposure during continuous walking. To address this limitation, this study proposes a method for predicting pedestrian physiological responses in sequential audiovisual street environments. [...] Read more.
Existing studies of street-related emotional perception mainly rely on static scene evaluations, which cannot capture the cumulative effects of environmental exposure during continuous walking. To address this limitation, this study proposes a method for predicting pedestrian physiological responses in sequential audiovisual street environments. Four real-world walking routes were selected, with outbound and return directions treated as independent paths, yielding eight paths and 32 valid samples. EEG, ECG, sound pressure level, first-person video, and GPS data were synchronously collected to construct a 1 s multimodal time-series dataset. Pearson correlation, Kendall correlation, and mutual information analyses were used to examine linear, monotonic, and nonlinear relationships between environmental variables and physiological indicators, and the resulting weights were incorporated into a Long Short-Term Memory (LSTM) model for multi-step prediction. Visual elements and noise exposure were the main factors influencing physiological responses. Among the models, the mutual-information-weighted LSTM performed best, achieving an R2 of 0.77 for heart rate variability (RMSSD), whereas prediction of the EEG ratio (β/α and θ/β) remained limited. An additional independent street sample outside the training set was then used to generate a dual-dimensional EEG-ECG physiological response map, demonstrating the model’s potential for identifying emotional risk segments and supporting street-level micro-renewal. Full article
8 pages, 2823 KB  
Proceeding Paper
Innovative Filipino Sign Language Translation and Interpretation with MediaPipe
by Zylwyn A. Alejo, Nathan Cyvel Jann R. Fuentes, Maria Patricia Z. Lungay, Alpha Isabel D. Maniquez, Paul Emmanuel G. Empas and John Paul T. Cruz
Eng. Proc. 2026, 134(1), 75; https://doi.org/10.3390/engproc2026134075 - 22 Apr 2026
Viewed by 295
Abstract
Filipino Sign Language (FSL) serves as a vital means of communication for the Deaf and hard-of-hearing in the Philippines. However, its societal use remains limited due to the scarcity of qualified interpreters and the general lack of FSL literacy among the population. Therefore, [...] Read more.
Filipino Sign Language (FSL) serves as a vital means of communication for the Deaf and hard-of-hearing in the Philippines. However, its societal use remains limited due to the scarcity of qualified interpreters and the general lack of FSL literacy among the population. Therefore, this study aims to address the gap between FSL development and automated FSL translation by employing machine learning and computer vision techniques. A model was trained using the FSL-105 dataset, which comprises video clips of gestures related to greetings and colors, and utilized MediaPipe for real-time detection of hand, face, and body landmarks. Through iterative training with transfer learning, the model’s performance improved from an initial accuracy of 80% to a final accuracy of 98.75%. The results demonstrate that the MediaPipe-based model can reliably interpret FSL gestures, positioning it as a potentially accessible assistive tool for the Deaf and hard of hearing community. This technology holds promise for applications in education, healthcare, and public service, offering new opportunities to promote the social inclusion of Filipino Deaf communities through more inclusive communication. Full article
Show Figures

Figure 1

30 pages, 6186 KB  
Article
CABIF-Net: Robust Confidence-Based Audio-Visual Fusion for Fine-Grained Bird Recognition
by Zilong Li, Yan Zhang, Danju Lv and Yueyun Yu
Biology 2026, 15(8), 661; https://doi.org/10.3390/biology15080661 - 21 Apr 2026
Viewed by 278
Abstract
Fine-grained bird identification is crucial for ecosystem monitoring, species conservation, and habitat assessment. However, in real-world environments, there are challenges such as imbalances in modality quality and interference from background noise. To improve fine-grained audio-visual bird classification under heterogeneous modality conditions, we propose [...] Read more.
Fine-grained bird identification is crucial for ecosystem monitoring, species conservation, and habitat assessment. However, in real-world environments, there are challenges such as imbalances in modality quality and interference from background noise. To improve fine-grained audio-visual bird classification under heterogeneous modality conditions, we propose an audio-visual feature fusion framework named CABIF-Net. This framework introduces a confidence-based Top-K mean pooling module to select key frames to optimize the visual representations at the video level. Through a Confidence Calibration module, it dynamically assesses the reliability of the visual and audio modalities and integrates a Bidirectional Inter-modulation Fusion module to achieve controllable cross-modal information interaction. Experiments were conducted on the publicly available SSW60 dataset, characterized by severe noise and imbalance in modality quality, and the self-built Birds21 dataset with balanced modality quality. The experimental results show that the classification accuracies were 85.76% and 96.67%, respectively, outperforming existing unimodal methods and several mainstream fusion strategies. Weight distribution and visualization analyses further indicate that the proposed method can adaptively adjust the modality contributions based on discriminative evidence at the sample level. This study provides an effective framework for fine-grained audio-visual bird species recognition. Full article
Show Figures

Figure 1

23 pages, 53680 KB  
Article
A Movement Description Language for Functional Training Exercise Analysis
by Lúcia Sousa, Daniel Canedo, Pedro Santos and António Neves
J. Funct. Morphol. Kinesiol. 2026, 11(2), 162; https://doi.org/10.3390/jfmk11020162 - 21 Apr 2026
Viewed by 152
Abstract
Objective: Functional training exercises involve complex multi-joint movements that challenge traditional rule-based or data-driven recognition systems. This paper introduces a Movement Description Language (MDL) designed to formally represent, analyze, and evaluate such exercises using camera-based pose estimation and interpretable, composable structures. Methods: The [...] Read more.
Objective: Functional training exercises involve complex multi-joint movements that challenge traditional rule-based or data-driven recognition systems. This paper introduces a Movement Description Language (MDL) designed to formally represent, analyze, and evaluate such exercises using camera-based pose estimation and interpretable, composable structures. Methods: The proposed MDL models each exercise as a finite-state machine defined by pose-derived angle proxy transitions, allowing movements to be described in a modular and reusable way. Demonstrated with MediaPipe landmark extraction from monocular video, while the MDL remains compatible with any pose estimation algorithm, the framework focuses on exercise phase detection and repetition counting. Experimental validation was conducted on a dataset of 1513 videos of 12 functional exercises (squats, deadlifts, lunges, shoulder presses, planks, push-ups, pull-ups, bent-over rows, box jumps, thrusters, overhead squats, and burpees) obtained from public pose datasets, competition footage, and recordings of 9 participants in real-world environments. Results: Automated repetition counts were compared against manually annotated ground truth, showing an overall repetition-counting accuracy of 97.2%, with a mean per-exercise accuracy of 98.8% (range 95–100%). The MDL successfully handled both simple and compound exercises, maintaining reliable phase detection despite variations in execution speed, camera perspective, and environmental conditions. Conclusions: The system was implemented using real-time pose estimation to demonstrate the practical execution of the MDL framework. The proposed MDL provides a transparent, extensible, and computationally efficient framework for functional exercise analysis. By bridging human-readable movement semantics with executable motion logic, it enables interpretable automatic repetition counting and phase detection, offering an alternative to black-box recognition approaches. The results support its potential for scalable deployment in training, monitoring and movement analysis applications. The proposed system is not intended for biomechanical measurement or clinical-grade kinematic analysis, but rather for interpretable modeling of exercise structure and repetition detection using approximate pose-derived signals. Full article
(This article belongs to the Section Kinesiology and Biomechanics)
Show Figures

Figure 1

14 pages, 127365 KB  
Article
CGS-BR: Construction and Benchmarking of a Respiratory Behavior Dataset for the Chinese Giant Salamander
by Dingwei Mao, Yan Zhou, Maochun Wang, Chenyang Shi, Yuanqiong Chen and Qinghua Luo
Animals 2026, 16(8), 1272; https://doi.org/10.3390/ani16081272 - 21 Apr 2026
Viewed by 195
Abstract
The Chinese giant salamander (Andrias davidianus) is a nationally protected species in China, and its respiratory behavior serves as a key indicator of its physiological state, health status, and biological rhythm. However, research on intelligent monitoring of its respiratory behavior remains [...] Read more.
The Chinese giant salamander (Andrias davidianus) is a nationally protected species in China, and its respiratory behavior serves as a key indicator of its physiological state, health status, and biological rhythm. However, research on intelligent monitoring of its respiratory behavior remains limited due to several challenges, including the species’ nocturnal habits, resulting in low image contrast and poor quality in dark environments; extremely subtle breathing movements; and high-cost manual annotation, leading to a scarcity of high-quality annotated visual data. These factors severely constrain the application of deep learning techniques in this field. To support research on respiratory behavior monitoring in the Chinese giant salamander, this study constructs and releases the CGS-BR dataset, which is the first vision-based dataset dedicated specifically to respiratory behavior detection in this species. The dataset was collected under controlled simulated breeding conditions and consists of 1732 images extracted from 215 high-definition video clips. Following a standardized procedure, each complete respiratory cycle is manually annotated into four stages: head-up, diving, exhalation, and inhalation. To validate the effectiveness of this dataset, this study selects YOLOv8n as the baseline model, which balances detection accuracy, speed, and parameter count, enabling efficient giant salamander respiratory detection under limited resources. By comparing it with several representative models, we provide a reliable evaluation of the dataset’s applicability. CGS-BR aims to provide fundamental data support for research on respiratory monitoring in the Chinese giant salamander, laying the foundation for subsequent applications in conservation management, captive breeding, health monitoring, and early disease warning. Full article
(This article belongs to the Special Issue Artificial Intelligence as a Useful Tool in Behavioural Studies)
Show Figures

Figure 1

27 pages, 3995 KB  
Article
Video-Based Arabic Sign Language Recognition with Mediapipe and Deep Learning Techniques
by Dana El-Rushaidat, Nour Almohammad, Raine Yeh and Kinda Fayyad
J. Imaging 2026, 12(4), 177; https://doi.org/10.3390/jimaging12040177 - 20 Apr 2026
Viewed by 320
Abstract
This paper addresses the critical communication barrier experienced by deaf and hearing-impaired individuals in the Arab world through the development of an affordable, video-based Arabic Sign Language (ArSL) recognition system. Designed for broad accessibility, the system eliminates specialized hardware by leveraging standard mobile [...] Read more.
This paper addresses the critical communication barrier experienced by deaf and hearing-impaired individuals in the Arab world through the development of an affordable, video-based Arabic Sign Language (ArSL) recognition system. Designed for broad accessibility, the system eliminates specialized hardware by leveraging standard mobile or laptop cameras. Our methodology employs Mediapipe for real-time extraction of hand, face, and pose landmarks from video streams. These anatomical features are then processed by a hybrid deep learning model integrating Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), specifically Bidirectional Long Short-Term Memory (BiLSTM) layers. The CNN component captures spatial features, such as intricate hand shapes and body movements, within individual frames. Concurrently, BiLSTMs model long-term temporal dependencies and motion trajectories across consecutive frames. This integrated CNN-BiLSTM architecture is critical for generating a comprehensive spatiotemporal representation, enabling accurate differentiation of complex signs where meaning relies on both static gestures and dynamic transitions, thus preventing misclassification that CNN-only or RNN-only models would incur. Rigorously evaluated on the author-created JUST-SL dataset and the publicly available KArSL dataset, the system achieved 96% overall accuracy for JUST-SL and an impressive 99% for KArSL. These results demonstrate the system’s superior accuracy compared to previous research, particularly for recognizing full Arabic words, thereby significantly enhancing communication accessibility for the deaf and hearing-impaired community. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

20 pages, 45555 KB  
Article
FAIRHiveFrames-1K: A Public FAIR Dataset of 1265 Annotated Hive Frame Images with Preliminary YOLOv8 and YOLOv11 Baselines
by Vladimir Kulyukin, Reagan Hill and Aleksey Kulyukin
Sensors 2026, 26(8), 2518; https://doi.org/10.3390/s26082518 (registering DOI) - 19 Apr 2026
Viewed by 163
Abstract
In precision apiculture, the portable digital camera is a cost-effective sensor for capturing hive images or videos used to quantify different colony variables. Openly accessible, well-annotated, interoperable cell-level image datasets are still the exception rather than the norm. This shortage constitutes a major [...] Read more.
In precision apiculture, the portable digital camera is a cost-effective sensor for capturing hive images or videos used to quantify different colony variables. Openly accessible, well-annotated, interoperable cell-level image datasets are still the exception rather than the norm. This shortage constitutes a major barrier to AI-driven approaches aimed at automating image-based comb analysis. In this article, we present FAIRHiveFrames-1K, a publicly available dataset of 1265 annotated hive frame images (1920 × 1080 PNG) designed to facilitate research in AI-intensive image-based comb analysis automation. The dataset, derived from a 2013–2022 U.S. Department of Agriculture–Agricultural Research Service multi-sensor research reservoir, includes 124,669 annotated regions of interest for seven biologically meaningful categories consistent with comb analysis literature and standard hive inspection protocols. FAIRHiveFrames-1K is curated according to FAIR principles (Findable, Accessible, Interoperable, Reusable) and distributed under CC-BY 4.0 with standard annotation formats, fixed training and validation splits, and reproducible benchmarking artifacts. To establish preliminary baseline performance, we iteratively tuned four YOLO architectures (YOLOv8n, YOLOv8s, YOLOv11n, YOLOv11s) under a shared tuning protocol over the period of dataset growth. Full article
Show Figures

Figure 1

20 pages, 6708 KB  
Article
Nighttime Image Dehazing for Urban Monitoring via a Mixed-Norm Variational Model
by Xianglei Liu, Yahao Wu, Runjie Wang and Yuhang Liu
Appl. Sci. 2026, 16(8), 3929; https://doi.org/10.3390/app16083929 - 17 Apr 2026
Viewed by 234
Abstract
As modern urban systems advance, video surveillance has become indispensable for ensuring high-quality urban development. Nighttime images acquired in urban monitoring scenarios are often degraded by haze and non-uniform illumination, resulting in reduced visibility, color distortion, and blurred structural boundaries. To address these [...] Read more.
As modern urban systems advance, video surveillance has become indispensable for ensuring high-quality urban development. Nighttime images acquired in urban monitoring scenarios are often degraded by haze and non-uniform illumination, resulting in reduced visibility, color distortion, and blurred structural boundaries. To address these issues, this paper proposes a nighttime image dehazing framework that combines mixed-norm variational atmospheric-light estimation with adaptive boundary-constrained transmission refinement. Specifically, an L2Lp mixed-norm regularization model is introduced to improve atmospheric-light estimation under complex nighttime illumination and suppress halo diffusion and color distortion around strong light sources. In addition, an adaptive boundary-constrained transmission refinement strategy with weighted soft-threshold shrinkage is developed to reduce residual artifacts while preserving structural edges. Experimental results on synthetic and real nighttime haze datasets demonstrate that the proposed method consistently outperforms representative state-of-the-art methods in both visual quality and quantitative metrics, showing superior robustness and restoration performance for nighttime urban monitoring applications. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

Back to TopTop