Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (307)

Search Parameters:
Keywords = video analytics

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 5725 KB  
Article
Real-Time 3D Scene Understanding for Road Safety: Depth Estimation and Object Detection for Autonomous Vehicle Awareness
by Marcel Simeonov, Andrei Kurdiumov and Milan Dado
Vehicles 2026, 8(2), 28; https://doi.org/10.3390/vehicles8020028 - 2 Feb 2026
Viewed by 115
Abstract
Accurate depth perception is vital for autonomous driving and roadside monitoring. Traditional stereo vision methods are cost-effective but often fail under challenging conditions such as low texture, reflections, or complex lighting. This work presents a perception pipeline built around FoundationStereo, a Transformer-based stereo [...] Read more.
Accurate depth perception is vital for autonomous driving and roadside monitoring. Traditional stereo vision methods are cost-effective but often fail under challenging conditions such as low texture, reflections, or complex lighting. This work presents a perception pipeline built around FoundationStereo, a Transformer-based stereo depth estimation model. At low resolutions, FoundationStereo achieves real-time performance (up to 26 FPS) on embedded platforms like NVIDIA Jetson AGX Orin with TensorRT acceleration and power-of-two input sizes, enabling deployment in roadside cameras and in-vehicle systems. For Full HD stereo pairs, the same model delivers dense and precise environmental scans, complementing LiDAR while maintaining a high level of accuracy. YOLO11 object detection and segmentation is deployed in parallel for object extraction. Detected objects are removed from depth maps generated by FoundationStereo prior to point cloud generation, producing cleaner 3D reconstructions of the environment. This approach demonstrates that advanced stereo networks can operate efficiently on embedded hardware. Rather than replacing LiDAR or radar, it complements existing sensors by providing dense depth maps in situations where other sensors may be limited. By improving depth completeness, robustness, and enabling filtered point clouds, the proposed system supports safer navigation, collision avoidance, and scalable roadside infrastructure scanning for autonomous mobility. Full article
Show Figures

Figure 1

26 pages, 3401 KB  
Article
Toward an Integrated IoT–Edge Computing Framework for Smart Stadium Development
by Nattawat Pattarawetwong, Charuay Savithi and Arisaphat Suttidee
J. Sens. Actuator Netw. 2026, 15(1), 15; https://doi.org/10.3390/jsan15010015 - 1 Feb 2026
Viewed by 168
Abstract
Large sports stadiums require robust real-time monitoring due to high crowd density, complex spatial configurations, and limited network infrastructure. This research evaluates a hybrid edge–cloud architecture implemented in a national stadium in Thailand. The proposed framework integrates diverse surveillance subsystems, including automatic number [...] Read more.
Large sports stadiums require robust real-time monitoring due to high crowd density, complex spatial configurations, and limited network infrastructure. This research evaluates a hybrid edge–cloud architecture implemented in a national stadium in Thailand. The proposed framework integrates diverse surveillance subsystems, including automatic number plate recognition, face recognition, and panoramic cameras, with edge-based processing to enable real-time situational awareness during high-attendance events. A simulation based on the stadium’s physical layout and operational characteristics is used to analyze coverage patterns, processing locations, and network performance under realistic event scenarios. The results show that geometry-informed sensor deployment ensures continuous visual coverage and minimizes blind zones without increasing camera density. Furthermore, relocating selected video processing tasks from the cloud to the edge reduces uplink bandwidth requirements by approximately 50–75%, depending on the processing configuration, and stabilizes data transmission during peak network loads. These findings suggest that processing location should be considered a primary architectural design factor in smart stadium systems. The combination of edge-based processing with centralized cloud coordination offers a practical model for scalable, safety-oriented monitoring solutions in high-density public venues. Full article
(This article belongs to the Section Big Data, Computing and Artificial Intelligence)
Show Figures

Figure 1

20 pages, 501 KB  
Article
Travel Influencers and Tourism Marketing: Content Strategies, Engagement and Transparency in Destination Promotion
by Elena Fernández-Blanco, Mercedes Ramos Gutiérrez and Sandra Lizzeth Hernández Zelaya
Tour. Hosp. 2026, 7(2), 34; https://doi.org/10.3390/tourhosp7020034 - 31 Jan 2026
Viewed by 241
Abstract
Background: Influencer marketing has become one of the most effective strategies in digital communication due to its capacity to generate trust, credibility and endorsement within segmented online communities. Within the tourism sector, travel influencers have been progressively integrated as key agents in destination [...] Read more.
Background: Influencer marketing has become one of the most effective strategies in digital communication due to its capacity to generate trust, credibility and endorsement within segmented online communities. Within the tourism sector, travel influencers have been progressively integrated as key agents in destination and brand promotion, contributing to both the construction of tourism-related perceptions and travel decision-making. This study aims to analyse how travel influencers communicate and promote tourist destinations, focusing on their profiles, content formats, commercial transparency and audience engagement. Methods: The research is based on a quantitative content analysis of publications by leading Spanish travel influencers identified through the Forbes Best Content Creators 2025 ranking. The observation period covered March to July 2025. Analysis was structured around four analytical blocks comprising 17 variables related to influencer profile, format and content, commercial transparency and ethics, and interaction. Results: The results reveal consistent behavioural patterns associated with gender, destination type and narrative style. Male influencers are more frequently linked to adventure-oriented storytelling and natural landscapes, whereas female influencers tend to emphasise urban and cultural experiences. Short-form video emerges as the dominant format, generating higher interaction levels, while engagement proves to be a more informative indicator of effectiveness than follower count. Conclusions: The findings underscore the importance of prioritising specialisation, narrative coherence, authenticity and transparency when integrating influencers into their communication strategies. Full article
(This article belongs to the Special Issue Digital Transformation in Hospitality and Tourism)
Show Figures

Figure 1

32 pages, 27435 KB  
Review
Artificial Intelligence in Adult Cardiovascular Medicine and Surgery: Real-World Deployments and Outcomes
by Dimitrios E. Magouliotis, Noah Sicouri, Laura Ramlawi, Massimo Baudo, Vasiliki Androutsopoulou and Serge Sicouri
J. Pers. Med. 2026, 16(2), 69; https://doi.org/10.3390/jpm16020069 - 30 Jan 2026
Viewed by 241
Abstract
Artificial intelligence (AI) is rapidly reshaping adult cardiac surgery, enabling more accurate diagnostics, personalized risk assessment, advanced surgical planning, and proactive postoperative care. Preoperatively, deep-learning interpretation of ECGs, automated CT/MRI segmentation, and video-based echocardiography improve early disease detection and refine risk stratification beyond [...] Read more.
Artificial intelligence (AI) is rapidly reshaping adult cardiac surgery, enabling more accurate diagnostics, personalized risk assessment, advanced surgical planning, and proactive postoperative care. Preoperatively, deep-learning interpretation of ECGs, automated CT/MRI segmentation, and video-based echocardiography improve early disease detection and refine risk stratification beyond conventional tools such as EuroSCORE II and the STS calculator. AI-driven 3D reconstruction, virtual simulation, and augmented-reality platforms enhance planning for structural heart and aortic procedures by optimizing device selection and anticipating complications. Intraoperatively, AI augments robotic precision, stabilizes instrument motion, identifies anatomy through computer vision, and predicts hemodynamic instability via real-time waveform analytics. Integration of the Hypotension Prediction Index into perioperative pathways has already demonstrated reductions in ventilation duration and improved hemodynamic control. Postoperatively, machine-learning early-warning systems and physiologic waveform models predict acute kidney injury, low-cardiac-output syndrome, respiratory failure, and sepsis hours before clinical deterioration, while emerging closed-loop control and remote monitoring tools extend individualized management into the recovery phase. Despite these advances, current evidence is limited by retrospective study designs, heterogeneous datasets, variable transparency, and regulatory and workflow barriers. Nonetheless, rapid progress in multimodal foundation models, digital twins, hybrid OR ecosystems, and semi-autonomous robotics signals a transition toward increasingly precise, predictive, and personalized cardiac surgical care. With rigorous validation and thoughtful implementation, AI has the potential to substantially improve safety, decision-making, and outcomes across the entire cardiac surgical continuum. Full article
Show Figures

Graphical abstract

26 pages, 2167 KB  
Article
AI-Powered Service Robots for Smart Airport Operations: Real-World Implementation and Performance Analysis in Passenger Flow Management
by Eleni Giannopoulou, Panagiotis Demestichas, Panagiotis Katrakazas, Sophia Saliverou and Nikos Papagiannopoulos
Sensors 2026, 26(3), 806; https://doi.org/10.3390/s26030806 - 25 Jan 2026
Viewed by 325
Abstract
The proliferation of air travel demand necessitates innovative solutions to enhance passenger experience while optimizing airport operational efficiency. This paper presents the pilot-scale implementation and evaluation of an AI-powered service robot ecosystem integrated with thermal cameras and 5G wireless connectivity at Athens International [...] Read more.
The proliferation of air travel demand necessitates innovative solutions to enhance passenger experience while optimizing airport operational efficiency. This paper presents the pilot-scale implementation and evaluation of an AI-powered service robot ecosystem integrated with thermal cameras and 5G wireless connectivity at Athens International Airport. The system addresses critical challenges in passenger flow management through real-time crowd analytics, congestion detection, and personalized robotic assistance. Eight strategically deployed thermal cameras monitor passenger movements across check-in areas, security zones, and departure entrances while employing privacy-by-design principles through thermal imaging technology that reduces personally identifiable information capture. A humanoid service robot, equipped with Robot Operating System navigation capabilities and natural language processing interfaces, provides real-time passenger assistance including flight information, wayfinding guidance, and congestion avoidance recommendations. The wi.move platform serves as the central intelligence hub, processing video streams through advanced computer vision algorithms to generate actionable insights including passenger count statistics, flow rate analysis, queue length monitoring, and anomaly detection. Formal trial evaluation conducted on 10 April 2025, with extended operational monitoring from April to June 2025, demonstrated strong technical performance with application round-trip latency achieving 42.9 milliseconds, perfect service reliability and availability ratings of one hundred percent, and comprehensive passenger satisfaction scores exceeding 4.3/5 across all evaluated dimensions. Results indicate promising potential for scalable deployment across major international airports, with identified requirements for sixth-generation network capabilities to support enhanced multi-robot coordination and advanced predictive analytics functionalities in future implementations. Full article
(This article belongs to the Section Sensors and Robotics)
Show Figures

Figure 1

18 pages, 3987 KB  
Article
Low-Latency Autonomous Surveillance in Defense Environments: A Hybrid RTSP-WebRTC Architecture with YOLOv11
by Juan José Castro-Castaño, William Efrén Chirán-Alpala, Guillermo Alfonso Giraldo-Martínez, José David Ortega-Pabón, Edison Camilo Rodríguez-Amézquita, Diego Ferney Gallego-Franco and Yeison Alberto Garcés-Gómez
Computers 2026, 15(1), 62; https://doi.org/10.3390/computers15010062 - 16 Jan 2026
Viewed by 363
Abstract
This article presents the Intelligent Monitoring System (IMS), an AI-assisted, low-latency surveillance platform designed for defense environments. The study addresses the need for real-time autonomous situational awareness by integrating high-speed video transmission with advanced computer vision analytics in constrained network settings. The IMS [...] Read more.
This article presents the Intelligent Monitoring System (IMS), an AI-assisted, low-latency surveillance platform designed for defense environments. The study addresses the need for real-time autonomous situational awareness by integrating high-speed video transmission with advanced computer vision analytics in constrained network settings. The IMS employs a hybrid transmission architecture based on RTSP for ingestion and WHEP/WebRTC for distribution, orchestrated via MediaMTX, with the objective of achieving end-to-end latencies below one second. The methodology includes a comparative evaluation of video streaming protocols (JPEG-over-WebSocket, HLS, WebRTC, etc.) and AI frameworks, alongside the modular architectural design and prolonged experimental validation. The detection module integrates YOLOv11 models fine-tuned on the VisDrone dataset to optimize performance for small objects, aerial views, and dense scenes. Experimental results, obtained through over 300 h of operational tests using IP cameras and aerial platforms, confirmed the stability and performance of the chosen architecture, maintaining latencies close to 500 ms. The YOLOv11 family was adopted as the primary detection framework, providing an effective trade-off between accuracy and inference performance in real-time scenarios. The YOLOv11n model was trained and validated on a Tesla T4 GPU, and YOLOv11m will be validated on the target platform in subsequent experiments. The findings demonstrate the technical viability and operational relevance of the IMS as a core component for autonomous surveillance systems in defense, satisfying strict requirements for speed, stability, and robust detection of vehicles and pedestrians. Full article
Show Figures

Figure 1

22 pages, 1147 KB  
Article
Toward Objective Assessment of Positive Affect: EEG and HRV Indices Distinguishing High and Low Arousal Positive Affect
by Yuri Nakagawa, Tipporn Laohakangvalvit, Toshitaka Matsubara, Keiko Tagai and Midori Sugaya
Sensors 2026, 26(2), 521; https://doi.org/10.3390/s26020521 - 13 Jan 2026
Viewed by 231
Abstract
Positive affect comprises distinct affective states that differ in arousal level, such as high-arousal positive affect (HAPA) and low-arousal positive affect (LAPA), which have been shown to be associated with different effects and effective contexts. In studies of positive affect, it is therefore [...] Read more.
Positive affect comprises distinct affective states that differ in arousal level, such as high-arousal positive affect (HAPA) and low-arousal positive affect (LAPA), which have been shown to be associated with different effects and effective contexts. In studies of positive affect, it is therefore important not only to assess overall positivity but also to distinguish between different types of positive affect. Existing assessments rely mainly on self-reports, which may be unreliable for individuals with limited self-report abilities. The aim of this study was to examine whether physiological indices can discriminate between HAPA and LAPA. Participants were presented with eight video stimuli designed to elicit either HAPA or LAPA, and self-report measures were used as manipulation checks to define the affective conditions, while heart rate variability (HRV) and electroencephalography (EEG) were recorded. HRV indices did not show significant differences between the two affective conditions. In contrast, analyses of EEG relative power revealed significant differences between the HAPA and LAPA conditions. These findings demonstrate that, under the present experimental conditions, physiological differences between low- and high-arousal positive affect can be captured in EEG signals using relative power, a simple and reproducible analytical index, whereas no such differences were observed in HRV indices. Full article
(This article belongs to the Special Issue Feature Papers in Smart Sensing and Intelligent Sensors 2025)
Show Figures

Figure 1

21 pages, 75033 KB  
Article
From Stones to Screen: Open-Source 3D Modeling and AI Video Generation for Reconstructing the Coëby Necropolis
by Jean-Baptiste Barreau and Philippe Gouézin
Heritage 2026, 9(1), 24; https://doi.org/10.3390/heritage9010024 - 10 Jan 2026
Viewed by 447
Abstract
This study presents a comprehensive digital workflow for the archaeological investigation and heritage enhancement of the Coëby megalithic necropolis (Brittany, France). Dating to the Middle Neolithic, between the 4th and 3rd millennia BC, this chronology is established through stratigraphy, material culture, and radiocarbon [...] Read more.
This study presents a comprehensive digital workflow for the archaeological investigation and heritage enhancement of the Coëby megalithic necropolis (Brittany, France). Dating to the Middle Neolithic, between the 4th and 3rd millennia BC, this chronology is established through stratigraphy, material culture, and radiocarbon dating. Focusing on cairns TRED 8 and TRED 9, which are two excavation units, we combined field archaeology, photogrammetry, and topographic data with open-source 3D geometric modeling to reconstruct the monuments’ original volumes and test construction hypotheses. The methodology leveraged the free software Blender (version 3.0.1) and its Bagapie extension for the procedural simulation of lithic block distribution within the tumular masses, ensuring both metric accuracy and realistic texturing. Beyond static reconstruction, the research explores innovative dynamic and narrative visualization techniques. We employed the FILM model for smooth video interpolation of the construction sequences and utilized the Wan 2.1 AI model to generate immersive video scenes of Neolithic life based on archaeologically informed prompts. The entire process, from data acquisition to final visualization, was conducted using free and open-source tools, guaranteeing full methodological reproducibility and alignment with open science principles. Our results include detailed 3D reconstructions that elucidate the complex architectural sequences of the cairns, as well as dynamic visualizations that enhance the understanding of their construction logic. This study demonstrates the analytical potential of open-source 3D modelling and AI-based visualisation for megalithic archaeology. Full article
(This article belongs to the Topic 3D Documentation of Natural and Cultural Heritage)
Show Figures

Figure 1

27 pages, 3118 KB  
Article
Development of a Measurement Procedure for Emotional States Detection Based on Single-Channel Ear-EEG: A Proof-of-Concept Study
by Marco Arnesano, Pasquale Arpaia, Simone Balatti, Gloria Cosoli, Matteo De Luca, Ludovica Gargiulo, Nicola Moccaldi, Andrea Pollastro, Theodore Zanto and Antonio Forenza
Sensors 2026, 26(2), 385; https://doi.org/10.3390/s26020385 - 7 Jan 2026
Cited by 1 | Viewed by 535
Abstract
Real-time emotion monitoring is increasingly relevant in healthcare, automotive, and workplace applications, where adaptive systems can enhance user experience and well-being. This study investigates the feasibility of classifying emotions along the valence–arousal dimensions of the Circumplex Model of Affect using EEG signals acquired [...] Read more.
Real-time emotion monitoring is increasingly relevant in healthcare, automotive, and workplace applications, where adaptive systems can enhance user experience and well-being. This study investigates the feasibility of classifying emotions along the valence–arousal dimensions of the Circumplex Model of Affect using EEG signals acquired from a single mastoid channel positioned near the ear. Twenty-four participants viewed emotion-eliciting videos and self-reported their affective states using the Self-Assessment Manikin. EEG data were recorded with an OpenBCI Cyton board and both spectral and temporal features (including power in multiple frequency bands and entropy-based complexity measures) were extracted from the single ear-channel. A dual analytical framework was adopted: classical statistical analyses (ANOVA, Mann–Whitney U) and artificial neural networks combined with explainable AI methods (Gradient × Input, Integrated Gradients) were used to identify features associated with valence and arousal. Results confirmed the physiological validity of single-channel ear-EEG, and showed that absolute β- and γ-band power, spectral ratios, and entropy-based metrics consistently contributed to emotion classification. Overall, the findings demonstrate that reliable and interpretable affective information can be extracted from minimal EEG configurations, supporting their potential for wearable, real-world emotion monitoring. Nonetheless, practical considerations—such as long-term comfort, stability, and wearability of ear-EEG devices—remain important challenges and motivate future research on sustained use in naturalistic environments. Full article
(This article belongs to the Section Wearables)
Show Figures

Figure 1

19 pages, 1187 KB  
Article
Dual-Pipeline Machine Learning Framework for Automated Interpretation of Pilot Communications at Non-Towered Airports
by Abdullah All Tanvir, Chenyu Huang, Moe Alahmad, Chuyang Yang and Xin Zhong
Aerospace 2026, 13(1), 32; https://doi.org/10.3390/aerospace13010032 - 28 Dec 2025
Viewed by 334
Abstract
Accurate estimation of aircraft operations, such as takeoffs and landings, is critical for airport planning and resource allocation, yet it remains particularly challenging at non-towered airports, where no dedicated surveillance infrastructure exists. Existing solutions, including video analytics, acoustic sensors, and transponder-based systems, are [...] Read more.
Accurate estimation of aircraft operations, such as takeoffs and landings, is critical for airport planning and resource allocation, yet it remains particularly challenging at non-towered airports, where no dedicated surveillance infrastructure exists. Existing solutions, including video analytics, acoustic sensors, and transponder-based systems, are often costly, incomplete, or unreliable in environments with mixed traffic and inconsistent radio usage, highlighting the need for a scalable, infrastructure-free alternative. To address this gap, this study proposes a novel dual-pipeline machine learning framework that classifies pilot radio communications using both textual and spectral features to infer operational intent. A total of 2489 annotated pilot transmissions collected from a U.S. non-towered airport were processed through automatic speech recognition (ASR) and Mel-spectrogram extraction. We benchmarked multiple traditional classifiers and deep learning models, including ensemble methods, long short-term memory (LSTM) networks, and convolutional neural networks (CNNs), across both feature pipelines. Results show that spectral features paired with deep architectures consistently achieved the highest performance, with F1-scores exceeding 91% despite substantial background noise, overlapping transmissions, and speaker variability These findings indicate that operational intent can be inferred reliably from existing communication audio alone, offering a practical, low-cost path toward scalable aircraft operations monitoring and supporting emerging virtual tower and automated air traffic surveillance applications. Full article
(This article belongs to the Special Issue AI, Machine Learning and Automation for Air Traffic Control (ATC))
Show Figures

Figure 1

16 pages, 15460 KB  
Article
A Parallel Algorithm for Background Subtraction: Modeling Lognormal Pixel Intensity Distributions on GPUs
by Sotirios Diamantas, Ethan Reaves and Bryant Wyatt
Mathematics 2026, 14(1), 43; https://doi.org/10.3390/math14010043 - 22 Dec 2025
Viewed by 265
Abstract
Background subtraction is a core preprocessing step for video analytics, enabling downstream tasks such as detection, tracking, and scene understanding in applications ranging from surveillance to transportation. However, real-time deployment remains challenging when illumination changes, shadows, and dynamic backgrounds produce heavy-tailed pixel variations [...] Read more.
Background subtraction is a core preprocessing step for video analytics, enabling downstream tasks such as detection, tracking, and scene understanding in applications ranging from surveillance to transportation. However, real-time deployment remains challenging when illumination changes, shadows, and dynamic backgrounds produce heavy-tailed pixel variations that are difficult to capture with simple Gaussian assumptions. In this work, we propose a fully parallel GPU implementation of a per-pixel background model that represents temporal pixel deviations with lognormal distributions. During a short training phase, a circular buffer of n frames (as small as n=3) is used to estimate, for every pixel, robust log-domain parameters (μ,σ). During testing, each incoming frame is compared against a robust reference (per-pixel median), and a lognormal cumulative density function yields a probabilistic foreground score that is thresholded to produce a binary mask. We evaluate the method on multiple videos under varying illumination and motion conditions and compare qualitatively with widely used mixture of Gaussians baselines (MOG and MOG2). Our method achieves, on average, 87 fps with a buffer size of 10, and reaches about 188 fps with a buffer size of 3, on an NVIDIA 3080 Ti. Finally, we discuss the accuracy–latency trade-off with larger buffers. Full article
Show Figures

Figure 1

24 pages, 27907 KB  
Article
Efficient Object-Related Scene Text Grouping Pipeline for Visual Scene Analysis in Large-Scale Investigative Data
by Enrique Shinohara, Jorge García, Luis Unzueta and Peter Leškovský
Electronics 2026, 15(1), 12; https://doi.org/10.3390/electronics15010012 - 19 Dec 2025
Viewed by 314
Abstract
Law Enforcement Agencies (LEAs) typically analyse vast collections of media files, extracting visual information that helps them to advance investigations. While recent advancements in deep learning-based computer vision algorithms have revolutionised the ability to detect multi-class objects and text instances (characters, words, numbers) [...] Read more.
Law Enforcement Agencies (LEAs) typically analyse vast collections of media files, extracting visual information that helps them to advance investigations. While recent advancements in deep learning-based computer vision algorithms have revolutionised the ability to detect multi-class objects and text instances (characters, words, numbers) from in-the-wild scenes, their association remains relatively unexplored. Previous studies focus on clustering text given its semantic relationship or layout, rather than its relationship with objects. In this paper, we present an efficient, modular pipeline for contextual scene text grouping with three complementary strategies: 2D planar segmentation, multi-class instance segmentation and promptable segmentation. The strategies address common scenes where related text instances frequently share the same 2D planar surface and object (vehicle, banner, etc.). Evaluated on a custom dataset of 1100 images, the overall grouping performance remained consistently high across all three strategies (B-Cubed F1 92–95%; Pairwise F1 80–82%), with adjusted Rand indices between 0.08 and 0.23. Our results demonstrate clear trade-offs between computational efficiency and contextual generalisation, where geometric methods offer reliability, semantic approaches provide scalability and class-agnostic strategies offer the most robust generalisation. The dataset used for testing will be made available upon request. Full article
(This article belongs to the Special Issue Deep Learning-Based Scene Text Detection)
Show Figures

Figure 1

49 pages, 6627 KB  
Article
LEARNet: A Learning Entropy-Aware Representation Network for Educational Video Understanding
by Chitrakala S, Nivedha V V and Niranjana S R
Entropy 2026, 28(1), 3; https://doi.org/10.3390/e28010003 - 19 Dec 2025
Viewed by 465
Abstract
Educational videos contain long periods of visual redundancy, where only a few frames convey meaningful instructional information. Conventional video models, which are designed for dynamic scenes, often fail to capture these subtle pedagogical transitions. We introduce LEARNet, an entropy-aware framework that models educational [...] Read more.
Educational videos contain long periods of visual redundancy, where only a few frames convey meaningful instructional information. Conventional video models, which are designed for dynamic scenes, often fail to capture these subtle pedagogical transitions. We introduce LEARNet, an entropy-aware framework that models educational video understanding as the extraction of high-information instructional content from low-entropy visual streams. LEARNet combines a Temporal Information Bottleneck (TIB) for selecting pedagogically significant keyframes with a Spatial–Semantic Decoder (SSD) that produces fine-grained annotations refined through a proposed Relational Consistency Verification Network (RCVN). This architecture enables the construction of EVUD-2M, a large-scale benchmark with multi-level semantic labels for diverse instructional formats. LEARNet achieves substantial redundancy reduction (70.2%) while maintaining high annotation fidelity (F1 = 0.89, mAP@50 = 0.88). Grounded in information-theoretic principles, LEARNet provides a scalable foundation for tasks such as lecture indexing, visual content summarization, and multimodal learning analytics. Full article
Show Figures

Figure 1

34 pages, 15045 KB  
Article
Integration of Road Data Collected Using LSB Audio Steganography
by Adam Stančić, Ivan Grgurević, Marko Matulin and Marko Periša
Technologies 2025, 13(12), 597; https://doi.org/10.3390/technologies13120597 - 18 Dec 2025
Viewed by 377
Abstract
Modern traffic-monitoring systems increasingly rely on supplemental analytical data to complement video recordings, yet such data are rarely integrated into video containers without altering the original footage. This paper proposes a lightweight audio-based approach for embedding road-condition information using a Least Significant Bit [...] Read more.
Modern traffic-monitoring systems increasingly rely on supplemental analytical data to complement video recordings, yet such data are rarely integrated into video containers without altering the original footage. This paper proposes a lightweight audio-based approach for embedding road-condition information using a Least Significant Bit (LSB) steganography framework. The method operates by serializing sensor data, encoding it into the LSB positions of synthetically generated audio, and subsequently compressing the audio track while preserving imperceptibility and video integrity. A series of controlled experiments evaluates how waveform type, sampling rate, amplitude, and frequency influence the storage efficiency and quality of WAV and FLAC stego-audio files. Additional tests examine the impact of embedding capacity and output-quality settings on compression behavior. Results reveal clear trade-offs between audio quality, data capacity, and file size, demonstrating that the proposed framework enables efficient, secure, and scalable integration of metadata into surveillance recordings. The findings establish practical guidelines for deploying LSB-based audio embedding in real traffic-monitoring environments. Full article
(This article belongs to the Special Issue IoT-Enabling Technologies and Applications—2nd Edition)
Show Figures

Figure 1

28 pages, 29179 KB  
Article
Improving Accuracy in Industrial Safety Monitoring: Combine UWB Localization and AI-Based Image Analysis
by Francesco Di Rienzo, Giustino Claudio Miglionico, Pietro Ducange, Francesco Marcelloni, Nicolò Salti and Carlo Vallati
J. Sens. Actuator Netw. 2025, 14(6), 118; https://doi.org/10.3390/jsan14060118 - 11 Dec 2025
Viewed by 857
Abstract
Industry 4.0 advanced technologies are increasingly used to monitor workers and reduce accident risks to ensure workplace safety. In this paper, we present an on-premise, rule-based safety management system that exploits the fusion of data from an Ultra-Wideband (UWB) Real-Time Locating System (RTLS) [...] Read more.
Industry 4.0 advanced technologies are increasingly used to monitor workers and reduce accident risks to ensure workplace safety. In this paper, we present an on-premise, rule-based safety management system that exploits the fusion of data from an Ultra-Wideband (UWB) Real-Time Locating System (RTLS) and AI-based video analytics to enforce context-aware safety policies. Data fusion from heterogeneous sources is exploited to broaden the set of safety rules that can be enforced and to improve resiliency. Unlike prior work that addresses PPE detection or indoor localization in isolation, the proposed system integrates an UWB-based RTLS with AI-based PPE detection through a rule-based aggregation engine, enabling context-aware safety policies that neither technology can enforce alone. In order to demonstrate the feasibility of the proposed approach and showcase its potential, a proof-of-concept implementation is developed. The implementation is exploited to validate the system, showing sufficient capabilities to process video streams on edge devices and track workers’ positions with sufficient accuracy using a commercial solution. The efficacy of the system is assessed through a set of seven safety rules implemented in a controlled laboratory scenario, showing that the proposed approach enhances situational awareness and robustness, compared with a single-source approach. An extended validation is further employed to confirm practical reliability under more challenging operational conditions, including varying camera perspectives, diverse worker clothing, and real-world outdoor conditions. Full article
Show Figures

Figure 1

Back to TopTop