MDPI - Publisher of Open Access Journals

19 pages, 1010 KiB

Open AccessArticle

Online Video Streaming from the Perspective of Transaction Cost Economics

by Amit Malhan, Pankaj Chaudhary and Robert Pavur

J. Theor. Appl. Electron. Commer. Res. 2025, 20(3), 199; https://doi.org/10.3390/jtaer20030199 - 4 Aug 2025

Viewed by 132

In recent years, online streaming has encountered the challenge of retaining its user base. This study considers the role of transaction cost economics theory in consumer choices to continue subscribing. Participants respond to their top three streaming services, resulting in 797 responses, accounting [...] Read more.

In recent years, online streaming has encountered the challenge of retaining its user base. This study considers the role of transaction cost economics theory in consumer choices to continue subscribing. Participants respond to their top three streaming services, resulting in 797 responses, accounting for multiple selections by each respondent. Respondents could choose their top three services from a list of Netflix, Disney, Hulu, Amazon Prime Video, HBO Max, and Apple TV+. The study’s conclusions highlight the impact of uncertainty, a negative measure of streaming quality, on online subscription-based video streaming. Additionally, asset specificity, reflecting uniqueness and exclusive content, is found to be positively related to continuing a subscription. This research distinguishes itself by examining individuals who are already subscribers to provide insights and guidance through the lens of Transaction Cost Economics, to help marketing professionals seeking a deeper understanding of consumer behavior in the online streaming landscape. Full article

► Show Figures

Figure 1

19 pages, 1109 KiB

Open AccessArticle

User Preference-Based Dynamic Optimization of Quality of Experience for Adaptive Video Streaming

by Zixuan Feng, Yazhi Liu and Hao Zhang

Electronics 2025, 14(15), 3103; https://doi.org/10.3390/electronics14153103 - 4 Aug 2025

Viewed by 133

Abstract

With the rapid development of video streaming services, adaptive bitrate (ABR) algorithms have become a core technology for ensuring optimal viewing experiences. Traditional ABR strategies, predominantly rule-based or reinforcement learning-driven, typically employ uniform quality assessment metrics that overlook users’ subjective preference differences regarding [...] Read more.

With the rapid development of video streaming services, adaptive bitrate (ABR) algorithms have become a core technology for ensuring optimal viewing experiences. Traditional ABR strategies, predominantly rule-based or reinforcement learning-driven, typically employ uniform quality assessment metrics that overlook users’ subjective preference differences regarding factors such as video quality and stalling. To address this limitation, this paper proposes an adaptive video bitrate selection system that integrates preference modeling with reinforcement learning. By incorporating a preference learning module, the system models and scores user viewing trajectories, using these scores to replace conventional rewards and guide the training of the Proximal Policy Optimization (PPO) algorithm, thereby achieving policy optimization that better aligns with users’ perceived experiences. Simulation results on DASH network bandwidth traces demonstrate that the proposed optimization method improves overall Quality of Experience (QoE) by over 9% compared to other mainstream algorithms. Full article

► Show Figures

Figure 1

24 pages, 1751 KiB

Open AccessArticle

Robust JND-Guided Video Watermarking via Adaptive Block Selection and Temporal Redundancy

by Antonio Cedillo-Hernandez, Lydia Velazquez-Garcia, Manuel Cedillo-Hernandez, Ismael Dominguez-Jimenez and David Conchouso-Gonzalez

Mathematics 2025, 13(15), 2493; https://doi.org/10.3390/math13152493 - 3 Aug 2025

Viewed by 225

Abstract

This paper introduces a robust and imperceptible video watermarking framework designed for blind extraction in dynamic video environments. The proposed method operates in the spatial domain and combines multiscale perceptual analysis, adaptive Just Noticeable Difference (JND)-based quantization, and temporal redundancy via multiframe embedding. [...] Read more.

This paper introduces a robust and imperceptible video watermarking framework designed for blind extraction in dynamic video environments. The proposed method operates in the spatial domain and combines multiscale perceptual analysis, adaptive Just Noticeable Difference (JND)-based quantization, and temporal redundancy via multiframe embedding. Watermark bits are embedded selectively in blocks with high perceptual masking using a QIM strategy, and the corresponding DCT coefficients are estimated directly from the spatial domain to reduce complexity. To enhance resilience, each bit is redundantly inserted across multiple keyframes selected based on scene transitions. Extensive simulations over 21 benchmark videos (CIF, 4CIF, HD) validate that the method achieves superior performance in robustness and perceptual quality, with an average Bit Error Rate (BER) of 1.03%, PSNR of 50.1 dB, SSIM of 0.996, and VMAF of 97.3 under compression, noise, cropping, and temporal desynchronization. The system outperforms several recent state-of-the-art techniques in both quality and speed, requiring no access to the original video during extraction. These results confirm the method’s viability for practical applications such as copyright protection and secure video streaming. Full article

(This article belongs to the Section E: Applied Mathematics)

► Show Figures

Figure 1

28 pages, 5699 KiB

Open AccessArticle

Multi-Modal Excavator Activity Recognition Using Two-Stream CNN-LSTM with RGB and Point Cloud Inputs

by Hyuk Soo Cho, Kamran Latif, Abubakar Sharafat and Jongwon Seo

Appl. Sci. 2025, 15(15), 8505; https://doi.org/10.3390/app15158505 (registering DOI) - 31 Jul 2025

Viewed by 148

Abstract

Recently, deep learning algorithms have been increasingly applied in construction for activity recognition, particularly for excavators, to automate processes and enhance safety and productivity through continuous monitoring of earthmoving activities. These deep learning algorithms analyze construction videos to classify excavator activities for earthmoving [...] Read more.

Recently, deep learning algorithms have been increasingly applied in construction for activity recognition, particularly for excavators, to automate processes and enhance safety and productivity through continuous monitoring of earthmoving activities. These deep learning algorithms analyze construction videos to classify excavator activities for earthmoving purposes. However, previous studies have solely focused on single-source external videos, which limits the activity recognition capabilities of the deep learning algorithm. This paper introduces a novel multi-modal deep learning-based methodology for recognizing excavator activities, utilizing multi-stream input data. It processes point clouds and RGB images using the two-stream long short-term memory convolutional neural network (CNN-LSTM) method to extract spatiotemporal features, enabling the recognition of excavator activities. A comprehensive dataset comprising 495,000 video frames of synchronized RGB and point cloud data was collected across multiple construction sites under varying conditions. The dataset encompasses five key excavator activities: Approach, Digging, Dumping, Idle, and Leveling. To assess the effectiveness of the proposed method, the performance of the two-stream CNN-LSTM architecture is compared with that of single-stream CNN-LSTM models on the same RGB and point cloud datasets, separately. The results demonstrate that the proposed multi-stream approach achieved an accuracy of 94.67%, outperforming existing state-of-the-art single-stream models, which achieved 90.67% accuracy for the RGB-based model and 92.00% for the point cloud-based model. These findings underscore the potential of the proposed activity recognition method, making it highly effective for automatic real-time monitoring of excavator activities, thereby laying the groundwork for future integration into digital twin systems for proactive maintenance and intelligent equipment management. Full article

(This article belongs to the Special Issue AI-Based Machinery Health Monitoring)

► Show Figures

Figure 1

30 pages, 37977 KiB

Open AccessArticle

Text-Guided Visual Representation Optimization for Sensor-Acquired Video Temporal Grounding

by Yun Tian, Xiaobo Guo, Jinsong Wang and Xinyue Liang

Sensors 2025, 25(15), 4704; https://doi.org/10.3390/s25154704 - 30 Jul 2025

Viewed by 266

Abstract

Video temporal grounding (VTG) aims to localize a semantically relevant temporal segment within an untrimmed video based on a natural language query. The task continues to face challenges arising from cross-modal semantic misalignment, which is largely attributed to redundant visual content in sensor-acquired [...] Read more.

Video temporal grounding (VTG) aims to localize a semantically relevant temporal segment within an untrimmed video based on a natural language query. The task continues to face challenges arising from cross-modal semantic misalignment, which is largely attributed to redundant visual content in sensor-acquired video streams, linguistic ambiguity, and discrepancies in modality-specific representations. Most existing approaches rely on intra-modal feature modeling, processing video and text independently throughout the representation learning stage. However, this isolation undermines semantic alignment by neglecting the potential of cross-modal interactions. In practice, a natural language query typically corresponds to spatiotemporal content in video signals collected through camera-based sensing systems, encompassing a particular sequence of frames and its associated salient subregions. We propose a text-guided visual representation optimization framework tailored to enhance semantic interpretation over video signals captured by visual sensors. This framework leverages textual information to focus on spatiotemporal video content, thereby narrowing the cross-modal gap. Built upon the unified cross-modal embedding space provided by CLIP, our model leverages video data from sensing devices to structure representations and introduces two dedicated modules to semantically refine visual representations across spatial and temporal dimensions. First, we design a Spatial Visual Representation Optimization (SVRO) module to learn spatial information within intra-frames. It selects salient patches related to the text, capturing more fine-grained visual details. Second, we introduce a Temporal Visual Representation Optimization (TVRO) module to learn temporal relations from inter-frames. Temporal triplet loss is employed in TVRO to enhance attention on text-relevant frames and capture clip semantics. Additionally, a self-supervised contrastive loss is introduced at the clip–text level to improve inter-clip discrimination by maximizing semantic variance during training. Experiments on Charades-STA, ActivityNet Captions, and TACoS, widely used benchmark datasets, demonstrate that our method outperforms state-of-the-art methods across multiple metrics. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

27 pages, 1128 KiB

Open AccessArticle

Adaptive Multi-Hop P2P Video Communication: A Super Node-Based Architecture for Conversation-Aware Streaming

by Jiajing Chen and Satoshi Fujita

Information 2025, 16(8), 643; https://doi.org/10.3390/info16080643 - 28 Jul 2025

Viewed by 344

Abstract

This paper proposes a multi-hop peer-to-peer (P2P) video streaming architecture designed to support dynamic, conversation-aware communication. The primary contribution is a decentralized system built on WebRTC that eliminates reliance on a central media server by employing super node aggregation. In this architecture, video [...] Read more.

This paper proposes a multi-hop peer-to-peer (P2P) video streaming architecture designed to support dynamic, conversation-aware communication. The primary contribution is a decentralized system built on WebRTC that eliminates reliance on a central media server by employing super node aggregation. In this architecture, video streams from multiple peer nodes are dynamically routed through a group of super nodes, enabling real-time reconfiguration of the network topology in response to conversational changes. To support this dynamic behavior, the system leverages WebRTC data channels for control signaling and overlay restructuring, allowing efficient dissemination of topology updates and coordination messages among peers. A key focus of this study is the rapid and efficient reallocation of network resources immediately following conversational events, ensuring that the streaming overlay remains aligned with ongoing interaction patterns. While the automatic detection of such events is beyond the scope of this work, we assume that external triggers are available to initiate topology updates. To validate the effectiveness of the proposed system, we construct a simulation environment using Docker containers and evaluate its streaming performance under dynamic network conditions. The results demonstrate the system’s applicability to adaptive, naturalistic communication scenarios. Finally, we discuss future directions, including the seamless integration of external trigger sources and enhanced support for flexible, context-sensitive interaction frameworks. Full article

(This article belongs to the Special Issue Second Edition of Advances in Wireless Communications Systems)

► Show Figures

Figure 1

27 pages, 705 KiB

Open AccessArticle

A Novel Wavelet Transform and Deep Learning-Based Algorithm for Low-Latency Internet Traffic Classification

by Ramazan Enisoglu and Veselin Rakocevic

Algorithms 2025, 18(8), 457; https://doi.org/10.3390/a18080457 - 23 Jul 2025

Viewed by 345

Abstract

Accurate and real-time classification of low-latency Internet traffic is critical for applications such as video conferencing, online gaming, financial trading, and autonomous systems, where millisecond-level delays can degrade user experience. Existing methods for low-latency traffic classification, reliant on raw temporal features or static [...] Read more.

Accurate and real-time classification of low-latency Internet traffic is critical for applications such as video conferencing, online gaming, financial trading, and autonomous systems, where millisecond-level delays can degrade user experience. Existing methods for low-latency traffic classification, reliant on raw temporal features or static statistical analyses, fail to capture dynamic frequency patterns inherent to real-time applications. These limitations hinder accurate resource allocation in heterogeneous networks. This paper proposes a novel framework integrating wavelet transform (WT) and artificial neural networks (ANNs) to address this gap. Unlike prior works, we systematically apply WT to commonly used temporal features—such as throughput, slope, ratio, and moving averages—transforming them into frequency-domain representations. This approach reveals hidden multi-scale patterns in low-latency traffic, akin to structured noise in signal processing, which traditional time-domain analyses often overlook. These wavelet-enhanced features train a multilayer perceptron (MLP) ANN, enabling dual-domain (time–frequency) analysis. We evaluate our approach on a dataset comprising FTP, video streaming, and low-latency traffic, including mixed scenarios with up to four concurrent traffic types. Experiments demonstrate 99.56% accuracy in distinguishing low-latency traffic (e.g., video conferencing) from FTP and streaming, outperforming k-NN, CNNs, and LSTMs. Notably, our method eliminates reliance on deep packet inspection (DPI), offering ISPs a privacy-preserving and scalable solution for prioritizing time-sensitive traffic. In mixed-traffic scenarios, the model achieves 74.2–92.8% accuracy, offering ISPs a scalable solution for prioritizing time-sensitive traffic without deep packet inspection. By bridging signal processing and deep learning, this work advances efficient bandwidth allocation and enables Internet Service Providers to prioritize time-sensitive flows without deep packet inspection, improving quality of service in heterogeneous network environments. Full article

(This article belongs to the Section Algorithms for Multidisciplinary Applications)

► Show Figures

Figure 1

23 pages, 3578 KiB

Open AccessArticle

High-Precision Chip Detection Using YOLO-Based Methods

by Ruofei Liu and Junjiang Zhu

Algorithms 2025, 18(7), 448; https://doi.org/10.3390/a18070448 - 21 Jul 2025

Viewed by 267

Abstract

Machining chips are directly related to both the machining quality and tool condition. However, detecting chips from images in industrial settings poses challenges in terms of model accuracy and computational speed. We firstly present a novel framework called GM-YOLOv11-DNMS to track the chips, [...] Read more.

Machining chips are directly related to both the machining quality and tool condition. However, detecting chips from images in industrial settings poses challenges in terms of model accuracy and computational speed. We firstly present a novel framework called GM-YOLOv11-DNMS to track the chips, followed by a video-level post-processing algorithm for chip counting in videos. GM-YOLOv11-DNMS has two main improvements: (1) it replaces the CNN layers with a ghost module in YOLOv11n, significantly reducing the computational cost while maintaining the detection performance, and (2) it uses a new dynamic non-maximum suppression (DNMS) method, which dynamically adjusts the thresholds to improve the detection accuracy. The post-processing method uses a trigger signal from rising edges to improve chip counting in video streams. Experimental results show that the ghost module reduces the FLOPs from 6.48 G to 5.72 G compared to YOLOv11n, with a negligible accuracy loss, while the DNMS algorithm improves the debris detection precision across different YOLO versions. The proposed framework achieves precision, recall, and mAP@0.5 values of 97.04%, 96.38%, and 95.56%, respectively, in image-based detection tasks. In video-based experiments, the proposed video-level post-processing algorithm combined with GM-YOLOv11-DNMS achieves crack–debris counting accuracy of 90.14%. This lightweight and efficient approach is particularly effective in detecting small-scale objects within images and accurately analyzing dynamic debris in video sequences, providing a robust solution for automated debris monitoring in machine tool processing applications. Full article

(This article belongs to the Special Issue Machine Learning Models and Algorithms for Image Processing)

► Show Figures

Figure 1

40 pages, 1540 KiB

Open AccessReview

A Survey on Video Big Data Analytics: Architecture, Technologies, and Open Research Challenges

by Thi-Thu-Trang Do, Quyet-Thang Huynh, Kyungbaek Kim and Van-Quyet Nguyen

Appl. Sci. 2025, 15(14), 8089; https://doi.org/10.3390/app15148089 - 21 Jul 2025

Viewed by 615

Abstract

The exponential growth of video data across domains such as surveillance, transportation, and healthcare has raised critical challenges in scalability, real-time processing, and privacy preservation. While existing studies have addressed individual aspects of Video Big Data Analytics (VBDA), an integrated, up-to-date perspective remains [...] Read more.

The exponential growth of video data across domains such as surveillance, transportation, and healthcare has raised critical challenges in scalability, real-time processing, and privacy preservation. While existing studies have addressed individual aspects of Video Big Data Analytics (VBDA), an integrated, up-to-date perspective remains limited. This paper presents a comprehensive survey of system architectures and enabling technologies in VBDA. It categorizes system architectures into four primary types as follows: centralized, cloud-based infrastructures, edge computing, and hybrid cloud–edge. It also analyzes key enabling technologies, including real-time streaming, scalable distributed processing, intelligent AI models, and advanced storage for managing large-scale multimodal video data. In addition, the study provides a functional taxonomy of core video processing tasks, including object detection, anomaly recognition, and semantic retrieval, and maps these tasks to real-world applications. Based on the survey findings, the paper proposes ViMindXAI, a hybrid AI-driven platform that combines edge and cloud orchestration, adaptive storage, and privacy-aware learning to support scalable and trustworthy video analytics. Our analysis in this survey highlights emerging trends such as the shift toward hybrid cloud–edge architectures, the growing importance of explainable AI and federated learning, and the urgent need for secure and efficient video data management. These findings highlight key directions for designing next-generation VBDA platforms that enhance real-time, data-driven decision-making in domains such as public safety, transportation, and healthcare. These platforms facilitate timely insights, rapid response, and regulatory alignment through scalable and explainable analytics. This work provides a robust conceptual foundation for future research on adaptive and efficient decision-support systems in video-intensive environments. Full article

(This article belongs to the Special Issue Advances in Machine Learning and Data Mining: Emerging Trends and Applications)

► Show Figures

Figure 1

39 pages, 2628 KiB

Open AccessArticle

A Decentralized Multi-Venue Real-Time Video Broadcasting System Integrating Chain Topology and Intelligent Self-Healing Mechanisms

by Tianpei Guo, Ziwen Song, Haotian Xin and Guoyang Liu

Appl. Sci. 2025, 15(14), 8043; https://doi.org/10.3390/app15148043 - 19 Jul 2025

Viewed by 478

Abstract

The rapid growth in large-scale distributed video conferencing, remote education, and real-time broadcasting poses significant challenges to traditional centralized streaming systems, particularly regarding scalability, cost, and reliability under high concurrency. Centralized approaches often encounter bottlenecks, increased bandwidth expenses, and diminished fault tolerance. This [...] Read more.

The rapid growth in large-scale distributed video conferencing, remote education, and real-time broadcasting poses significant challenges to traditional centralized streaming systems, particularly regarding scalability, cost, and reliability under high concurrency. Centralized approaches often encounter bottlenecks, increased bandwidth expenses, and diminished fault tolerance. This paper proposes a novel decentralized real-time broadcasting system employing a peer-to-peer (P2P) chain topology based on IPv6 networking and the Secure Reliable Transport (SRT) protocol. By exploiting the global addressing capability of IPv6, our solution simplifies direct node interconnections, effectively eliminating complexities associated with Network Address Translation (NAT). Furthermore, we introduce an innovative chain-relay transmission method combined with distributed node management strategies, substantially reducing reliance on central servers and minimizing deployment complexity. Leveraging SRT’s low-latency UDP transmission, packet retransmission, congestion control, and AES-128/256 encryption, the proposed system ensures robust security and high video stream quality across wide-area networks. Additionally, a WebSocket-based real-time fault detection algorithm coupled with a rapid fallback self-healing mechanism is developed, enabling millisecond-level fault detection and swift restoration of disrupted links. Extensive performance evaluations using Video Multi-Resolution Fidelity (VMRF) metrics across geographically diverse and heterogeneous environments confirm significant performance gains. Specifically, our approach achieves substantial improvements in latency, video quality stability, and fault tolerance over existing P2P methods, along with over tenfold enhancements in frame rates compared with conventional RTMP-based solutions, thereby demonstrating its efficacy, scalability, and cost-effectiveness for real-time video streaming applications. Full article

► Show Figures

Figure 1

19 pages, 709 KiB

Open AccessArticle

Fusion of Multimodal Spatio-Temporal Features and 3D Deformable Convolution Based on Sign Language Recognition in Sensor Networks

by Qian Zhou, Hui Li, Weizhi Meng, Hua Dai, Tianyu Zhou and Guineng Zheng

Sensors 2025, 25(14), 4378; https://doi.org/10.3390/s25144378 - 13 Jul 2025

Viewed by 367

Abstract

Sign language is a complex and dynamic visual language that requires the coordinated movement of various body parts, such as the hands, arms, and limbs—making it an ideal application domain for sensor networks to capture and interpret human gestures accurately. To address the [...] Read more.

Sign language is a complex and dynamic visual language that requires the coordinated movement of various body parts, such as the hands, arms, and limbs—making it an ideal application domain for sensor networks to capture and interpret human gestures accurately. To address the intricate task of precise and expedient SLR from raw videos, this study introduces a novel deep learning approach by devising a multimodal framework for SLR. Specifically, feature extraction models are built based on two modalities: skeleton and RGB images. In this paper, we firstly propose a Multi-Stream Spatio-Temporal Graph Convolutional Network (MSGCN) that relies on three modules: a decoupling graph convolutional network, a self-emphasizing temporal convolutional network, and a spatio-temporal joint attention module. These modules are combined to capture the spatio-temporal information in multi-stream skeleton features. Secondly, we propose a 3D ResNet model based on deformable convolution (D-ResNet) to model complex spatial and temporal sequences in the original raw images. Finally, a gating mechanism-based Multi-Stream Fusion Module (MFM) is employed to merge the results of the two modalities. Extensive experiments are conducted on the public datasets AUTSL and WLASL, achieving competitive results compared to state-of-the-art systems. Full article

(This article belongs to the Special Issue Intelligent Sensing and Artificial Intelligence for Image Processing)

► Show Figures

Figure 1

21 pages, 4859 KiB

Open AccessArticle

Improvement of SAM2 Algorithm Based on Kalman Filtering for Long-Term Video Object Segmentation

by Jun Yin, Fei Wu, Hao Su, Peng Huang and Yuetong Qixuan

Sensors 2025, 25(13), 4199; https://doi.org/10.3390/s25134199 - 5 Jul 2025

Viewed by 555

Abstract

The Segment Anything Model 2 (SAM2) has achieved state-of-the-art performance in pixel-level object segmentation for both static and dynamic visual content. Its streaming memory architecture maintains spatial context across video sequences, yet struggles with long-term tracking due to its static inference framework. SAM [...] Read more.

The Segment Anything Model 2 (SAM2) has achieved state-of-the-art performance in pixel-level object segmentation for both static and dynamic visual content. Its streaming memory architecture maintains spatial context across video sequences, yet struggles with long-term tracking due to its static inference framework. SAM 2’s fixed temporal window approach indiscriminately retains historical frames, failing to account for frame quality or dynamic motion patterns. This leads to error propagation and tracking instability in challenging scenarios involving fast-moving objects, partial occlusions, or crowded environments. To overcome these limitations, this paper proposes SAM2Plus, a zero-shot enhancement framework that integrates Kalman filter prediction, dynamic quality thresholds, and adaptive memory management. The Kalman filter models object motion using physical constraints to predict trajectories and dynamically refine segmentation states, mitigating positional drift during occlusions or velocity changes. Dynamic thresholds, combined with multi-criteria evaluation metrics (e.g., motion coherence, appearance consistency), prioritize high-quality frames while adaptively balancing confidence scores and temporal smoothness. This reduces ambiguities among similar objects in complex scenes. SAM2Plus further employs an optimized memory system that prunes outdated or low-confidence entries and retains temporally coherent context, ensuring constant computational resources even for infinitely long videos. Extensive experiments on two video object segmentation (VOS) benchmarks demonstrate SAM2Plus’s superiority over SAM 2. It achieves an average improvement of 1.0 in J&F metrics across all 24 direct comparisons, with gains exceeding 2.3 points on SA-V and LVOS datasets for long-term tracking. The method delivers real-time performance and strong generalization without fine-tuning or additional parameters, effectively addressing occlusion recovery and viewpoint changes. By unifying motion-aware physics-based prediction with spatial segmentation, SAM2Plus bridges the gap between static and dynamic reasoning, offering a scalable solution for real-world applications such as autonomous driving and surveillance systems. Full article

(This article belongs to the Special Issue Image and Video Processing and Recognition Based on Artificial Intelligence: 3rd Edition)

► Show Figures

Figure 1

24 pages, 76230 KiB

Open AccessArticle

Secure and Efficient Video Management: A Novel Framework for CCTV Surveillance Systems

by Swarnalatha Camalapuram Subramanyam, Ansuman Bhattacharya and Koushik Sinha

IoT 2025, 6(3), 38; https://doi.org/10.3390/iot6030038 - 4 Jul 2025

Viewed by 424

Abstract

This paper presents a novel video encoding and decoding method aimed at enhancing security and reducing storage requirements, particularly for CCTV systems. The technique merges two video streams of matching frame dimensions into a single stream, optimizing disk space usage without compromising video [...] Read more.

This paper presents a novel video encoding and decoding method aimed at enhancing security and reducing storage requirements, particularly for CCTV systems. The technique merges two video streams of matching frame dimensions into a single stream, optimizing disk space usage without compromising video quality. The combined video is secured using an advanced encryption standard (AES)-based shift algorithm that rearranges pixel positions, preventing unauthorized access. During decoding, the AES shift is reversed, enabling precise reconstruction of the original videos. This approach provides a space-efficient and secure solution for managing multiple video feeds while ensuring accurate recovery of the original content. The experimental results demonstrate that the transmission time for the encoded video is consistently shorter compared to transmitting the video streams separately. This, in turn, leads to about

54 %

reduction in energy consumption across diverse outdoor and indoor video datasets, highlighting significant improvements in both transmission efficiency and energy savings by our proposed scheme. Full article

► Show Figures

Figure 1

12 pages, 7718 KiB

Open AccessTechnical Note

Nearshore Depth Inversion Bathymetry from Coastal Webcam: A Novel Technique Based on Wave Celerity Estimation

by Umberto Andriolo, Alberto Azevedo, Gil Gonçalves and Rui Taborda

Remote Sens. 2025, 17(13), 2274; https://doi.org/10.3390/rs17132274 - 2 Jul 2025

Viewed by 347

Abstract

Nearshore bathymetry is key to most oceanographic studies and coastal engineering works. This work proposes a new methodology to assess nearshore wave celerity and infer bathymetry from video images. Shoaling and breaking wave patterns were detected on the Timestacks distinctly, and wave celerity [...] Read more.

Nearshore bathymetry is key to most oceanographic studies and coastal engineering works. This work proposes a new methodology to assess nearshore wave celerity and infer bathymetry from video images. Shoaling and breaking wave patterns were detected on the Timestacks distinctly, and wave celerity was estimated from wave trajectories. The wave type separation enabled the implementation of specific domain formulations for depth inversion: linear for shoaling and non-linear for breaking waves. The technique was validated over a rocky bottom using video acquisition of an online streaming webcam for a period of two days, with significant wave heights varying between 1.7 m and 3.5 m. The results were corroborated in comparison to ground-truth data available up to a depth of 10 m, yielding a mean bias of 0.05 m and a mean root mean square error (RMSE) of 0.43 m. In particular, RMSE was lower than 15% in the outer surf zone, where breaking processes occur. Overall, the depth-normalized RMSE was always lower than 20%, with the major inaccuracy due to some local depressions, which were not resolved. The developed technique can be readily applied to images collected by coastal monitoring stations worldwide and is applicable to drone video acquisitions. Full article

(This article belongs to the Special Issue Remote Sensing Application in Coastal Geomorphology and Processes II)

► Show Figures

Figure 1

22 pages, 3885 KiB

Open AccessArticle

Enhancing Drone Navigation and Control: Gesture-Based Piloting, Obstacle Avoidance, and 3D Trajectory Mapping

by Ben Taylor, Mathew Allen, Preston Henson, Xu Gao, Haroon Malik and Pingping Zhu

Appl. Sci. 2025, 15(13), 7340; https://doi.org/10.3390/app15137340 - 30 Jun 2025

Viewed by 478

Abstract

Autonomous drone navigation presents challenges for users unfamiliar with manual flight controls, increasing the risk of collisions. This research addresses this issue by developing a multifunctional drone control system that integrates hand gesture recognition, obstacle avoidance, and 3D mapping to improve accessibility and [...] Read more.

Autonomous drone navigation presents challenges for users unfamiliar with manual flight controls, increasing the risk of collisions. This research addresses this issue by developing a multifunctional drone control system that integrates hand gesture recognition, obstacle avoidance, and 3D mapping to improve accessibility and safety. The system utilizes Google’s MediaPipe Hands software library, which employs machine learning to track 21 key landmarks of the user’s hand, enabling gesture-based control of the drone. Each recognized gesture is mapped to a flight command, eliminating the need for a traditional controller. The obstacle avoidance system, utilizing the Flow Deck V2 and Multi-Ranger Deck, detects objects within a safety threshold and autonomously moves the drone by a predefined avoidance distance away to prevent collisions. A mapping system continuously logs the drone’s flight path and detects obstacles, enabling 3D visualization of drone’s trajectory after the drone landing. Also, an AI-Deck streams live video, enabling navigation beyond the user’s direct line of sight. Experimental validation with the Crazyflie drone demonstrates seamless integration of these systems, providing a beginner-friendly experience where users can fly drones safely without prior expertise. This research enhances human–drone interaction, making drone technology more accessible for education, training, and intuitive navigation. Full article

(This article belongs to the Special Issue Application of Machine Learning and Artificial Intelligence in Human-Computer Interaction)

► Show Figures

Figure 1

Search Results (859)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (859)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI