MDPI - Publisher of Open Access Journals

30 pages, 1456 KiB

Open AccessArticle

Adaptive Stochastic GERT Modeling of UAV Video Transmission for Urban Monitoring Systems

by Serhii Semenov, Magdalena Krupska-Klimczak, Michał Frontczak, Jian Yu, Jiang He and Olena Chernykh

Appl. Sci. 2025, 15(17), 9277; https://doi.org/10.3390/app15179277 (registering DOI) - 23 Aug 2025

Abstract

The growing use of unmanned aerial vehicles (UAVs) for real-time video surveillance in smart city and smart region infrastructures requires reliable and delay-aware data transmission models. In urban environments, UAV communication links are subject to stochastic variability, leading to jitter, packet loss, and [...] Read more.

The growing use of unmanned aerial vehicles (UAVs) for real-time video surveillance in smart city and smart region infrastructures requires reliable and delay-aware data transmission models. In urban environments, UAV communication links are subject to stochastic variability, leading to jitter, packet loss, and unstable video delivery. This paper presents a novel approach based on the Graphical Evaluation and Review Technique (GERT) for modeling the transmission of video frames from UAVs over uncertain network paths with probabilistic feedback loops and lognormally distributed delays. The proposed model enables both analytical and numerical evaluation of key Quality-of-Service (QoS) metrics, including mean transmission time and jitter, under varying levels of channel variability. Additionally, the structure of the GERT-based framework allows integration with artificial intelligence mechanisms, particularly for adaptive routing and delay prediction in urban conditions. Spectral analysis of the system’s characteristic function is also performed to identify instability zones and guide buffer design. The results demonstrate that the approach supports flexible, parameterized modeling of UAV video transmission and can be extended to intelligent, learning-based control strategies in complex smart city environments. This makes it suitable for a wide range of applications, including traffic monitoring, infrastructure inspection, and emergency response. Beyond QoS optimization, the framework explicitly accommodates security and privacy preserving operations (e.g., encryption, authentication, on-board redaction), enabling secure UAV video transmission in urban networks. Full article

(This article belongs to the Section Electrical, Electronics and Communications Engineering)

► Show Figures

Figure 1

16 pages, 1632 KiB

Open AccessArticle

Toward an Augmented Reality Representation of Collision Risks in Harbors

by Mario Miličević, Igor Vujović, Miro Petković and Ana Kuzmanić Skelin

Appl. Sci. 2025, 15(17), 9260; https://doi.org/10.3390/app15179260 - 22 Aug 2025

Abstract

In ports with a significant density of non-AIS vessels, there is an increased risk of collisions. This is because physical limitations restrict the maneuverability of AIS vessels, while small vessels that do not have AIS are unpredictable. To help with collision prevention, we [...] Read more.

In ports with a significant density of non-AIS vessels, there is an increased risk of collisions. This is because physical limitations restrict the maneuverability of AIS vessels, while small vessels that do not have AIS are unpredictable. To help with collision prevention, we propose an augmented reality system that detects vessels from video stream and estimates speed with a single sideway-mounted camera. The goal is to visualize a cone for risk assessment. The estimation of speed is executed by geometric relations between the camera and the ship, which were used to estimate distances between points in a known time interval. The most important part of the proposal is vessel speed estimation by a monocular camera validated by a laser speed measurement. This will help port authorities to manage risks. This system differs from similar trials as it uses a single stationary camera linked to the authorities and not to the bridge crew. Full article

(This article belongs to the Section Marine Science and Engineering)

24 pages, 9450 KiB

Open AccessArticle

Industrial-AdaVAD: Adaptive Industrial Video Anomaly Detection Empowered by Edge Intelligence

by Jie Xiao, Haocheng Shen, Yasan Ding and Bin Guo

Mathematics 2025, 13(17), 2711; https://doi.org/10.3390/math13172711 - 22 Aug 2025

Abstract

The rapid advancement of Artificial Intelligence of Things (AIoT) has driven an urgent demand for intelligent video anomaly detection (VAD) to ensure industrial safety. However, traditional approaches struggle to detect unknown anomalies in complex and dynamic environments due to the scarcity of abnormal [...] Read more.

The rapid advancement of Artificial Intelligence of Things (AIoT) has driven an urgent demand for intelligent video anomaly detection (VAD) to ensure industrial safety. However, traditional approaches struggle to detect unknown anomalies in complex and dynamic environments due to the scarcity of abnormal samples and limited generalization capabilities. To address these challenges, this paper presents an adaptive VAD framework powered by edge intelligence tailored for resource-constrained industrial settings. Specifically, a lightweight feature extractor is developed by integrating residual networks with channel attention mechanisms, achieving a 58% reduction in model parameters through dense connectivity and output pruning. A multidimensional evaluation strategy is introduced to dynamically select optimal models for deployment on heterogeneous edge devices. To enhance cross-scene adaptability, we propose a multilayer adversarial domain adaptation mechanism that effectively aligns feature distributions across diverse industrial environments. Extensive experiments on a real-world coal mine surveillance dataset demonstrate that the proposed framework achieves an accuracy of 86.7% with an inference latency of 23 ms per frame on edge hardware, improving both detection efficiency and transferability. Full article

(This article belongs to the Special Issue Mathematical Method for Artificial Intelligence and Mobile Edge Computing)

23 pages, 28832 KiB

Open AccessArticle

Micro-Expression-Based Facial Analysis for Automated Pain Recognition in Dairy Cattle: An Early-Stage Evaluation

by Shuqiang Zhang, Kashfia Sailunaz and Suresh Neethirajan

AI 2025, 6(9), 199; https://doi.org/10.3390/ai6090199 - 22 Aug 2025

Abstract

Timely, objective pain recognition in dairy cattle is essential for welfare assurance, productivity, and ethical husbandry yet remains elusive because evolutionary pressure renders bovine distress signals brief and inconspicuous. Without verbal self-reporting, cows suppress overt cues, so automated vision is indispensable for on-farm [...] Read more.

Timely, objective pain recognition in dairy cattle is essential for welfare assurance, productivity, and ethical husbandry yet remains elusive because evolutionary pressure renders bovine distress signals brief and inconspicuous. Without verbal self-reporting, cows suppress overt cues, so automated vision is indispensable for on-farm triage. Although earlier systems tracked whole-body posture or static grimace scales, frame-level detection of facial micro-expressions has not been explored fully in livestock. We translate micro-expression analytics from automotive driver monitoring to the barn, linking modern computer vision with veterinary ethology. Our two-stage pipeline first detects faces and 30 landmarks using a custom You Only Look Once (YOLO) version 8-Pose network, achieving a 96.9% mean average precision (

m A P

) at an Intersection over the Union (IoU) threshold of 0.50 for detection and 83.8% Object Keypoint Similarity (OKS) for keypoint placement. Cropped eye, ear, and muzzle patches are encoded using a pretrained MobileNetV2, generating 3840-dimensional descriptors that capture millisecond muscle twitches. Sequences of five consecutive frames are fed into a 128-unit Long Short-Term Memory (LSTM) classifier that outputs pain probabilities. On a held-out validation set of 1700 frames, the system records 99.65% accuracy and an F1-score of 0.997, with only three false positives and three false negatives. Tested on 14 unseen barn videos, it attains 64.3% clip-level accuracy (i.e., overall accuracy for the whole video clip) and 83% precision for the pain class, using a hybrid aggregation rule that combines a 30% mean probability threshold with micro-burst counting to temper false alarms. As an early exploration from our proof-of-concept study on a subset of our custom dairy farm datasets, these results show that micro-expression mining can deliver scalable, non-invasive pain surveillance across variations in illumination, camera angle, background, and individual morphology. Future work will explore attention-based temporal pooling, curriculum learning for variable window lengths, domain-adaptive fine-tuning, and multimodal fusion with accelerometry on the complete datasets to elevate the performance toward clinical deployment. Full article

22 pages, 4034 KiB

Open AccessArticle

An Online Modular Framework for Anomaly Detection and Multiclass Classification in Video Surveillance

by Jonathan Flores-Monroy, Gibran Benitez-Garcia, Mariko Nakano-Miyatake and Hiroki Takahashi

Appl. Sci. 2025, 15(17), 9249; https://doi.org/10.3390/app15179249 - 22 Aug 2025

Abstract

Video surveillance systems are a key tool for the identification of anomalous events, but they still rely heavily on human analysis, which limits their efficiency. Current video anomaly detection models aim to automatically detect such events. However, most of them provide only a [...] Read more.

Video surveillance systems are a key tool for the identification of anomalous events, but they still rely heavily on human analysis, which limits their efficiency. Current video anomaly detection models aim to automatically detect such events. However, most of them provide only a binary classification (normal or anomalous) and do not identify the specific type of anomaly. Although recent proposals address anomaly classification, they typically require full video analysis, making them unsuitable for online applications. In this work, we propose a modular framework for the joint detection and classification of anomalies, designed to operate on individual clips within continuous video streams. The architecture integrates interchangeable modules (feature extractor, detector, and classifier) and is adaptable to both offline and online scenarios. Specifically, we introduce a multi-category classifier that processes only anomalous clips, enabling efficient clip-level classification. Experiments conducted on the UCF-Crime dataset validate the effectiveness of the framework, achieving 74.77% clip-level accuracy and 58.96% video-level accuracy, surpassing prior approaches and confirming its applicability in real-world surveillance environments. Full article

(This article belongs to the Special Issue Advanced Technologies in Intelligent Software Methodologies, Tools, and Techniques)

16 pages, 2587 KiB

Open AccessArticle

Video Display Improvement by Using Collaborative Edge Devices with YOLOv11

by Byoungkug Kim, Soohyun Wang and Jaeho Lee

Appl. Sci. 2025, 15(17), 9241; https://doi.org/10.3390/app15179241 - 22 Aug 2025

Abstract

Efficient human detection in video streams is essential for various IoT applications, including surveillance, smart cities, intelligent transportation systems (ITSs), and industrial automation. However, resource-constrained IoT devices often face limitations in handling deep learning-based object detection. This study proposes a collaborative edge computing [...] Read more.

Efficient human detection in video streams is essential for various IoT applications, including surveillance, smart cities, intelligent transportation systems (ITSs), and industrial automation. However, resource-constrained IoT devices often face limitations in handling deep learning-based object detection. This study proposes a collaborative edge computing framework utilizing multiple Raspberry Pi-based IoT devices to improve YOLOv11-based human detection performance. By distributing video frames across multiple edge devices, the proposed system effectively balances the computational load, resulting in an increase in the FPS (Frames Per Second) for processed video outputs. The experimental results confirm that as more edge devices collaborate, overall video processing efficiency improves, demonstrating the feasibility of distributed object detection for scalable and cost-effective IoT-based video analytics. In particular, the proposed approach holds significant potential for ITS applications such as pedestrian monitoring at intersections, real-time incident detection, and enhancing traffic safety by enabling responsive and decentralized analysis at the edge. Full article

(This article belongs to the Special Issue Advances in Intelligent Transportation and Its Applications)

► Show Figures

Figure 1

16 pages, 2923 KiB

Open AccessArticle

Method for Dairy Cow Target Detection and Tracking Based on Lightweight YOLO v11

by Zhongkun Li, Guodong Cheng, Lu Yang, Shuqing Han, Yali Wang, Xiaofei Dai, Jianyu Fang and Jianzhai Wu

Animals 2025, 15(16), 2439; https://doi.org/10.3390/ani15162439 - 20 Aug 2025

Viewed by 68

Abstract

With the development of precision livestock farming, in order to achieve the goal of fine management and improve the health and welfare of dairy cows, research on dairy cow motion monitoring has become particularly important. In this study, considering the problems surrounding a [...] Read more.

With the development of precision livestock farming, in order to achieve the goal of fine management and improve the health and welfare of dairy cows, research on dairy cow motion monitoring has become particularly important. In this study, considering the problems surrounding a large amount of model parameters, the poor accuracy of multi-target tracking, and the nonlinear motion of dairy cows in dairy farming scenes, a lightweight detection model based on improved YOLO v11n was proposed and four tracking algorithms were compared. Firstly, the Ghost module was used to replace the standard convolutions in the YOLO v11n network and a more lightweight attention mechanism called ELA was replaced, which reduced the number of model parameters by 18.59%. Then, a loss function called SDIoU was used to solve the influence of different cow target sizes. With the above improvements, the improved model achieved an increase of 2.0 percentage points and 2.3 percentage points in mAP@75 and mAP@50-95, respectively. Secondly, the performance of four tracking algorithms, including ByteTrack, BoT-SORT, OC-SORT, and BoostTrack, was systematically compared. The results show that 97.02% MOTA and 89.81% HOTA could be achieved when combined with the OC-SORT tracking algorithm. Considering the demand of equipment in lightweight models, the improved object detection model in this paper reduces the number of model parameters while offering better performance. The OC-SORT tracking algorithm enables the tracking and localization of cows through video surveillance alone, creating the necessary conditions for the continuous monitoring of cows. Full article

(This article belongs to the Section Animal System and Management)

► Show Figures

Figure 1

34 pages, 3909 KiB

Open AccessArticle

UWB Radar-Based Human Activity Recognition via EWT–Hilbert Spectral Videos and Dual-Path Deep Learning

by Hui-Sup Cho and Young-Jin Park

Electronics 2025, 14(16), 3264; https://doi.org/10.3390/electronics14163264 - 17 Aug 2025

Viewed by 364

Abstract

Ultrawideband (UWB) radar has emerged as a compelling solution for noncontact human activity recognition. This study proposes a novel framework that leverages adaptive signal decomposition and video-based deep learning to classify human motions with high accuracy using a single UWB radar. The raw [...] Read more.

Ultrawideband (UWB) radar has emerged as a compelling solution for noncontact human activity recognition. This study proposes a novel framework that leverages adaptive signal decomposition and video-based deep learning to classify human motions with high accuracy using a single UWB radar. The raw radar signals were processed by empirical wavelet transform (EWT) to isolate the dominant frequency components in a data-driven manner. These components were further analyzed using the Hilbert transform to produce time–frequency spectra that capture motion-specific signatures through subtle phase variations. Instead of treating each spectrum as an isolated image, the resulting sequence was organized into a temporally coherent video, capturing spatial and temporal motion dynamics. The video data were used to train the SlowFast network—a dual-path deep learning model optimized for video-based action recognition. The proposed system achieved an average classification accuracy exceeding 99% across five representative human actions. The experimental results confirmed that the EWT–Hilbert-based preprocessing enhanced feature distinctiveness, while the SlowFast architecture enabled efficient and accurate learning of motion patterns. The proposed framework is intuitive, computationally efficient, and scalable, demonstrating strong potential for deployment in real-world scenarios such as smart healthcare, ambient-assisted living, and privacy-sensitive surveillance environments. Full article

► Show Figures

Figure 1

14 pages, 1394 KiB

Open AccessArticle

Pulmonary Benign Metastasizing Leiomyoma: A Retrospective Analysis of Seven Cases Including a Rare Coexistence with In Situ Mucinous Adenocarcinoma

by Zeguang Ye, Xi Wu, Can Fang and Min Zhu

Biomedicines 2025, 13(8), 1971; https://doi.org/10.3390/biomedicines13081971 - 13 Aug 2025

Viewed by 308

Abstract

Background: Pulmonary benign metastasizing leiomyoma (PBML) is a rare condition characterized by histologically benign smooth muscle tumors occurring at extrauterine sites, often in women with a history of uterine leiomyoma. While PBML generally exhibits indolent behavior, its pathogenesis, management, and malignant potential remain [...] Read more.

Background: Pulmonary benign metastasizing leiomyoma (PBML) is a rare condition characterized by histologically benign smooth muscle tumors occurring at extrauterine sites, often in women with a history of uterine leiomyoma. While PBML generally exhibits indolent behavior, its pathogenesis, management, and malignant potential remain unclear. Methods: This study retrospectively analyzes the clinical characteristics, imaging features, diagnostic approaches, pathological findings, treatment strategies, and outcomes of seven patients with PBML treated at our institution between January 2016 and May 2025. Results: Seven patients were included, with a mean age at diagnosis of 48.9 ± 5.6 years. Two patients presented with respiratory symptoms. Imaging revealed multiple bilateral pulmonary nodules in four patients and solitary nodules in three. Six patients were diagnosed via video-assisted thoracoscopic surgery, and one through computed tomography-guided percutaneous biopsy. Immunohistochemistry revealed positivity for SMA and Desmin in all cases, ER in six, and PR in five, with the Ki-67 labeling index ≤3% in six patients. One patient had a coexisting in situ mucinous adenocarcinoma within the PBML lesion. All had a history of uterine leiomyoma. After diagnosis, one patient received hormonal therapy, and another underwent right adnexectomy. The remaining patients were managed with surveillance without additional treatment. During follow-up, one patient developed distant organ metastasis. Conclusions: PBML is a rare, typically indolent condition with potential for metastasis. Accurate diagnosis relies on imaging, histopathology, and immunohistochemistry. This study reports a unique case of PBML coexisting with intratumoral in situ mucinous adenocarcinoma, a previously unreported finding that may broaden the known histopathological spectrum. Full article

(This article belongs to the Section Cancer Biology and Oncology)

► Show Figures

Figure 1

17 pages, 840 KiB

Open AccessArticle

Improving Person Re-Identification via Feature Erasing-Driven Data Augmentation

by Shangdong Zhu and Huayan Zhang

Mathematics 2025, 13(16), 2580; https://doi.org/10.3390/math13162580 - 12 Aug 2025

Viewed by 329

Abstract

Person re-identification (Re-ID) has attracted considerable attention in the field of computer vision, primarily due to its critical role in video surveillance and public security applications. However, most existing Re-ID approaches rely on image-level erasing techniques, which may inadvertently remove fine-grained visual cues [...] Read more.

Person re-identification (Re-ID) has attracted considerable attention in the field of computer vision, primarily due to its critical role in video surveillance and public security applications. However, most existing Re-ID approaches rely on image-level erasing techniques, which may inadvertently remove fine-grained visual cues that are essential for accurate identification. To mitigate this limitation, we propose an effective feature erasing-based data augmentation framework that aims to explore discriminative information within individual samples and improve overall recognition performance. Specifically, we first introduce a diagonal swapping augmentation strategy to increase the diversity of the training samples. Secondly, we design a feature erasing-driven method applied to the extracted pedestrian feature to capture identity-relevant information at the feature level. Finally, extensive experiments demonstrate that our method achieves competitive performance compared to many representative approaches. Full article

► Show Figures

Figure 1

17 pages, 918 KiB

Open AccessArticle

LTGS-Net: Local Temporal and Global Spatial Network for Weakly Supervised Video Anomaly Detection

by Minghao Li, Xiaohan Wang, Haofei Wang and Min Yang

Sensors 2025, 25(16), 4884; https://doi.org/10.3390/s25164884 - 8 Aug 2025

Viewed by 297

Abstract

Video anomaly detection has an important application value in the field of intelligent surveillance; however, due to the problems of sparse anomaly events and expensive labeling, it has made weakly supervised methods a research hotspot. Most of the current methods still adopt the [...] Read more.

Video anomaly detection has an important application value in the field of intelligent surveillance; however, due to the problems of sparse anomaly events and expensive labeling, it has made weakly supervised methods a research hotspot. Most of the current methods still adopt the strategy of processing temporal and spatial features independently, which makes it difficult to fully capture their temporal and spatial complex dependencies, affecting the accuracy and robustness of detection. Existing studies predominantly process temporal and spatial information independently, which limits the ability to effectively capture their interdependencies. To address this, we propose the Local Temporal and Global Spatial Network (LTGS) for weakly supervised video anomaly detection. The LTGS architecture incorporates a clip-level temporal feature relation module and a video-level spatial feature module, which collaboratively enhance discriminative representations. Through joint training of these modules, we develop a feature encoder specifically tailored for video anomaly detection. To further refine clip-level annotations and better align them with actual events, we employ a dynamic label updating strategy. These updated labels are utilized to optimize the model and enhance its robustness. Extensive experiments on two widely used public datasets, ShanghaiTech and UCF-Crime, validate the effectiveness of the proposed LTGS method. Experimental results demonstrate that the LTGS achieves an AUC of 96.69% on the ShanghaiTech dataset and 82.33% on the UCF dataset, outperforming various state-of-the-art algorithms in anomaly detection tasks. Full article

(This article belongs to the Section Sensor Networks)

► Show Figures

Figure 1

23 pages, 4350 KiB

Open AccessArticle

Gardens Fire Detection Based on the Symmetrical SSS-YOLOv8 Network

by Bo Liu, Junhua Wang, Qing An, Yanglu Wan, Jianing Zhou and Xijiang Chen

Symmetry 2025, 17(8), 1269; https://doi.org/10.3390/sym17081269 - 8 Aug 2025

Viewed by 296

Abstract

Fire detection primarily relies on sensors such as smoke detectors, heat detectors, and flame detectors. However, due to cost constraints, it is impractical to deploy such a large number of sensors for fire detection in outdoor gardens and landscapes. To address this challenge [...] Read more.

Fire detection primarily relies on sensors such as smoke detectors, heat detectors, and flame detectors. However, due to cost constraints, it is impractical to deploy such a large number of sensors for fire detection in outdoor gardens and landscapes. To address this challenge and aiming to enhance fire detection accuracy in gardens while achieving lightweight design, this paper proposes an improved symmetry SSS-YOLOv8 model for lightweight fire detection in garden video surveillance. Firstly, the SPDConv layer from ShuffleNetV2 is used to preserve flame or smoke information, combined with the Conv_Maxpool layer to reduce computational complexity. Subsequently, the SE module is introduced into the backbone feature extraction network to enhance features specific to fire and smoke. ShuffleNetV2 and the SE module are configured into a symmetric local network structure to enhance the extraction of flame or smoke features. Finally, WIoU is introduced as the bounding box regression loss function to further ensure the detection performance of the symmetry SSS-YOLOv8 model. Experimental results demonstrate that the improved symmetry SSS-YOLOv8 model achieves precision and recall rates for garden flame and smoke detection both exceeding 0.70. Compared to the YOLOv8n model, it exhibits a 2.1 percentage point increase in mAP, while its parameter is only 1.99 M, reduced to 65.7% of the original model. The proposed model demonstrates superior detection accuracy for garden fires compared to other YOLO series models of the same type, as well as different types of SSD and Faster R-CNN models. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

24 pages, 23817 KiB

Open AccessArticle

Dual-Path Adversarial Denoising Network Based on UNet

by Jinchi Yu, Yu Zhou, Mingchen Sun and Dadong Wang

Sensors 2025, 25(15), 4751; https://doi.org/10.3390/s25154751 - 1 Aug 2025

Viewed by 325

Abstract

Digital image quality is crucial for reliable analysis in applications such as medical imaging, satellite remote sensing, and video surveillance. However, traditional denoising methods struggle to balance noise removal with detail preservation and lack adaptability to various types of noise. We propose a [...] Read more.

Digital image quality is crucial for reliable analysis in applications such as medical imaging, satellite remote sensing, and video surveillance. However, traditional denoising methods struggle to balance noise removal with detail preservation and lack adaptability to various types of noise. We propose a novel three-module architecture for image denoising, comprising a generator, a dual-path-UNet-based denoiser, and a discriminator. The generator creates synthetic noise patterns to augment training data, while the dual-path-UNet denoiser uses multiple receptive field modules to preserve fine details and dense feature fusion to maintain global structural integrity. The discriminator provides adversarial feedback to enhance denoising performance. This dual-path adversarial training mechanism addresses the limitations of traditional methods by simultaneously capturing both local details and global structures. Experiments on the SIDD, DND, and PolyU datasets demonstrate superior performance. We compare our architecture with the latest state-of-the-art GAN variants through comprehensive qualitative and quantitative evaluations. These results confirm the effectiveness of noise removal with minimal loss of critical image details. The proposed architecture enhances image denoising capabilities in complex noise scenarios, providing a robust solution for applications that require high image fidelity. By enhancing adaptability to various types of noise while maintaining structural integrity, this method provides a versatile tool for image processing tasks that require preserving detail. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

22 pages, 554 KiB

Open AccessSystematic Review

Smart Homes: A Meta-Study on Sense of Security and Home Automation

by Carlos M. Torres-Hernandez, Mariano Garduño-Aparicio and Juvenal Rodriguez-Resendiz

Technologies 2025, 13(8), 320; https://doi.org/10.3390/technologies13080320 - 30 Jul 2025

Viewed by 887

Abstract

This review examines advancements in smart home security through the integration of home automation technologies. Various security systems, including surveillance cameras, smart locks, and motion sensors, are analyzed, highlighting their effectiveness in enhancing home security. These systems enable users to monitor and control [...] Read more.

This review examines advancements in smart home security through the integration of home automation technologies. Various security systems, including surveillance cameras, smart locks, and motion sensors, are analyzed, highlighting their effectiveness in enhancing home security. These systems enable users to monitor and control their homes in real-time, providing an additional layer of security. The document also examines how these security systems can enhance the quality of life for users by providing greater convenience and control over their domestic environment. The ability to receive instant alerts and access video recordings from anywhere allows users to respond quickly to unexpected situations, thereby increasing their sense of security and well-being. Additionally, the challenges and future trends in this field are addressed, emphasizing the importance of designing solutions that are intuitive and easy to use. As technology continues to evolve, it is crucial for developers and manufacturers to focus on creating products that seamlessly integrate into users’ daily lives, facilitating their adoption and use. This comprehensive state-of-the-art review, based on the Scopus database, provides a detailed overview of the current status and future potential of smart home security systems. It highlights how ongoing innovation in this field can lead to the development of more advanced and efficient solutions that not only protect homes but also enhance the overall user experience. Full article

(This article belongs to the Special Issue Smart Systems (SmaSys2024))

► Show Figures

Figure 1

15 pages, 1943 KiB

Open AccessArticle

Multimodal Latent Representation Learning for Video Moment Retrieval

by Jinkwon Hwang, Mingyu Jeon and Junyeong Kim

Sensors 2025, 25(14), 4528; https://doi.org/10.3390/s25144528 - 21 Jul 2025

Viewed by 561

Abstract

The rise of artificial intelligence (AI) has revolutionized the processing and analysis of video sensor data, driving advancements in areas such as surveillance, autonomous driving, and personalized content recommendations. However, leveraging video data presents unique challenges, particularly in the time-intensive feature extraction process [...] Read more.

The rise of artificial intelligence (AI) has revolutionized the processing and analysis of video sensor data, driving advancements in areas such as surveillance, autonomous driving, and personalized content recommendations. However, leveraging video data presents unique challenges, particularly in the time-intensive feature extraction process required for model training. This challenge is intensified in research environments lacking advanced hardware resources like GPUs. We propose a new method called the multimodal latent representation learning framework (MLRL) to address these limitations. MLRL enhances the performance of downstream tasks by conducting additional representation learning on pre-extracted features. By integrating and augmenting multimodal data, our method effectively predicts latent representations, leveraging pre-extracted features to reduce model training time and improve task performance. We validate the efficacy of MLRL on the video moment retrieval task using the QVHighlight dataset, benchmarking against the QD-DETR model. Our results demonstrate significant improvements, highlighting the potential of MLRL to streamline video data processing by leveraging pre-extracted features to bypass the time-consuming extraction process of raw sensor data and enhance model accuracy in various sensor-based applications. Full article

(This article belongs to the Special Issue Multimodal Perception Modeling Based on Advanced Computational Technologies)

► Show Figures

Figure 1

Search Results (887)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (887)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI