MDPI - Publisher of Open Access Journals

22 pages, 554 KiB

Open AccessSystematic Review

Smart Homes: A Meta-Study on Sense of Security and Home Automation

by Carlos M. Torres-Hernandez, Mariano Garduño-Aparicio and Juvenal Rodriguez-Resendiz

Technologies 2025, 13(8), 320; https://doi.org/10.3390/technologies13080320 - 30 Jul 2025

Viewed by 430

This review examines advancements in smart home security through the integration of home automation technologies. Various security systems, including surveillance cameras, smart locks, and motion sensors, are analyzed, highlighting their effectiveness in enhancing home security. These systems enable users to monitor and control [...] Read more.

This review examines advancements in smart home security through the integration of home automation technologies. Various security systems, including surveillance cameras, smart locks, and motion sensors, are analyzed, highlighting their effectiveness in enhancing home security. These systems enable users to monitor and control their homes in real-time, providing an additional layer of security. The document also examines how these security systems can enhance the quality of life for users by providing greater convenience and control over their domestic environment. The ability to receive instant alerts and access video recordings from anywhere allows users to respond quickly to unexpected situations, thereby increasing their sense of security and well-being. Additionally, the challenges and future trends in this field are addressed, emphasizing the importance of designing solutions that are intuitive and easy to use. As technology continues to evolve, it is crucial for developers and manufacturers to focus on creating products that seamlessly integrate into users’ daily lives, facilitating their adoption and use. This comprehensive state-of-the-art review, based on the Scopus database, provides a detailed overview of the current status and future potential of smart home security systems. It highlights how ongoing innovation in this field can lead to the development of more advanced and efficient solutions that not only protect homes but also enhance the overall user experience. Full article

(This article belongs to the Special Issue Smart Systems (SmaSys2024))

► Show Figures

Figure 1

40 pages, 1540 KiB

Open AccessReview

A Survey on Video Big Data Analytics: Architecture, Technologies, and Open Research Challenges

by Thi-Thu-Trang Do, Quyet-Thang Huynh, Kyungbaek Kim and Van-Quyet Nguyen

Appl. Sci. 2025, 15(14), 8089; https://doi.org/10.3390/app15148089 - 21 Jul 2025

Viewed by 589

Abstract

The exponential growth of video data across domains such as surveillance, transportation, and healthcare has raised critical challenges in scalability, real-time processing, and privacy preservation. While existing studies have addressed individual aspects of Video Big Data Analytics (VBDA), an integrated, up-to-date perspective remains [...] Read more.

The exponential growth of video data across domains such as surveillance, transportation, and healthcare has raised critical challenges in scalability, real-time processing, and privacy preservation. While existing studies have addressed individual aspects of Video Big Data Analytics (VBDA), an integrated, up-to-date perspective remains limited. This paper presents a comprehensive survey of system architectures and enabling technologies in VBDA. It categorizes system architectures into four primary types as follows: centralized, cloud-based infrastructures, edge computing, and hybrid cloud–edge. It also analyzes key enabling technologies, including real-time streaming, scalable distributed processing, intelligent AI models, and advanced storage for managing large-scale multimodal video data. In addition, the study provides a functional taxonomy of core video processing tasks, including object detection, anomaly recognition, and semantic retrieval, and maps these tasks to real-world applications. Based on the survey findings, the paper proposes ViMindXAI, a hybrid AI-driven platform that combines edge and cloud orchestration, adaptive storage, and privacy-aware learning to support scalable and trustworthy video analytics. Our analysis in this survey highlights emerging trends such as the shift toward hybrid cloud–edge architectures, the growing importance of explainable AI and federated learning, and the urgent need for secure and efficient video data management. These findings highlight key directions for designing next-generation VBDA platforms that enhance real-time, data-driven decision-making in domains such as public safety, transportation, and healthcare. These platforms facilitate timely insights, rapid response, and regulatory alignment through scalable and explainable analytics. This work provides a robust conceptual foundation for future research on adaptive and efficient decision-support systems in video-intensive environments. Full article

(This article belongs to the Special Issue Advances in Machine Learning and Data Mining: Emerging Trends and Applications)

► Show Figures

Figure 1

29 pages, 1184 KiB

Open AccessArticle

Perception-Based H.264/AVC Video Coding for Resource-Constrained and Low-Bit-Rate Applications

by Lih-Jen Kau, Chin-Kun Tseng and Ming-Xian Lee

Sensors 2025, 25(14), 4259; https://doi.org/10.3390/s25144259 - 8 Jul 2025

Viewed by 393

Abstract

With the rapid expansion of Internet of Things (IoT) and edge computing applications, efficient video transmission under constrained bandwidth and limited computational resources has become increasingly critical. In such environments, perception-based video coding plays a vital role in maintaining acceptable visual quality while [...] Read more.

With the rapid expansion of Internet of Things (IoT) and edge computing applications, efficient video transmission under constrained bandwidth and limited computational resources has become increasingly critical. In such environments, perception-based video coding plays a vital role in maintaining acceptable visual quality while minimizing bit rate and processing overhead. Although newer video coding standards have emerged, H.264/AVC remains the dominant compression format in many deployed systems, particularly in commercial CCTV surveillance, due to its compatibility, stability, and widespread hardware support. Motivated by these practical demands, this paper proposes a perception-based video coding algorithm specifically tailored for low-bit-rate H.264/AVC applications. By targeting regions most relevant to the human visual system, the proposed method enhances perceptual quality while optimizing resource usage, making it particularly suitable for embedded systems and bandwidth-limited communication channels. In general, regions containing human faces and those exhibiting significant motion are of primary importance for human perception and should receive higher bit allocation to preserve visual quality. To this end, macroblocks (MBs) containing human faces are detected using the Viola–Jones algorithm, which leverages AdaBoost for feature selection and a cascade of classifiers for fast and accurate detection. This approach is favored over deep learning-based models due to its low computational complexity and real-time capability, making it ideal for latency- and resource-constrained IoT and edge environments. Motion-intensive macroblocks were identified by comparing their motion intensity against the average motion level of preceding reference frames. Based on these criteria, a dynamic quantization parameter (QP) adjustment strategy was applied to assign finer quantization to perceptually important regions of interest (ROIs) in low-bit-rate scenarios. The experimental results show that the proposed method achieves superior subjective visual quality and objective Peak Signal-to-Noise Ratio (PSNR) compared to the standard JM software and other state-of-the-art algorithms under the same bit rate constraints. Moreover, the approach introduces only a marginal increase in computational complexity, highlighting its efficiency. Overall, the proposed algorithm offers an effective balance between visual quality and computational performance, making it well suited for video transmission in bandwidth-constrained, resource-limited IoT and edge computing environments. Full article

(This article belongs to the Special Issue Image/Video Coding and Processing Techniques for Intelligent Sensor Nodes: 2nd Edition)

► Show Figures

Figure 1

21 pages, 4859 KiB

Open AccessArticle

Improvement of SAM2 Algorithm Based on Kalman Filtering for Long-Term Video Object Segmentation

by Jun Yin, Fei Wu, Hao Su, Peng Huang and Yuetong Qixuan

Sensors 2025, 25(13), 4199; https://doi.org/10.3390/s25134199 - 5 Jul 2025

Viewed by 546

Abstract

The Segment Anything Model 2 (SAM2) has achieved state-of-the-art performance in pixel-level object segmentation for both static and dynamic visual content. Its streaming memory architecture maintains spatial context across video sequences, yet struggles with long-term tracking due to its static inference framework. SAM [...] Read more.

The Segment Anything Model 2 (SAM2) has achieved state-of-the-art performance in pixel-level object segmentation for both static and dynamic visual content. Its streaming memory architecture maintains spatial context across video sequences, yet struggles with long-term tracking due to its static inference framework. SAM 2’s fixed temporal window approach indiscriminately retains historical frames, failing to account for frame quality or dynamic motion patterns. This leads to error propagation and tracking instability in challenging scenarios involving fast-moving objects, partial occlusions, or crowded environments. To overcome these limitations, this paper proposes SAM2Plus, a zero-shot enhancement framework that integrates Kalman filter prediction, dynamic quality thresholds, and adaptive memory management. The Kalman filter models object motion using physical constraints to predict trajectories and dynamically refine segmentation states, mitigating positional drift during occlusions or velocity changes. Dynamic thresholds, combined with multi-criteria evaluation metrics (e.g., motion coherence, appearance consistency), prioritize high-quality frames while adaptively balancing confidence scores and temporal smoothness. This reduces ambiguities among similar objects in complex scenes. SAM2Plus further employs an optimized memory system that prunes outdated or low-confidence entries and retains temporally coherent context, ensuring constant computational resources even for infinitely long videos. Extensive experiments on two video object segmentation (VOS) benchmarks demonstrate SAM2Plus’s superiority over SAM 2. It achieves an average improvement of 1.0 in J&F metrics across all 24 direct comparisons, with gains exceeding 2.3 points on SA-V and LVOS datasets for long-term tracking. The method delivers real-time performance and strong generalization without fine-tuning or additional parameters, effectively addressing occlusion recovery and viewpoint changes. By unifying motion-aware physics-based prediction with spatial segmentation, SAM2Plus bridges the gap between static and dynamic reasoning, offering a scalable solution for real-world applications such as autonomous driving and surveillance systems. Full article

(This article belongs to the Special Issue Image and Video Processing and Recognition Based on Artificial Intelligence: 3rd Edition)

► Show Figures

Figure 1

802 KiB

Open AccessProceeding Paper

Video Surveillance and Artificial Intelligence for Urban Security in Smart Cities: A Review of a Selection of Empirical Studies from 2018 to 2024

by Abdellah Dardour, Essaid El Haji and Mohamed Achkari Begdouri

Comput. Sci. Math. Forum 2025, 10(1), 15; https://doi.org/10.3390/cmsf2025010015 - 16 Jun 2025

Viewed by 141

Abstract

The rapid growth of information and communication technologies, in particular big data, artificial intelligence (AI), and the Internet of Things (IoT), has made it possible to make smart cities a tangible reality. In this context, real-time video surveillance plays a crucial role in [...] Read more.

The rapid growth of information and communication technologies, in particular big data, artificial intelligence (AI), and the Internet of Things (IoT), has made it possible to make smart cities a tangible reality. In this context, real-time video surveillance plays a crucial role in improving public safety. This article presents a systematic review of studies focused on the detection of acts of aggression and crime in these cities. By studying 100 indexed scientific articles, dating from 2018 to 2024, we examine the most recent methods and techniques, with an emphasis on the use of machine learning and deep learning for the processing of real-time video streams. The works examined cover several technological axes such as convolutional neural networks (CNNs), fog computing, and integrated IoT systems while also addressing issues such as the challenges related to the detection of anomalies, frequently affected by their contextual and uncertain nature. Finally, this article offers suggestions to guide future research, with the aim of improving the accuracy and efficiency of intelligent monitoring systems. Full article

(This article belongs to the Proceedings of International Conference on Sustainable Computing and Green Technologies (SCGT’2025))

► Show Figures

Figure 1

57 pages, 4508 KiB

Open AccessReview

Person Recognition via Gait: A Review of Covariate Impact and Challenges

by Abdul Basit Mughal, Rafi Ullah Khan, Amine Bermak and Atiq ur Rehman

Sensors 2025, 25(11), 3471; https://doi.org/10.3390/s25113471 - 30 May 2025

Viewed by 879

Abstract

Human gait identification is a biometric technique that permits recognizing an individual from a long distance focusing on numerous features such as movement, time, and clothing. This approach in particular is highly useful in video surveillance scenarios, where biometric systems allow people to [...] Read more.

Human gait identification is a biometric technique that permits recognizing an individual from a long distance focusing on numerous features such as movement, time, and clothing. This approach in particular is highly useful in video surveillance scenarios, where biometric systems allow people to be easily recognized without intruding on their privacy. In the domain of computer vision, one of the essential and most difficult tasks is tracking a person across multiple camera views, specifically, recognizing the similar person in diverse scenes. However, the accuracy of the gait identification system is significantly affected by covariate factors, such as different view angles, clothing, walking speeds, occlusion, and low-lighting conditions. Previous studies have often overlooked the influence of these factors, leaving a gap in the comprehensive understanding of gait recognition systems. This paper provides a comprehensive review of the most effective gait recognition methods, assessing their performance across various image source databases while highlighting the limitations of existing datasets. Additionally, it explores the influence of key covariate factors, such as viewing angle, clothing, and environmental conditions, on model performance. The paper also compares traditional gait recognition methods with advanced deep learning techniques, offering theoretical insights into the impact of covariates and addressing real-world application challenges. The contrasts and discussions presented provide valuable insights for developing a robust and improved gait-based identification framework for future advancements. Full article

(This article belongs to the Special Issue Artificial Intelligence and Sensor-Based Gait Recognition)

► Show Figures

Figure 1

16 pages, 2556 KiB

Open AccessArticle

Deep Learning Method with Domain-Task Adaptation and Client-Specific Fine-Tuning YOLO11 Model for Counting Greenhouse Tomatoes

by Igor Glukhikh, Dmitry Glukhikh, Anna Gubina and Tatiana Chernysheva

Appl. Syst. Innov. 2025, 8(3), 71; https://doi.org/10.3390/asi8030071 - 27 May 2025

Viewed by 777

Abstract

This article discusses the tasks involved in the operational assessment of the volume of produced goods, such as tomatoes. The large-scale implementation of computer vision systems in greenhouses requires approaches that reduce costs, time and complexity, particularly in creating training data and preparing [...] Read more.

This article discusses the tasks involved in the operational assessment of the volume of produced goods, such as tomatoes. The large-scale implementation of computer vision systems in greenhouses requires approaches that reduce costs, time and complexity, particularly in creating training data and preparing neural network models. Publicly available models like YOLO often lack the accuracy needed for specific tasks. This study proposes a method for the sequential training of detection models, incorporating Domain-Task Adaptation and Client-Specific Fine-Tuning. The model is initially trained on a large, specialized dataset for tasks like tomato detection, followed by fine-tuning with a small custom dataset reflecting real greenhouse conditions. This results in the light YOLO11n model achieving high validation accuracy (mAP50 > 0.83, Precision > 0.75, Recall > 0.73) while reducing computational resource requirements. Additionally, a custom training dataset was developed that captures the unique challenges of greenhouse environments, such as dense vegetation and occlusions. An algorithm for counting tomatoes was also created, which processes video frames to accurately count only the visible tomatoes in the front row of plants. This algorithm can be utilized in mobile video surveillance systems, enhancing monitoring efficiency in greenhouses. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

16 pages, 5532 KiB

Open AccessArticle

Intelligent System Study for Asymmetric Positioning of Personnel, Transport, and Equipment Monitoring in Coal Mines

by Diana Novak, Yuriy Kozhubaev, Hengbo Kang, Haodong Cheng and Roman Ershov

Symmetry 2025, 17(5), 755; https://doi.org/10.3390/sym17050755 - 14 May 2025

Viewed by 447

Abstract

The paper presents a study of an intelligent system for personnel positioning, transport, and equipment monitoring in the mining industry using convolutional neural network (CNN) and OpenPose technology. The proposed framework operates through a three-stage pipeline: OpenPose-based skeleton extraction from surveillance video streams, [...] Read more.

The paper presents a study of an intelligent system for personnel positioning, transport, and equipment monitoring in the mining industry using convolutional neural network (CNN) and OpenPose technology. The proposed framework operates through a three-stage pipeline: OpenPose-based skeleton extraction from surveillance video streams, capturing 18 key body joints at 30fps; multimodal feature fusion, combining skeletal key points and proximity sensor data to achieve environmental context awareness and obtain relevant feature values; and hierarchical pose alert, using attention-enhanced bidirectional LSTM (trained on 5000 annotated fall instances) for fall warning. The experiment conducted demonstrated that the combined use of the aforementioned technologies allows the system to determine the location and behavior of personnel, calculate the distance to hazardous areas in real time, and analyze personnel postures to identify possible risks such as falls or immobility. The system’s capacity to track the location of vehicles and equipment enhances operational efficiency, thereby mitigating the risk of accidents. Additionally, the system provides real-time alerts, identifying abnormal behavior, equipment malfunctions, and safety hazards, thus promoting enhanced mine management efficiency, improved safe working conditions, and a reduction in accidents. Full article

(This article belongs to the Special Issue Symmetry and Asymmetry in Computer Vision and Graphics)

► Show Figures

Figure 1

26 pages, 7868 KiB

Open AccessArticle

A System for Real-Time Detection of Abandoned Luggage

by Ivan Vrsalovic, Jonatan Lerga and Marina Ivasic-Kos

Sensors 2025, 25(9), 2872; https://doi.org/10.3390/s25092872 - 2 May 2025

Viewed by 870

Abstract

In this paper, we propose a system for the real-time automatic detection of abandoned luggage in an airport recorded by surveillance cameras. To do this, we use an adapted YOLOv11-s model and a proposed algorithm for detecting unattended luggage. The system uses the [...] Read more.

In this paper, we propose a system for the real-time automatic detection of abandoned luggage in an airport recorded by surveillance cameras. To do this, we use an adapted YOLOv11-s model and a proposed algorithm for detecting unattended luggage. The system uses the OpenCV library for the video processing of the recorded footage, a detector, and an algorithm that analyzes the movement of a person and their luggage and evaluates their spatial and temporal relationships to determine whether the luggage is truly abandoned. We used several popular deep convolutional neural network architectures for object detection, e.g., Yolov8, Yolov11, and DETR encoder–decoder transformer with a ResNet-50 deep convolutional backbone, we fine-tuned them on our dataset, and compared their performance in detecting people and luggage in surveillance scenes recorded by an airport surveillance camera. The fine-tuned model significantly improved the detection of people and luggage captured by the airport surveillance camera in our custom dataset. The fine-tuned YOLOv8 and YOLOv11 models achieved excellent real-time results on a challenging dataset consisting only of small and medium-sized objects. They achieved real-time precision (mAP) of over 88%, while their precision for medium-sized objects was over 96%. However, the YOLOv11-s model achieved the highest precision in detecting small objects, corresponding to 85.8%, which is why we selected it as a component of the abandoned luggage detection system. The abandoned luggage detection algorithm was tested in various scenarios where luggage may be left behind and in situations that may be potentially suspicious and showed promising results. Full article

(This article belongs to the Special Issue Sensors for Pattern Recognition and Computer Vision)

► Show Figures

Figure 1

38 pages, 28331 KiB

Open AccessArticle

Robustness Benchmark Evaluation and Optimization for Real-Time Vehicle Detection Under Multiple Adverse Conditions

by Jianming Cai, Yifan Gao and Jinjun Tang

Appl. Sci. 2025, 15(9), 4950; https://doi.org/10.3390/app15094950 - 29 Apr 2025

Viewed by 787

Abstract

This paper presents a robustness benchmark evaluation and optimization for vehicle detection. Real-time vehicle detection has become an essential means of data perception in the transportation field, covering various aspects such as intelligent transportation systems, video surveillance, and autonomous driving. However, evaluating and [...] Read more.

This paper presents a robustness benchmark evaluation and optimization for vehicle detection. Real-time vehicle detection has become an essential means of data perception in the transportation field, covering various aspects such as intelligent transportation systems, video surveillance, and autonomous driving. However, evaluating and optimizing the robustness of vehicle detection in real traffic scenarios remains challenging. When data distributions change, such as the impact of adverse weather or sensor damages, model reliability cannot be guaranteed. We first conducted a large-scale robustness benchmark evaluation for vehicle detection. Analysis revealed that adverse weather, motion, and occlusion are the most detrimental factors to vehicle detection performance. The impact of color changes and noise, while present, is relatively less pronounced. Moreover, the robustness of vehicle detection is closely linked to its baseline performance and model size. And as the severity of corruption intensifies, the performance of models experiences a sharp drop. When the data distribution of images changes, the features of the vehicles that the model focuses on are weakened, making the activation level of the targets significantly reduced. By evaluation, we provided guidance and direction for optimizing detection robustness. Based on these findings, we propose TDIRM, a traffic-degraded image restoration model based on stable diffusion, designed to efficiently restore degraded images in real traffic scenarios and thereby enhance the robustness of vehicle detection. The model introduces an image semantics encoder (ISE) module to extract features that align with the latent description of the real background while excluding degradation-related information. Additionally, a triple control embedding attention (TCE) module is proposed to fully integrate all condition controls. Through a triple condition control mechanism, TDIRM achieves restoration results with high fidelity and consistency. Experimental results demonstrate that TDIRM improves vehicle detection mAP by 6.92% on real dense fog data, especially for small distant vehicles that were severely obscured by fog. By enabling semantic-structural-content collaborative optimization within the diffusion framework, TDIRM establishes a novel paradigm for traffic scene image restoration. Full article

(This article belongs to the Special Issue Advances in Autonomous Driving and Smart Transportation)

► Show Figures

Figure 1

13 pages, 4627 KiB

Open AccessArticle

Detection of Abnormal Pedestrian Flows with Automatic Contextualization Using Pre-Trained YOLO11n

by Adrián Núñez-Vieyra, Juan C. Olivares-Rojas, Rogelio Ferreira-Escutia, Arturo Méndez-Patiño, José A. Gutiérrez-Gnecchi and Enrique Reyes-Archundia

Math. Comput. Appl. 2025, 30(2), 44; https://doi.org/10.3390/mca30020044 - 17 Apr 2025

Viewed by 501

Abstract

Recently, video surveillance systems have evolved from expensive, human-operated monitoring systems that were only useful after the crime was committed to systems that monitor 24/7, in real time, and with less and less human involvement. This is partly due to the use of [...] Read more.

Recently, video surveillance systems have evolved from expensive, human-operated monitoring systems that were only useful after the crime was committed to systems that monitor 24/7, in real time, and with less and less human involvement. This is partly due to the use of smart cameras, the improvement of the Internet, and AI-based algorithms that allow the classifying and tracking of objects in images and in some cases identifying them as threats. Threats are often associated with abnormal or unexpected situations such as the presence of unauthorized persons in a given place or time, the manifestation of a different behavior by one or more persons compared to the behavior of the majority, or simply an unexpected number of people in the place, which depends largely on the available information of their context, i.e., place, date, and time of capture. In this work, we propose a model to automatically contextualize video capture scenarios, generating data such as location, date, time, and flow of people in the scene. A strategy to measure the accuracy of the data generated for such contextualization is also proposed. The pre-trained YOLO11n algorithm and the Bot-SORT algorithm gave the best results in person detection and tracking, respectively. Full article

(This article belongs to the Special Issue New Trends in Computational Intelligence and Applications 2024)

► Show Figures

Figure 1

30 pages, 14418 KiB

Open AccessArticle

LAVID: A Lightweight and Autonomous Smart Camera System for Urban Violence Detection and Geolocation

by Mohammed Azzakhnini, Houda Saidi, Ahmed Azough, Hamid Tairi and Hassan Qjidaa

Computers 2025, 14(4), 140; https://doi.org/10.3390/computers14040140 - 7 Apr 2025

Viewed by 878

Abstract

With the rise of digital video technologies and the proliferation of processing methods and storage systems, video-surveillance systems have received increasing attention over the last decade. However, the spread of cameras installed in public and private spaces makes it more difficult for human [...] Read more.

With the rise of digital video technologies and the proliferation of processing methods and storage systems, video-surveillance systems have received increasing attention over the last decade. However, the spread of cameras installed in public and private spaces makes it more difficult for human operators to perform real-time analysis of the large amounts of data produced by surveillance systems. Due to the advancement of artificial intelligence methods, many automatic video analysis tasks like violence detection have been studied from a research perspective, and are even beginning to be commercialized in industrial solutions. Nevertheless, most of these solutions adopt centralized architectures with costly servers utilized to process streaming videos sent from different cameras. Centralized architectures do not present the ideal solution due to the high cost, processing time issues, and network bandwidth overhead. In this paper, we propose a lightweight autonomous system for the detection and geolocation of violent acts. Our proposed system, named LAVID, is based on a depthwise separable convolution model (DSCNN) combined with a bidirectional long-short-term memory network (BiLSTM) and implemented on a lightweight smart camera. We provide in this study a lightweight video-surveillance system consisting of low-cost autonomous smart cameras that are capable of detecting and identifying harmful behavior and geolocate violent acts that occur over a covered area in real-time. Our proposed system, implemented using Raspberry Pi boards, represents a cost-effective solution with interoperability features making it an ideal IoT solution to be integrated with other smart city infrastructure. Furthermore, our approach, implemented using optimized deep learning models and evaluated on several public datasets, has shown good results in term of accuracy compared to state of the art methods while optimizing reducing power and computational requirements. Full article

(This article belongs to the Section Internet of Things (IoT) and Industrial IoT)

► Show Figures

Figure 1

21 pages, 5409 KiB

Open AccessArticle

Discriminative Deformable Part Model for Pedestrian Detection with Occlusion Handling

by Shahzad Siddiqi, Muhammad Faizan Shirazi and Yawar Rehman

AI 2025, 6(4), 70; https://doi.org/10.3390/ai6040070 - 3 Apr 2025

Viewed by 890

Abstract

Efficient pedestrian detection plays an important role in many practical daily life applications, such as autonomous cars, video surveillance, and intelligent driving assistance systems. The main goal of pedestrian detection systems, especially in vehicles, is to prevent accidents. By recognizing pedestrians in real [...] Read more.

Efficient pedestrian detection plays an important role in many practical daily life applications, such as autonomous cars, video surveillance, and intelligent driving assistance systems. The main goal of pedestrian detection systems, especially in vehicles, is to prevent accidents. By recognizing pedestrians in real time, these systems can alert drivers or even autonomously apply brakes, minimizing the possibility of collisions. However, occlusion is a major obstacle to pedestrian detection. Pedestrians are typically occluded by trees, street poles, cars, and other pedestrians. State-of-the-art detection methods are based on fully visible or little-occluded pedestrians; hence, their performance declines with increasing occlusion level. To meet this challenge, a pedestrian detector capable of handling occlusion is preferred. To increase the detection accuracy for occluded pedestrians, we propose a new method called the Discriminative Deformable Part Model (DDPM), which uses the concept of breaking human image into deformable parts via machine learning. In existing works, human image breaking into deformable parts has been performed by human intuition. In our novel approach, machine learning is used for deformable objects such as humans, combining the benefits and removing the drawbacks of the previous works. We also propose a new pedestrian dataset based on Eastern clothes to accommodate the detector’s evaluation under different intra-class variations of pedestrians. The proposed method achieves a higher detection accuracy on Pascal VOC and VisDrone Detection datasets when compared with other popular detection methods. Full article

► Show Figures

Figure 1

29 pages, 8325 KiB

Open AccessArticle

Insights into Mosquito Behavior: Employing Visual Technology to Analyze Flight Trajectories and Patterns

by Ning Zhao, Lifeng Wang and Ke Wang

Electronics 2025, 14(7), 1333; https://doi.org/10.3390/electronics14071333 - 27 Mar 2025

Cited by 1 | Viewed by 556

Abstract

Mosquitoes, as vectors of numerous serious infectious diseases, require rigorous behavior monitoring for effective disease prevention and control. Simultaneously, precise surveillance of flying insect behavior is also crucial in agricultural pest management. This study proposes a three-dimensional trajectory reconstruction method for mosquito behavior [...] Read more.

Mosquitoes, as vectors of numerous serious infectious diseases, require rigorous behavior monitoring for effective disease prevention and control. Simultaneously, precise surveillance of flying insect behavior is also crucial in agricultural pest management. This study proposes a three-dimensional trajectory reconstruction method for mosquito behavior analysis based on video data. By employing multiple synchronized cameras to capture mosquito flight images, using background subtraction to extract moving targets, applying Kalman filtering to predict target states, and integrating the Hungarian algorithm for multi-target data association, the system can automatically reconstruct three-dimensional mosquito flight trajectories. Experimental results demonstrate that this approach achieves high-precision flight path reconstruction, with a detection accuracy exceeding 95%, an F1-score of 0.93, and fast processing speeds that enables real-time tracking. The mean error of three-dimensional trajectory reconstruction is only 10 ± 4 mm, offering significant improvements in detection accuracy, tracking robustness, and real-time performance over traditional two-dimensional methods. These findings provide technological support for optimizing vector control strategies and enhancing precision pest control and can be further extended to ecological monitoring and agricultural pest management, thus bearing substantial significance for both public health and agriculture. Full article

► Show Figures

Figure 1

22 pages, 7677 KiB

Open AccessArticle

Universal Low-Frequency Noise Black-Box Attack on Visual Object Tracking

by Hanting Hou, Huan Bao, Kaimin Wei and Yongdong Wu

Symmetry 2025, 17(3), 462; https://doi.org/10.3390/sym17030462 - 19 Mar 2025

Viewed by 500

Abstract

Adversarial attacks on visual object tracking aim to degrade tracking accuracy by introducing imperceptible perturbations into video frames, exploiting vulnerabilities in neural networks. In real-world symmetrical double-blind engagements, both attackers and defenders operate with mutual unawareness of strategic parameters or initiation timing. Black-box [...] Read more.

Adversarial attacks on visual object tracking aim to degrade tracking accuracy by introducing imperceptible perturbations into video frames, exploiting vulnerabilities in neural networks. In real-world symmetrical double-blind engagements, both attackers and defenders operate with mutual unawareness of strategic parameters or initiation timing. Black-box attacks based on iterative optimization show excellent applicability in this scenario. However, existing state-of-the-art adversarial attacks based on iterative optimization suffer from high computational costs and limited effectiveness. To address these challenges, this paper proposes the Universal Low-frequency Noise black-box attack method (ULN), which generates perturbations through discrete cosine transform to disrupt structural features critical for tracking while mimicking compression artifacts. Extensive experimentation on four state-of-the-art trackers, including transformer-based models, demonstrates the method’s severe degradation effects. GRM’s expected average overlap drops by

97.77 %

on VOT2018, while SiamRPN++’s AUC and Precision on OTB100 decline by

76.55 %

and

78.9 %

, respectively. The attack achieves real-time performance with a computational cost reduction of over

50 %

compared to iterative methods, operating efficiently on embedded devices such as Raspberry Pi 4B. By maintaining a structural similarity index measure above

0.84

, the perturbations blend seamlessly with common compression artifacts, evading traditional spatial filtering defenses. Cross-platform experiments validate its consistent threat across diverse hardware environments, with attack success rates exceeding

40 %

even under resource constraints. These results underscore the dual capability of ULN as both a stealthy and practical attack vector, and emphasize the urgent need for robust defenses in safety-critical applications such as autonomous driving and aerial surveillance. The efficiency of the method, when combined with its ability to exploit low-frequency vulnerabilities across architectures, establishes a new benchmark for adversarial robustness in visual tracking systems. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

Search Results (151)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (151)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI