MDPI - Publisher of Open Access Journals

17 pages, 9404 KB

Open AccessArticle

SimpleTrackV2: Rethinking the Timing Characteristics for Multi-Object Tracking

by Yan Ding, Yuchen Ling, Bozhi Zhang, Jiaxin Li, Lingxi Guo and Zhe Yang

Sensors 2024, 24(18), 6015; https://doi.org/10.3390/s24186015 - 17 Sep 2024

Cited by 1 | Viewed by 2020

Multi-object tracking tasks aim to assign unique trajectory codes to targets in video frames. Most detection-based tracking methods use Kalman filtering algorithms for trajectory prediction, directly utilizing associated target features for trajectory updates. However, this approach often fails, with camera jitter and transient [...] Read more.

Multi-object tracking tasks aim to assign unique trajectory codes to targets in video frames. Most detection-based tracking methods use Kalman filtering algorithms for trajectory prediction, directly utilizing associated target features for trajectory updates. However, this approach often fails, with camera jitter and transient target loss in real-world scenarios. This paper rethinks state prediction and fusion based on target temporal features to address these issues and proposes the SimpleTrackV2 algorithm, building on the previously designed SimpleTrack. Firstly, to address the poor prediction performance of linear motion models in complex scenes, we designed a target state prediction algorithm called LSTM-MP, based on long short-term memory (LSTM). This algorithm encodes the target’s historical motion information using LSTM and decodes it with a multilayer perceptron (MLP) to achieve target state prediction. Secondly, to mitigate the effect of occlusion on target state saliency, we designed a spatiotemporal attention-based target appearance feature fusion (TSA-FF) target state fusion algorithm based on the attention mechanism. TSA-FF calculates adaptive fusion coefficients to enhance target state fusion, thereby improving the accuracy of subsequent data association. To demonstrate the effectiveness of the proposed method, we compared SimpleTrackV2 with the baseline model SimpleTrack on the MOT17 dataset. We also conducted ablation experiments on TSA-FF and LSTM-MP for SimpleTrackV2, exploring the optimal number of fusion frames and the impact of different loss functions on model performance. The experimental results show that SimpleTrackV2 handles camera jitter and target occlusion better, achieving improvements of 1.6%, 3.2%, and 6.1% in MOTA, IDF1, and HOTA, respectively, compared to the SimpleTrack algorithm. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

27 pages, 3278 KB

Open AccessArticle

Deep Learning Approach for Human Action Recognition Using a Time Saliency Map Based on Motion Features Considering Camera Movement and Shot in Video Image Sequences

by Abdorreza Alavigharahbagh, Vahid Hajihashemi, José J. M. Machado and João Manuel R. S. Tavares

Information 2023, 14(11), 616; https://doi.org/10.3390/info14110616 - 15 Nov 2023

Cited by 7 | Viewed by 3618

Abstract

In this article, a hierarchical method for action recognition based on temporal and spatial features is proposed. In current HAR methods, camera movement, sensor movement, sudden scene changes, and scene movement can increase motion feature errors and decrease accuracy. Another important aspect to [...] Read more.

In this article, a hierarchical method for action recognition based on temporal and spatial features is proposed. In current HAR methods, camera movement, sensor movement, sudden scene changes, and scene movement can increase motion feature errors and decrease accuracy. Another important aspect to take into account in a HAR method is the required computational cost. The proposed method provides a preprocessing step to address these challenges. As a preprocessing step, the method uses optical flow to detect camera movements and shots in input video image sequences. In the temporal processing block, the optical flow technique is combined with the absolute value of frame differences to obtain a time saliency map. The detection of shots, cancellation of camera movement, and the building of a time saliency map minimise movement detection errors. The time saliency map is then passed to the spatial processing block to segment the moving persons and/or objects in the scene. Because the search region for spatial processing is limited based on the temporal processing results, the computations in the spatial domain are drastically reduced. In the spatial processing block, the scene foreground is extracted in three steps: silhouette extraction, active contour segmentation, and colour segmentation. Key points are selected at the borders of the segmented foreground. The last used features are the intensity and angle of the optical flow of detected key points. Using key point features for action detection reduces the computational cost of the classification step and the required training time. Finally, the features are submitted to a Recurrent Neural Network (RNN) to recognise the involved action. The proposed method was tested using four well-known action datasets: KTH, Weizmann, HMDB51, and UCF101 datasets and its efficiency was evaluated. Since the proposed approach segments salient objects based on motion, edges, and colour features, it can be added as a preprocessing step to most current HAR systems to improve performance. Full article

(This article belongs to the Special Issue Computer Vision for Security Applications)

► Show Figures

Figure 1

21 pages, 4424 KB

Open AccessArticle

A Salient Object Detection Method Based on Boundary Enhancement

by Falin Wen, Qinghui Wang, Ruirui Zou, Ying Wang, Fenglin Liu, Yang Chen, Linghao Yu, Shaoyi Du and Chengzhi Yuan

Sensors 2023, 23(16), 7077; https://doi.org/10.3390/s23167077 - 10 Aug 2023

Viewed by 2174

Abstract

Visual saliency refers to the human’s ability to quickly focus on important parts of their visual field, which is a crucial aspect of image processing, particularly in fields like medical imaging and robotics. Understanding and simulating this mechanism is crucial for solving complex [...] Read more.

Visual saliency refers to the human’s ability to quickly focus on important parts of their visual field, which is a crucial aspect of image processing, particularly in fields like medical imaging and robotics. Understanding and simulating this mechanism is crucial for solving complex visual problems. In this paper, we propose a salient object detection method based on boundary enhancement, which is applicable to both 2D and 3D sensors data. To address the problem of large-scale variation of salient objects, our method introduces a multi-level feature aggregation module that enhances the expressive ability of fixed-resolution features by utilizing adjacent features to complement each other. Additionally, we propose a multi-scale information extraction module to capture local contextual information at different scales for back-propagated level-by-level features, which allows for better measurement of the composition of the feature map after back-fusion. To tackle the low confidence issue of boundary pixels, we also introduce a boundary extraction module to extract the boundary information of salient regions. This information is then fused with salient target information to further refine the saliency prediction results. During the training process, our method uses a mixed loss function to constrain the model training from two levels: pixels and images. The experimental results demonstrate that our salient target detection method based on boundary enhancement shows good detection effects on targets of different scales, multi-targets, linear targets, and targets in complex scenes. We compare our method with the best method in four conventional datasets and achieve an average improvement of 6.2% on the mean absolute error (MAE) indicators. Overall, our approach shows promise for improving the accuracy and efficiency of salient object detection in a variety of settings, including those involving 2D/3D semantic analysis and reconstruction/inpainting of image/video/point cloud data. Full article

(This article belongs to the Special Issue Machine Learning Based 2D/3D Sensors Data Understanding and Analysis)

► Show Figures

Figure 1

22 pages, 2661 KB

Open AccessArticle

Human Action Representation Learning Using an Attention-Driven Residual 3DCNN Network

by Hayat Ullah and Arslan Munir

Algorithms 2023, 16(8), 369; https://doi.org/10.3390/a16080369 - 31 Jul 2023

Cited by 10 | Viewed by 2554

Abstract

The recognition of human activities using vision-based techniques has become a crucial research field in video analytics. Over the last decade, there have been numerous advancements in deep learning algorithms aimed at accurately detecting complex human actions in video streams. While these algorithms [...] Read more.

The recognition of human activities using vision-based techniques has become a crucial research field in video analytics. Over the last decade, there have been numerous advancements in deep learning algorithms aimed at accurately detecting complex human actions in video streams. While these algorithms have demonstrated impressive performance in activity recognition, they often exhibit a bias towards either model performance or computational efficiency. This biased trade-off between robustness and efficiency poses challenges when addressing complex human activity recognition problems. To address this issue, this paper presents a computationally efficient yet robust approach, exploiting saliency-aware spatial and temporal features for human action recognition in videos. To achieve effective representation of human actions, we propose an efficient approach called the dual-attentional Residual 3D Convolutional Neural Network (DA-R3DCNN). Our proposed method utilizes a unified channel-spatial attention mechanism, allowing it to efficiently extract significant human-centric features from video frames. By combining dual channel-spatial attention layers with residual 3D convolution layers, the network becomes more discerning in capturing spatial receptive fields containing objects within the feature maps. To assess the effectiveness and robustness of our proposed method, we have conducted extensive experiments on four well-established benchmark datasets for human action recognition. The quantitative results obtained validate the efficiency of our method, showcasing significant improvements in accuracy of up to 11% as compared to state-of-the-art human action recognition methods. Additionally, our evaluation of inference time reveals that the proposed method achieves up to a 74× improvement in frames per second (FPS) compared to existing approaches, thus showing the suitability and effectiveness of the proposed DA-R3DCNN for real-time human activity recognition. Full article

(This article belongs to the Special Issue Algorithms for Image Processing and Machine Vision)

► Show Figures

Figure 1

20 pages, 2466 KB

Open AccessArticle

SDebrisNet: A Spatial–Temporal Saliency Network for Space Debris Detection

by Jiang Tao, Yunfeng Cao and Meng Ding

Appl. Sci. 2023, 13(8), 4955; https://doi.org/10.3390/app13084955 - 14 Apr 2023

Cited by 19 | Viewed by 4505

Abstract

The rapidly growing number of space activities is generating numerous space debris, which greatly threatens the safety of space operations. Therefore, space-based space debris surveillance is crucial for the early avoidance of spacecraft emergencies. With the progress in computer vision technology, space debris [...] Read more.

The rapidly growing number of space activities is generating numerous space debris, which greatly threatens the safety of space operations. Therefore, space-based space debris surveillance is crucial for the early avoidance of spacecraft emergencies. With the progress in computer vision technology, space debris detection using optical sensors has become a promising solution. However, detecting space debris at far ranges is challenging due to its limited imaging size and unknown movement characteristics. In this paper, we propose a space debris saliency detection algorithm called SDebrisNet. The algorithm utilizes a convolutional neural network (CNN) to take into account both spatial and temporal data from sequential video images, which aim to assist in detecting small and moving space debris. Firstly, taking into account the limited resource of the space-based computational platform, a MobileNet-based space debris feature extraction structure was constructed to make the overall model more lightweight. In particular, an enhanced spatial feature module is introduced to strengthen the spatial details of small objects. Secondly, based on attention mechanisms, a constrained self-attention (CSA) module is applied to learn the spatiotemporal data from the sequential images. Finally, a space debris dataset was constructed for algorithm evaluation. The experimental results demonstrate that the method proposed in this paper is robust for detecting moving space debris with a low signal-to-noise ratio in the video. Compared to the NODAMI method, SDebrisNet shows improvements of 3.5% and 1.7% in terms of detection probability and the false alarm rate, respectively. Full article

(This article belongs to the Special Issue Vision-Based Autonomous Unmanned Systems: Challenges and Approaches)

► Show Figures

Figure 1

17 pages, 5104 KB

Open AccessArticle

Video Saliency Object Detection with Motion Quality Compensation

by Hengsen Wang, Chenglizhao Chen, Linfeng Li and Chong Peng

Electronics 2023, 12(7), 1618; https://doi.org/10.3390/electronics12071618 - 30 Mar 2023

Cited by 2 | Viewed by 2615

Abstract

Video saliency object detection is one of the classic research problems in computer vision, yet existing works rarely focus on the impact of input quality on model performance. As optical flow is a key input for video saliency detection models, its quality significantly [...] Read more.

Video saliency object detection is one of the classic research problems in computer vision, yet existing works rarely focus on the impact of input quality on model performance. As optical flow is a key input for video saliency detection models, its quality significantly affects model performance. Traditional optical flow models only calculate the optical flow between two consecutive video frames, ignoring the motion state of objects over a period of time, leading to low-quality optical flow and reduced performance of video saliency object detection models. Therefore, this paper proposes a new optical flow model that improves the quality of optical flow by expanding the flow perception range and uses high-quality optical flow to enhance the performance of video saliency object detection models. Experimental results on the datasets show that the proposed optical flow model can significantly improve optical flow quality, with the S-M values on the DAVSOD dataset increasing by about 39%, 49%, and 44% compared to optical flow models such as PWCNet, SpyNet, and LFNet. In addition, experiments that fine-tuning the benchmark model LIMS demonstrate that improving input quality can further improve model performance. Full article

(This article belongs to the Special Issue New Technologies in Digital Media Processing: When Computer Vision Meets Natural Language Processing)

► Show Figures

Figure 1

20 pages, 4998 KB

Open AccessArticle

Quality-Driven Dual-Branch Feature Integration Network for Video Salient Object Detection

by Xiaofei Zhou, Hanxiao Gao, Longxuan Yu, Defu Yang and Jiyong Zhang

Electronics 2023, 12(3), 680; https://doi.org/10.3390/electronics12030680 - 29 Jan 2023

Cited by 4 | Viewed by 2137

Abstract

Video salient object detection has attracted growing interest in recent years. However, some existing video saliency models often suffer from the inappropriate utilization of spatial and temporal cues and the insufficient aggregation of different level features, leading to remarkable performance degradation. Therefore, we [...] Read more.

Video salient object detection has attracted growing interest in recent years. However, some existing video saliency models often suffer from the inappropriate utilization of spatial and temporal cues and the insufficient aggregation of different level features, leading to remarkable performance degradation. Therefore, we propose a quality-driven dual-branch feature integration network majoring in the adaptive fusion of multi-modal cues and sufficient aggregation of multi-level spatiotemporal features. Firstly, we employ the quality-driven multi-modal feature fusion (QMFF) module to combine the spatial and temporal features. Particularly, the quality scores estimated from each level’s spatial and temporal cues are not only used to weigh the two modal features but also to adaptively integrate the coarse spatial and temporal saliency predictions into the guidance map, which further enhances the two modal features. Secondly, we deploy the dual-branch-based multi-level feature aggregation (DMFA) module to integrate multi-level spatiotemporal features, where the two branches including the progressive decoder branch and the direct concatenation branch sufficiently explore the cooperation of multi-level spatiotemporal features. In particular, in order to provide an adaptive fusion for the outputs of the two branches, we design the dual-branch fusion (DF) unit, where the channel weight of each output can be learned jointly from the two outputs. The experiments conducted on four video datasets clearly demonstrate the effectiveness and superiority of our model against the state-of-the-art video saliency models. Full article

► Show Figures

Figure 1

16 pages, 851 KB

Open AccessArticle

Human-like Attention-Driven Saliency Object Estimation in Dynamic Driving Scenes

by Lisheng Jin, Bingdong Ji and Baicang Guo

Machines 2022, 10(12), 1172; https://doi.org/10.3390/machines10121172 - 7 Dec 2022

Cited by 1 | Viewed by 2243

Abstract

Identifying a notable object and predicting its importance in front of a vehicle are crucial for automated systems’ risk assessment and decision making. However, current research has rarely exploited the driver’s attentional characteristics. In this study, we propose an attention-driven saliency object estimation [...] Read more.

Identifying a notable object and predicting its importance in front of a vehicle are crucial for automated systems’ risk assessment and decision making. However, current research has rarely exploited the driver’s attentional characteristics. In this study, we propose an attention-driven saliency object estimation (SOE) method that uses the attention intensity of the driver as a criterion for determining the salience and importance of objects. First, we design a driver attention prediction (DAP) network with a 2D-3D mixed convolution encoder–decoder structure. Second, we fuse the DAP network with faster R-CNN and YOLOv4 at the feature level and name them SOE-F and SOE-Y, respectively, using a shared-bottom multi-task learning (MTL) architecture. By transferring the spatial features onto the time axis, we are able to eliminate the drawback of the bottom features being extracted repeatedly and achieve a uniform image-video input in SOE-F and SOE-Y. Finally, the parameters in SOE-F and SOE-Y are classified into two categories, domain invariant and domain adaptive, and then the domain-adaptive parameters are trained and optimized. The experimental results on the DADA-2000 dataset demonstrate that the proposed method outperforms the state-of-the-art methods in several evaluation metrics and can more accurately predict driver attention. In addition, driven by a human-like attention mechanism, SOE-F and SOE-Y can identify and detect the salience, category, and location of objects, providing risk assessment and a decision basis for autonomous driving systems. Full article

(This article belongs to the Section Vehicle Engineering)

► Show Figures

Figure 1

18 pages, 3040 KB

Open AccessArticle

OTNet: A Small Object Detection Algorithm for Video Inspired by Avian Visual System

by Pingge Hu, Xingtong Wang, Xiaoteng Zhang, Yueyang Cang and Li Shi

Mathematics 2022, 10(21), 4125; https://doi.org/10.3390/math10214125 - 4 Nov 2022

Cited by 3 | Viewed by 3144

Abstract

Small object detection is one of the most challenging and non-negligible fields in computer vision. Inspired by the location–focus–identification process of the avian visual system, we present our location-focused small-object-detection algorithm for video or image sequence, OTNet. The model contains three modules corresponding [...] Read more.

Small object detection is one of the most challenging and non-negligible fields in computer vision. Inspired by the location–focus–identification process of the avian visual system, we present our location-focused small-object-detection algorithm for video or image sequence, OTNet. The model contains three modules corresponding to the forms of saliency, which drive the strongest response of OT to calculate the saliency map. The three modules are responsible for temporal–spatial feature extraction, spatial feature extraction and memory matching, respectively. We tested our model on the AU-AIR dataset and achieved up to 97.95% recall rate, 85.73% precision rate and 89.94 F₁ score with a lower computational complexity. Our model is also able to work as a plugin module for other object detection models to improve their performance in bird-view images, especially for detecting smaller objects. We managed to improve the detection performance by up to 40.01%. The results show that our model performs well on the common metrics on detection, while simulating visual information processing for object localization of the avian brain. Full article

(This article belongs to the Special Issue Mathematical Method and Application of Machine Learning)

► Show Figures

Figure 1

17 pages, 4937 KB

Open AccessArticle

Infrared Bird Target Detection Based on Temporal Variation Filtering and a Gaussian Heat-Map Perception Network

by Fan Zhao, Renjie Wei, Yu Chao, Sidi Shao and Cuining Jing

Appl. Sci. 2022, 12(11), 5679; https://doi.org/10.3390/app12115679 - 2 Jun 2022

Cited by 8 | Viewed by 2883

Abstract

Flying bird detection has recently attracted increasing attention in computer vision. However, compared to conventional object detection tasks, it is much more challenging to trap flying birds in infrared videos due to small target size, complex backgrounds, and dim shapes. In order to [...] Read more.

Flying bird detection has recently attracted increasing attention in computer vision. However, compared to conventional object detection tasks, it is much more challenging to trap flying birds in infrared videos due to small target size, complex backgrounds, and dim shapes. In order to solve the problem of poor detection performance caused by insufficient feature information of small and dim birds, this paper suggests a method of detecting birds in outdoor environments using image pre-processing and deep learning, called temporal Variation filtering (TVF) and Gaussian heatmap perception network (GHPNet), respectively. TVF separates the dynamic background from moving creatures. Using bird appearance features that are brightest at the center and gradually darker outwards, the size-adaptive Gaussian kernel is used to generate the ground truth of the region of interest (ROI). In order to fuse the features from different scales and to highlight the saliency of the target, the GHPNet network integrates VGG-16 and maximum-no-pooling filterer into a U-Net network. The comparative experiments demonstrate that the proposed method outperforms those that are state-of-the-art in detecting bird targets in real-world infrared images. Full article

► Show Figures

Figure 1

19 pages, 4031 KB

Open AccessArticle

Saliency Detection with Moving Camera via Background Model Completion

by Yu-Pei Zhang and Kwok-Leung Chan

Sensors 2021, 21(24), 8374; https://doi.org/10.3390/s21248374 - 15 Dec 2021

Cited by 2 | Viewed by 3023

Abstract

Detecting saliency in videos is a fundamental step in many computer vision systems. Saliency is the significant target(s) in the video. The object of interest is further analyzed for high-level applications. The segregation of saliency and the background can be made if they [...] Read more.

Detecting saliency in videos is a fundamental step in many computer vision systems. Saliency is the significant target(s) in the video. The object of interest is further analyzed for high-level applications. The segregation of saliency and the background can be made if they exhibit different visual cues. Therefore, saliency detection is often formulated as background subtraction. However, saliency detection is challenging. For instance, dynamic background can result in false positive errors. In another scenario, camouflage will result in false negative errors. With moving cameras, the captured scenes are even more complicated to handle. We propose a new framework, called saliency detection via background model completion (SD-BMC), that comprises a background modeler and a deep learning background/foreground segmentation network. The background modeler generates an initial clean background image from a short image sequence. Based on the idea of video completion, a good background frame can be synthesized with the co-existence of changing background and moving objects. We adopt the background/foreground segmenter, which was pre-trained with a specific video dataset. It can also detect saliency in unseen videos. The background modeler can adjust the background image dynamically when the background/foreground segmenter output deteriorates during processing a long video. To the best of our knowledge, our framework is the first one to adopt video completion for background modeling and saliency detection in videos captured by moving cameras. The F-measure results, obtained from the pan-tilt-zoom (PTZ) videos, show that our proposed framework outperforms some deep learning-based background subtraction models by 11% or more. With more challenging videos, our framework also outperforms many high-ranking background subtraction methods by more than 3%. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

18 pages, 9636 KB

Open AccessArticle

Video Desnowing and Deraining via Saliency and Dual Adaptive Spatiotemporal Filtering

by Yongji Li, Rui Wu, Zhenhong Jia, Jie Yang and Nikola Kasabov

Sensors 2021, 21(22), 7610; https://doi.org/10.3390/s21227610 - 16 Nov 2021

Cited by 5 | Viewed by 2812

Abstract

Outdoor vision sensing systems often struggle with poor weather conditions, such as snow and rain, which poses a great challenge to existing video desnowing and deraining methods. In this paper, we propose a novel video desnowing and deraining model that utilizes the salience [...] Read more.

Outdoor vision sensing systems often struggle with poor weather conditions, such as snow and rain, which poses a great challenge to existing video desnowing and deraining methods. In this paper, we propose a novel video desnowing and deraining model that utilizes the salience information of moving objects to address this problem. First, we remove the snow and rain from the video by low-rank tensor decomposition, which makes full use of the spatial location information and the correlation between the three channels of the color video. Second, because existing algorithms often regard sparse snowflakes and rain streaks as moving objects, this paper injects salience information into moving object detection, which reduces the false alarms and missed alarms of moving objects. At the same time, feature point matching is used to mine the redundant information of moving objects in continuous frames, and a dual adaptive minimum filtering algorithm in the spatiotemporal domain is proposed by us to remove snow and rain in front of moving objects. Both qualitative and quantitative experimental results show that the proposed algorithm is more competitive than other state-of-the-art snow and rain removal methods. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

15 pages, 2472 KB

Open AccessArticle

Compressed Video Quality Index Based on Saliency-Aware Artifact Detection

by Liqun Lin, Jing Yang, Zheng Wang, Liping Zhou, Weiling Chen and Yiwen Xu

Sensors 2021, 21(19), 6429; https://doi.org/10.3390/s21196429 - 26 Sep 2021

Cited by 7 | Viewed by 7350

Abstract

Video coding technology makes the required storage and transmission bandwidth of video services decrease by reducing the bitrate of the video stream. However, the compressed video signals may involve perceivable information loss, especially when the video is overcompressed. In such cases, the viewers [...] Read more.

Video coding technology makes the required storage and transmission bandwidth of video services decrease by reducing the bitrate of the video stream. However, the compressed video signals may involve perceivable information loss, especially when the video is overcompressed. In such cases, the viewers can observe visually annoying artifacts, namely, Perceivable Encoding Artifacts (PEAs), which degrade their perceived video quality. To monitor and measure these PEAs (including blurring, blocking, ringing and color bleeding), we propose an objective video quality metric named Saliency-Aware Artifact Measurement (SAAM) without any reference information. The SAAM metric first introduces video saliency detection to extract interested regions and further splits these regions into a finite number of image patches. For each image patch, the data-driven model is utilized to evaluate intensities of PEAs. Finally, these intensities are fused into an overall metric using Support Vector Regression (SVR). In experiment section, we compared the SAAM metric with other popular video quality metrics on four publicly available databases: LIVE, CSIQ, IVP and FERIT-RTRK. The results reveal the promising quality prediction performance of the SAAM metric, which is superior to most of the popular compressed video quality evaluation models. Full article

(This article belongs to the Collection Advances in Deep-Learning-Based Sensing, Imaging, and Video Processing)

► Show Figures

Figure 1

23 pages, 28729 KB

Open AccessArticle

A Biologically Motivated, Proto-Object-Based Audiovisual Saliency Model

by Sudarshan Ramenahalli

AI 2020, 1(4), 487-509; https://doi.org/10.3390/ai1040030 - 3 Nov 2020

Cited by 3 | Viewed by 4511

Abstract

The natural environment and our interaction with it are essentially multisensory, where we may deploy visual, tactile and/or auditory senses to perceive, learn and interact with our environment. Our objective in this study is to develop a scene analysis algorithm using multisensory information, [...] Read more.

The natural environment and our interaction with it are essentially multisensory, where we may deploy visual, tactile and/or auditory senses to perceive, learn and interact with our environment. Our objective in this study is to develop a scene analysis algorithm using multisensory information, specifically vision and audio. We develop a proto-object-based audiovisual saliency map (AVSM) for the analysis of dynamic natural scenes. A specialized audiovisual camera with 360

^{\circ}

field of view, capable of locating sound direction, is used to collect spatiotemporally aligned audiovisual data. We demonstrate that the performance of a proto-object-based audiovisual saliency map in detecting and localizing salient objects/events is in agreement with human judgment. In addition, the proto-object-based AVSM that we compute as a linear combination of visual and auditory feature conspicuity maps captures a higher number of valid salient events compared to unisensory saliency maps. Such an algorithm can be useful in surveillance, robotic navigation, video compression and related applications. Full article

(This article belongs to the Special Issue Frontiers in Artificial Intelligence)

► Show Figures

Figure 1

23 pages, 98116 KB

Open AccessArticle

Motion Saliency Detection for Surveillance Systems Using Streaming Dynamic Mode Decomposition

by Thien-Thu Ngo, VanDung Nguyen, Xuan-Qui Pham, Md-Alamgir Hossain and Eui-Nam Huh

Symmetry 2020, 12(9), 1397; https://doi.org/10.3390/sym12091397 - 21 Aug 2020

Cited by 6 | Viewed by 2852

Abstract

Intelligent surveillance systems enable secured visibility features in the smart city era. One of the major models for pre-processing in intelligent surveillance systems is known as saliency detection, which provides facilities for multiple tasks such as object detection, object segmentation, video coding, image [...] Read more.

Intelligent surveillance systems enable secured visibility features in the smart city era. One of the major models for pre-processing in intelligent surveillance systems is known as saliency detection, which provides facilities for multiple tasks such as object detection, object segmentation, video coding, image re-targeting, image-quality assessment, and image compression. Traditional models focus on improving detection accuracy at the cost of high complexity. However, these models are computationally expensive for real-world systems. To cope with this issue, we propose a fast-motion saliency method for surveillance systems under various background conditions. Our method is derived from streaming dynamic mode decomposition (s-DMD), which is a powerful tool in data science. First, DMD computes a set of modes in a streaming manner to derive spatial–temporal features, and a raw saliency map is generated from the sparse reconstruction process. Second, the final saliency map is refined using a difference-of-Gaussians filter in the frequency domain. The effectiveness of the proposed method is validated on a standard benchmark dataset. The experimental results show that the proposed method achieves competitive accuracy with lower complexity than state-of-the-art methods, which satisfies requirements in real-time applications. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

Search Results (21)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (21)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI