sensors-logo

Journal Browser

Journal Browser

Applications of Video Processing and Computer Vision Sensor

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Intelligent Sensors".

Deadline for manuscript submissions: closed (15 May 2021) | Viewed by 25360

Special Issue Editors


E-Mail Website
Guest Editor
School of Computing, Gachon University, Seongnam, Republic of Korea
Interests: image processing; computer vision
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
School of Computing, Gachon University, Seongnam, Republic of Korea
Interests: video streaming; edge intelligence; IoT system
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Electrical and Electronic Engineering, University of Cagliari, Piazza d’Armi, 09123 Cagliari, Italy
Interests: pattern recognition; image processing; intelligent video surveillance
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
School of Computing and Informatics, Universiti Teknologi Brunei (UTB), Brunei
Interests: edge computing; Internet of Things; green networking
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

With the recent proliferation of deep learning technology and AI democratization by edge computing devices, AI-powered computer vision sensors have gained great interest in academy and industry. Accordingly, we have witnessed the recent booming of various AI-based vision applications (e.g., video surveillance, and self-driving car) in various fields. The goal of this Special Issue is to invite researchers who tackle important and challenging issues in various AI-based vision applications, which may include video processing and computer vision sensors. In particular, this Special Issue aims at providing recent progress on AI-enabled computational photography and machine vision, smart camera systems and applications for intelligent edge. Topics of interest include, but are not limited to:

1. Deep Learning-Based Computational Photography:

  • Image/video manipulation (camera ISP, inpainting, relighting, super-resolution, deblurring, de-hazing, artifacts removal, etc);
  • Image-to-image translation;
  • Video-to-video translation;
  • Image/video restoration and enhancement on mobile devices;
  • Image fusion for single and multi-camera;
  • Hyperspectral imaging;
  • Depth estimation;

2. AI-Enabled Machine Vision:

  • Object detection and real-time tracking;
  • Anomaly detection;
  • Crowd monitoring and crowd behaviour analysis;
  • Face detection, recognition, and modeling;
  • Human activity recognition; Emotion recognition;

3. Smart Camera Systems and Applications for Intelligent Edge:

  • Resource management in edge devices for video surveillance;
  • Architecture and efficient operation procedures for edge devices to support video surveillance;
  • Intelligent edge system and protocol design for video surveillance;
  • Deep learning model optimization for intelligent edge;
  • Input filtering for object detection and tracking for Intelligent edge.

Dr. Yong Ju Jung
Prof. Joohyung Lee
Prof. Dr. Giorgio Fumera
Dr. S. H. Shah Newaz
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Computational photography
  • Image/video understanding and recognition
  • Deep learning algorithms
  • AI-enabled machine vision
  • Smart camera systems
  • Intelligent edge
  • Edge computing devices

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

19 pages, 1439 KiB  
Article
A Hierarchy-Based System for Recognizing Customer Activity in Retail Environments
by Jiahao Wen, Luis Guillen, Toru Abe and Takuo Suganuma
Sensors 2021, 21(14), 4712; https://doi.org/10.3390/s21144712 - 9 Jul 2021
Cited by 3 | Viewed by 2148
Abstract
Customer activity (CA) in retail environments, which ranges over various shopper situations in store spaces, provides valuable information for store management and marketing planning. Several systems have been proposed for customer activity recognition (CAR) from in-store camera videos, and most of them use [...] Read more.
Customer activity (CA) in retail environments, which ranges over various shopper situations in store spaces, provides valuable information for store management and marketing planning. Several systems have been proposed for customer activity recognition (CAR) from in-store camera videos, and most of them use machine learning based end-to-end (E2E) CAR models, due to their remarkable performance. Usually, such E2E models are trained for target conditions (i.e., particular CA types in specific store spaces). Accordingly, the existing systems are not malleable to fit the changes in target conditions because they require entire retraining of their specialized E2E models and concurrent use of additional E2E models for new target conditions. This paper proposes a novel CAR system based on a hierarchy that organizes CA types into different levels of abstraction from lowest to highest. The proposed system consists of multiple CAR models, each of which performs CAR tasks that belong to a certain level of the hierarchy on the lower level’s output, and thus conducts CAR for videos through the models level by level. Since these models are separated, this system can deal efficiently with the changes in target conditions by modifying some models individually. Experimental results show the effectiveness of the proposed system in adapting to different target conditions. Full article
(This article belongs to the Special Issue Applications of Video Processing and Computer Vision Sensor)
Show Figures

Figure 1

17 pages, 5959 KiB  
Article
AdaMM: Adaptive Object Movement and Motion Tracking in Hierarchical Edge Computing System
by Jingyeom Kim, Joohyung Lee and Taeyeon Kim
Sensors 2021, 21(12), 4089; https://doi.org/10.3390/s21124089 - 14 Jun 2021
Cited by 4 | Viewed by 2559
Abstract
This paper presents a novel adaptive object movement and motion tracking (AdaMM) framework in a hierarchical edge computing system for achieving GPU memory footprint reduction of deep learning (DL)-based video surveillance services. DL-based object movement and motion tracking requires a significant amount of [...] Read more.
This paper presents a novel adaptive object movement and motion tracking (AdaMM) framework in a hierarchical edge computing system for achieving GPU memory footprint reduction of deep learning (DL)-based video surveillance services. DL-based object movement and motion tracking requires a significant amount of resources, such as (1) GPU processing power for the inference phase and (2) GPU memory for model loading. Despite the absence of an object in the video, if the DL model is loaded, the GPU memory must be kept allocated for the loaded model. Moreover, in several cases, video surveillance tries to capture events that rarely occur (e.g., abnormal object behaviors); therefore, such standby GPU memory might be easily wasted. To alleviate this problem, the proposed AdaMM framework categorizes the tasks used for the object movement and motion tracking procedure in an increasing order of the required processing and memory resources as task (1) frame difference calculation, task (2) object detection, and task (3) object motion and movement tracking. The proposed framework aims to adaptively release the unnecessary standby object motion and movement tracking model to save GPU memory by utilizing light tasks, such as frame difference calculation and object detection in a hierarchical manner. Consequently, object movement and motion tracking are adaptively triggered if the object is detected within the specified threshold time; otherwise, the GPU memory for the model of task (3) can be released. Moreover, object detection is also adaptively performed if the frame difference over time is greater than the specified threshold. We implemented the proposed AdaMM framework using commercial edge devices by considering a three-tier system, such as the 1st edge node for both tasks (1) and (2), the 2nd edge node for task (3), and the cloud for sending a push alarm. A measurement-based experiment reveals that the proposed framework achieves a maximum GPU memory reduction of 76.8% compared to the baseline system, while requiring a 2680 ms delay for loading the model for object movement and motion tracking. Full article
(This article belongs to the Special Issue Applications of Video Processing and Computer Vision Sensor)
Show Figures

Figure 1

20 pages, 3165 KiB  
Article
Deep-Framework: A Distributed, Scalable, and Edge-Oriented Framework for Real-Time Analysis of Video Streams
by Alessandro Sassu, Jose Francisco Saenz-Cogollo and Maurizio Agelli
Sensors 2021, 21(12), 4045; https://doi.org/10.3390/s21124045 - 11 Jun 2021
Cited by 3 | Viewed by 3857
Abstract
Edge computing is the best approach for meeting the exponential demand and the real-time requirements of many video analytics applications. Since most of the recent advances regarding the extraction of information from images and video rely on computation heavy deep learning algorithms, there [...] Read more.
Edge computing is the best approach for meeting the exponential demand and the real-time requirements of many video analytics applications. Since most of the recent advances regarding the extraction of information from images and video rely on computation heavy deep learning algorithms, there is a growing need for solutions that allow the deployment and use of new models on scalable and flexible edge architectures. In this work, we present Deep-Framework, a novel open source framework for developing edge-oriented real-time video analytics applications based on deep learning. Deep-Framework has a scalable multi-stream architecture based on Docker and abstracts away from the user the complexity of cluster configuration, orchestration of services, and GPU resources allocation. It provides Python interfaces for integrating deep learning models developed with the most popular frameworks and also provides high-level APIs based on standard HTTP and WebRTC interfaces for consuming the extracted video data on clients running on browsers or any other web-based platform. Full article
(This article belongs to the Special Issue Applications of Video Processing and Computer Vision Sensor)
Show Figures

Graphical abstract

21 pages, 5771 KiB  
Article
Overcome the Brightness and Jitter Noises in Video Inter-Frame Tampering Detection
by Han Pu, Tianqiang Huang, Bin Weng, Feng Ye and Chenbin Zhao
Sensors 2021, 21(12), 3953; https://doi.org/10.3390/s21123953 - 8 Jun 2021
Cited by 5 | Viewed by 2181
Abstract
Digital video forensics plays a vital role in judicial forensics, media reports, e-commerce, finance, and public security. Although many methods have been developed, there is currently no efficient solution to real-life videos with illumination noises and jitter noises. To solve this issue, we [...] Read more.
Digital video forensics plays a vital role in judicial forensics, media reports, e-commerce, finance, and public security. Although many methods have been developed, there is currently no efficient solution to real-life videos with illumination noises and jitter noises. To solve this issue, we propose a detection method that adapts to brightness and jitter for video inter-frame forgery. For videos with severe brightness changes, we relax the brightness constancy constraint and adopt intensity normalization to propose a new optical flow algorithm. For videos with large jitter noises, we introduce motion entropy to detect the jitter and extract the stable feature of texture changes fraction for double-checking. Experimental results show that, compared with previous algorithms, the proposed method is more accurate and robust for videos with significant brightness variance or videos with heavy jitter on public benchmark datasets. Full article
(This article belongs to the Special Issue Applications of Video Processing and Computer Vision Sensor)
Show Figures

Figure 1

19 pages, 3857 KiB  
Article
Tracklet Pair Proposal and Context Reasoning for Video Scene Graph Generation
by Gayoung Jung, Jonghun Lee and Incheol Kim
Sensors 2021, 21(9), 3164; https://doi.org/10.3390/s21093164 - 2 May 2021
Cited by 6 | Viewed by 2478
Abstract
Video scene graph generation (ViDSGG), the creation of video scene graphs that helps in deeper and better visual scene understanding, is a challenging task. Segment-based and sliding-window based methods have been proposed to perform this task. However, they all have certain limitations. This [...] Read more.
Video scene graph generation (ViDSGG), the creation of video scene graphs that helps in deeper and better visual scene understanding, is a challenging task. Segment-based and sliding-window based methods have been proposed to perform this task. However, they all have certain limitations. This study proposes a novel deep neural network model called VSGG-Net for video scene graph generation. The model uses a sliding window scheme to detect object tracklets of various lengths throughout the entire video. In particular, the proposed model presents a new tracklet pair proposal method that evaluates the relatedness of object tracklet pairs using a pretrained neural network and statistical information. To effectively utilize the spatio-temporal context, low-level visual context reasoning is performed using a spatio-temporal context graph and a graph neural network as well as high-level semantic context reasoning. To improve the detection performance for sparse relationships, the proposed model applies a class weighting technique that adjusts the weight of sparse relationships to a higher level. This study demonstrates the positive effect and high performance of the proposed model through experiments using the benchmark dataset VidOR and VidVRD. Full article
(This article belongs to the Special Issue Applications of Video Processing and Computer Vision Sensor)
Show Figures

Figure 1

15 pages, 1826 KiB  
Article
A Supervised Video Hashing Method Based on a Deep 3D Convolutional Neural Network for Large-Scale Video Retrieval
by Hanqing Chen, Chunyan Hu, Feifei Lee, Chaowei Lin, Wei Yao, Lu Chen and Qiu Chen
Sensors 2021, 21(9), 3094; https://doi.org/10.3390/s21093094 - 29 Apr 2021
Cited by 17 | Viewed by 3138
Abstract
Recently, with the popularization of camera tools such as mobile phones and the rise of various short video platforms, a lot of videos are being uploaded to the Internet at all times, for which a video retrieval system with fast retrieval speed and [...] Read more.
Recently, with the popularization of camera tools such as mobile phones and the rise of various short video platforms, a lot of videos are being uploaded to the Internet at all times, for which a video retrieval system with fast retrieval speed and high precision is very necessary. Therefore, content-based video retrieval (CBVR) has aroused the interest of many researchers. A typical CBVR system mainly contains the following two essential parts: video feature extraction and similarity comparison. Feature extraction of video is very challenging, previous video retrieval methods are mostly based on extracting features from single video frames, while resulting the loss of temporal information in the videos. Hashing methods are extensively used in multimedia information retrieval due to its retrieval efficiency, but most of them are currently only applied to image retrieval. In order to solve these problems in video retrieval, we build an end-to-end framework called deep supervised video hashing (DSVH), which employs a 3D convolutional neural network (CNN) to obtain spatial-temporal features of videos, then train a set of hash functions by supervised hashing to transfer the video features into binary space and get the compact binary codes of videos. Finally, we use triplet loss for network training. We conduct a lot of experiments on three public video datasets UCF-101, JHMDB and HMDB-51, and the results show that the proposed method has advantages over many state-of-the-art video retrieval methods. Compared with the DVH method, the mAP value of UCF-101 dataset is improved by 9.3%, and the minimum improvement on JHMDB dataset is also increased by 0.3%. At the same time, we also demonstrate the stability of the algorithm in the HMDB-51 dataset. Full article
(This article belongs to the Special Issue Applications of Video Processing and Computer Vision Sensor)
Show Figures

Figure 1

20 pages, 1700 KiB  
Article
Smart Video Surveillance System Based on Edge Computing
by Antonio Carlos Cob-Parro, Cristina Losada-Gutiérrez, Marta Marrón-Romera, Alfredo Gardel-Vicente and Ignacio Bravo-Muñoz
Sensors 2021, 21(9), 2958; https://doi.org/10.3390/s21092958 - 23 Apr 2021
Cited by 25 | Viewed by 5979
Abstract
New processing methods based on artificial intelligence (AI) and deep learning are replacing traditional computer vision algorithms. The more advanced systems can process huge amounts of data in large computing facilities. In contrast, this paper presents a smart video surveillance system executing AI [...] Read more.
New processing methods based on artificial intelligence (AI) and deep learning are replacing traditional computer vision algorithms. The more advanced systems can process huge amounts of data in large computing facilities. In contrast, this paper presents a smart video surveillance system executing AI algorithms in low power consumption embedded devices. The computer vision algorithm, typical for surveillance applications, aims to detect, count and track people’s movements in the area. This application requires a distributed smart camera system. The proposed AI application allows detecting people in the surveillance area using a MobileNet-SSD architecture. In addition, using a robust Kalman filter bank, the algorithm can keep track of people in the video also providing people counting information. The detection results are excellent considering the constraints imposed on the process. The selected architecture for the edge node is based on a UpSquared2 device that includes a vision processor unit (VPU) capable of accelerating the AI CNN inference. The results section provides information about the image processing time when multiple video cameras are connected to the same edge node, people detection precision and recall curves, and the energy consumption of the system. The discussion of results shows the usefulness of deploying this smart camera node throughout a distributed surveillance system. Full article
(This article belongs to the Special Issue Applications of Video Processing and Computer Vision Sensor)
Show Figures

Figure 1

Back to TopTop