Topic Editors

Collaborative Robotics and Intelligent Systems (CoRIS) Institute, Oregon State University, Corvallis, OR 97331, USA
Intelligent Media Lab, Sejong University, Seoul 05006, Republic of Korea
Department of Computer Science and Engineering, Pohang University of Science and Technology, San 31, Hyoja-Dong, Pohang 790-784, Republic of Korea

Lightweight Deep Neural Networks for Video Analytics

Abstract submission deadline
closed (31 October 2023)
Manuscript submission deadline
closed (31 December 2023)
Viewed by
23214

Topic Information

Dear Colleagues, 

The tremendous amount of video data generated by camera networks installed in industries, offices, and public places meets the requirements of Big Data. For instance, a simple multi-view network with two cameras, acquiring video from two different views with 25 frames per second (fps), generates 180,000 frames (90,000 for each camera) in an hour. Surveillance networks acquire video data for 24 hours from multi-view cameras, making it challenging to extract useful information from these Big Data. Indeed, significant effort is required when searching for salient information in such huge-sized 60 × 60 video data. One critical application in video analytics is detecting anoma-lous events such as traffic accidents, crimes, or illegal activities in surveillance videos. Generally, anomalous events occur rarely compared to normal activities. Therefore, developing intelligent lightweight computer vision algo-rithms for automatic video analytics is a pressing need to alleviate the waste of labor and time. Such automatic methods will not only be helpful in monitoring but also reduce the human effort required to maintain manual ob-servation on a 24-hour basis. Thus, automatic techniques are required to extract pertinent information present in videos without involving any human effort, such as video summarization, human action/activity recognition, video object tracking, violence detection, and crowd counting. These technologies can be integrated for efficient and in-stant analysis of surveillance, entertainment, and medical video Big Data. This Topic invites original works revealing the latest research advancements in conventional machine learning and deep learning methods that deeply analyze patterns from video streams for various real-time applica-tions. The topics of interest include but are not limited to the following:

  • Human action and activity recognition;
  • Efficient spatiotemporal feature representation;
  • Lightweight 2D and 3D convolutional neural networks for video analytics;
  • RGB/depth/skeleton-sensor-based action recognition;
  • Video summarization and scene segmentation;
  • Violence/anomaly detection;
  • Embedded vision for video analytics;
  • Sensors/multi-sensor integrations for activity recognition;
  • Activity localization, detection, and context analysis;
  • IoT-assisted computationally intelligent methods for activity recognition;
  • Cloud/Fog computing for action and activity recognition;
  • Benchmark datasets for action/activity/anomaly recognition.

Dr. Amin Ullah
Dr. Tanveer Hussain
Dr. Mohammad Farhad Bulbul
Topic Editors

Keywords

  • artificial intelligence
  • machine learning
  • computer vision 
  • deep learning
  • video analytics
  • feature extraction
  • feature selection
  • action recognition
  • activity recognition
  • video object detection
  • video object tracking
  • anomaly detection
  • embedded vision

Participating Journals

Journal Name Impact Factor CiteScore Launched Year First Decision (median) APC
AI
ai
3.1 7.2 2020 17.6 Days CHF 1600
Algorithms
algorithms
1.8 4.1 2008 15 Days CHF 1600
Information
information
2.4 6.9 2010 14.9 Days CHF 1600
Multimodal Technologies and Interaction
mti
2.4 4.9 2017 14.5 Days CHF 1600
Sensors
sensors
3.4 7.3 2001 16.8 Days CHF 2600
Mathematics
mathematics
2.3 4.0 2013 17.1 Days CHF 2600

Preprints.org is a multidiscipline platform providing preprint service that is dedicated to sharing your research from the start and empowering your research journey.

MDPI Topics is cooperating with Preprints.org and has built a direct connection between MDPI journals and Preprints.org. Authors are encouraged to enjoy the benefits by posting a preprint at Preprints.org prior to publication:

  1. Immediately share your ideas ahead of publication and establish your research priority;
  2. Protect your idea from being stolen with this time-stamped preprint article;
  3. Enhance the exposure and impact of your research;
  4. Receive feedback from your peers in advance;
  5. Have it indexed in Web of Science (Preprint Citation Index), Google Scholar, Crossref, SHARE, PrePubMed, Scilit and Europe PMC.

Published Papers (6 papers)

Order results
Result details
Journals
Select all
Export citation of selected articles as:
18 pages, 1063 KiB  
Article
Vision-Based Concrete-Crack Detection on Railway Sleepers Using Dense U-Net Model
by Md. Al-Masrur Khan, Seong-Hoon Kee and Abdullah-Al Nahid
Algorithms 2023, 16(12), 568; https://doi.org/10.3390/a16120568 - 15 Dec 2023
Cited by 5 | Viewed by 1996
Abstract
Crack inspection in railway sleepers is crucial for ensuring rail safety and avoiding deadly accidents. Traditional methods for detecting cracks on railway sleepers are very time-consuming and lack efficiency. Therefore, nowadays, researchers are paying attention to vision-based algorithms, especially Deep Learning algorithms. In [...] Read more.
Crack inspection in railway sleepers is crucial for ensuring rail safety and avoiding deadly accidents. Traditional methods for detecting cracks on railway sleepers are very time-consuming and lack efficiency. Therefore, nowadays, researchers are paying attention to vision-based algorithms, especially Deep Learning algorithms. In this work, we adopted the U-net for the first time for detecting cracks on a railway sleeper and proposed a modified U-net architecture named Dense U-net for segmenting the cracks. In the Dense U-net structure, we established several short connections between the encoder and decoder blocks, which enabled the architecture to obtain better pixel information flow. Thus, the model extracted the necessary information in more detail to predict the cracks. We collected images from railway sleepers, processed them in a dataset, and finally trained the model with the images. The model achieved an overall F1-score, precision, Recall, and IoU of 86.5%, 88.53%, 84.63%, and 76.31%, respectively. We compared our suggested model with the original U-net, and the results demonstrate that our model performed better than the U-net in both quantitative and qualitative results. Moreover, we considered the necessity of crack severity analysis and measured a few parameters of the cracks. The engineers must know the severity of the cracks to have an idea about the most severe locations and take the necessary steps to repair the badly affected sleepers. Full article
(This article belongs to the Topic Lightweight Deep Neural Networks for Video Analytics)
Show Figures

Figure 1

16 pages, 5162 KiB  
Article
Transfer Learning-Based YOLOv3 Model for Road Dense Object Detection
by Chunhua Zhu, Jiarui Liang and Fei Zhou
Information 2023, 14(10), 560; https://doi.org/10.3390/info14100560 - 12 Oct 2023
Cited by 2 | Viewed by 1706
Abstract
Stemming from the overlap of objects and undertraining due to few samples, road dense object detection is confronted with poor object identification performance and the inability to recognize edge objects. Based on this, one transfer learning-based YOLOv3 approach for identifying dense objects on [...] Read more.
Stemming from the overlap of objects and undertraining due to few samples, road dense object detection is confronted with poor object identification performance and the inability to recognize edge objects. Based on this, one transfer learning-based YOLOv3 approach for identifying dense objects on the road has been proposed. Firstly, the Darknet-53 network structure is adopted to obtain a pre-trained YOLOv3 model. Then, the transfer training is introduced as the output layer for the special dataset of 2000 images containing vehicles. In the proposed model, one random function is adapted to initialize and optimize the weights of the transfer training model, which is separately designed from the pre-trained YOLOv3. The object detection classifier replaces the fully connected layer, which further improves the detection effect. The reduced size of the network model can further reduce the training and detection time. As a result, it can be better applied to actual scenarios. The experimental results demonstrate that the object detection accuracy of the presented approach is 87.75% for the Pascal VOC 2007 dataset, which is superior to the traditional YOLOv3 and the YOLOv5 by 4% and 0.59%, respectively. Additionally, the test was carried out using UA-DETRAC, a public road vehicle detection dataset. The object detection accuracy of the presented approach reaches 79.23% in detecting images, which is 4.13% better than the traditional YOLOv3, and compared with the existing relatively new object detection algorithm YOLOv5, the detection accuracy is 1.36% better. Moreover, the detection speed of the proposed YOLOv3 method reaches 31.2 Fps/s in detecting images, which is 7.6 Fps/s faster than the traditional YOLOv3, and compared with the existing new object detection algorithm YOLOv7, the speed is 1.5 Fps/s faster. The proposed YOLOv3 performs 67.36 Bn of floating point operations per second in detecting video, which is obviously less than the traditional YOLOv3 and the newer object detection algorithm YOLOv5. Full article
(This article belongs to the Topic Lightweight Deep Neural Networks for Video Analytics)
Show Figures

Figure 1

18 pages, 42289 KiB  
Article
Automatic Sequential Stitching of High-Resolution Panorama for Android Devices Using Precapture Feature Detection and the Orientation Sensor
by Yaseen, Oh-Jin Kwon, Jinhee Lee, Faiz Ullah, Sonain Jamil and Jae Soo Kim
Sensors 2023, 23(2), 879; https://doi.org/10.3390/s23020879 - 12 Jan 2023
Cited by 2 | Viewed by 2891
Abstract
Image processing on smartphones, which are resource-limited devices, is challenging. Panorama generation on modern mobile phones is a requirement of most mobile phone users. This paper presents an automatic sequential image stitching algorithm with high-resolution panorama generation and addresses the issue of stitching [...] Read more.
Image processing on smartphones, which are resource-limited devices, is challenging. Panorama generation on modern mobile phones is a requirement of most mobile phone users. This paper presents an automatic sequential image stitching algorithm with high-resolution panorama generation and addresses the issue of stitching failure on smartphone devices. A robust method is used to automatically control the events involved in panorama generation from image capture to image stitching on Android operating systems. The image frames are taken in a firm spatial interval using the orientation sensor included in smartphone devices. The features-based stitching algorithm is used for panorama generation, with a novel modification to address the issue of stitching failure (inability to find local features causes this issue) when performing sequential stitching over mobile devices. We also address the issue of distortion in sequential stitching. Ultimately, in this study, we built an Android application that can construct a high-resolution panorama sequentially with automatic frame capture based on an orientation sensor and device rotation. We present a novel research methodology (called “Sense-Panorama”) for panorama construction along with a development guide for smartphone developers. Based on our experiments, performed by Samsung Galaxy SM-N960N, which carries system on chip (SoC) as Qualcomm Snapdragon 845 and a CPU of 4 × 2.8 GHz Kyro 385, our method can generate a high-resolution panorama. Compared to the existing methods, the results show improvement in visual quality for both subjective and objective evaluation. Full article
(This article belongs to the Topic Lightweight Deep Neural Networks for Video Analytics)
Show Figures

Figure 1

21 pages, 2333 KiB  
Article
A Video Summarization Model Based on Deep Reinforcement Learning with Long-Term Dependency
by Xu Wang, Yujie Li, Haoyu Wang, Longzhao Huang and Shuxue Ding
Sensors 2022, 22(19), 7689; https://doi.org/10.3390/s22197689 - 10 Oct 2022
Cited by 5 | Viewed by 4058
Abstract
Deep summarization models have succeeded in the video summarization field based on the development of gated recursive unit (GRU) and long and short-term memory (LSTM) technology. However, for some long videos, GRU and LSTM cannot effectively capture long-term dependencies. This paper proposes a [...] Read more.
Deep summarization models have succeeded in the video summarization field based on the development of gated recursive unit (GRU) and long and short-term memory (LSTM) technology. However, for some long videos, GRU and LSTM cannot effectively capture long-term dependencies. This paper proposes a deep summarization network with auxiliary summarization losses to address this problem. We introduce an unsupervised auxiliary summarization loss module with LSTM and a swish activation function to capture the long-term dependencies for video summarization, which can be easily integrated with various networks. The proposed model is an unsupervised framework for deep reinforcement learning that does not depend on any labels or user interactions. Additionally, we implement a reward function (R(S)) that jointly considers the consistency, diversity, and representativeness of generated summaries. Furthermore, the proposed model is lightweight and can be successfully deployed on mobile devices and enhance the experience of mobile users and reduce pressure on server operations. We conducted experiments on two benchmark datasets and the results demonstrate that our proposed unsupervised approach can obtain better summaries than existing video summarization methods. Furthermore, the proposed algorithm can generate higher F scores with a nearly 6.3% increase on the SumMe dataset and a 2.2% increase on the TVSum dataset compared to the DR-DSN model. Full article
(This article belongs to the Topic Lightweight Deep Neural Networks for Video Analytics)
Show Figures

Figure 1

22 pages, 5677 KiB  
Article
A Deep Sequence Learning Framework for Action Recognition in Small-Scale Depth Video Dataset
by Mohammad Farhad Bulbul, Amin Ullah, Hazrat Ali and Daijin Kim
Sensors 2022, 22(18), 6841; https://doi.org/10.3390/s22186841 - 9 Sep 2022
Viewed by 2905
Abstract
Depth video sequence-based deep models for recognizing human actions are scarce compared to RGB and skeleton video sequences-based models. This scarcity limits the research advancements based on depth data, as training deep models with small-scale data is challenging. In this work, we propose [...] Read more.
Depth video sequence-based deep models for recognizing human actions are scarce compared to RGB and skeleton video sequences-based models. This scarcity limits the research advancements based on depth data, as training deep models with small-scale data is challenging. In this work, we propose a sequence classification deep model using depth video data for scenarios when the video data are limited. Unlike summarizing the frame contents of each frame into a single class, our method can directly classify a depth video, i.e., a sequence of depth frames. Firstly, the proposed system transforms an input depth video into three sequences of multi-view temporal motion frames. Together with the three temporal motion sequences, the input depth frame sequence offers a four-stream representation of the input depth action video. Next, the DenseNet121 architecture is employed along with ImageNet pre-trained weights to extract the discriminating frame-level action features of depth and temporal motion frames. The extracted four sets of feature vectors about frames of four streams are fed into four bi-directional (BLSTM) networks. The temporal features are further analyzed through multi-head self-attention (MHSA) to capture multi-view sequence correlations. Finally, the concatenated genre of their outputs is processed through dense layers to classify the input depth video. The experimental results on two small-scale benchmark depth datasets, MSRAction3D and DHA, demonstrate that the proposed framework is efficacious even for insufficient training samples and superior to the existing depth data-based action recognition methods. Full article
(This article belongs to the Topic Lightweight Deep Neural Networks for Video Analytics)
Show Figures

Figure 1

20 pages, 14627 KiB  
Article
LLDNet: A Lightweight Lane Detection Approach for Autonomous Cars Using Deep Learning
by Md. Al-Masrur Khan, Md Foysal Haque, Kazi Rakib Hasan, Samah H. Alajmani, Mohammed Baz, Mehedi Masud and Abdullah-Al Nahid
Sensors 2022, 22(15), 5595; https://doi.org/10.3390/s22155595 - 26 Jul 2022
Cited by 16 | Viewed by 6545
Abstract
Lane detection plays a vital role in making the idea of the autonomous car a reality. Traditional lane detection methods need extensive hand-crafted features and post-processing techniques, which make the models specific feature-oriented, and susceptible to instability for the variations on road scenes. [...] Read more.
Lane detection plays a vital role in making the idea of the autonomous car a reality. Traditional lane detection methods need extensive hand-crafted features and post-processing techniques, which make the models specific feature-oriented, and susceptible to instability for the variations on road scenes. In recent years, Deep Learning (DL) models, especially Convolutional Neural Network (CNN) models have been proposed and utilized to perform pixel-level lane segmentation. However, most of the methods focus on achieving high accuracy while considering structured roads and good weather conditions and do not put emphasis on testing their models on defected roads, especially ones with blurry lane lines, no lane lines, and cracked pavements, which are predominant in the real world. Moreover, many of these CNN-based models have complex structures and require high-end systems to operate, which makes them quite unsuitable for being implemented in embedded devices. Considering these shortcomings, in this paper, we have introduced a novel CNN model named LLDNet based on an encoder–decoder architecture that is lightweight and has been tested in adverse weather as well as road conditions. A channel attention and spatial attention module are integrated into the designed architecture to refine the feature maps for achieving outstanding results with a lower number of parameters. We have used a hybrid dataset to train our model, which was created by combining two separate datasets, and have compared the model with a few state-of-the-art encoder–decoder architectures. Numerical results on the utilized dataset show that our model surpasses the compared methods in terms of dice coefficient, IoU, and the size of the models. Moreover, we carried out extensive experiments on the videos of different roads in Bangladesh. The visualization results exhibit that our model can detect the lanes accurately in both structured and defected roads and adverse weather conditions. Experimental results elicit that our designed method is capable of detecting lanes accurately and is ready for practical implementation. Full article
(This article belongs to the Topic Lightweight Deep Neural Networks for Video Analytics)
Show Figures

Figure 1

Back to TopTop