Applied Sciences

Research

11 pages, 3294 KiB

Open AccessArticle

Cover the Violence: A Novel Deep-Learning-Based Approach Towards Violence-Detection in Movies

by Samee Ullah Khan, Ijaz Ul Haq, Seungmin Rho, Sung Wook Baik and Mi Young Lee

Appl. Sci. 2019, 9(22), 4963; https://doi.org/10.3390/app9224963 - 18 Nov 2019

Cited by 105 | Viewed by 9295

Movies have become one of the major sources of entertainment in the current era, which are based on diverse ideas. Action movies have received the most attention in last few years, which contain violent scenes, because it is one of the undesirable features [...] Read more.

Movies have become one of the major sources of entertainment in the current era, which are based on diverse ideas. Action movies have received the most attention in last few years, which contain violent scenes, because it is one of the undesirable features for some individuals that is used to create charm and fantasy. However, these violent scenes have had a negative impact on kids, and they are not comfortable even for mature age people. The best way to stop under aged people from watching violent scenes in movies is to eliminate these scenes. In this paper, we proposed a violence detection scheme for movies that is comprised of three steps. First, the entire movie is segmented into shots, and then a representative frame from each shot is selected based on the level of saliency. Next, these selected frames are passed from a light-weight deep learning model, which is fine-tuned using a transfer learning approach to classify violence and non-violence shots in a movie. Finally, all the non-violence scenes are merged in a sequence to generate a violence-free movie that can be watched by children and as well violence paranoid people. The proposed model is evaluated on three violence benchmark datasets, and it is experimentally proved that the proposed scheme provides a fast and accurate detection of violent scenes in movies compared to the state-of-the-art methods. Full article

(This article belongs to the Special Issue Multimodal Deep Learning Methods for Video Analytics)

► Show Figures

Figure 1

20 pages, 8168 KiB

Open AccessArticle

A Study on Development of the Camera-Based Blind Spot Detection System Using the Deep Learning Methodology

by Donghwoon Kwon, Ritesh Malaiya, Geumchae Yoon, Jeong-Tak Ryu and Su-Young Pi

Appl. Sci. 2019, 9(14), 2941; https://doi.org/10.3390/app9142941 - 23 Jul 2019

Cited by 15 | Viewed by 7943

Abstract

One of the recent news headlines is that a pedestrian was killed by an autonomous vehicle because safety features in this vehicle did not detect an object on a road correctly. Due to this accident, some global automobile companies announced plans to postpone [...] Read more.

One of the recent news headlines is that a pedestrian was killed by an autonomous vehicle because safety features in this vehicle did not detect an object on a road correctly. Due to this accident, some global automobile companies announced plans to postpone development of an autonomous vehicle. Furthermore, there is no doubt about the importance of safety features for autonomous vehicles. For this reason, our research goal is the development of a very safe and lightweight camera-based blind spot detection system, which can be applied to future autonomous vehicles. The blind spot detection system was implemented in open source software. Approximately 2000 vehicle images and 9000 non-vehicle images were adopted for training the Fully Connected Network (FCN) model. Other data processing concepts such as the Histogram of Oriented Gradients (HOG), heat map, and thresholding were also employed. We achieved 99.43% training accuracy and 98.99% testing accuracy of the FCN model, respectively. Source codes with respect to all the methodologies were then deployed to an off-the-shelf embedded board for actual testing on a road. Actual testing was conducted with consideration of various factors, and we confirmed 93.75% average detection accuracy with three false positives. Full article

(This article belongs to the Special Issue Multimodal Deep Learning Methods for Video Analytics)

► Show Figures

Figure 1

19 pages, 8018 KiB

Open AccessArticle

A Video-Based Fire Detection Using Deep Learning Models

by Byoungjun Kim and Joonwhoan Lee

Appl. Sci. 2019, 9(14), 2862; https://doi.org/10.3390/app9142862 - 18 Jul 2019

Cited by 163 | Viewed by 16202

Abstract

Fire is an abnormal event which can cause significant damage to lives and property. In this paper, we propose a deep learning-based fire detection method using a video sequence, which imitates the human fire detection process. The proposed method uses Faster Region-based Convolutional [...] Read more.

Fire is an abnormal event which can cause significant damage to lives and property. In this paper, we propose a deep learning-based fire detection method using a video sequence, which imitates the human fire detection process. The proposed method uses Faster Region-based Convolutional Neural Network (R-CNN) to detect the suspected regions of fire (SRoFs) and of non-fire based on their spatial features. Then, the summarized features within the bounding boxes in successive frames are accumulated by Long Short-Term Memory (LSTM) to classify whether there is a fire or not in a short-term period. The decisions for successive short-term periods are then combined in the majority voting for the final decision in a long-term period. In addition, the areas of both flame and smoke are calculated and their temporal changes are reported to interpret the dynamic fire behavior with the final fire decision. Experiments show that the proposed long-term video-based method can successfully improve the fire detection accuracy compared with the still image-based or short-term video-based method by reducing both the false detections and the misdetections. Full article

(This article belongs to the Special Issue Multimodal Deep Learning Methods for Video Analytics)

► Show Figures

Figure 1

13 pages, 2578 KiB

Open AccessArticle

Classification of Marine Vessels with Multi-Feature Structure Fusion

by Erhu Zhang, Kelu Wang and Guangfeng Lin

Appl. Sci. 2019, 9(10), 2153; https://doi.org/10.3390/app9102153 - 27 May 2019

Cited by 23 | Viewed by 4372

Abstract

The classification of marine vessels is one of the important problems of maritime traffic. To fully exploit the complementarity between different features and to more effectively identify marine vessels, a novel feature structure fusion method based on spectral regression discriminant analysis (SF-SRDA) was [...] Read more.

The classification of marine vessels is one of the important problems of maritime traffic. To fully exploit the complementarity between different features and to more effectively identify marine vessels, a novel feature structure fusion method based on spectral regression discriminant analysis (SF-SRDA) was proposed. Firstly, we selected the different convolutional neural network features that better describe the characteristics of ships, and constructed the features based on graphs by the similarity metric. Then we weighed the concatenate multi-feature and fused their structures according to the linear relationship assumption. Finally, we constructed the optimization formula to solve the fusion features and structure by using spectral regression discriminant analyses. Experiments on the VAIS dataset show that the proposed SF-SRDA method can reduce the feature dimension from the original 102,400 dimensions to 5 dimensions, that the classification accuracy of visible images can reach 87.60%, and that that of the infrared image can reach 74.68% at daytime. The experimental results demonstrate that the proposed method can not only extract the optimal features from the original redundant feature space, but also greatly reduce the dimensions of the feature. Furthermore, the classification performance of SF-SRDA also gets a promising result. Full article

(This article belongs to the Special Issue Multimodal Deep Learning Methods for Video Analytics)

► Show Figures

Figure 1

13 pages, 1429 KiB

Open AccessArticle

Parallel Image Captioning Using 2D Masked Convolution

by Chanrith Poleak and Jangwoo Kwon

Appl. Sci. 2019, 9(9), 1871; https://doi.org/10.3390/app9091871 - 7 May 2019

Cited by 3 | Viewed by 3438

Abstract

Automatically generating a novel description of an image is a challenging and important problem that brings together advanced research in both computer vision and natural language processing. In recent years, image captioning has significantly improved its performance by using long short-term memory (LSTM) [...] Read more.

Automatically generating a novel description of an image is a challenging and important problem that brings together advanced research in both computer vision and natural language processing. In recent years, image captioning has significantly improved its performance by using long short-term memory (LSTM) as a decoder for the language model. However, despite this improvement, LSTM itself has its own shortcomings as a model because the structure is complicated and its nature is inherently sequential. This paper proposes a model using a simple convolutional network for both encoder and decoder functions of image captioning, instead of the current state-of-the-art approach. Our experiment with this model on a Microsoft Common Objects in Context (MSCOCO) captioning dataset yielded results that are competitive with the state-of-the-art image captioning model across different evaluation metrics, while having a much simpler model and enabling parallel graphics processing unit (GPU) computation during training, resulting in a faster training time. Full article

(This article belongs to the Special Issue Multimodal Deep Learning Methods for Video Analytics)

► Show Figures

Figure 1

16 pages, 1141 KiB

Open AccessArticle

Bilinear CNN Model for Fine-Grained Classification Based on Subcategory-Similarity Measurement

by Xinghua Dai, Shengrong Gong, Shan Zhong and Zongming Bao

Appl. Sci. 2019, 9(2), 301; https://doi.org/10.3390/app9020301 - 16 Jan 2019

Cited by 9 | Viewed by 5033

Abstract

One of the challenges in fine-grained classification is that subcategories with significant similarity are hard to be distinguished due to the equal treatment of all subcategories in existing algorithms. In order to solve this problem, a fine-grained image classification method by combining a [...] Read more.

One of the challenges in fine-grained classification is that subcategories with significant similarity are hard to be distinguished due to the equal treatment of all subcategories in existing algorithms. In order to solve this problem, a fine-grained image classification method by combining a bilinear convolutional neural network (B-CNN) and the measurement of subcategory similarities is proposed. Firstly, an improved weakly supervised localization method is designed to obtain the bounding box of the main object, which allows the model to eliminate the influence of background noise and obtain more accurate features. Then, sample features in the training set are computed by B-CNN so that the fuzzing similarity matrix for measuring interclass similarities can be obtained. To further improve classification accuracy, the loss function is designed by weighting triplet loss and softmax loss. Extensive experiments implemented on two benchmarks datasets, Stanford Cars-196 and Caltech-UCSD Birds-200-2011 (CUB-200-2011), show that the newly proposed method outperforms in accuracy several state-of-the-art weakly supervised classification models. Full article

(This article belongs to the Special Issue Multimodal Deep Learning Methods for Video Analytics)

► Show Figures

Figure 1

19 pages, 3670 KiB

Open AccessArticle

Deep Learning Based Computer Generated Face Identification Using Convolutional Neural Network

by L. Minh Dang, Syed Ibrahim Hassan, Suhyeon Im, Jaecheol Lee, Sujin Lee and Hyeonjoon Moon

Appl. Sci. 2018, 8(12), 2610; https://doi.org/10.3390/app8122610 - 13 Dec 2018

Cited by 51 | Viewed by 10154

Abstract

Generative adversarial networks (GANs) describe an emerging generative model which has made impressive progress in the last few years in generating photorealistic facial images. As the result, it has become more and more difficult to differentiate between computer-generated and real face images, even [...] Read more.

Generative adversarial networks (GANs) describe an emerging generative model which has made impressive progress in the last few years in generating photorealistic facial images. As the result, it has become more and more difficult to differentiate between computer-generated and real face images, even with the human’s eyes. If the generated images are used with the intent to mislead and deceive readers, it would probably cause severe ethical, moral, and legal issues. Moreover, it is challenging to collect a dataset for computer-generated face identification that is large enough for research purposes because the number of realistic computer-generated images is still limited and scattered on the internet. Thus, a development of a novel decision support system for analyzing and detecting computer-generated face images generated by the GAN network is crucial. In this paper, we propose a customized convolutional neural network, namely CGFace, which is specifically designed for the computer-generated face detection task by customizing the number of convolutional layers, so it performs well in detecting computer-generated face images. After that, an imbalanced framework (IF-CGFace) is created by altering CGFace’s layer structure to adjust to the imbalanced data issue by extracting features from CGFace layers and use them to train AdaBoost and eXtreme Gradient Boosting (XGB). Next, we explain the process of generating a large computer-generated dataset based on the state-of-the-art PCGAN and BEGAN model. Then, various experiments are carried out to show that the proposed model with augmented input yields the highest accuracy at 98%. Finally, we provided comparative results by applying the proposed CNN architecture on images generated by other GAN researches. Full article

(This article belongs to the Special Issue Multimodal Deep Learning Methods for Video Analytics)

► Show Figures

Graphical abstract

21 pages, 6127 KiB

Open AccessArticle

Temporal Modeling on Multi-Temporal-Scale Spatiotemporal Atoms for Action Recognition

by Guangle Yao, Tao Lei, Xianyuan Liu and Ping Jiang

Appl. Sci. 2018, 8(10), 1835; https://doi.org/10.3390/app8101835 - 6 Oct 2018

Cited by 2 | Viewed by 3272

Abstract

As an important branch of video analysis, human action recognition has attracted extensive research attention in computer vision and artificial intelligence communities. In this paper, we propose to model the temporal evolution of multi-temporal-scale atoms for action recognition. An action can be considered [...] Read more.

As an important branch of video analysis, human action recognition has attracted extensive research attention in computer vision and artificial intelligence communities. In this paper, we propose to model the temporal evolution of multi-temporal-scale atoms for action recognition. An action can be considered as a temporal sequence of action units. These action units which we referred to as action atoms, can capture the key semantic and characteristic spatiotemporal features of actions in different temporal scales. We first investigate Res3D, a powerful 3D CNN architecture and create the variants of Res3D for different temporal scale. In each temporal scale, we design some practices to transfer the knowledge learned from RGB to optical flow (OF) and build RGB and OF streams to extract deep spatiotemporal information using Res3D. Then we propose an unsupervised method to mine action atoms in the deep spatiotemporal space. Finally, we use long short-term memory (LSTM) to model the temporal evolution of atoms for action recognition. The experimental results show that our proposed multi-temporal-scale spatiotemporal atoms modeling method achieves recognition performance comparable to that of state-of-the-art methods on two challenging action recognition datasets: UCF101 and HMDB51. Full article

(This article belongs to the Special Issue Multimodal Deep Learning Methods for Video Analytics)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Multimodal Deep Learning Methods for Video Analytics

Share This Special Issue

Special Issue Editor

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (8 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI