Special Issue "Computer Vision, Deep Learning and Machine Learning with Applications"

A special issue of Future Internet (ISSN 1999-5903). This special issue belongs to the section "Big Data and Augmented Intelligence".

Deadline for manuscript submissions: 30 November 2021.

Special Issue Editors

Prof. Dr. Remus Brad
E-Mail Website
Guest Editor
Computer Science and Electrical Engineering Department, Lucian Blaga University of Sibiu, Sibiu, Romania
Interests: computer networks; smart city; image processing; pattern recognition; computer vision
Dr. Arpad Gellert
E-Mail Website
Co-Guest Editor
Computer Science and Electrical Engineering Department, Lucian Blaga University of Sibiu, Emil Cioran 4, 550025 Sibiu, Romania
Interests: image processing; smart buildings; smart factories; web mining; computer architecture

Special Issue Information

Dear Colleagues,

In recent years, interest toward automation, both in industrial or domestic applications, has increased with the development of new methods and with the growth of computer processing capabilities. One of the major pillars of today’s research goes beyond image processing, in the more generous concept of computer vision. Nevertheless, the latest evolutions in neural networks and machine learning have added more valences and applications to the domain, starting from industrial process control to the challenging self-driving in automotive, passing from medical imaging, information retrieval and digital forensics.

The scope of this Special Issue is to collect the latest works in the fields of computer vision, machine learning, and deep learning and gather researchers from different areas who are struggling to solve the provoking and demanding topics proposed by the industry and an ever-changing world. Potential topics include but are not limited to:

  • Image processing and applications;
  • Medical imaging;
  • Autonomous vehicles;
  • Convolutional neural networks;
  • Digital forensics;
  • Smart home;
  • Smart city;
  • Industrial quality control.

Prof. Dr. Remus Brad
Dr. Arpad Gellert
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Future Internet is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • image processing
  • computer vision
  • deep learning
  • artificial intelligence
  • machine learning
  • convolutional neural networks
  • automation
  • smart home
  • smart city
  • medical imaging

Published Papers (12 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

Article
COVIDNet: Implementing Parallel Architecture on Sound and Image for High Efficacy
Future Internet 2021, 13(11), 269; https://doi.org/10.3390/fi13110269 - 26 Oct 2021
Viewed by 400
Abstract
The present work relates to the implementation of core parallel architecture in a deep learning algorithm. At present, deep learning technology forms the main interdisciplinary basis of healthcare, hospital hygiene, biological and medicine. This work establishes a baseline range by training hyperparameter space, [...] Read more.
The present work relates to the implementation of core parallel architecture in a deep learning algorithm. At present, deep learning technology forms the main interdisciplinary basis of healthcare, hospital hygiene, biological and medicine. This work establishes a baseline range by training hyperparameter space, which could be support images, and sound with further develop a parallel architectural model using multiple inputs with and without the patient’s involvement. The chest X-ray images input could form the model architecture include variables for the number of nodes in each layer and dropout rate. Fourier transformation Mel-spectrogram images with the correct pixel range use to covert sound acceptance at the convolutional neural network in embarrassingly parallel sequences. COVIDNet the end user tool has to input a chest X-ray image and a cough audio file which could be a natural cough or a forced cough. Three binary classification models (COVID-19 CXR, non-COVID-19 CXR, COVID-19 cough) were trained. The COVID-19 CXR model classifies between healthy lungs and the COVID-19 model meanwhile the non-COVID-19 CXR model classifies between non-COVID-19 pneumonia and healthy lungs. The COVID-19 CXR model has an accuracy of 95% which was trained using 1681 COVID-19 positive images and 10,895 healthy lungs images, meanwhile, the non-COVID-19 CXR model has an accuracy of 91% which was trained using 7478 non-COVID-19 pneumonia positive images and 10,895 healthy lungs. The reason why all the models are binary classification is due to the lack of available data since medical image datasets are usually highly imbalanced and the cost of obtaining them are very pricey and time-consuming. Therefore, data augmentation was performed on the medical images datasets that were used. Effects of parallel architecture and optimization to improve on design were investigated. Full article
(This article belongs to the Special Issue Computer Vision, Deep Learning and Machine Learning with Applications)
Show Figures

Figure 1

Article
Rumor Detection Based on Attention CNN and Time Series of Context Information
Future Internet 2021, 13(11), 267; https://doi.org/10.3390/fi13110267 - 25 Oct 2021
Viewed by 398
Abstract
This study aims to explore the time series context and sentiment polarity features of rumors’ life cycles, and how to use them to optimize the CNN model parameters and improve the classification effect. The proposed model is a convolutional neural network embedded with [...] Read more.
This study aims to explore the time series context and sentiment polarity features of rumors’ life cycles, and how to use them to optimize the CNN model parameters and improve the classification effect. The proposed model is a convolutional neural network embedded with an attention mechanism of sentiment polarity and time series information. Firstly, the whole life cycle of rumors is divided into 20 groups by the time series algorithm and each group of texts is trained by Doc2Vec to obtain the text vector. Secondly, the SVM algorithm is used to obtain the sentiment polarity features of each group. Lastly, the CNN model with the spatial attention mechanism is used to obtain the rumors’ classification. The experiment results show that the proposed model introduced with features of time series and sentiment polarity is very effective for rumor detection, and can greatly reduce the number of iterations for model training as well. The accuracy, precision, recall and F1 of the attention CNN are better than the latest benchmark model. Full article
(This article belongs to the Special Issue Computer Vision, Deep Learning and Machine Learning with Applications)
Show Figures

Figure 1

Article
Dynamic Detection and Recognition of Objects Based on Sequential RGB Images
Future Internet 2021, 13(7), 176; https://doi.org/10.3390/fi13070176 - 07 Jul 2021
Viewed by 647
Abstract
Conveyors are used commonly in industrial production lines and automated sorting systems. Many applications require fast, reliable, and dynamic detection and recognition for the objects on conveyors. Aiming at this goal, we design a framework that involves three subtasks: one-class instance segmentation (OCIS), [...] Read more.
Conveyors are used commonly in industrial production lines and automated sorting systems. Many applications require fast, reliable, and dynamic detection and recognition for the objects on conveyors. Aiming at this goal, we design a framework that involves three subtasks: one-class instance segmentation (OCIS), multiobject tracking (MOT), and zero-shot fine-grained recognition of 3D objects (ZSFGR3D). A new level set map network (LSMNet) and a multiview redundancy-free feature network (MVRFFNet) are proposed for the first and third subtasks, respectively. The level set map (LSM) is used to annotate instances instead of the traditional multichannel binary mask, and each peak of the LSM represents one instance. Based on the LSM, LSMNet can adopt a pix2pix architecture to segment instances. MVRFFNet is a generalized zero-shot learning (GZSL) framework based on the Wasserstein generative adversarial network for 3D object recognition. Multi-view features of an object are combined into a compact registered feature. By treating the registered features as the category attribution in the GZSL setting, MVRFFNet learns a mapping function that maps original retrieve features into a new redundancy-free feature space. To validate the performance of the proposed methods, a segmentation dataset and a fine-grained classification dataset about objects on a conveyor are established. Experimental results on these datasets show that LSMNet can achieve a recalling accuracy close to the light instance segmentation framework You Only Look At CoefficienTs (YOLACT), while its computing speed on an NVIDIA GTX1660TI GPU is 80 fps, which is much faster than YOLACT’s 25 fps. Redundancy-free features generated by MVRFFNet perform much better than original features in the retrieval task. Full article
(This article belongs to the Special Issue Computer Vision, Deep Learning and Machine Learning with Applications)
Show Figures

Figure 1

Article
A Pattern Mining Method for Teaching Practices
Future Internet 2021, 13(5), 106; https://doi.org/10.3390/fi13050106 - 23 Apr 2021
Viewed by 652
Abstract
When integrating digital technology into teaching, many teachers experience similar challenges. Nevertheless, sharing experiences is difficult as it is usually not possible to transfer teaching scenarios directly from one subject to another because subject-specific characteristics make it difficult to reuse them. To address [...] Read more.
When integrating digital technology into teaching, many teachers experience similar challenges. Nevertheless, sharing experiences is difficult as it is usually not possible to transfer teaching scenarios directly from one subject to another because subject-specific characteristics make it difficult to reuse them. To address this problem, instructional scenarios can be described as patterns, which has already been applied in educational contexts. Patterns capture proven teaching strategies and describe teaching scenarios in a unified structure that can be reused. Since priorities for content, methods, and tools are different in each subject, we show an approach to develop a domain-independent graph database to collect digital teaching practices from a taxonomic structure via the intermediate step of an ontology. Furthermore, we outline a method to identify effective teaching practices from interdisciplinary data as patterns from the graph database using an association rule algorithm. The results show that an association-based analysis approach can derive initial indications of effective teaching scenarios. Full article
(This article belongs to the Special Issue Computer Vision, Deep Learning and Machine Learning with Applications)
Show Figures

Figure 1

Article
Coronary Centerline Extraction from CCTA Using 3D-UNet
Future Internet 2021, 13(4), 101; https://doi.org/10.3390/fi13040101 - 19 Apr 2021
Viewed by 734
Abstract
The mesh-type coronary model, obtained from three-dimensional reconstruction using the sequence of images produced by computed tomography (CT), can be used to obtain useful diagnostic information, such as extracting the projection of the lumen (planar development along an artery). In this paper, we [...] Read more.
The mesh-type coronary model, obtained from three-dimensional reconstruction using the sequence of images produced by computed tomography (CT), can be used to obtain useful diagnostic information, such as extracting the projection of the lumen (planar development along an artery). In this paper, we have focused on automated coronary centerline extraction from cardiac computed tomography angiography (CCTA) proposing a 3D version of U-Net architecture, trained with a novel loss function and with augmented patches. We have obtained promising results for accuracy (between 90–95%) and overlap (between 90–94%) with various network training configurations on the data from the Rotterdam Coronary Artery Centerline Extraction benchmark. We have also demonstrated the ability of the proposed network to learn despite the huge class imbalance and sparse annotation present in the training data. Full article
(This article belongs to the Special Issue Computer Vision, Deep Learning and Machine Learning with Applications)
Show Figures

Figure 1

Article
Person Re-Identification Based on Attention Mechanism and Context Information Fusion
Future Internet 2021, 13(3), 72; https://doi.org/10.3390/fi13030072 - 13 Mar 2021
Viewed by 813
Abstract
Person re-identification (ReID) plays a significant role in video surveillance analysis. In the real world, due to illumination, occlusion, and deformation, pedestrian features extraction is the key to person ReID. Considering the shortcomings of existing methods in pedestrian features extraction, a method based [...] Read more.
Person re-identification (ReID) plays a significant role in video surveillance analysis. In the real world, due to illumination, occlusion, and deformation, pedestrian features extraction is the key to person ReID. Considering the shortcomings of existing methods in pedestrian features extraction, a method based on attention mechanism and context information fusion is proposed. A lightweight attention module is introduced into ResNet50 backbone network equipped with a small number of network parameters, which enhance the significant characteristics of person and suppress irrelevant information. Aiming at the problem of person context information loss due to the over depth of the network, a context information fusion module is designed to sample the shallow feature map of pedestrians and cascade with the high-level feature map. In order to improve the robustness, the model is trained by combining the loss of margin sample mining with the loss function of cross entropy. Experiments are carried out on datasets Market1501 and DukeMTMC-reID, our method achieves rank-1 accuracy of 95.9% on the Market1501 dataset, and 90.1% on the DukeMTMC-reID dataset, outperforming the current mainstream method in case of only using global feature. Full article
(This article belongs to the Special Issue Computer Vision, Deep Learning and Machine Learning with Applications)
Show Figures

Figure 1

Article
Adaptive Weighted Multi-Level Fusion of Multi-Scale Features: A New Approach to Pedestrian Detection
by and
Future Internet 2021, 13(2), 38; https://doi.org/10.3390/fi13020038 - 02 Feb 2021
Viewed by 876
Abstract
Great achievements have been made in pedestrian detection through deep learning. For detectors based on deep learning, making better use of features has become the key to their detection effect. While current pedestrian detectors have made efforts in feature utilization to improve their [...] Read more.
Great achievements have been made in pedestrian detection through deep learning. For detectors based on deep learning, making better use of features has become the key to their detection effect. While current pedestrian detectors have made efforts in feature utilization to improve their detection performance, the feature utilization is still inadequate. To solve the problem of inadequate feature utilization, we proposed the Multi-Level Feature Fusion Module (MFFM) and its Multi-Scale Feature Fusion Unit (MFFU) sub-module, which connect feature maps of the same scale and different scales by using horizontal and vertical connections and shortcut structures. All of these connections are accompanied by weights that can be learned; thus, they can be used as adaptive multi-level and multi-scale feature fusion modules to fuse the best features. Then, we built a complete pedestrian detector, the Adaptive Feature Fusion Detector (AFFDet), which is an anchor-free one-stage pedestrian detector that can make full use of features for detection. As a result, compared with other methods, our method has better performance on the challenging Caltech Pedestrian Detection Benchmark (Caltech) and has quite competitive speed. It is the current state-of-the-art one-stage pedestrian detection method. Full article
(This article belongs to the Special Issue Computer Vision, Deep Learning and Machine Learning with Applications)
Show Figures

Figure 1

Article
Collaborative Filtering Based on a Variational Gaussian Mixture Model
Future Internet 2021, 13(2), 37; https://doi.org/10.3390/fi13020037 - 01 Feb 2021
Cited by 1 | Viewed by 897
Abstract
Collaborative filtering (CF) is a widely used method in recommendation systems. Linear models are still the mainstream of collaborative filtering research methods, but non-linear probabilistic models are beyond the limit of linear model capacity. For example, variational autoencoders (VAEs) have been extensively used [...] Read more.
Collaborative filtering (CF) is a widely used method in recommendation systems. Linear models are still the mainstream of collaborative filtering research methods, but non-linear probabilistic models are beyond the limit of linear model capacity. For example, variational autoencoders (VAEs) have been extensively used in CF, and have achieved excellent results. Aiming at the problem of the prior distribution for the latent codes of VAEs in traditional CF is too simple, which makes the implicit variable representations of users and items too poor. This paper proposes a variational autoencoder that uses a Gaussian mixture model for latent factors distribution for CF, GVAE-CF. On this basis, an optimization function suitable for GVAE-CF is proposed. In our experimental evaluation, we show that the recommendation performance of GVAE-CF outperforms the previously proposed VAE-based models on several popular benchmark datasets in terms of recall and normalized discounted cumulative gain (NDCG), thus proving the effectiveness of the algorithm. Full article
(This article belongs to the Special Issue Computer Vision, Deep Learning and Machine Learning with Applications)
Show Figures

Figure 1

Article
Evaluation of Deep Convolutional Generative Adversarial Networks for Data Augmentation of Chest X-ray Images
Future Internet 2021, 13(1), 8; https://doi.org/10.3390/fi13010008 - 31 Dec 2020
Cited by 3 | Viewed by 1909
Abstract
Medical image datasets are usually imbalanced due to the high costs of obtaining the data and time-consuming annotations. Training a deep neural network model on such datasets to accurately classify the medical condition does not yield the desired results as they often over-fit [...] Read more.
Medical image datasets are usually imbalanced due to the high costs of obtaining the data and time-consuming annotations. Training a deep neural network model on such datasets to accurately classify the medical condition does not yield the desired results as they often over-fit the majority class samples’ data. Data augmentation is often performed on the training data to address the issue by position augmentation techniques such as scaling, cropping, flipping, padding, rotation, translation, affine transformation, and color augmentation techniques such as brightness, contrast, saturation, and hue to increase the dataset sizes. Radiologists generally use chest X-rays for the diagnosis of pneumonia. Due to patient privacy concerns, access to such data is often protected. In this study, we performed data augmentation on the Chest X-ray dataset to generate artificial chest X-ray images of the under-represented class through generative modeling techniques such as the Deep Convolutional Generative Adversarial Network (DCGAN). With just 1341 chest X-ray images labeled as Normal, artificial samples were created by retaining similar characteristics to the original data with this technique. Evaluating the model resulted in a Fréchet Distance of Inception (FID) score of 1.289. We further show the superior performance of a CNN classifier trained on the DCGAN augmented dataset. Full article
(This article belongs to the Special Issue Computer Vision, Deep Learning and Machine Learning with Applications)
Show Figures

Figure 1

Article
Portfolio Learning Based on Deep Learning
Future Internet 2020, 12(11), 202; https://doi.org/10.3390/fi12110202 - 18 Nov 2020
Cited by 1 | Viewed by 955
Abstract
Traditional portfolio theory divides stocks into different categories using indicators such as industry, market value, and liquidity, and then selects representative stocks according to them. In this paper, we propose a novel portfolio learning approach based on deep learning and apply it to [...] Read more.
Traditional portfolio theory divides stocks into different categories using indicators such as industry, market value, and liquidity, and then selects representative stocks according to them. In this paper, we propose a novel portfolio learning approach based on deep learning and apply it to China’s stock market. Specifically, this method is based on the similarity of deep features extracted from candlestick charts. First, we obtained whole stock information from Tushare, a professional financial data interface. These raw time series data are then plotted into candlestick charts to make an image dataset for studying the stock market. Next, the method extracts high-dimensional features from candlestick charts through an autoencoder. After that, K-means is used to cluster these high-dimensional features. Finally, we choose one stock from each category according to the Sharpe ratio and a low-risk, high-return portfolio is obtained. Extensive experiments are conducted on stocks in the Chinese stock market for evaluation. The results demonstrate that the proposed portfolio outperforms the market’s leading funds and the Shanghai Stock Exchange Composite Index (SSE Index) in a number of metrics. Full article
(This article belongs to the Special Issue Computer Vision, Deep Learning and Machine Learning with Applications)
Show Figures

Figure 1

Article
Comparison of Machine Learning and Deep Learning Models for Network Intrusion Detection Systems
Future Internet 2020, 12(10), 167; https://doi.org/10.3390/fi12100167 - 30 Sep 2020
Cited by 10 | Viewed by 1976
Abstract
The development of robust anomaly-based network detection systems, which are preferred over static signal-based network intrusion, is vital for cybersecurity. The development of a flexible and dynamic security system is required to tackle the new attacks. Current intrusion detection systems (IDSs) suffer to [...] Read more.
The development of robust anomaly-based network detection systems, which are preferred over static signal-based network intrusion, is vital for cybersecurity. The development of a flexible and dynamic security system is required to tackle the new attacks. Current intrusion detection systems (IDSs) suffer to attain both the high detection rate and low false alarm rate. To address this issue, in this paper, we propose an IDS using different machine learning (ML) and deep learning (DL) models. This paper presents a comparative analysis of different ML models and DL models on Coburg intrusion detection datasets (CIDDSs). First, we compare different ML- and DL-based models on the CIDDS dataset. Second, we propose an ensemble model that combines the best ML and DL models to achieve high-performance metrics. Finally, we benchmarked our best models with the CIC-IDS2017 dataset and compared them with state-of-the-art models. While the popular IDS datasets like KDD99 and NSL-KDD fail to represent the recent attacks and suffer from network biases, CIDDS, used in this research, encompasses labeled flow-based data in a simulated office environment with both updated attacks and normal usage. Furthermore, both accuracy and interpretability must be considered while implementing AI models. Both ML and DL models achieved an accuracy of 99% on the CIDDS dataset with a high detection rate, low false alarm rate, and relatively low training costs. Feature importance was also studied using the Classification and regression tree (CART) model. Our models performed well in 10-fold cross-validation and independent testing. CART and convolutional neural network (CNN) with embedding achieved slightly better performance on the CIC-IDS2017 dataset compared to previous models. Together, these results suggest that both ML and DL methods are robust and complementary techniques as an effective network intrusion detection system. Full article
(This article belongs to the Special Issue Computer Vision, Deep Learning and Machine Learning with Applications)
Show Figures

Figure 1

Review

Jump to: Research

Review
Computer Vision for Fire Detection on UAVs—From Software to Hardware
Future Internet 2021, 13(8), 200; https://doi.org/10.3390/fi13080200 - 31 Jul 2021
Cited by 1 | Viewed by 1166
Abstract
Fire hazard is a condition that has potentially catastrophic consequences. Artificial intelligence, through Computer Vision, in combination with UAVs has assisted dramatically to identify this risk and avoid it in a timely manner. This work is a literature review on UAVs using Computer [...] Read more.
Fire hazard is a condition that has potentially catastrophic consequences. Artificial intelligence, through Computer Vision, in combination with UAVs has assisted dramatically to identify this risk and avoid it in a timely manner. This work is a literature review on UAVs using Computer Vision in order to detect fire. The research was conducted for the last decade in order to record the types of UAVs, the hardware and software used and the proposed datasets. The scientific research was executed through the Scopus database. The research showed that multi-copters were the most common type of vehicle and that the combination of RGB with a thermal camera was part of most applications. In addition, the trend in the use of Convolutional Neural Networks (CNNs) is increasing. In the last decade, many applications and a wide variety of hardware and methods have been implemented and studied. Many efforts have been made to effectively avoid the risk of fire. The fact that state-of-the-art methodologies continue to be researched, leads to the conclusion that the need for a more effective solution continues to arouse interest. Full article
(This article belongs to the Special Issue Computer Vision, Deep Learning and Machine Learning with Applications)
Show Figures

Figure 1

Planned Papers

The below list represents only planned manuscripts. Some of these manuscripts have not been received by the Editorial Office yet. Papers submitted to MDPI journals are subject to peer-review.

Title: A Deep Learning-based Approach for the Discrimination of Crop and Weed in Agricultural Robots

Abstract: A major threat to agricultural productivity is the weed that grows together with crops. It takes part of the nutrients and water in the soil and is capable of inhibiting the growth of crops. Herbicides have been traditionally used for weed control but their chemical compounds often contaminate the crops posing a significant health risk for humans. In this paper, we propose a deep learning-based approach for discriminating the weed from the crops. An embedded system has been deployed on a tractor and controls the land plow machinery to selectively discard the weed from the soil. Concept, implementation, and preliminary field results are presented.

Back to TopTop