Special Issue "Data Stream Mining and Processing"

A special issue of Data (ISSN 2306-5729).

Deadline for manuscript submissions: closed (10 November 2018)

Special Issue Editors

Guest Editor
Prof. Dr.Sc. Dmytro Peleshko

IT Step University, Lviv, Ukraine
Website | E-Mail
Interests: computer vision; artificial intelligence; machine learning; video data stream processing; neural networks; deep learning; IoT; pattern recognition; big data modelling
Guest Editor
Prof. Dr.Sc. Olena Vynokurova

IT Step University, Lviv, Ukraine
Website | E-Mail
Interests: machine learning; computational intelligence; hybrid systems; wavelet neural networks; deep learning; prediction; clustering; classification; IoT; pattern recognition
Guest Editor
Associate Prof. CSc. Sergii Babichev

1. Department of Informatics, Jan Evangelista Purkyně University in Usti nad Labem, Czech Republic;
2. IT Step University, Lviv, Ukraine
Website | E-Mail
Interests: data mining of complex data; objective clustering; bioinformatics; gene expression profile processing; gene regulatory network reconstruction and simulation

Special Issue Information

Dear Colleagues,

This Special Issue of Data is dedicated mainly to selected papers from the 2018 IEEE International Conference of Data Stream Mining and Processing held in Lviv, Ukraine, 21–25 August, 2018. Expanded versions of papers presented at the conference will be invited for submission to this special issue. However, it should be noted that this Special Issue is not limited conference materials. Original papers, which correspond to hereinbelow presented topics can also be published.

Topics include:

  • Hybrid Systems of Computational Intelligence

Information processing systems which combine different approaches of Computational Intelligence, for example, artificial neural networks which are learnt by evolutionary algorithms, neuro-fuzzy systems, wavelet-neuro-fuzzy systems, neuro-neo-fuzzy systems, particle swarm algorithms, evolving systems, deep learning, etc.

  • Machine Vision and Pattern Recognition

Video Streams that are fed from video cameras in an online mode under environment uncertainty and variability conditions.

  • Dynamic Data Mining and Data Stream Mining

Data Mining problems (classification, clustering, prediction, identification, etc.) when information is fed in an online mode in the form of data streams.

  • Big Data and Data Science Using Intelligent Approaches

Systems of Computational Intelligence (artificial neural networks, fuzzy reasoning systems, evolutionary algorithms) in the tasks of Big Data processing (high-dimensional data) where data are stored in VLDB or fed in an unlimited data stream. Natural Language Processing—machine learning using to get the semantic objects from natural language; the deep learning methods for natural language understanding.

Prof. Dr.Sc. Dmytro Peleshko
Prof. Dr.Sc. Olena Vynokurova
Associate prof. CSc. Sergii Babichev
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Data is an international peer-reviewed open access quarterly journal published by MDPI, indexed in the Emerging Sources Citation Index (ESCI) - Web of Science and Inspec (IET).

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) is waived for well-prepared manuscripts submitted to this issue. Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Additional Information for Authors

Authors are obliged to expand their conference papers by adding 60% of the new research results, changing the title, and partly changing abstract and conclusions. Moreover, a reference to the paper from conference proceedings should be in the journal paper.

Technical Program Committee

List of the reviewers

Aizenberg I., D.Sc., Prof. (New York, USA), [email protected]

Antoshchuk S., D.Sc., Prof. (Odesa, Ukraine), [email protected]

Bidyuk P., D.Sc., Prof. (Kyiv, Ukraine), [email protected]

Bodyanskiy Ye., D.Sc., Prof., (Kharkiv, Ukraine), [email protected]

Boyun V., D.Sc., Prof. (Kyiv, Ukraine), [email protected]

Churyumov G., D.Sc., Prof., IEEE Senior Member (Kharkiv, Ukraine), [email protected]

Dyvak М., D.Sc., Prof. (Ternopil, Ukraine), [email protected]

Gozhiy O., D.Sc., Assoc. Prof. (Mykolayv, Ukraine), [email protected]

Hnatushenko V., D.Sc., Prof., IEEE Senior Member (Dnipro, Ukraine), [email protected]

Kharchenko V., D.Sc., Prof. (Kharkiv, Ukraine), [email protected]

Lytvynenko V., D.Sc., Prof. (Kherson, Ukraine), [email protected]

Lyubchik L., D.Sc., Prof., IEEE Member (Kharkiv, Ukraine), [email protected]

Mashkov V., D.Sc., Assoc. Prof. (Ústi nad Labem, Czech republic)

Mashtalir V., D.Sc., Prof. (Kharkiv, Ukraine), [email protected]

Petlenkov E., Ph.D., Prof. (Tallinn, Estonia), [email protected]

Rekik A., Ph.D. (Sfax, Tunisia), [email protected]

Romanyshyn Yu., D.Sc., Prof. (Lviv, Ukraine), [email protected]

Sachenko A., D.Sc., Prof. (Ternopil, Ukraine), [email protected]

Setlak G., D.Sc., Prof. (Rzeszów, Poland), [email protected]

Shelevytsky I., D.Sc., Prof. (Kryvyi Rih, Ukraine), [email protected]

Sokolovsky Ya., D.Sc., Prof. (Lviv, Ukraine), [email protected]

Stepashko V., D.Sc., Prof. (Kyiv, Ukraine), [email protected]

Štěpnička M., Ph.D., Assoc. Prof. (Ostrava, Czech Republic), [email protected]

Vassiljeva K., Ph.D., Assoc. Prof. (Tallinn, Estonia), [email protected]

Wójcik W., Dr. hab.inz. (Lublin, Poland)

Kulishova N., Ph.D., Assoc. Prof., (Kharkiv, Ukraine), [email protected]

Volkova V., Ph.D., Assoc. Prof., (Kyiv, Ukraine), [email protected]

Yatsymirskyy М., D.Sc., Prof. (Łódź, Poland), [email protected]

Alekseyev V., Ph.D., Assoc. Prof. (Lviv, Ukraine), Vladislav Alekseyev <[email protected]>

Dumin O., Ph.D., Assoc. Prof., IEEE Ukraine Section (Kharkiv) (Kharkiv, Ukraine), [email protected]

Panchenko T., Ph.D., Assoc. Prof., Member of the Board of Directors at ACM Ukrainian Chapter (Kyiv, Ukraine) [email protected]

Andrew Smith, Ph.D., (Dublin, Ireland) [email protected]

Bohdan Pavlyshenko, Ph.D., Assoc. Prof., (Lviv, Ukraine), bohdan [email protected]

Mike Hinchey, Ph.D., President, International Federation for Information Processing (IFIP); Professor of Software Engineering, University of Limerick; Emeritus Director, Lero-the Irish Software Research Centre; Chair, IEEE UK & Ireland section (Limerick, Ireland), [email protected]

Minho Jo, Ph.D., Chairman of IoT and Cognitive Networks Lab and Professor of Department of Computer Convergence Software at Korea University (Sejong Metro, South Korea), [email protected]   

Keywords

  • Big Data
  • Artificial Intelligence
  • Data Mining
  • Data Science
  • Deep learning
  • Machine Vision
  • Pattern Recognition
  • Computational Intelligence
  • Hybrid Systems

Published Papers (16 papers)

View options order results:
result details:
Displaying articles 1-16
Export citation of selected articles as:

Research

Open AccessArticle
Diagnosis of Intermittently Faulty Units at System Level
Received: 3 March 2019 / Revised: 16 March 2019 / Accepted: 17 March 2019 / Published: 22 March 2019
PDF Full-text (2871 KB) | HTML Full-text | XML Full-text
Abstract
Mostly, diagnosis at a system level intends to identify only permanently faulty units. In the paper, we consider the case when both permanently and intermittently faulty units can occur in the system. Identification of intermittently faulty units has some specifics which we have [...] Read more.
Mostly, diagnosis at a system level intends to identify only permanently faulty units. In the paper, we consider the case when both permanently and intermittently faulty units can occur in the system. Identification of intermittently faulty units has some specifics which we have considered in this paper. We also suggest the method which allows for distinguishing among different types of intermittent faults. A diagnosis procedure was suggested for each type of intermittent fault. Full article
(This article belongs to the Special Issue Data Stream Mining and Processing)
Figures

Figure 1

Open AccessArticle
Machine-Learning Models for Sales Time Series Forecasting
Received: 3 November 2018 / Revised: 9 January 2019 / Accepted: 14 January 2019 / Published: 18 January 2019
PDF Full-text (1585 KB) | HTML Full-text | XML Full-text
Abstract
In this paper, we study the usage of machine-learning models for sales predictive analytics. The main goal of this paper is to consider main approaches and case studies of using machine learning for sales forecasting. The effect of machine-learning generalization has been considered. [...] Read more.
In this paper, we study the usage of machine-learning models for sales predictive analytics. The main goal of this paper is to consider main approaches and case studies of using machine learning for sales forecasting. The effect of machine-learning generalization has been considered. This effect can be used to make sales predictions when there is a small amount of historical data for specific sales time series in the case when a new product or store is launched. A stacking approach for building regression ensemble of single models has been studied. The results show that using stacking techniques, we can improve the performance of predictive models for sales time series forecasting. Full article
(This article belongs to the Special Issue Data Stream Mining and Processing)
Figures

Figure 1

Open AccessArticle
Machine Learning in Classification Time Series with Fractal Properties
Received: 13 November 2018 / Revised: 18 December 2018 / Accepted: 23 December 2018 / Published: 28 December 2018
PDF Full-text (1616 KB) | HTML Full-text | XML Full-text
Abstract
The article presents a novel method of fractal time series classification by meta-algorithms based on decision trees. The classification objects are fractal time series. For modeling, binomial stochastic cascade processes are chosen. Each class that was singled out unites model time series with [...] Read more.
The article presents a novel method of fractal time series classification by meta-algorithms based on decision trees. The classification objects are fractal time series. For modeling, binomial stochastic cascade processes are chosen. Each class that was singled out unites model time series with the same fractal properties. Numerical experiments demonstrate that the best results are obtained by the random forest method with regression trees. A comparative analysis of the classification approaches, based on the random forest method, and traditional estimation of self-similarity degree are performed. The results show the advantage of machine learning methods over traditional time series evaluation. The results were used for detecting denial-of-service (DDoS) attacks and demonstrated a high probability of detection. Full article
(This article belongs to the Special Issue Data Stream Mining and Processing)
Figures

Figure 1

Open AccessArticle
The Model and Training Algorithm of Compact Drone Autonomous Visual Navigation System
Received: 4 November 2018 / Revised: 20 December 2018 / Accepted: 22 December 2018 / Published: 28 December 2018
PDF Full-text (1705 KB) | HTML Full-text | XML Full-text
Abstract
Trainable visual navigation systems based on deep learning demonstrate potential for robustness of onboard camera parameters and challenging environment. However, a deep model requires substantial computational resources and large labelled training sets for successful training. Implementation of the autonomous navigation and training-based fast [...] Read more.
Trainable visual navigation systems based on deep learning demonstrate potential for robustness of onboard camera parameters and challenging environment. However, a deep model requires substantial computational resources and large labelled training sets for successful training. Implementation of the autonomous navigation and training-based fast adaptation to the new environment for a compact drone is a complicated task. The article describes an original model and training algorithms adapted to the limited volume of labelled training set and constrained computational resource. This model consists of a convolutional neural network for visual feature extraction, extreme-learning machine for estimating the position displacement and boosted information-extreme classifier for obstacle prediction. To perform unsupervised training of the convolution filters with a growing sparse-coding neural gas algorithm, supervised learning algorithms to construct the decision rules with simulated annealing search algorithm used for finetuning are proposed. The use of complex criterion for parameter optimization of the feature extractor model is considered. The resulting approach performs better trajectory reconstruction than the well-known ORB-SLAM. In particular, for sequence 7 from the KITTI dataset, the translation error is reduced by nearly 65.6% under the frame rate 10 frame per second. Besides, testing on the independent TUM sequence shot outdoors produces a translation error not exceeding 6% and a rotation error not exceeding 3.68 degrees per 100 m. Testing was carried out on the Raspberry Pi 3+ single-board computer. Full article
(This article belongs to the Special Issue Data Stream Mining and Processing)
Figures

Figure 1

Open AccessArticle
Continuous Genetic Algorithms as Intelligent Assistance for Resource Distribution in Logistic Systems
Received: 14 November 2018 / Revised: 10 December 2018 / Accepted: 12 December 2018 / Published: 16 December 2018
Cited by 1 | PDF Full-text (2770 KB) | HTML Full-text | XML Full-text
Abstract
This paper addresses the problem of resource distribution control in logistic systems influenced by uncertain demand. The considered class of logistic topologies comprises two types of actors—controlled nodes and external sources—interconnected without any structural restrictions. In this paper, the application of continuous-domain genetic [...] Read more.
This paper addresses the problem of resource distribution control in logistic systems influenced by uncertain demand. The considered class of logistic topologies comprises two types of actors—controlled nodes and external sources—interconnected without any structural restrictions. In this paper, the application of continuous-domain genetic algorithms (GAs) is proposed in order to support the optimization process of resource reflow in the network channels. GAs allow one to perform simulation-based optimization and provide desirable operating conditions in the face of a priori unknown, time-varying demand. The effectiveness of inventory management process governed under an order-up-to policy involves two different objectives—holding costs and service level. Using the network analytical model with the inventory management policy implemented in a centralized way, GAs search a space of candidate solutions to find optimal policy parameters for a given topology. Numerical experiments confirm the analytical assumptions. Full article
(This article belongs to the Special Issue Data Stream Mining and Processing)
Figures

Figure 1

Open AccessArticle
Similar Text Fragments Extraction for Identifying Common Wikipedia Communities
Received: 4 November 2018 / Revised: 9 December 2018 / Accepted: 10 December 2018 / Published: 13 December 2018
PDF Full-text (1786 KB) | HTML Full-text | XML Full-text
Abstract
Similar text fragments extraction from weakly formalized data is the task of natural language processing and intelligent data analysis and is used for solving the problem of automatic identification of connected knowledge fields. In order to search such common communities in Wikipedia, we [...] Read more.
Similar text fragments extraction from weakly formalized data is the task of natural language processing and intelligent data analysis and is used for solving the problem of automatic identification of connected knowledge fields. In order to search such common communities in Wikipedia, we propose to use as an additional stage a logical-algebraic model for similar collocations extraction. With Stanford Part-Of-Speech tagger and Stanford Universal Dependencies parser, we identify the grammatical characteristics of collocation words. With WordNet synsets, we choose their synonyms. Our dataset includes Wikipedia articles from different portals and projects. The experimental results show the frequencies of synonymous text fragments in Wikipedia articles that form common information spaces. The number of highly frequented synonymous collocations can obtain an indication of key common up-to-date Wikipedia communities. Full article
(This article belongs to the Special Issue Data Stream Mining and Processing)
Figures

Figure 1

Open AccessArticle
The Extended Multidimensional Neo-Fuzzy System and Its Fast Learning in Pattern Recognition Tasks
Received: 16 October 2018 / Revised: 7 December 2018 / Accepted: 7 December 2018 / Published: 9 December 2018
PDF Full-text (892 KB) | HTML Full-text | XML Full-text
Abstract
Methods of machine learning and data mining are becoming the cornerstone in information technologies with real-time image and video recognition methods getting more and more attention. While computational system architectures are getting larger and more complex, their learning methods call for changes, as [...] Read more.
Methods of machine learning and data mining are becoming the cornerstone in information technologies with real-time image and video recognition methods getting more and more attention. While computational system architectures are getting larger and more complex, their learning methods call for changes, as training datasets often reach tens and hundreds of thousands of samples, therefore increasing the learning time of such systems. It is possible to reduce computational costs by tuning the system structure to allow fast, high accuracy learning algorithms to be applied. This paper proposes a system based on extended multidimensional neo-fuzzy units and its learning algorithm designed for data streams processing tasks. The proposed learning algorithm, based on the information entropy criterion, has significantly improved the system approximating capabilities. Experiments have confirmed the efficiency of the proposed system in solving real-time video stream recognition tasks. Full article
(This article belongs to the Special Issue Data Stream Mining and Processing)
Figures

Figure 1

Open AccessArticle
A Novel Neuro-Fuzzy Model for Multivariate Time-Series Prediction
Received: 6 November 2018 / Revised: 6 December 2018 / Accepted: 6 December 2018 / Published: 8 December 2018
PDF Full-text (3186 KB) | HTML Full-text | XML Full-text
Abstract
Time series forecasting can be a complicated problem when the underlying process shows high degree of complex nonlinear behavior. In some domains, such as financial data, processing related time-series jointly can have significant benefits. This paper proposes a novel multivariate hybrid neuro-fuzzy model [...] Read more.
Time series forecasting can be a complicated problem when the underlying process shows high degree of complex nonlinear behavior. In some domains, such as financial data, processing related time-series jointly can have significant benefits. This paper proposes a novel multivariate hybrid neuro-fuzzy model for forecasting tasks, which is based on and generalizes the neuro-fuzzy model with consequent layer multi-variable Gaussian units and its learning algorithm. The model is distinguished by a separate consequent block for each output, which is tuned with respect to the its output error only, but benefits from extracting additional information by processing the whole input vector including lag values of other variables. Numerical experiments show better accuracy and computational performance results than competing models and separate neuro-fuzzy models for each output, and thus an ability to implicitly handle complex cross correlation dependencies between variables. Full article
(This article belongs to the Special Issue Data Stream Mining and Processing)
Figures

Figure 1

Open AccessArticle
Using Recurrent Procedures in Adaptive Control System for Identify the Model Parameters of the Moving Vessel on the Cross Slipway
Received: 4 November 2018 / Revised: 27 November 2018 / Accepted: 4 December 2018 / Published: 7 December 2018
PDF Full-text (9752 KB) | HTML Full-text | XML Full-text
Abstract
The article analyses the problems connected with ensuring the coordinated operation of slipway drives that arise during the launch of a ship. The dynamic model of load of the electric drive of the ship’s cart is obtained taking into account the peculiarities of [...] Read more.
The article analyses the problems connected with ensuring the coordinated operation of slipway drives that arise during the launch of a ship. The dynamic model of load of the electric drive of the ship’s cart is obtained taking into account the peculiarities of the construction of the ship-lifting complex, which allows us to analyse the influence of external factors and random influences during the entire process of launching the ship. A linearized mathematical model of the dynamics of a complex vessel movement in the process of descent in the space of states is developed, which allows us to identify the mode of operation of the multi-drive system, taking into account its structure. The analysis of application efficiency of recurrent methods for identification (stochastic approximation and least squares) of the linearized model parameters in the space of states is carried out. A decision support system has been developed in the automated system of operational control by the module for estimating the situation and the control synthesis to ensure a coherent motion of a complex ship-carts object in a two-phase environment. Full article
(This article belongs to the Special Issue Data Stream Mining and Processing)
Figures

Figure 1

Open AccessArticle
Real-Time Fuzzy Data Processing Based on a Computational Library of Analytic Models
Received: 9 November 2018 / Revised: 29 November 2018 / Accepted: 30 November 2018 / Published: 4 December 2018
PDF Full-text (1565 KB) | HTML Full-text | XML Full-text
Abstract
This work focuses on fuzzy data processing in control and decision-making systems based on the transformation of real-timeseries and high-frequency data to fuzzy sets with further implementation of diverse fuzzy arithmetic operations. Special attention was paid to the synthesis of the computational library [...] Read more.
This work focuses on fuzzy data processing in control and decision-making systems based on the transformation of real-timeseries and high-frequency data to fuzzy sets with further implementation of diverse fuzzy arithmetic operations. Special attention was paid to the synthesis of the computational library of horizontal and vertical analytic models for fuzzy sets as the results of fuzzy arithmetic operations. The usage of the developed computational library allows increasing the operating speed and accuracy of fuzzy data processing in real time. A computational library was formed for computing of such fuzzy arithmetic operations as fuzzy-maximum. Fuzzy sets as components of fuzzy data processing were chosen as triangular fuzzy numbers. The analytic models were developed based on the analysis of the intersection points between left and right branches of considered triangular fuzzy numbers with different relations between their parameters. Our study introduces the mask for the evaluation of the relations between corresponding parameters of fuzzy numbers that allows to determine the appropriate model from the computational library in automatic mode. The simulation results confirm the efficiency of the proposed computational library for different applications. Full article
(This article belongs to the Special Issue Data Stream Mining and Processing)
Figures

Figure 1

Open AccessArticle
Multi-Agent Big-Data Lambda Architecture Model for E-Commerce Analytics
Received: 5 November 2018 / Revised: 27 November 2018 / Accepted: 27 November 2018 / Published: 1 December 2018
PDF Full-text (1910 KB) | HTML Full-text | XML Full-text
Abstract
We study big-data hybrid-data-processing lambda architecture, which consolidates low-latency real-time frameworks with high-throughput Hadoop-batch frameworks over a massively distributed setup. In particular, real-time and batch-processing engines act as autonomous multi-agent systems in collaboration. We propose a Multi-Agent Lambda Architecture (MALA) for e-commerce data [...] Read more.
We study big-data hybrid-data-processing lambda architecture, which consolidates low-latency real-time frameworks with high-throughput Hadoop-batch frameworks over a massively distributed setup. In particular, real-time and batch-processing engines act as autonomous multi-agent systems in collaboration. We propose a Multi-Agent Lambda Architecture (MALA) for e-commerce data analytics. We address the high-latency problem of Hadoop MapReduce jobs by simultaneous processing at the speed layer to the requests which require a quick turnaround time. At the same time, the batch layer in parallel provides comprehensive coverage of data by intelligent blending of stream and historical data through the weighted voting method. The cold-start problem of streaming services is addressed through the initial offset from historical batch data. Challenges of high-velocity data ingestion is resolved with distributed message queues. A proposed multi-agent decision-maker component is placed at the MALA stack as the gateway of the data pipeline. We prove efficiency of our batch model by implementing an array of features for an e-commerce site. The novelty of the model and its key significance is a scheme for multi-agent interaction between batch and real-time agents to produce deeper insights at low latency and at significantly lower costs. Hence, the proposed system is highly appealing for applications involving big data and caters to high-velocity streaming ingestion and a massive data pool. Full article
(This article belongs to the Special Issue Data Stream Mining and Processing)
Figures

Figure 1

Open AccessArticle
Television Rating Control in the Multichannel Environment Using Trend Fuzzy Knowledge Bases and Monitoring Results
Received: 3 November 2018 / Revised: 23 November 2018 / Accepted: 28 November 2018 / Published: 1 December 2018
PDF Full-text (835 KB) | HTML Full-text | XML Full-text
Abstract
The purpose of this study is to control the ratio of programs of different genres when forming the broadcast grid in order to increase and maintain the rating of a channel. In the multichannel environment, television rating controls consist of selecting content, the [...] Read more.
The purpose of this study is to control the ratio of programs of different genres when forming the broadcast grid in order to increase and maintain the rating of a channel. In the multichannel environment, television rating controls consist of selecting content, the ratings of which are completely restored after advertising. The hybrid approach to rule set refinement based on fuzzy relational calculus simplifies the process of expert recommendation systems construction. By analogy with the problem of the inverted pendulum control, the managerial actions aim to retain the balance between the fuzzy demand and supply. The increase or decrease trends of the demand and supply are described by primary fuzzy relations. The rule-based solutions of fuzzy relational equations connect significance measures of the primary fuzzy terms. Program set refinement by solving fuzzy relational equations allows avoiding procedures of content-based selective filtering. The solution set generation corresponds to the granulation of television time, where each solution represents the time slot and the granulated rating of the content. In automated media planning, generation of the weekly TV program in the form of the granular solution provides the decrease of time needed for the programming of the channel broadcast grid. Full article
(This article belongs to the Special Issue Data Stream Mining and Processing)
Figures

Figure 1

Open AccessArticle
Analysis of Application of Cluster Descriptions in Space of Characteristic Image Features
Received: 2 October 2018 / Revised: 10 November 2018 / Accepted: 12 November 2018 / Published: 14 November 2018
PDF Full-text (2395 KB) | HTML Full-text | XML Full-text
Abstract
In this paper, we propose an investigation of the properties of structural image recognition methods in the cluster space of characteristic features. Recognition, which is based on key point descriptors like SIFT (Scale-invariant Feature Transform), SURF (Speeded Up Robust Features), ORB (Oriented FAST [...] Read more.
In this paper, we propose an investigation of the properties of structural image recognition methods in the cluster space of characteristic features. Recognition, which is based on key point descriptors like SIFT (Scale-invariant Feature Transform), SURF (Speeded Up Robust Features), ORB (Oriented FAST and Rotated BRIEF), etc., often relating to the search for corresponding descriptor values between an input image and all etalon images, which require many operations and time. Recognition of the previously quantized (clustered) sets of descriptor features is described. Clustering is performed across the complete set of etalon image descriptors and followed by screening, which allows for representation of each etalon image in vector form as a distribution of clusters. Due to such representations, the number of computation and comparison procedures, which are the core of the recognition process, might be reduced tens of times. Respectively, the preprocessing stage takes additional time for clustering. The implementation of the proposed approach was tested on the Leeds Butterfly dataset. The dependence of cluster amount on recognition performance and processing time was investigated. It was proven that recognition may be performed up to nine times faster with only a moderate decrease in quality recognition compared to searching for correspondences between all existing descriptors in etalon images and input one without quantization. Full article
(This article belongs to the Special Issue Data Stream Mining and Processing)
Figures

Figure 1

Open AccessArticle
An Evaluation of the Information Technology of Gene Expression Profiles Processing Stability for Different Levels of Noise Components
Received: 10 September 2018 / Revised: 29 October 2018 / Accepted: 1 November 2018 / Published: 5 November 2018
PDF Full-text (15439 KB) | HTML Full-text | XML Full-text
Abstract
This paper presents the results of research concerning the evaluation of stability of information technology of gene expression profiles processing with the use of gene expression profiles, which contain different levels of noise components. The information technology is presented as a structural block-chart, [...] Read more.
This paper presents the results of research concerning the evaluation of stability of information technology of gene expression profiles processing with the use of gene expression profiles, which contain different levels of noise components. The information technology is presented as a structural block-chart, which contains all stages of the studied data processing. The hybrid model of objective clustering based on the SOTA algorithm and the technology of gene regulatory networks reconstruction have been investigated to evaluate the stability to the level of the noise components. The results of the simulation have shown that the hybrid model of the objective clustering has high level of stability to noise components and vice versa, the technology of gene regulatory networks reconstruction is rather sensitive to the level of noise component. The obtained results indicate the importance of gene expression profiles preprocessing at the early stage of the gene regulatory network reconstruction in order to remove background noise and non-informative genes in terms of the used criteria. Full article
(This article belongs to the Special Issue Data Stream Mining and Processing)
Figures

Figure 1

Open AccessArticle
Development of the Non-Iterative Supervised Learning Predictor Based on the Ito Decomposition and SGTM Neural-Like Structure for Managing Medical Insurance Costs
Received: 23 September 2018 / Revised: 24 October 2018 / Accepted: 29 October 2018 / Published: 31 October 2018
PDF Full-text (2861 KB) | HTML Full-text | XML Full-text
Abstract
The paper describes a new non-iterative linear supervised learning predictor. It is based on the use of Ito decomposition and the neural-like structure of the successive geometric transformations model (SGTM). Ito decomposition (Kolmogorov–Gabor polynomial) is used to extend the inputs of the SGTM [...] Read more.
The paper describes a new non-iterative linear supervised learning predictor. It is based on the use of Ito decomposition and the neural-like structure of the successive geometric transformations model (SGTM). Ito decomposition (Kolmogorov–Gabor polynomial) is used to extend the inputs of the SGTM neural-like structure. This provides high approximation properties for solving various tasks. The search for the coefficients of this polynomial is carried out using the fast, non-iterative training algorithm of the SGTM linear neural-like structure. The developed method provides high speed and increased generalization properties. The simulation of the developed method’s work for solving the medical insurance costs prediction task showed a significant increase in accuracy compared with existing methods (common SGTM neural-like structure, multilayer perceptron, Support Vector Machine, adaptive boosting, linear regression). Given the above, the developed method can be used to process large amounts of data from a variety of industries (medicine, materials science, economics, etc.) to improve the accuracy and speed of their processing. Full article
(This article belongs to the Special Issue Data Stream Mining and Processing)
Figures

Figure 1

Open AccessArticle
Short-Term Forecasting of Electricity Supply and Demand by Using the Wavelet-PSO-NNs-SO Technique for Searching in Big Data of Iran’s Electricity Market
Received: 4 October 2018 / Revised: 18 October 2018 / Accepted: 21 October 2018 / Published: 23 October 2018
Cited by 1 | PDF Full-text (12265 KB) | HTML Full-text | XML Full-text
Abstract
The databases of Iran’s electricity market have been storing large sizes of data. Retail buyers and retailers will operate in Iran’s electricity market in the foreseeable future when smart grids are implemented thoroughly across Iran. As a result, there will be very much [...] Read more.
The databases of Iran’s electricity market have been storing large sizes of data. Retail buyers and retailers will operate in Iran’s electricity market in the foreseeable future when smart grids are implemented thoroughly across Iran. As a result, there will be very much larger data of the electricity market in the future than ever before. If certain methods are devised to perform quick search in such large sizes of stored data, it will be possible to improve the forecasting accuracy of important variables in Iran’s electricity market. In this paper, available methods were employed to develop a new technique of Wavelet-Neural Networks-Particle Swarm Optimization-Simulation-Optimization (WT-NNPSO-SO) with the purpose of searching in Big Data stored in the electricity market and improving the accuracy of short-term forecasting of electricity supply and demand. The electricity market data exploration approach was based on the simulation-optimization algorithms. It was combined with the Wavelet-Neural Networks-Particle Swarm Optimization (Wavelet-NNPSO) method to improve the forecasting accuracy with the assumption Length of Training Data (LOTD) increased. In comparison with previous techniques, the runtime of the proposed technique was improved in larger sizes of data due to the use of metaheuristic algorithms. The findings were dealt with in the Results section. Full article
(This article belongs to the Special Issue Data Stream Mining and Processing)
Figures

Figure 1

Data EISSN 2306-5729 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top