Recent Advances of Big Data Technology

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Information Applications".

Deadline for manuscript submissions: closed (31 March 2016)

Special Issue Editors


E-Mail
Guest Editor
School of Information Technology, Deakin University, Australia
Interests: big data applications; big data security; big data processing

E-Mail
Guest Editor
School of Computer Science and Engineering, University of Electronic Science and Technology of China, China
Interests: information security; cloud computing; big data algorithms

Special Issue Information

Dear Colleagues,

“Big Data” has become an important topic in science, engineering, medicine, healthcare, finance, business, and, ultimately, society itself. Big Data refers to the massive amount of digital information stored or transmitted in computer systems. With a rapid growth of big data applications, it has become critical to introduce recent research advances to accommodate the need of big data applications. The objective of this Special Issue is to capture the latest advances in this research field. Topics of interest include, but are not limited to, the following:

  • Big data processing (Analytics, Querying, Mining)
  • Big data storage and management
  • Technology and application of big data
  • Intelligent and unconventional methods for big data
  • High performance computing for big data
  • Novel hardware and software architectures for big data
  • Success case analysis of big data
  • Big data in business performance management
  • Big data as a service
  • Big data in enterprise models and practices
  • Big data security

Associate Professor Yong Yu
Dr. Yu Wang

Guest Editors

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

846 KiB  
Article
Efficient Dynamic Integrity Verification for Big Data Supporting Users Revocability
by Xinpeng Zhang, Chunxiang Xu, Xiaojun Zhang, Taizong Gu, Zhi Geng and Guoping Liu
Information 2016, 7(2), 31; https://doi.org/10.3390/info7020031 - 27 May 2016
Cited by 2 | Viewed by 4258
Abstract
With the advent of the big data era, cloud data storage and retrieval have become popular for efficient data management in large companies and organizations, thus they can enjoy the on-demand high-quality cloud storage service. Meanwhile, for security reasons, those companies and organizations [...] Read more.
With the advent of the big data era, cloud data storage and retrieval have become popular for efficient data management in large companies and organizations, thus they can enjoy the on-demand high-quality cloud storage service. Meanwhile, for security reasons, those companies and organizations would like to verify the integrity of their data once storing it in the cloud. To address this issue, they need a proper cloud storage auditing scheme which matches their actual demands. Current research often focuses on the situation where the data manager owns the data; however, the data belongs to the company, rather than the data managers in the real situation which has been overlooked. For example, the current data manager is no longer suitable to manage the data stored in the cloud after a period and will be replaced by another one. The successor needs to verify the integrity of the former managed data; this problem is obviously inevitable in reality. In this paper, we fill this gap by giving a practical efficient revocable privacy-preserving public auditing scheme for cloud storage meeting the auditing requirement of large companies and organization’s data transfer. The scheme is conceptually simple and is proven to be secure even when the cloud service provider conspires with revoked users. Full article
(This article belongs to the Special Issue Recent Advances of Big Data Technology)
Show Figures

Figure 1

1587 KiB  
Article
Feature Engineering for Recognizing Adverse Drug Reactions from Twitter Posts
by Hong-Jie Dai, Musa Touray, Jitendra Jonnagaddala and Shabbir Syed-Abdul
Information 2016, 7(2), 27; https://doi.org/10.3390/info7020027 - 25 May 2016
Cited by 19 | Viewed by 8252
Abstract
Social media platforms are emerging digital communication channels that provide an easy way for common people to share their health and medication experiences online. With more people discussing their health information online publicly, social media platforms present a rich source of information for [...] Read more.
Social media platforms are emerging digital communication channels that provide an easy way for common people to share their health and medication experiences online. With more people discussing their health information online publicly, social media platforms present a rich source of information for exploring adverse drug reactions (ADRs). ADRs are major public health problems that result in deaths and hospitalizations of millions of people. Unfortunately, not all ADRs are identified before a drug is made available in the market. In this study, an ADR event monitoring system is developed which can recognize ADR mentions from a tweet and classify its assertion. We explored several entity recognition features, feature conjunctions, and feature selection and analyzed their characteristics and impacts on the recognition of ADRs, which have never been studied previously. The results demonstrate that the entity recognition performance for ADR can achieve an F-score of 0.562 on the PSB Social Media Mining shared task dataset, which outperforms the partial-matching-based method by 0.122. After feature selection, the F-score can be further improved by 0.026. This novel technique of text mining utilizing shared online social media data will open an array of opportunities for researchers to explore various health related issues. Full article
(This article belongs to the Special Issue Recent Advances of Big Data Technology)
Show Figures

Figure 1

809 KiB  
Article
A Big Network Traffic Data Fusion Approach Based on Fisher and Deep Auto-Encoder
by Xiaoling Tao, Deyan Kong, Yi Wei and Yong Wang
Information 2016, 7(2), 20; https://doi.org/10.3390/info7020020 - 23 Mar 2016
Cited by 28 | Viewed by 5836
Abstract
Data fusion is usually performed prior to classification in order to reduce the input space. These dimensionality reduction techniques help to decline the complexity of the classification model and thus improve the classification performance. The traditional supervised methods demand labeled samples, and the [...] Read more.
Data fusion is usually performed prior to classification in order to reduce the input space. These dimensionality reduction techniques help to decline the complexity of the classification model and thus improve the classification performance. The traditional supervised methods demand labeled samples, and the current network traffic data mostly is not labeled. Thereby, better learners will be built by using both labeled and unlabeled data, than using each one alone. In this paper, a novel network traffic data fusion approach based on Fisher and deep auto-encoder (DFA-F-DAE) is proposed to reduce the data dimensions and the complexity of computation. The experimental results show that the DFA-F-DAE improves the generalization ability of the three classification algorithms (J48, back propagation neural network (BPNN), and support vector machine (SVM)) by data dimensionality reduction. We found that the DFA-F-DAE remarkably improves the efficiency of big network traffic classification. Full article
(This article belongs to the Special Issue Recent Advances of Big Data Technology)
Show Figures

Figure 1

1342 KiB  
Article
A Feature Selection Method for Large-Scale Network Traffic Classification Based on Spark
by Yong Wang, Wenlong Ke and Xiaoling Tao
Information 2016, 7(1), 6; https://doi.org/10.3390/info7010006 - 15 Feb 2016
Cited by 19 | Viewed by 6307
Abstract
Currently, with the rapid increasing of data scales in network traffic classifications, how to select traffic features efficiently is becoming a big challenge. Although a number of traditional feature selection methods using the Hadoop-MapReduce framework have been proposed, the execution time was still [...] Read more.
Currently, with the rapid increasing of data scales in network traffic classifications, how to select traffic features efficiently is becoming a big challenge. Although a number of traditional feature selection methods using the Hadoop-MapReduce framework have been proposed, the execution time was still unsatisfactory with numeral iterative computations during the processing. To address this issue, an efficient feature selection method for network traffic based on a new parallel computing framework called Spark is proposed in this paper. In our approach, the complete feature set is firstly preprocessed based on Fisher score, and a sequential forward search strategy is employed for subsets. The optimal feature subset is then selected using the continuous iterations of the Spark computing framework. The implementation demonstrates that, on the precondition of keeping the classification accuracy, our method reduces the time cost of modeling and classification, and improves the execution efficiency of feature selection significantly. Full article
(This article belongs to the Special Issue Recent Advances of Big Data Technology)
Show Figures

Figure 1

1029 KiB  
Article
Effects of Semantic Features on Machine Learning-Based Drug Name Recognition Systems: Word Embeddings vs. Manually Constructed Dictionaries
by Shengyu Liu, Buzhou Tang, Qingcai Chen and Xiaolong Wang
Information 2015, 6(4), 848-865; https://doi.org/10.3390/info6040848 - 11 Dec 2015
Cited by 47 | Viewed by 6944
Abstract
Semantic features are very important for machine learning-based drug name recognition (DNR) systems. The semantic features used in most DNR systems are based on drug dictionaries manually constructed by experts. Building large-scale drug dictionaries is a time-consuming task and adding new drugs to [...] Read more.
Semantic features are very important for machine learning-based drug name recognition (DNR) systems. The semantic features used in most DNR systems are based on drug dictionaries manually constructed by experts. Building large-scale drug dictionaries is a time-consuming task and adding new drugs to existing drug dictionaries immediately after they are developed is also a challenge. In recent years, word embeddings that contain rich latent semantic information of words have been widely used to improve the performance of various natural language processing tasks. However, they have not been used in DNR systems. Compared to the semantic features based on drug dictionaries, the advantage of word embeddings lies in that learning them is unsupervised. In this paper, we investigate the effect of semantic features based on word embeddings on DNR and compare them with semantic features based on three drug dictionaries. We propose a conditional random fields (CRF)-based system for DNR. The skip-gram model, an unsupervised algorithm, is used to induce word embeddings on about 17.3 GigaByte (GB) unlabeled biomedical texts collected from MEDLINE (National Library of Medicine, Bethesda, MD, USA). The system is evaluated on the drug-drug interaction extraction (DDIExtraction) 2013 corpus. Experimental results show that word embeddings significantly improve the performance of the DNR system and they are competitive with semantic features based on drug dictionaries. F-score is improved by 2.92 percentage points when word embeddings are added into the baseline system. It is comparative with the improvements from semantic features based on drug dictionaries. Furthermore, word embeddings are complementary to the semantic features based on drug dictionaries. When both word embeddings and semantic features based on drug dictionaries are added, the system achieves the best performance with an F-score of 78.37%, which outperforms the best system of the DDIExtraction 2013 challenge by 6.87 percentage points. Full article
(This article belongs to the Special Issue Recent Advances of Big Data Technology)
Show Figures

Figure 1

Back to TopTop