Special Issue "Recent Advances of Big Data Technology"

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Information Applications".

Deadline for manuscript submissions: closed (31 March 2016)

Special Issue Editors

Guest Editor
Associate Professor Yong Yu

School of Computer Science and Engineering, University of Electronic Science and Technology of China, China
E-Mail
Interests: information security; cloud computing; big data algorithms
Guest Editor
Dr. Yu Wang

School of Information Technology, Deakin University, Australia
E-Mail
Interests: big data applications; big data security; big data processing

Special Issue Information

Dear Colleagues,

“Big Data” has become an important topic in science, engineering, medicine, healthcare, finance, business, and, ultimately, society itself. Big Data refers to the massive amount of digital information stored or transmitted in computer systems. With a rapid growth of big data applications, it has become critical to introduce recent research advances to accommodate the need of big data applications. The objective of this Special Issue is to capture the latest advances in this research field. Topics of interest include, but are not limited to, the following:

  • Big data processing (Analytics, Querying, Mining)
  • Big data storage and management
  • Technology and application of big data
  • Intelligent and unconventional methods for big data
  • High performance computing for big data
  • Novel hardware and software architectures for big data
  • Success case analysis of big data
  • Big data in business performance management
  • Big data as a service
  • Big data in enterprise models and practices
  • Big data security

Associate Professor Yong Yu
Dr. Yu Wang

Guest Editors

Published Papers (5 papers)

View options order results:
result details:
Displaying articles 1-5
Export citation of selected articles as:

Research

Open AccessArticle Efficient Dynamic Integrity Verification for Big Data Supporting Users Revocability
Information 2016, 7(2), 31; doi:10.3390/info7020031
Received: 6 February 2016 / Revised: 9 April 2016 / Accepted: 12 April 2016 / Published: 27 May 2016
PDF Full-text (846 KB) | HTML Full-text | XML Full-text
Abstract
With the advent of the big data era, cloud data storage and retrieval have become popular for efficient data management in large companies and organizations, thus they can enjoy the on-demand high-quality cloud storage service. Meanwhile, for security reasons, those companies and organizations
[...] Read more.
With the advent of the big data era, cloud data storage and retrieval have become popular for efficient data management in large companies and organizations, thus they can enjoy the on-demand high-quality cloud storage service. Meanwhile, for security reasons, those companies and organizations would like to verify the integrity of their data once storing it in the cloud. To address this issue, they need a proper cloud storage auditing scheme which matches their actual demands. Current research often focuses on the situation where the data manager owns the data; however, the data belongs to the company, rather than the data managers in the real situation which has been overlooked. For example, the current data manager is no longer suitable to manage the data stored in the cloud after a period and will be replaced by another one. The successor needs to verify the integrity of the former managed data; this problem is obviously inevitable in reality. In this paper, we fill this gap by giving a practical efficient revocable privacy-preserving public auditing scheme for cloud storage meeting the auditing requirement of large companies and organization’s data transfer. The scheme is conceptually simple and is proven to be secure even when the cloud service provider conspires with revoked users. Full article
(This article belongs to the Special Issue Recent Advances of Big Data Technology)
Figures

Figure 1

Open AccessArticle Feature Engineering for Recognizing Adverse Drug Reactions from Twitter Posts
Information 2016, 7(2), 27; doi:10.3390/info7020027
Received: 30 March 2016 / Revised: 17 May 2016 / Accepted: 18 May 2016 / Published: 25 May 2016
Cited by 2 | PDF Full-text (1587 KB) | HTML Full-text | XML Full-text
Abstract
Social media platforms are emerging digital communication channels that provide an easy way for common people to share their health and medication experiences online. With more people discussing their health information online publicly, social media platforms present a rich source of information for
[...] Read more.
Social media platforms are emerging digital communication channels that provide an easy way for common people to share their health and medication experiences online. With more people discussing their health information online publicly, social media platforms present a rich source of information for exploring adverse drug reactions (ADRs). ADRs are major public health problems that result in deaths and hospitalizations of millions of people. Unfortunately, not all ADRs are identified before a drug is made available in the market. In this study, an ADR event monitoring system is developed which can recognize ADR mentions from a tweet and classify its assertion. We explored several entity recognition features, feature conjunctions, and feature selection and analyzed their characteristics and impacts on the recognition of ADRs, which have never been studied previously. The results demonstrate that the entity recognition performance for ADR can achieve an F-score of 0.562 on the PSB Social Media Mining shared task dataset, which outperforms the partial-matching-based method by 0.122. After feature selection, the F-score can be further improved by 0.026. This novel technique of text mining utilizing shared online social media data will open an array of opportunities for researchers to explore various health related issues. Full article
(This article belongs to the Special Issue Recent Advances of Big Data Technology)
Figures

Figure 1

Open AccessArticle A Big Network Traffic Data Fusion Approach Based on Fisher and Deep Auto-Encoder
Information 2016, 7(2), 20; doi:10.3390/info7020020
Received: 27 January 2016 / Revised: 6 March 2016 / Accepted: 7 March 2016 / Published: 23 March 2016
Cited by 2 | PDF Full-text (809 KB) | HTML Full-text | XML Full-text
Abstract
Data fusion is usually performed prior to classification in order to reduce the input space. These dimensionality reduction techniques help to decline the complexity of the classification model and thus improve the classification performance. The traditional supervised methods demand labeled samples, and the
[...] Read more.
Data fusion is usually performed prior to classification in order to reduce the input space. These dimensionality reduction techniques help to decline the complexity of the classification model and thus improve the classification performance. The traditional supervised methods demand labeled samples, and the current network traffic data mostly is not labeled. Thereby, better learners will be built by using both labeled and unlabeled data, than using each one alone. In this paper, a novel network traffic data fusion approach based on Fisher and deep auto-encoder (DFA-F-DAE) is proposed to reduce the data dimensions and the complexity of computation. The experimental results show that the DFA-F-DAE improves the generalization ability of the three classification algorithms (J48, back propagation neural network (BPNN), and support vector machine (SVM)) by data dimensionality reduction. We found that the DFA-F-DAE remarkably improves the efficiency of big network traffic classification. Full article
(This article belongs to the Special Issue Recent Advances of Big Data Technology)
Figures

Figure 1

Open AccessArticle A Feature Selection Method for Large-Scale Network Traffic Classification Based on Spark
Information 2016, 7(1), 6; doi:10.3390/info7010006
Received: 20 December 2015 / Revised: 27 January 2016 / Accepted: 29 January 2016 / Published: 15 February 2016
Cited by 1 | PDF Full-text (1342 KB) | HTML Full-text | XML Full-text
Abstract
Currently, with the rapid increasing of data scales in network traffic classifications, how to select traffic features efficiently is becoming a big challenge. Although a number of traditional feature selection methods using the Hadoop-MapReduce framework have been proposed, the execution time was still
[...] Read more.
Currently, with the rapid increasing of data scales in network traffic classifications, how to select traffic features efficiently is becoming a big challenge. Although a number of traditional feature selection methods using the Hadoop-MapReduce framework have been proposed, the execution time was still unsatisfactory with numeral iterative computations during the processing. To address this issue, an efficient feature selection method for network traffic based on a new parallel computing framework called Spark is proposed in this paper. In our approach, the complete feature set is firstly preprocessed based on Fisher score, and a sequential forward search strategy is employed for subsets. The optimal feature subset is then selected using the continuous iterations of the Spark computing framework. The implementation demonstrates that, on the precondition of keeping the classification accuracy, our method reduces the time cost of modeling and classification, and improves the execution efficiency of feature selection significantly. Full article
(This article belongs to the Special Issue Recent Advances of Big Data Technology)
Figures

Figure 1

Open AccessArticle Effects of Semantic Features on Machine Learning-Based Drug Name Recognition Systems: Word Embeddings vs. Manually Constructed Dictionaries
Information 2015, 6(4), 848-865; doi:10.3390/info6040848
Received: 17 October 2015 / Revised: 4 December 2015 / Accepted: 4 December 2015 / Published: 11 December 2015
Cited by 5 | PDF Full-text (1029 KB) | HTML Full-text | XML Full-text
Abstract
Semantic features are very important for machine learning-based drug name recognition (DNR) systems. The semantic features used in most DNR systems are based on drug dictionaries manually constructed by experts. Building large-scale drug dictionaries is a time-consuming task and adding new drugs to
[...] Read more.
Semantic features are very important for machine learning-based drug name recognition (DNR) systems. The semantic features used in most DNR systems are based on drug dictionaries manually constructed by experts. Building large-scale drug dictionaries is a time-consuming task and adding new drugs to existing drug dictionaries immediately after they are developed is also a challenge. In recent years, word embeddings that contain rich latent semantic information of words have been widely used to improve the performance of various natural language processing tasks. However, they have not been used in DNR systems. Compared to the semantic features based on drug dictionaries, the advantage of word embeddings lies in that learning them is unsupervised. In this paper, we investigate the effect of semantic features based on word embeddings on DNR and compare them with semantic features based on three drug dictionaries. We propose a conditional random fields (CRF)-based system for DNR. The skip-gram model, an unsupervised algorithm, is used to induce word embeddings on about 17.3 GigaByte (GB) unlabeled biomedical texts collected from MEDLINE (National Library of Medicine, Bethesda, MD, USA). The system is evaluated on the drug-drug interaction extraction (DDIExtraction) 2013 corpus. Experimental results show that word embeddings significantly improve the performance of the DNR system and they are competitive with semantic features based on drug dictionaries. F-score is improved by 2.92 percentage points when word embeddings are added into the baseline system. It is comparative with the improvements from semantic features based on drug dictionaries. Furthermore, word embeddings are complementary to the semantic features based on drug dictionaries. When both word embeddings and semantic features based on drug dictionaries are added, the system achieves the best performance with an F-score of 78.37%, which outperforms the best system of the DDIExtraction 2013 challenge by 6.87 percentage points. Full article
(This article belongs to the Special Issue Recent Advances of Big Data Technology)

Planned Papers

The below list represents only planned manuscripts. Some of these manuscripts have not been received by the Editorial Office yet. Papers submitted to MDPI journals are subject to peer-review.

Title: Mining Adverse Drug Reactions from Twitter Posts
Author: Hong-Jie Dai
Affiliation: Graduate Institute of Biomedical Informatics, Taipei Medical University
Abstract: Nowadays, social media is often being used by users to create public messages or posts that are related to their health. With the increasing number of social media usage, a trend has been observed of users creating posts related to adverse drug reactions (ADRs). Mining social media data for these information can be used for pharmacological post-marketing surveillance and monitoring. In this study, we developed a binary classifier using linear support vector machines to automatically classify Twitter posts assertive of ADRs and a named entity recognition (NER) system based on conditional random fields to recognize ADRs-related information from related Twitter data. Our classifier and NER systems achieved an F-score of 0.33 and 0.576 on the test set of the Pacific Symposium on Biocomputing 2016 Social Media Mining shared task.

Journal Contact

MDPI AG
Information Editorial Office
St. Alban-Anlage 66, 4052 Basel, Switzerland
E-Mail: 
Tel. +41 61 683 77 34
Fax: +41 61 302 89 18
Editorial Board
Contact Details Submit to Information Edit a special issue Review for Information
logo
loading...
Back to Top