Next Article in Journal
Designing towards the Unknown: Engaging with Material and Aesthetic Uncertainty
Next Article in Special Issue
Quality Management in Big Data
Previous Article in Journal
Web-Based Scientific Exploration and Analysis of 3D Scanned Cuneiform Datasets for Collaborative Research
Previous Article in Special Issue
Relative Quality and Popularity Evaluation of Multilingual Wikipedia Articles
Article Menu

Export Article

Open AccessArticle
Informatics 2017, 4(4), 45; https://doi.org/10.3390/informatics4040045

A Data Quality Strategy to Enable FAIR, Programmatic Access across Large, Diverse Data Collections for High Performance Data Analysis

National Computational Infrastructure, the Australian National University, Acton 2601, Australia
*
Author to whom correspondence should be addressed.
Received: 31 August 2017 / Revised: 1 December 2017 / Accepted: 8 December 2017 / Published: 13 December 2017
(This article belongs to the Special Issue Quality Management in Big Data)
Full-Text   |   PDF [4465 KB, uploaded 13 December 2017]   |  

Abstract

To ensure seamless, programmatic access to data for High Performance Computing (HPC) and analysis across multiple research domains, it is vital to have a methodology for standardization of both data and services. At the Australian National Computational Infrastructure (NCI) we have developed a Data Quality Strategy (DQS) that currently provides processes for: (1) Consistency of data structures needed for a High Performance Data (HPD) platform; (2) Quality Control (QC) through compliance with recognized community standards; (3) Benchmarking cases of operational performance tests; and (4) Quality Assurance (QA) of data through demonstrated functionality and performance across common platforms, tools and services. By implementing the NCI DQS, we have seen progressive improvement in the quality and usefulness of the datasets across the different subject domains, and demonstrated the ease by which modern programmatic methods can be used to access the data, either in situ or via web services, and for uses ranging from traditional analysis methods through to emerging machine learning techniques. To help increase data re-usability by broader communities, particularly in high performance environments, the DQS is also used to identify the need for any extensions to the relevant international standards for interoperability and/or programmatic access. View Full-Text
Keywords: data quality; quality control; quality assurance; benchmarks; performance; data management policy; netCDF; high performance computing; HPC; fair data data quality; quality control; quality assurance; benchmarks; performance; data management policy; netCDF; high performance computing; HPC; fair data
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).
SciFeed

Share & Cite This Article

MDPI and ACS Style

Evans, B.; Druken, K.; Wang, J.; Yang, R.; Richards, C.; Wyborn, L. A Data Quality Strategy to Enable FAIR, Programmatic Access across Large, Diverse Data Collections for High Performance Data Analysis. Informatics 2017, 4, 45.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Informatics EISSN 2227-9709 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top