In the health industry, the use of data (including Big Data) is of growing importance. The term ‘Big Data’ characterizes data by its volume
, and also by its velocity
, and veracity
. Big Data needs to have effective data governance, which includes measures to manage and control the use of data and to enhance data quality, availability, and integrity. The type and description of data quality can be expressed in terms of the dimensions of data quality. Well-known dimensions are accuracy
, and consistency
, amongst others. Since data quality depends on how the data is expected to be used, the most important data quality dimensions depend on the context of use and industry needs. There is a lack of current research focusing on data quality dimensions for Big Data within the health industry; this paper, therefore, investigates the most important data quality dimensions for Big Data within this context. An inner hermeneutic cycle research approach was used to review relevant literature related to data quality for big health datasets in a systematic way and to produce a list of the most important data quality dimensions. Based on a hierarchical framework for organizing data quality dimensions, the highest ranked category of dimensions was determined.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited