Next Article in Journal
A Review of Facial Landmark Extraction in 2D Images and Videos Using Deep Learning
Previous Article in Journal
Big Data and Climate Change
Open AccessArticle

Usage of the Term Big Data in Biomedical Publications: A Text Mining Approach

Department of Clinical Epidemiology, Biostatistics, and Bioinformatics, Amsterdam UMC, University of Amsterdam, Meibergdreef 9, 1105AZ Amsterdam, The Netherlands
*
Author to whom correspondence should be addressed.
Big Data Cogn. Comput. 2019, 3(1), 13; https://doi.org/10.3390/bdcc3010013
Received: 16 January 2019 / Revised: 30 January 2019 / Accepted: 1 February 2019 / Published: 6 February 2019
In this study, we attempt to assess the value of the term Big Data when used by researchers in their publications. For this purpose, we systematically collected a corpus of biomedical publications that use and do not use the term Big Data. These documents were used as input to a machine learning classifier to determine how well they can be separated into two groups and to determine the most distinguishing classification features. We generated 100 classifiers that could correctly distinguish between Big Data and non-Big Data documents with an area under the Receiver Operating Characteristic (ROC) curve of 0.96. The differences between the two groups were characterized by terms specific to Big Data themes—such as ‘computational’, ‘mining’, and ‘challenges’—and also by terms that indicate the research field, such as ‘genomics’. The ROC curves when plotted for various time intervals showed no difference over time. We conclude that there is a detectable and stable difference between publications that use the term Big Data and those that do not. Furthermore, the use of the term Big Data within a publication seems to indicate a distinct type of research in the biomedical field. Therefore, we conclude that value can be attributed to the term Big Data when used in a publication and this value has not changed over time. View Full-Text
Keywords: Big Data; Big Data Aspects; hype; biomedical literature; text mining; Lasso Regression Big Data; Big Data Aspects; hype; biomedical literature; text mining; Lasso Regression
Show Figures

Figure 1

MDPI and ACS Style

van Altena, A.J.; Moerland, P.D.; Zwinderman, A.H.; Delgado Olabarriaga, S. Usage of the Term Big Data in Biomedical Publications: A Text Mining Approach. Big Data Cogn. Comput. 2019, 3, 13.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
  • Supplementary File 1:

    ZIP-Document (ZIP, 101 KB)

  • Externally hosted supplementary file 1
    Doi: 10.6084/m9.figshare.6989645
    Link: https://figshare.com/s/762bede6954172d4d173
    Description: Journal exclusion verdicts
  • Externally hosted supplementary file 2
    Doi: 10.6084/m9.figshare.7593953
    Link: https://figshare.com/s/f93da687d747ac73394b
    Description: Normalised documents per year. Shown for both the Big Data and non-Big Data corpus. Data is normalised to the total number of documents in each of the respective corpora.
  • Externally hosted supplementary file 3
    Doi: 10.6084/m9.figshare.7593956
    Link: https://figshare.com/s/d7af33b776104f341ece
    Description: Number of tokens distribution over the documents. Plotted for both the Big Data and non-Big Data corpus. Data is normalised to the total number of documents in each of the respective corpora.
Back to TopTop