Usage of the Term Big Data in Biomedical Publications: A Text Mining Approach
AbstractIn this study, we attempt to assess the value of the term Big Data when used by researchers in their publications. For this purpose, we systematically collected a corpus of biomedical publications that use and do not use the term Big Data. These documents were used as input to a machine learning classifier to determine how well they can be separated into two groups and to determine the most distinguishing classification features. We generated 100 classifiers that could correctly distinguish between Big Data and non-Big Data documents with an area under the Receiver Operating Characteristic (ROC) curve of 0.96. The differences between the two groups were characterized by terms specific to Big Data themes—such as ‘computational’, ‘mining’, and ‘challenges’—and also by terms that indicate the research field, such as ‘genomics’. The ROC curves when plotted for various time intervals showed no difference over time. We conclude that there is a detectable and stable difference between publications that use the term Big Data and those that do not. Furthermore, the use of the term Big Data within a publication seems to indicate a distinct type of research in the biomedical field. Therefore, we conclude that value can be attributed to the term Big Data when used in a publication and this value has not changed over time. View Full-Text
- Supplementary File 1:
ZIP-Document (ZIP, 101 KB)
Externally hosted supplementary file 1
Description: Journal exclusion verdicts
Externally hosted supplementary file 2
Description: Normalised documents per year. Shown for both the Big Data and non-Big Data corpus. Data is normalised to the total number of documents in each of the respective corpora.
Externally hosted supplementary file 3
Description: Number of tokens distribution over the documents. Plotted for both the Big Data and non-Big Data corpus. Data is normalised to the total number of documents in each of the respective corpora.
Share & Cite This Article
van Altena, A.J.; Moerland, P.D.; Zwinderman, A.H.; Delgado Olabarriaga, S. Usage of the Term Big Data in Biomedical Publications: A Text Mining Approach. Big Data Cogn. Comput. 2019, 3, 13.
van Altena AJ, Moerland PD, Zwinderman AH, Delgado Olabarriaga S. Usage of the Term Big Data in Biomedical Publications: A Text Mining Approach. Big Data and Cognitive Computing. 2019; 3(1):13.Chicago/Turabian Style
van Altena, Allard J.; Moerland, Perry D.; Zwinderman, Aeilko H.; Delgado Olabarriaga, Sílvia. 2019. "Usage of the Term Big Data in Biomedical Publications: A Text Mining Approach." Big Data Cogn. Comput. 3, no. 1: 13.
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.