Next Article in Journal
Spencer-Brown vs. Probability and Statistics: Entropy’s Testimony on Subjective and Objective Randomness
Previous Article in Journal
Designing Data Protection Safeguards Ethically
Article Menu

Export Article

Open AccessArticle
Information 2011, 2(2), 266-276;

Distribution of “Characteristic” Terms in MEDLINE Literatures

Department of Psychiatry, MC912, University of Illinois at Chicago, 1601 W. Taylor Street, Chicago, IL 60612, USA
Ingenuity Systems, Inc., 1700 Seaport Blvd. Third Floor, Redwood City, CA 94063, USA
Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign, 501 E. Daniel St., Champaign, IL 61820, USA
Author to whom correspondence should be addressed.
Received: 3 March 2011 / Accepted: 28 March 2011 / Published: 30 March 2011
(This article belongs to the Section Information Applications)
Full-Text   |   PDF [539 KB, uploaded 30 March 2011]   |  


Given the occurrence frequency of any term within any set of articles within MEDLINE, we define “characteristic” terms as words and phrases that occur in that literature more frequently than expected by chance (at p < 0.001 or better). In this report, we studied how the cut-off criterion varied as a function of literature size and term frequency in MEDLINE as a whole, and have compared the distribution of characteristic terms within a number of journal-defined, affiliation-defined and random literatures. We also investigated how the characteristic terms were distributed among MEDLINE titles, abstracts, and last sentence of abstracts, including “regularized” terms that appear both in the title and abstract of the same paper for at least one paper in the literature. For a set of 10 disciplinary journals, the characteristic terms comprised 18% of the total terms on average. Characteristic terms are utilized in several of our web-based services (Anne O’Tate and Arrowsmith), and should be useful for a variety of other information-processing tasks designed to improve text mining in MEDLINE. View Full-Text
Keywords: information retrieval; term occurrence; text mining; annotation; literature based discovery information retrieval; term occurrence; text mining; annotation; literature based discovery

Graphical abstract

This is an open access article distributed under the Creative Commons Attribution License (CC BY 3.0).

Share & Cite This Article

MDPI and ACS Style

Smalheiser, N.R.; Zhou, W.; Torvik, V.I. Distribution of “Characteristic” Terms in MEDLINE Literatures. Information 2011, 2, 266-276.

Show more citation formats Show less citations formats

Related Articles

Article Metrics

Article Access Statistics



[Return to top]
Information EISSN 2078-2489 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top