Next Article in Journal
Blockchain and Artificial Intelligence Technology for Novel Coronavirus Disease 2019 Self-Testing
Previous Article in Journal
Stakeholders’ Perspectives for the Development of a Point-of-Care Diagnostics Curriculum in Rural Primary Clinics in South Africa—Nominal Group Technique
Open AccessArticle

Automatic Annotation of Narrative Radiology Reports

1
Department of Computer Engineering, Faculty of Engineering, University of Rijeka, Vukovarska 58, 51000 Rijeka, Croatia
2
School of Business Informatics and Mathematics, University of Mannheim, 68159 Mannheim, Germany
3
Faculty of Veterinary Medicine, University of Zagreb, Heinzelova 55, 10000 Zagreb, Croatia
4
Clinical Hospital Centre Rijeka, University of Rijeka, Krešimirova 42, 51000 Rijeka, Croatia
5
Center for Artificial Intelligence and Cybersecurity, University of Rijeka, Radmile Matejčić 2, 51000 Rijeka, Croatia
*
Author to whom correspondence should be addressed.
Diagnostics 2020, 10(4), 196; https://doi.org/10.3390/diagnostics10040196
Received: 26 February 2020 / Revised: 27 March 2020 / Accepted: 27 March 2020 / Published: 1 April 2020
(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)
Narrative texts in electronic health records can be efficiently utilized for building decision support systems in the clinic, only if they are correctly interpreted automatically in accordance with a specified standard. This paper tackles the problem of developing an automated method of labeling free-form radiology reports, as a precursor for building query-capable report databases in hospitals. The analyzed dataset consists of 1295 radiology reports concerning the condition of a knee, retrospectively gathered at the Clinical Hospital Centre Rijeka, Croatia. Reports were manually labeled with one or more labels from a set of 10 most commonly occurring clinical conditions. After primary preprocessing of the texts, two sets of text classification methods were compared: (1) traditional classification models—Naive Bayes (NB), Logistic Regression (LR), Support Vector Machine (SVM), and Random Forests (RF)—coupled with Bag-of-Words (BoW) features (i.e., symbolic text representation) and (2) Convolutional Neural Network (CNN) coupled with dense word vectors (i.e., word embeddings as a semantic text representation) as input features. We resorted to nested 10-fold cross-validation to evaluate the performance of competing methods using accuracy, precision, recall, and F 1 score. The CNN with semantic word representations as input yielded the overall best performance, having a micro-averaged F 1 score of 86.7 % . The CNN classifier yielded particularly encouraging results for the most represented conditions: degenerative disease ( 95.9 % ), arthrosis ( 93.3 % ), and injury ( 89.2 % ). As a data-hungry deep learning model, the CNN, however, performed notably worse than the competing models on underrepresented classes with fewer training instances such as multicausal disease or metabolic disease. LR, RF, and SVM performed comparably well, with the obtained micro-averaged F 1 scores of 84.6 % , 82.2 % , and 82.1 % , respectively. View Full-Text
Keywords: free-form radiology report; automatic labeling; decision support system; natural language processing; machine learning; word embedding; knee free-form radiology report; automatic labeling; decision support system; natural language processing; machine learning; word embedding; knee
Show Figures

Figure 1

MDPI and ACS Style

Krsnik, I.; Glavaš, G.; Krsnik, M.; Miletić, D.; Štajduhar, I. Automatic Annotation of Narrative Radiology Reports. Diagnostics 2020, 10, 196.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Search more from Scilit
 
Search
Back to TopTop