Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (12)

Search Parameters:
Keywords = biomedical literature annotation

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 6329 KiB  
Article
TrialSieve: A Comprehensive Biomedical Information Extraction Framework for PICO, Meta-Analysis, and Drug Repurposing
by David Kartchner, Haydn Turner, Christophe Ye, Irfan Al-Hussaini, Batuhan Nursal, Albert J. B. Lee, Jennifer Deng, Courtney Curtis, Hannah Cho, Eva L. Duvaris, Coral Jackson, Catherine E. Shanks, Sarah Y. Tan, Selvi Ramalingam and Cassie S. Mitchell
Bioengineering 2025, 12(5), 486; https://doi.org/10.3390/bioengineering12050486 - 2 May 2025
Viewed by 1248
Abstract
This work introduces TrialSieve, a novel framework for biomedical information extraction that enhances clinical meta-analysis and drug repurposing. By extending traditional PICO (Patient, Intervention, Comparison, Outcome) methodologies, TrialSieve incorporates hierarchical, treatment group-based graphs, enabling more comprehensive and quantitative comparisons of clinical outcomes. TrialSieve [...] Read more.
This work introduces TrialSieve, a novel framework for biomedical information extraction that enhances clinical meta-analysis and drug repurposing. By extending traditional PICO (Patient, Intervention, Comparison, Outcome) methodologies, TrialSieve incorporates hierarchical, treatment group-based graphs, enabling more comprehensive and quantitative comparisons of clinical outcomes. TrialSieve was used to annotate 1609 PubMed abstracts, 170,557 annotations, and 52,638 final spans, incorporating 20 unique annotation categories that capture a diverse range of biomedical entities relevant to systematic reviews and meta-analyses. The performance (accuracy, precision, recall, F1-score) of four natural-language processing (NLP) models (BioLinkBERT, BioBERT, KRISSBERT, PubMedBERT) and the large language model (LLM), GPT-4o, was evaluated using the human-annotated TrialSieve dataset. BioLinkBERT had the best accuracy (0.875) and recall (0.679) for biomedical entity labeling, whereas PubMedBERT had the best precision (0.614) and F1-score (0.639). Error analysis showed that NLP models trained on noisy, human-annotated data can match or, in most cases, surpass human performance. This finding highlights the feasibility of fully automating biomedical information extraction, even when relying on imperfectly annotated datasets. An annotator user study (n = 39) revealed significant (p < 0.05) gains in efficiency and human annotation accuracy with the unique TrialSieve tree-based annotation approach. In summary, TrialSieve provides a foundation to improve automated biomedical information extraction for frontend clinical research. Full article
(This article belongs to the Special Issue Artificial Intelligence for Better Healthcare and Precision Medicine)
Show Figures

Figure 1

13 pages, 2102 KiB  
Review
A Systematic Review of Lipid-Focused Cardiovascular Disease Research: Trends and Opportunities
by Uchenna Alex Anyaegbunam, Piyush More, Jean-Fred Fontaine, Vincent ten Cate, Katrin Bauer, Ute Distler, Elisa Araldi, Laura Bindila, Philipp Wild and Miguel A. Andrade-Navarro
Curr. Issues Mol. Biol. 2023, 45(12), 9904-9916; https://doi.org/10.3390/cimb45120618 - 9 Dec 2023
Cited by 4 | Viewed by 3712
Abstract
Lipids are important modifiers of protein function, particularly as parts of lipoproteins, which transport lipophilic substances and mediate cellular uptake of circulating lipids. As such, lipids are of particular interest as blood biological markers for cardiovascular disease (CVD) as well as for conditions [...] Read more.
Lipids are important modifiers of protein function, particularly as parts of lipoproteins, which transport lipophilic substances and mediate cellular uptake of circulating lipids. As such, lipids are of particular interest as blood biological markers for cardiovascular disease (CVD) as well as for conditions linked to CVD such as atherosclerosis, diabetes mellitus, obesity and dietary states. Notably, lipid research is particularly well developed in the context of CVD because of the relevance and multiple causes and risk factors of CVD. The advent of methods for high-throughput screening of biological molecules has recently resulted in the generation of lipidomic profiles that allow monitoring of lipid compositions in biological samples in an untargeted manner. These and other earlier advances in biomedical research have shaped the knowledge we have about lipids in CVD. To evaluate the knowledge acquired on the multiple biological functions of lipids in CVD and the trends in their research, we collected a dataset of references from the PubMed database of biomedical literature focused on plasma lipids and CVD in human and mouse. Using annotations from these records, we were able to categorize significant associations between lipids and particular types of research approaches, distinguish non-biological lipids used as markers, identify differential research between human and mouse models, and detect the increasingly mechanistic nature of the results in this field. Using known associations between lipids and proteins that metabolize or transport them, we constructed a comprehensive lipid–protein network, which we used to highlight proteins strongly connected to lipids found in the CVD-lipid literature. Our approach points to a series of proteins for which lipid-focused research would bring insights into CVD, including Prostaglandin G/H synthase 2 (PTGS2, a.k.a. COX2) and Acylglycerol kinase (AGK). In this review, we summarize our findings, putting them in a historical perspective of the evolution of lipid research in CVD. Full article
(This article belongs to the Special Issue A Focus on Molecular Basis in Cardiac Diseases)
Show Figures

Figure 1

52 pages, 10104 KiB  
Article
Osteology of the Hamadryas Baboon (Papio hamadryas)
by Christophe Casteleyn, Estée Wydooghe and Jaco Bakker
Animals 2023, 13(19), 3124; https://doi.org/10.3390/ani13193124 - 6 Oct 2023
Cited by 2 | Viewed by 6263
Abstract
Besides living as a free-ranging primate in the horn of Africa and the Arabian Peninsula, the hamadryas baboon has an important place in zoos and can be found in biomedical research centers worldwide. To be valuable as a non-human primate laboratory model for [...] Read more.
Besides living as a free-ranging primate in the horn of Africa and the Arabian Peninsula, the hamadryas baboon has an important place in zoos and can be found in biomedical research centers worldwide. To be valuable as a non-human primate laboratory model for man, its anatomy should be portrayed in detail, allowing for the correct interpretation and translation of obtained research results. Reviewing the literature on the use of the baboon in biomedical research revealed that very limited anatomical works on this species are available. Anatomical atlases are incomplete, use archaic nomenclature and fail to provide high-definition color photographs. Therefore, the skeletons of two male hamadryas baboons were prepared by manually removing as much soft tissues as possible followed by maceration in warm water to which enzyme-containing washing powder was added. The bones were bleached with hydrogen peroxide and degreased by means of methylene chloride. Photographs of the various bones were taken, and the anatomical structures were identified using the latest version of the Nomina Anatomica Veterinaria. As such, the present article shows 31 annotated multipanel figures. The skeleton of the hamadryas baboon generally parallels the human skeleton, but some remarkable differences have been noticed. If these are taken into consideration when evaluating the results of experiments using the hamadryas baboon, justified conclusions can be drawn. Full article
(This article belongs to the Section Wildlife)
Show Figures

Figure 1

17 pages, 563 KiB  
Article
Proteomics Studies in Gestational Diabetes Mellitus: A Systematic Review and Meta-Analysis
by Natthida Sriboonvorakul, Jiamiao Hu, Dittakarn Boriboonhirunsarn, Leong Loke Ng and Bee Kang Tan
J. Clin. Med. 2022, 11(10), 2737; https://doi.org/10.3390/jcm11102737 - 12 May 2022
Cited by 13 | Viewed by 4323
Abstract
Gestational Diabetes Mellitus (GDM) is the most common metabolic complication during pregnancy and is associated with serious maternal and fetal complications such as pre-eclampsia and stillbirth. Further, women with GDM have approximately 10 times higher risk of diabetes later in life. Children born [...] Read more.
Gestational Diabetes Mellitus (GDM) is the most common metabolic complication during pregnancy and is associated with serious maternal and fetal complications such as pre-eclampsia and stillbirth. Further, women with GDM have approximately 10 times higher risk of diabetes later in life. Children born to mothers with GDM also face a higher risk of childhood obesity and diabetes later in life. Early prediction/diagnosis of GDM leads to early interventions such as diet and lifestyle, which could mitigate the maternal and fetal complications associated with GDM. However, no biomarkers identified to date have been proven to be effective in the prediction/diagnosis of GDM. Proteomic approaches based on mass spectrometry have been applied in various fields of biomedical research to identify novel biomarkers. Although a number of proteomic studies in GDM now exist, a lack of a comprehensive and up-to-date meta-analysis makes it difficult for researchers to interpret the data in the existing literature. Thus, we undertook a systematic review and meta-analysis on proteomic studies and GDM. We searched MEDLINE, EMBASE, Web of Science and Scopus from inception to January 2022. We searched Medline, Embase, CINHAL and the Cochrane Library, which were searched from inception to February 2021. We included cohort, case-control and observational studies reporting original data investigating the development of GDM compared to a control group. Two independent reviewers selected eligible studies for meta-analysis. Data collection and analyses were performed by two independent reviewers. The PROSPERO registration number is CRD42020185951. Of 120 articles retrieved, 24 studies met the eligibility criteria, comparing a total of 1779 pregnant women (904 GDM and 875 controls). A total of 262 GDM candidate biomarkers (CBs) were identified, with 49 CBs reported in at least two studies. We found 22 highly replicable CBs that were significantly different (nine CBs were upregulated and 12 CBs downregulated) between women with GDM and controls across various proteomic platforms, sample types, blood fractions and time of blood collection and continents. We performed further analyses on blood (plasma/serum) CBs in early pregnancy (first and/or early second trimester) and included studies with more than nine samples (nine studies in total). We found that 11 CBs were significantly upregulated, and 13 CBs significantly downregulated in women with GDM compared to controls. Subsequent pathway analysis using Database for Annotation, Visualization and Integrated Discovery (DAVID) bioinformatics resources found that these CBs were most strongly linked to pathways related to complement and coagulation cascades. Our findings provide important insights and form a strong foundation for future validation studies to establish reliable biomarkers for GDM. Full article
(This article belongs to the Section Obstetrics & Gynecology)
Show Figures

Figure 1

13 pages, 6750 KiB  
Article
Darling: A Web Application for Detecting Disease-Related Biomedical Entity Associations with Literature Mining
by Evangelos Karatzas, Fotis A. Baltoumas, Ioannis Kasionis, Despina Sanoudou, Aristides G. Eliopoulos, Theodosios Theodosiou, Ioannis Iliopoulos and Georgios A. Pavlopoulos
Biomolecules 2022, 12(4), 520; https://doi.org/10.3390/biom12040520 - 30 Mar 2022
Cited by 16 | Viewed by 4719
Abstract
Finding, exploring and filtering frequent sentence-based associations between a disease and a biomedical entity, co-mentioned in disease-related PubMed literature, is a challenge, as the volume of publications increases. Darling is a web application, which utilizes Name Entity Recognition to identify human-related biomedical terms [...] Read more.
Finding, exploring and filtering frequent sentence-based associations between a disease and a biomedical entity, co-mentioned in disease-related PubMed literature, is a challenge, as the volume of publications increases. Darling is a web application, which utilizes Name Entity Recognition to identify human-related biomedical terms in PubMed articles, mentioned in OMIM, DisGeNET and Human Phenotype Ontology (HPO) disease records, and generates an interactive biomedical entity association network. Nodes in this network represent genes, proteins, chemicals, functions, tissues, diseases, environments and phenotypes. Users can search by identifiers, terms/entities or free text and explore the relevant abstracts in an annotated format. Full article
(This article belongs to the Collection Feature Papers in Bioinformatics and Systems Biology Section)
Show Figures

Figure 1

14 pages, 347 KiB  
Review
Semantic Metadata Annotation Services in the Biomedical Domain—A Literature Review
by Julia Sasse, Johannes Darms and Juliane Fluck
Appl. Sci. 2022, 12(2), 796; https://doi.org/10.3390/app12020796 - 13 Jan 2022
Cited by 9 | Viewed by 4244
Abstract
For all research data collected, data descriptions and information about the corresponding variables are essential for data analysis and reuse. To enable cross-study comparisons and analyses, semantic interoperability of metadata is one of the most important requirements. In the area of clinical and [...] Read more.
For all research data collected, data descriptions and information about the corresponding variables are essential for data analysis and reuse. To enable cross-study comparisons and analyses, semantic interoperability of metadata is one of the most important requirements. In the area of clinical and epidemiological studies, data collection instruments such as case report forms (CRFs), data dictionaries and questionnaires are critical for metadata collection. Even though data collection instruments are often created in a digital form, they are mostly not machine readable; i.e., they are not semantically coded. As a result, the comparison between data collection instruments is complex. The German project NFDI4Health is dedicated to the development of national research data infrastructure for personal health data, and as such searches for ways to enhance semantic interoperability. Retrospective integration of semantic codes into study metadata is important, as ongoing or completed studies contain valuable information. However, this is labor intensive and should be eased by software. To understand the market and find out what techniques and technologies support retrospective semantic annotation/enrichment of metadata, we conducted a literature review. In NFDI4Health, we identified basic requirements for semantic metadata annotation software in the biomedical field and in the context of the FAIR principles. Ten relevant software systems were summarized and aligned with those requirements. We concluded that despite active research on semantic annotation systems, no system meets all requirements. Consequently, further research and software development in this area is needed, as interoperability of data dictionaries, questionnaires and data collection tools is key to reusing and combining results from independent research studies. Full article
(This article belongs to the Special Issue Semantic Interoperability and Applications in Healthcare)
Show Figures

Figure 1

13 pages, 1144 KiB  
Article
PlasmiR: A Manual Collection of Circulating microRNAs of Prognostic and Diagnostic Value
by Spyros Tastsoglou, Marios Miliotis, Ioannis Kavakiotis, Athanasios Alexiou, Eleni C. Gkotsi, Anastasia Lambropoulou, Vasileios Lygnos, Vasiliki Kotsira, Vasileios Maroulis, Dimitrios Zisis, Giorgos Skoufos and Artemis G. Hatzigeorgiou
Cancers 2021, 13(15), 3680; https://doi.org/10.3390/cancers13153680 - 22 Jul 2021
Cited by 10 | Viewed by 3957
Abstract
Only recently, microRNAs (miRNAs) were found to exist in traceable and distinctive amounts in the human circulatory system, bringing forth the intriguing possibility of using them as minimally invasive biomarkers. miRNAs are short non-coding RNAs that act as potent post-transcriptional regulators of gene [...] Read more.
Only recently, microRNAs (miRNAs) were found to exist in traceable and distinctive amounts in the human circulatory system, bringing forth the intriguing possibility of using them as minimally invasive biomarkers. miRNAs are short non-coding RNAs that act as potent post-transcriptional regulators of gene expression. Extensive studies in cancer and other disease landscapes investigate the protective/pathogenic functions of dysregulated miRNAs, as well as their biomarker potential. A specialized resource amassing experimentally verified, circulating miRNA biomarkers does not exist. We queried the existing literature to identify articles assessing diagnostic/prognostic roles of miRNAs in blood, serum, or plasma samples. Articles were scrutinized in order to exclude instances lacking sufficient experimental documentation or employing no biomarker assessment methods. We incorporated information from more than 200 biomedical articles, annotating crucial meta-information including cohort sizes, inclusion-exclusion criteria, disease/healthy confirmation methods and quantification details. miRNAs and diseases were systematically characterized using reference resources. Our circulating miRNA biomarker collection is provided as an online database, plasmiR. It consists of 1021 entries regarding 251 miRNAs and 112 diseases. More than half of plasmiR’s entries refer to cancerous and neoplastic conditions, 183 of them (32%) describing prognostic associations. plasmiR facilitates smart queries, emphasizing visualization and exploratory modes for all researchers. Full article
(This article belongs to the Special Issue Circulating miRNAs as Tumor Biomarkers)
Show Figures

Graphical abstract

24 pages, 4909 KiB  
Article
A Literature-Derived Knowledge Graph Augments the Interpretation of Single Cell RNA-seq Datasets
by Deeksha Doddahonnaiah, Patrick J. Lenehan, Travis K. Hughes, David Zemmour, Enrique Garcia-Rivera, A. J. Venkatakrishnan, Ramakrishna Chilaka, Apoorv Khare, Akhil Kasaraneni, Abhinav Garg, Akash Anand, Rakesh Barve, Viswanathan Thiagarajan and Venky Soundararajan
Genes 2021, 12(6), 898; https://doi.org/10.3390/genes12060898 - 10 Jun 2021
Cited by 6 | Viewed by 5981
Abstract
Technology to generate single cell RNA-sequencing (scRNA-seq) datasets and tools to annotate them have advanced rapidly in the past several years. Such tools generally rely on existing transcriptomic datasets or curated databases of cell type defining genes, while the application of scalable natural [...] Read more.
Technology to generate single cell RNA-sequencing (scRNA-seq) datasets and tools to annotate them have advanced rapidly in the past several years. Such tools generally rely on existing transcriptomic datasets or curated databases of cell type defining genes, while the application of scalable natural language processing (NLP) methods to enhance analysis workflows has not been adequately explored. Here we deployed an NLP framework to objectively quantify associations between a comprehensive set of over 20,000 human protein-coding genes and over 500 cell type terms across over 26 million biomedical documents. The resultant gene-cell type associations (GCAs) are significantly stronger between a curated set of matched cell type-marker pairs than the complementary set of mismatched pairs (Mann Whitney p = 6.15 × 10−76, r = 0.24; cohen’s D = 2.6). Building on this, we developed an augmented annotation algorithm (single cell Annotation via Literature Encoding, or scALE) that leverages GCAs to categorize cell clusters identified in scRNA-seq datasets, and we tested its ability to predict the cellular identity of 133 clusters from nine datasets of human breast, colon, heart, joint, ovary, prostate, skin, and small intestine tissues. With the optimized settings, the true cellular identity matched the top prediction in 59% of tested clusters and was present among the top five predictions for 91% of clusters. scALE slightly outperformed an existing method for reference data driven automated cluster annotation, and we demonstrate that integration of scALE can meaningfully improve the annotations derived from such methods. Further, contextualization of differential expression analyses with these GCAs highlights poorly characterized markers of well-studied cell types, such as CLIC6 and DNASE1L3 in retinal pigment epithelial cells and endothelial cells, respectively. Taken together, this study illustrates for the first time how the systematic application of a literature-derived knowledge graph can expedite and enhance the annotation and interpretation of scRNA-seq data. Full article
(This article belongs to the Section Technologies and Resources for Genetics)
Show Figures

Figure 1

19 pages, 2303 KiB  
Article
In Silico Exploration of the Potential Role of Acetaminophen and Pesticides in the Etiology of Autism Spectrum Disorder
by Tristan Furnary, Rolando Garcia-Milian, Zeyan Liew, Shannon Whirledge and Vasilis Vasiliou
Toxics 2021, 9(5), 97; https://doi.org/10.3390/toxics9050097 - 27 Apr 2021
Viewed by 4705
Abstract
Recent epidemiological studies suggest that prenatal exposure to acetaminophen (APAP) is associated with increased risk of Autism Spectrum Disorder (ASD), a neurodevelopmental disorder affecting 1 in 59 children in the US. Maternal and prenatal exposure to pesticides from food and environmental sources have [...] Read more.
Recent epidemiological studies suggest that prenatal exposure to acetaminophen (APAP) is associated with increased risk of Autism Spectrum Disorder (ASD), a neurodevelopmental disorder affecting 1 in 59 children in the US. Maternal and prenatal exposure to pesticides from food and environmental sources have also been implicated to affect fetal neurodevelopment. However, the underlying mechanisms for ASD are so far unknown, likely with complex and multifactorial etiology. The aim of this study was to explore the potential effects of APAP and pesticide exposure on development with regards to the etiology of ASD by highlighting common genes and biological pathways. Genes associated with APAP, pesticides, and ASD through human research were retrieved from molecular and biomedical literature databases. The interaction network of overlapping genetic associations was subjected to network topology analysis and functional annotation of the resulting clusters. These genes were over-represented in pathways and biological processes (FDR p < 0.05) related to apoptosis, metabolism of reactive oxygen species (ROS), and carbohydrate metabolism. Since these three biological processes are frequently implicated in ASD, our findings support the hypothesis that cell death processes and specific metabolic pathways, both of which appear to be targeted by APAP and pesticide exposure, may be involved in the etiology of ASD. This novel exposures-gene-disease database mining might inspire future work on understanding the biological underpinnings of various ASD risk factors. Full article
(This article belongs to the Special Issue Prenatal Environmental Exposure and Autism Risk)
Show Figures

Figure 1

20 pages, 1521 KiB  
Article
A Comparative Analysis of Active Learning for Biomedical Text Mining
by Usman Naseem, Matloob Khushi, Shah Khalid Khan, Kamran Shaukat and Mohammad Ali Moni
Appl. Syst. Innov. 2021, 4(1), 23; https://doi.org/10.3390/asi4010023 - 15 Mar 2021
Cited by 51 | Viewed by 6939
Abstract
An enormous amount of clinical free-text information, such as pathology reports, progress reports, clinical notes and discharge summaries have been collected at hospitals and medical care clinics. These data provide an opportunity of developing many useful machine learning applications if the data could [...] Read more.
An enormous amount of clinical free-text information, such as pathology reports, progress reports, clinical notes and discharge summaries have been collected at hospitals and medical care clinics. These data provide an opportunity of developing many useful machine learning applications if the data could be transferred into a learn-able structure with appropriate labels for supervised learning. The annotation of this data has to be performed by qualified clinical experts, hence, limiting the use of this data due to the high cost of annotation. An underutilised technique of machine learning that can label new data called active learning (AL) is a promising candidate to address the high cost of the label the data. AL has been successfully applied to labelling speech recognition and text classification, however, there is a lack of literature investigating its use for clinical purposes. We performed a comparative investigation of various AL techniques using ML and deep learning (DL)-based strategies on three unique biomedical datasets. We investigated random sampling (RS), least confidence (LC), informative diversity and density (IDD), margin and maximum representativeness-diversity (MRD) AL query strategies. Our experiments show that AL has the potential to significantly reducing the cost of manual labelling. Furthermore, pre-labelling performed using AL expediates the labelling process by reducing the time required for labelling. Full article
(This article belongs to the Special Issue Advanced Machine Learning Techniques, Applications and Developments)
Show Figures

Figure 1

12 pages, 868 KiB  
Article
Deep Learning Based Biomedical Literature Classification Using Criteria of Scientific Rigor
by Muhammad Afzal, Beom Joo Park, Maqbool Hussain and Sungyoung Lee
Electronics 2020, 9(8), 1253; https://doi.org/10.3390/electronics9081253 - 5 Aug 2020
Cited by 8 | Viewed by 4127
Abstract
A major blockade to support the evidence-based clinical decision-making is accurately and efficiently recognizing appropriate and scientifically rigorous studies in the biomedical literature. We trained a multi-layer perceptron (MLP) model on a dataset with two textual features, title and abstract. The dataset consisting [...] Read more.
A major blockade to support the evidence-based clinical decision-making is accurately and efficiently recognizing appropriate and scientifically rigorous studies in the biomedical literature. We trained a multi-layer perceptron (MLP) model on a dataset with two textual features, title and abstract. The dataset consisting of 7958 PubMed citations classified in two classes: scientific rigor and non-rigor, is used to train the proposed model. We compare our model with other promising machine learning models such as Support Vector Machine (SVM), Decision Tree, Random Forest, and Gradient Boosted Tree (GBT) approaches. Based on the higher cumulative score, deep learning was chosen and was tested on test datasets obtained by running a set of domain-specific queries. On the training dataset, the proposed deep learning model obtained significantly higher accuracy and AUC of 97.3% and 0.993, respectively, than the competitors, but was slightly lower in the recall of 95.1% as compared to GBT. The trained model sustained the performance of testing datasets. Unlike previous approaches, the proposed model does not require a human expert to create fresh annotated data; instead, we used studies cited in Cochrane reviews as a surrogate for quality studies in a clinical topic. We learn that deep learning methods are beneficial to use for biomedical literature classification. Not only do such methods minimize the workload in feature engineering, but they also show better performance on large and noisy data. Full article
Show Figures

Figure 1

14 pages, 2526 KiB  
Article
Unsupervised Learning for Concept Detection in Medical Images: A Comparative Analysis
by Eduardo Pinho and Carlos Costa
Appl. Sci. 2018, 8(8), 1213; https://doi.org/10.3390/app8081213 - 24 Jul 2018
Cited by 9 | Viewed by 3952
Abstract
As digital medical imaging becomes more prevalent and archives increase in size, representation learning exposes an interesting opportunity for enhanced medical decision support systems. On the other hand, medical imaging data is often scarce and short on annotations. In this paper, we present [...] Read more.
As digital medical imaging becomes more prevalent and archives increase in size, representation learning exposes an interesting opportunity for enhanced medical decision support systems. On the other hand, medical imaging data is often scarce and short on annotations. In this paper, we present an assessment of unsupervised feature learning approaches for images in biomedical literature which can be applied to automatic biomedical concept detection. Six unsupervised representation learning methods were built, including traditional bags of visual words, autoencoders, and generative adversarial networks. Each model was trained, and their respective feature spaces evaluated using images from the ImageCLEF 2017 concept detection task. The highest mean F1 score of 0.108 was obtained using representations from an adversarial autoencoder, which increased to 0.111 when combined with the representations from the sparse denoising autoencoder. We conclude that it is possible to obtain more powerful representations with modern deep learning approaches than with previously popular computer vision methods. The possibility of semi-supervised learning as well as its use in medical information retrieval problems are the next steps to be strongly considered. Full article
(This article belongs to the Special Issue Deep Learning and Big Data in Healthcare)
Show Figures

Graphical abstract

Back to TopTop