Next Article in Journal
Continuous-Time Fast Motion of Explosion Fragments Estimated by Bundle Adjustment and Spline Representation Using HFR Cameras
Previous Article in Journal
Investigation of Deformation Inhomogeneity and Low-Cycle Fatigue of a Polycrystalline Material
Article

Classification of Full Text Biomedical Documents: Sections Importance Assessment

1
Computer Science Department, University of Vigo, Escuela Superior de Ingeniería Informática, 32004 Ourense, Spain
2
CINBIO—Biomedical Research Centre, University of Vigo, 36310 Vigo, Spain
3
SING Research Group, Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, 36310 Vigo, Spain
4
Faculdade de Engenharia da Universidade do Porto, LIAAD-INESC TEC, 4200-465 Porto, Portugal
5
ISCAP—P.PORTO, CEOS.PP, LIACC, Campus da FEUP, 4369-00 Porto, Portugal
*
Author to whom correspondence should be addressed.
Current address: Escuela Superior de Ingeniería Informática, 32004 Ourense, Spain.
These authors contributed equally to this work.
Academic Editor: Luis Javier Garcia Villalba
Appl. Sci. 2021, 11(6), 2674; https://doi.org/10.3390/app11062674
Received: 8 February 2021 / Revised: 7 March 2021 / Accepted: 10 March 2021 / Published: 17 March 2021
The exponential growth of documents in the web makes it very hard for researchers to be aware of the relevant work being done within the scientific community. The task of efficiently retrieving information has therefore become an important research topic. The objective of this study is to test how the efficiency of the text classification changes if different weights are previously assigned to the sections that compose the documents. The proposal takes into account the place (section) where terms are located in the document, and each section has a weight that can be modified depending on the corpus. To carry out the study, an extended version of the OHSUMED corpus with full documents have been created. Through the use of WEKA, we compared the use of abstracts only with that of full texts, as well as the use of section weighing combinations to assess their significance in the scientific article classification process using the SMO (Sequential Minimal Optimization), the WEKA Support Vector Machine (SVM) algorithm implementation. The experimental results show that the proposed combinations of the preprocessing techniques and feature selection achieve promising results for the task of full text scientific document classification. We also have evidence to conclude that enriched datasets with text from certain sections achieve better results than using only titles and abstracts. View Full-Text
Keywords: full text classification; preprocessing techniques; section weighing scheme; information retrieval full text classification; preprocessing techniques; section weighing scheme; information retrieval
Show Figures

Figure 1

MDPI and ACS Style

Oliveira Gonçalves, C.A.; Camacho, R.; Gonçalves, C.T.; Seara Vieira, A.; Borrajo Diz, L.; Lorenzo Iglesias, E. Classification of Full Text Biomedical Documents: Sections Importance Assessment. Appl. Sci. 2021, 11, 2674. https://doi.org/10.3390/app11062674

AMA Style

Oliveira Gonçalves CA, Camacho R, Gonçalves CT, Seara Vieira A, Borrajo Diz L, Lorenzo Iglesias E. Classification of Full Text Biomedical Documents: Sections Importance Assessment. Applied Sciences. 2021; 11(6):2674. https://doi.org/10.3390/app11062674

Chicago/Turabian Style

Oliveira Gonçalves, Carlos A., Rui Camacho, Célia T. Gonçalves, Adrián Seara Vieira, Lourdes Borrajo Diz, and Eva Lorenzo Iglesias. 2021. "Classification of Full Text Biomedical Documents: Sections Importance Assessment" Applied Sciences 11, no. 6: 2674. https://doi.org/10.3390/app11062674

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop