Medical text categorization is a specific area of text categorization. Classification for medical texts is considered a special case of text classification. Medical text includes medical records and medical literature, both of which are important clinical information resources. However, medical text contains complex medical vocabularies, medical measures, which has problems with high-dimensionality and data sparsity, so text classification in the medical domain is more challenging than those in other general domains. In order to solve these problems, this paper proposes a unified neural network method. In the sentence representation, the convolutional layer extracts features from the sentence and a bidirectional gated recurrent unit (BIGRU) is used to access both the preceding and succeeding sentence features. An attention mechanism is employed to obtain the sentence representation with the important word weights. In the document representation, the method uses the BIGRU to encode the sentences, which is obtained in sentence representation and then decode it through the attention mechanism to get the document representation with important sentence weights. Finally, a category of medical text is obtained through a classifier. Experimental verifications are conducted on four medical text datasets, including two medical record datasets and two medical literature datasets. The results clearly show that our method is effective.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited