BTCAM: Attention-Based BiLSTM for Imbalanced Classification of Political Inquiry Messages
Abstract
1. Introduction
- As illustrated in Figure 1, the length of political inquiry messages is usually short, with most messages less than 100 words. Due to the limitation of text length, the association and contextual information between different elements are incomplete, resulting in possible ambiguity during the classification. For example, for a certain message, such as ”There is construction at night near the community. The sound is very loud and affects rest. I hope it can be solved.”, the message may involve categories such as ”Urban and rural construction” and ”Transportation”. Which category it belongs to requires more contextual information to determine. If this message is talking about road repair or expansion, it might be more inclined to ”Transportation.” If it is about construction of facilities within the community, it may fall under ”Urban and rural construction”.
- Another problem is data imbalance. Figure 2 shows the proportion of different categories of political inquiry messages, among which ”Urban and rural construction” is the most popular category, with a proportion of 22.7%, followed by ”Labor and social security”. ”Transportation” comprises the smallest proportion, with 8.2%. This indicates the significant difference in the distribution of different categories in the dataset, which will negatively impact the performance of models.
- We propose a topic-based data augmentation algorithm (TDA). Experiments demonstrate that TDA not only enhances the linguistic quality of sentences but also markedly elevates the operational efficiency of data processing.
- We design a streamlined convolutional block attention mechanism (SCBAM) and demonstrate its effectiveness in short-text classification. This plug-and-play module can be seamlessly integrated into diverse model architectures with minimal computational overhead.
- We propose an attention-based BiLSTM for the imbalanced classification of political inquiry messages. Extensive experiments show that BTCAM outperforms the state-of-the-art baseline.
2. Related Work
2.1. Text Classification
2.2. Data Augmentation
2.3. Attention Mechanism
3. Methodology
3.1. Preliminaries
3.1.1. Notation
3.1.2. Topic Modeling
3.2. Framework Overview of BTCAM
3.2.1. Topic-Based Data Augmentation
3.2.2. Embedding Layer
3.2.3. Context-Information-Extraction Layer
3.2.4. Dual-Gated Convolution Mechanism Layer
3.2.5. Output Layer
Algorithm 1 The pseudocode of BTCAM |
Input: A set of messages where each is an individual message. |
Output: Predicted labels for the input messages. |
|
4. Experiment
- RQ1: Does BTCAM outperform other baseline models?
- RQ2: Is the combination of topic-based data augmentation and the streamlined convolutional block attention mechanism crucial for BTCAM?
- RQ3: What is the impact of different attention mechanisms?
- RQ4: What is the efficiency and quality of the TDA algorithm?
4.1. Datasets
4.2. Evaluation Metrics
- TP (true positive): The number of samples correctly predicted as positive.
- TN (true negative): The number of samples that are correctly predicted as negative.
- FP (false positive): The number of samples incorrectly predicted as positive.
- FN (false negative): The number of samples incorrectly predicted as negative.
- AccuracyAccuracy is the proportion of the quantity that is predicted correctly:
- PrecisionPrecision is the percentage of actual positive samples that are predicted to be positive samples, accounting for all the samples that are predicted to be positive:
- RecallRecall indicates the probability that actual positive samples which have been predicted to be positive samples are actual positive samples:
4.3. Baselines and Experiment Setup
- TextCNN [38]: This model is a classic baseline for text classification.
- TextRNN [39]: RNN are used in this method to solve text classification problems. Usually bidirectional LSTM and GRU.
- FastText [40]: This model extends word2vec to handle sentence and document classification, with a very fast training speed.
- BiLSTM_ATT [41]: This integrates RNN and an attention mechanism. Crucial information is captured using the attention mechanism.
- DPCNN [42]: By continuously deepening the network, this technique extracts long-distance text dependencies to produce better classification outcomes.
- BERT [13]: This method is a pre-trained deep bidirectional language model based on the Transformer architecture, which improves model performance by learning context-related word embeddings from large-scale text.
4.4. Results and Discussion
4.4.1. Comparison of Baseline Models (RQ1)
- Without data augmentation, the FastText, DPCNN, and TextCNN models have poor performance on the training dataset and validation dataset. Although TextRNN, BiLSTM_ATT, and BERT improve the accuracy of the model on the training dataset, the recall was only 0.632 and 0.651.
- After using the proposed TDA algorithm to increase the number of minority categories, the accuracy of the model on the training dataset is greater than 0.9, and the accuracy and recall on the validation dataset are both greater than 0.8. The performance of the models with TDA has improved greatly.
- Compared with the baseline, the model converges faster and achieves the best performance. On the training dataset, the accuracy of BTCAM reached 0.996, and the loss value dropped to 0.03. On the validation dataset, the precision and recall of BTCAM were both greater than 0.93.
- To further verify the generalizability of BTCAM, additional experiments were conducted on the public THUCNews dataset, a widely used benchmark for Chinese short-text classification. As shown in Table 5, BTCAM outperforms all baselines by a significant margin, thus confirming its adaptability beyond government-specific data.
4.4.2. Ablation Study (RQ2)
- When the TDA component is removed, the accuracy of the model on the training dataset is only 0.882, and the accuracy and recall on the validation dataset are reduced to 0.601 and 0.581, respectively. This shows that the TDA algorithm has a great impact on the performance of BTCAM, especially on recall, which proves that the SCBAM component is essential for BTCAM.
- When the SCBAM component is removed, the accuracy on the training dataset is 0.983, and the accuracy and recall on the validation dataset are 0.922 and 0.913, respectively. The performance of the model dropped by varying degrees, and the drop was more obvious on the validation dataset. This shows that SCBAM can effectively improve the generalization ability of the model, which proves that the SCBAM component is necessary for BTCAM.
4.4.3. Analysis of Attention Mechanism (RQ3)
4.4.4. Analysis of Efficiency and Quality (RQ4)
- Compared with EDA, the execution time of TDA is reduced from 10,488 s to 129 s, and the efficiency of the algorithm has been greatly improved.
- The model performs best on the data obtained by the TDA algorithm. The reason is that the TDA algorithm can generate higher-quality samples, and the boundaries between different categories are clearer. This proves the superiority of the TDA algorithm.
5. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Jeong, H.; Lee, T.H.; Hong, S.-G. A corpus analysis of electronic petitions for improving the responsiveness of public services: Focusing on Busan petition. Korean J. Local Gov. Stud. 2017, 21, 423–436. [Google Scholar] [CrossRef]
- Kim, N.; Hong, S. Automatic classification of citizen requests for transportation using deep learning: Case study from Boston city. Inf. Process. Manag. 2021, 58, 102410. [Google Scholar] [CrossRef]
- Dalianis, H.; Sjöbergh, J.; Sneiders, E. Comparing manual text patterns and machine learning for classification of e-mails for automatic answering by a government agency. In Computational Linguistics and Intelligent Text Processing; Springer: Berlin/Heidelberg, Germany, 2011; pp. 234–243. [Google Scholar]
- Wirtz, B.W.; Müller, W.M. An integrated artificial intelligence framework for public management. Public Manag. Rev. 2018, 21, 1076–1100. [Google Scholar] [CrossRef]
- Hong, S.G.; Kim, H.J.; Choi, H.R. An analysis of civil traffic complaints using text mining. Int. Inf. Inst. (Tokyo). Inf. 2016, 19, 4995. [Google Scholar]
- Hagen, L. Content analysis of e-petitions with topic modeling: How to train and evaluate LDA models? Inf. Process. Manag. 2018, 54, 1292–1307. [Google Scholar] [CrossRef]
- Ryu, S.; Hong, S.; Lee, T.; Kim, N. A pattern analysis of bus civil complaint in Busan city using the text network analysis. Korean Assoc. Comput. Account. 2018, 16, 19–43. [Google Scholar] [CrossRef]
- Fu, J.; Lee, S. A multi-class SVM classification system based on learning methods from indistinguishable Chinese official documents. Expert Syst. Appl. 2012, 39, 3127–3134. [Google Scholar] [CrossRef]
- Dieng, A.B.; Wang, C.; Gao, J.; Paisley, J. Topicrnn: A recurrent neural network with long-range semantic dependency. arXiv 2016, arXiv:1611.01702. [Google Scholar]
- Hadi, M.U.; Qureshi, R.; Ahmed, A.; Iftikhar, N. A lightweight CORONA-NET for COVID-19 detection in X-ray images. Expert Syst. Appl. 2023, 225, 120023. [Google Scholar] [CrossRef]
- Ozbek, A. Prediction of daily average seawater temperature using data-driven and deep learning algorithms. Neural Comput. Appl. 2023, 36, 365–383. [Google Scholar] [CrossRef]
- Wang, Y.; Li, X.; Cheng, Y.; Du, Y.; Huang, D.; Chen, X.; Fan, Y. A neural probabilistic bounded confidence model for opinion dynamics on social networks. Expert Syst. Appl. 2024, 247, 123315. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Long and Short Papers. Volume 1, pp. 4171–4186. [Google Scholar]
- Alves, A.L.F.; de Souza Baptista, C.; Firmino, A.A.; de Oliveira, M.G.; de Paiva, A.C. A spatial and temporal sentiment analysis approach applied to Twitter microtexts. J. Inf. Data Manag. 2015, 6, 118. [Google Scholar]
- Lai, S.; Xu, L.; Liu, K.; Zhao, J. Recurrent convolutional neural networks for text classification. Proc. AAAI Conf. Artif. Intell. 2015, 29, 1. [Google Scholar] [CrossRef]
- Duan, Y.; Yao, L. The automatic classification and matching of informal texts on political media integration platform. Intell. Theory Pract. 2020, 43, 7. [Google Scholar]
- Sidi, W.; Guangwei, H.; Siyu, Y.; Yun, S. Research on automatic forwarding method of government website mailbox based on text classification. Data Anal. Knowl. Discov. 2020, 4, 51–59. [Google Scholar]
- Li, M.; Yin, K.; Wu, Y.; Guo, C.; Li, X. Research on government message text classification based on natural Language Processing. Comput. Knowl. Technol. 2021, 17, 160–161. [Google Scholar] [CrossRef]
- Chen, J.; Tam, D.; Raffel, C.; Bansal, M.; Yang, D. An empirical survey of data augmentation for limited data learning in NLP. Trans. Assoc. Comput. Linguist. 2023, 11, 191–211. [Google Scholar] [CrossRef]
- Feng, S.Y.; Gangal, V.; Wei, J.; Chandar, S.; Vosoughi, S.; Mitamura, T.; Hovy, E. A survey of data augmentation approaches for NLP. arXiv 2021, arXiv:2105.03075. [Google Scholar]
- Zhang, L.; Yang, Z.; Yang, D. TreeMix: Compositional Constituency-based Data Augmentation for Natural Language Understanding. arXiv 2022, arXiv:2205.06153. [Google Scholar]
- Hariharan, B.; Girshick, R. Low-shot visual recognition by shrinking and hallucinating features. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 3018–3027. [Google Scholar]
- Chen, J.; Yang, Z.; Yang, D. Mixtext: Linguistically-informed interpolation of hidden space for semi-supervised text classification. arXiv 2020, arXiv:2004.12239. [Google Scholar]
- Sennrich, R.; Haddow, B.; Birch, A. Improving neural machine translation models with monolingual data. arXiv 2015, arXiv:1511.06709. [Google Scholar]
- Yang, Y.; Malaviya, C.; Fernandez, J.; Swayamdipta, S.; Bras, R.L.; Wang, J.P.; Bhagavatula, C.; Choi, Y.; Downey, D. Generative data augmentation for commonsense reasoning. arXiv 2020, arXiv:2004.11546. [Google Scholar]
- Wei, J.; Zou, K. Eda: Easy data augmentation techniques for boosting performance on text classification tasks. arXiv 2019, arXiv:1901.11196. [Google Scholar]
- Karimi, A.; Rossi, L.; Prati, A. Aeda: An easier data augmentation technique for text classification. arXiv 2021, arXiv:2108.13230. [Google Scholar]
- Fu, Y.; Xu, D.; He, K.; Li, H.; Zhang, T. Image Inpainting Based on Edge Features and Attention Mechanism. In Proceedings of the 2022 5th International Conference on Image and Graphics Processing (ICIGP), Beijing, China, 7–9 January 2022; pp. 64–71. [Google Scholar]
- Jia, K. Sentiment classification of microblog: A framework based on BERT and CNN with attention mechanism. Comput. Electr. Eng. 2022, 101, 108032. [Google Scholar] [CrossRef]
- Zheng, Y.; Shao, Z.; Gao, Z.; Deng, M.; Zhai, X. Optimizing the Online Learners’ Verbal Intention Classification Efficiency Based on the Multi-Head Attention Mechanism Algorithm. Int. J. Found. Comput. Sci. 2022, 33, 717–733. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Deerwester, S.; Dumais, S.T.; Furnas, G.W.; Landauer, T.K.; Harshman, R. Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 1990, 41, 391–407. [Google Scholar] [CrossRef]
- Hofmann, T. Probabilistic latent semantic analysis. arXiv 2013, arXiv:1301.6705. [Google Scholar]
- Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent dirichlet allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. [Google Scholar]
- Khan, S.; Durrani, S.; Shahab, M.B.; Johnson, S.J.; Camtepe, S. Joint User and Data Detection in Grant-Free NOMA with Attention-based BiLSTM Network. arXiv 2022, arXiv:2209.06392. [Google Scholar] [CrossRef]
- Li, J.; Sun, M. Scalable term selection for text categorization. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, Czech Republic, 28–30 June 2007; pp. 774–782. [Google Scholar]
- Kim, Y. Convolutional Neural Networks for Sentence Classification. arXiv 2014, arXiv:1408.5882. [Google Scholar]
- Liu, P.; Qiu, X.; Huang, X. Recurrent neural network for text classification with multi-task learning. arXiv 2016, arXiv:1605.05101. [Google Scholar]
- Joulin, A.; Grave, E.; Bojanowski, P.; Mikolov, T. Bag of tricks for efficient text classification. arXiv 2016, arXiv:1607.01759. [Google Scholar]
- Zhou, P.; Shi, W.; Tian, J.; Qi, Z.; Li, B.; Hao, H.; Xu, B. Attention-based bidirectional long short-term memory networks for relation classification. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, 7–12 August 2016; Volume 2, pp. 207–212. [Google Scholar]
- Salerno, M.; Sargeni, F.; Bonaiuto, V. DPCNN: A modular chip for large CNN arrays. In Proceedings of the ISCAS’95-International Symposium on Circuits and Systems, Seattle, WA, USA, 30 April–3 May 1995; Volume 1, pp. 417–420. [Google Scholar]
Symbol | Description of Symbol |
---|---|
x | The embedding of messages |
y | The label of messages |
M | Collection of documents |
N | Length of document |
Dirichlet prior for the topic distribution per message | |
Dirichlet prior for the word distribution per topic | |
Polynomial distribution of topics in certain message | |
Polynomial distribution of words in certain topic | |
Z | The words of messages |
W | The topics of messages |
T | Collection of topics |
Operation | Specific Method |
---|---|
Topic-term insertion (TI) | Randomly select a word from a sentence that belongs to the set of categorical themes and obtain its random synonym to insert anywhere in the sentence. |
Topic-term replacement (TR) | Randomly select n words from the sentence that belong to the set of categorical themes and replace them by randomly selecting their synonyms. |
Random swap (RS) | Alter the positions of two words in a sentence randomly. |
Insertion punctuation (IP) | Insert n punctuation marks from {“.”, “;”, “?”, “:”, “!”, “,”} with probability p. |
Hyperparameter | Value |
---|---|
Max sequence length | 32 |
Max epochs | 15 |
Learning rate | |
Batch size | 32 |
Dropout | 0.7 |
The dimension of the word vector | 300 |
Model | Train_data | Val_data | ||
---|---|---|---|---|
Loss | Accuracy | Accuracy | Recall | |
TDA+FastText | 0.270 | 0.910 | 0.869 | 0.828 |
TDA+DPCNN | 0.213 | 0.931 | 0.855 | 0.842 |
TDA+TextCNN | 0.081 | 0.957 | 0.892 | 0.884 |
TDA+TextRNN | 0.043 | 0.968 | 0.911 | 0.910 |
TDA+BiLSTM_ATT | 0.041 | 0.967 | 0.927 | 0.918 |
TDA+BERT | 0.037 | 0.988 | 0.928 | 0.921 |
BTCAM (ours) | 0.030 | 0.996 | 0.939 | 0.931 |
Model | Train_data | Val_data | ||
---|---|---|---|---|
Loss | Accuracy | Accuracy | Recall | |
FastText | 1.102 | 0.745 | 0.732 | 0.509 |
DPCNN | 0.915 | 0.801 | 0.789 | 0.667 |
TextCNN | 0.883 | 0.832 | 0.818 | 0.693 |
TextRNN | 0.508 | 0.865 | 0.841 | 0.746 |
BiLSTM_ATT | 0.335 | 0.892 | 0.872 | 0.751 |
BERT | 0.301 | 0.910 | 0.901 | 0.814 |
TDA+FastText | 0.295 | 0.915 | 0.892 | 0.875 |
TDA+DPCNN | 0.228 | 0.928 | 0.910 | 0.892 |
TDA+TextCNN | 0.098 | 0.948 | 0.925 | 0.914 |
TDA+TextRNN | 0.051 | 0.961 | 0.938 | 0.927 |
TDA+BiLSTM_ATT | 0.048 | 0.963 | 0.946 | 0.935 |
TDA+BERT | 0.042 | 0.975 | 0.952 | 0.941 |
BTCAM (ours) | 0.035 | 0.981 | 0.962 | 0.951 |
Precision | Recall | F1 | |
---|---|---|---|
Labor and social security | 0.912 | 0.956 | 0.932 |
Environmental protection | 0.983 | 0.913 | 0.943 |
Transportation | 0.911 | 0.926 | 0.917 |
Business and tourism | 0.932 | 0.922 | 0.928 |
Education and sports | 0.930 | 0.924 | 0.926 |
Urban and rural construction | 0.935 | 0.948 | 0.934 |
Health and birth control | 0.938 | 0.934 | 0.930 |
Arrangement | Train_data | Val_data | ||
---|---|---|---|---|
Loss | Accuracy | Accuracy | Recall | |
TDA | 0.431 | 0.882 | 0.601 | 0.581 |
SCBAM | 0.0982 | 0.983 | 0.922 | 0.913 |
BTCAM (ours) | 0.030 | 0.996 | 0.939 | 0.931 |
Arrangement | Train_data | Val_data | ||
---|---|---|---|---|
Loss | Accuracy | Accuracy | Recall | |
CBAM | 0.051 | 0.983 | 0.911 | 0.921 |
SE+spatial | 0.072 | 0.970 | 0.923 | 0.920 |
SCBAM&: MaxPool&spatial | 0.054 | 0.981 | 0.921 | 0.922 |
SCBAM+: MaxPool + spatial | 0.030 | 0.996 | 0.939 | 0.931 |
Method | Execute Time | FastText | TextCNN | BiLSTM_ATT | TextRNN | BTCAM |
---|---|---|---|---|---|---|
EDA | 10488 s | 0.592 | 0.721 | 0.720 | 0.761 | 0.791 |
AEDA | 8.36 s | 0.613 | 0.742 | 0.884 | 0.881 | 0.897 |
TDA | 129 s | 0.910 | 0.957 | 0.961 | 0.968 | 0.996 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hu, H.; Wu, C.; Zhu, D. BTCAM: Attention-Based BiLSTM for Imbalanced Classification of Political Inquiry Messages. Appl. Sci. 2025, 15, 8796. https://doi.org/10.3390/app15168796
Hu H, Wu C, Zhu D. BTCAM: Attention-Based BiLSTM for Imbalanced Classification of Political Inquiry Messages. Applied Sciences. 2025; 15(16):8796. https://doi.org/10.3390/app15168796
Chicago/Turabian StyleHu, Huijuan, Chao Wu, and Dingju Zhu. 2025. "BTCAM: Attention-Based BiLSTM for Imbalanced Classification of Political Inquiry Messages" Applied Sciences 15, no. 16: 8796. https://doi.org/10.3390/app15168796
APA StyleHu, H., Wu, C., & Zhu, D. (2025). BTCAM: Attention-Based BiLSTM for Imbalanced Classification of Political Inquiry Messages. Applied Sciences, 15(16), 8796. https://doi.org/10.3390/app15168796