Research on Public Service Request Text Classification Based on BERT-BiLSTM-CNN Feature Fusion

Xiong, Yunpeng; Chen, Guolian; Cao, Junkuo

doi:10.3390/app14146282

Open AccessArticle

Research on Public Service Request Text Classification Based on BERT-BiLSTM-CNN Feature Fusion

by

Yunpeng Xiong

¹

,

Guolian Chen

^2,* and

Junkuo Cao

^3,4,*

¹

School of Information Science and Technology, Hainan Normal University, Haikou 571127, China

²

State-Owned Assets Management Office, Hainan Normal University, Haikou 571158, China

³

Information Network and Data Center, Hainan Normal University, Haikou 571158, China

⁴

Key Laboratory of Data Science and Smart Education, Ministry of Education, Haikou 571158, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2024, 14(14), 6282; https://doi.org/10.3390/app14146282

Submission received: 5 June 2024 / Revised: 13 July 2024 / Accepted: 16 July 2024 / Published: 18 July 2024

Download

Browse Figures

Versions Notes

Abstract

Convolutional neural networks (CNNs) face challenges in capturing long-distance text correlations, and Bidirectional Long Short-Term Memory (BiLSTM) networks exhibit limited feature extraction capabilities for text classification of public service requests. To address the abovementioned problems, this work utilizes an ensemble learning approach to integrate model elements efficiently. This study presents a method for classifying public service request text using a hybrid neural network model called BERT-BiLSTM-CNN. First, BERT (Bidirectional Encoder Representations from Transformers) is used for preprocessing to obtain text vector representations. Then, context and process sequence information are captured through BiLSTM. Next, local features in the text are captured through CNN. Finally, classification results are obtained through Softmax. Through comparative analysis, the method of fusing these three models is superior to other hybrid neural network model architectures in multiple classification tasks. It has a significant effect on public service request text classification.

Keywords:

text classification; ensemble learning; hybrid neural network model; feature fusion; BERT

1. Introduction

Within natural language processing, text classification is a widely explored field. Some approaches use conventional machine learning techniques, including Naive Bayes [1], Support Vector Machine (SVM) [2], and K Nearest Neighbor (KNN) [3,4]. Although these methods can achieve specific classification purposes, they also have limitations such as long training times, slow processing speeds, low efficiency, and poor performance. With the deepening of deep learning, compared to conventional machine learning techniques, its classification approaches yield a far higher classification effect. The commonly used models are the Convolutional Neural Network (CNN), used for text [5], Recurrent Neural Network (RNN) [6,7], Long Short-Term Memory network (LSTM) [8], Bi-directional Long Short-Term Memory (BiLSTM) [9], and Attention Mechanism [10,11,12]. However, these models still need to improve at capturing complex semantic and syntactic information in text and must be based on the word vector Word2Vec [13]. Because Word2Vec still has a lot of semantic ambiguity, it cannot capture accurate semantic information for some words with multiple meanings [14]. The pre-trained model BERT (Bidirectional Encoder Representations from Transformers, BERT) [15], developed by the Google team in 2018, pertinently solves the semantic ambiguity problem. Studies have shown that in natural-language-processing tasks like text categorization and sentiment analysis, BERT outperforms Word2Vec [12,16,17,18].

Despite significant advancements in deep and machine learning, a single model frequently has drawbacks, such as a lengthy training period, limited scope, large number of parameters, and poor interpretability in completing text classification tasks, and it is often difficult to meet ideal expectations with its results. On the one hand, the information provided by the word vectors obtained from a single model is limited, and it is also important to take word relationships into account, as well as the position information and context information of words in sentences or documents. On the other hand, researchers generally find that a hybrid neural network composed of multiple neural network models organically fused [19,20] can usually explore the advantages of a single model while overcoming its disadvantages and improving the classification performance to a large extent. Given the abovementioned circumstances, this study proposes a BERT-BiLSTM-CNN (BBLC) hybrid neural network model, which can effectively improve public service request text categorization performance by integrating current models.

The contributions of this article are as follows:

Single neural network models exhibit shortcomings in handling complex semantic and syntactic structures, difficulties in processing long-range dependencies, and poor generalization capabilities in text classification tasks. The hybrid neural network model BBLC effectively addresses these deficiencies compared to single neural network models.
The BERT-BiLSTM-CNN model combines multiple structures, enabling comprehensive consideration of language context, sequence dependencies, and local patterns. This integrated feature extraction helps reduce model misjudgments and biases in understanding complex texts, leading to superior classification performance in multi-class tasks compared to other hybrid neural network models.
BERT-BiLSTM-CNN provides richer contextual information compared to Word2Vec-BiLSTM-CNN. Word2Vec, being a static embedding method based on word embeddings, fails to capture semantic variations of words across different contexts. Consequently, BERT-BiLSTM-CNN achieves higher classification performance than Word2Vec, which is illustrated in Section 4.

The following parts of this article are as follows: Section 2 describes the current research in this study. Section 3 introduces the structure of the hybrid neural network model BBLC. Section 4 describes the comparison of the classification performance between the BBLC model and other hybrid neural network models. Section 5 provides a summary and outlook.

2. Related Work

Researchers have been more concerned about the hybrid neural network model in recent years. For example, Li et al. [21] used BERT combined with a full connection layer to conduct sentiment analysis on stock comments. Compared with CNN, RNN, BiLSTM, and other models, BERT combined with a full connection layer has a better effect in sentiment classification. Cai [22] and Li et al. [23] combined BERT and BiLSTM models for emotion analysis and the fusion model performs well in the classification of irregular texts’ emotion analysis. Kaur et al. [24], by combining a CNN with the BERT model, discovered that the CNN model could extract a more significant amount of semantic information, improving the accuracy of the classification model. Xie [25] and Deng et al. [26] presented a feature enhancement fusion model that utilizes an attention mechanism [27] to integrate deep learning models—like CNN and LSTM—and enhances the capture of keyword data. Li et al. [17] use BiLSTM-CNN, which integrates multiple features, to perform community Q&A text matching. After multi-feature fusion, the model has a more robust matching performance. Bao et al. [28] proposed a hybrid BERT-based short-text classification algorithm that blends CNN and BiGRU, which is significantly superior to other single models in short-text classification tasks. Jiang [18] and Kaur et al. [29] combined BiLSTM and CNN based on BERT and achieved good classification results in both binary classification and multi-classification tasks. Based on the above analysis, this paper aims to optimize the utilization of BERT, BiLSTM, and CNN by integrating these models and proposing a text classification approach for public service request using a mixed neural network model called BBLC.

3. Model Design

3.1. BERT

The two stages of the BERT paradigm are called fine-tuning and pre-training. A multi-layer bidirectional Transformer encoder layered with 12 encoder structures makes up its fundamental construction. Each encoder structure has 12 self-attention heads and 768 hidden units per feedforward layer. The output representation of each word is obtained by summing three Embedding vectors: Token Embedding, Position Embedding, and Segment Embedding. Figure 1 displays the BERT input feature diagram. Two tasks are used to pre-train the model: Next Sentence Prediction (NSP) and Masked Language Modeling (MLM). MLM randomly masks words in input sentences and tasks the model with predicting the original form of the masked words, thus learning the contextual representation of words in the text. NSP is used to train models to understand the order and relationship between sentences, predicting whether two sentences are consecutive. Fine adjustment is simple, mainly carried out by freezing most parameters, only for parameters requiring fine-tuning.

3.2. BiLSTM

BiLSTM is a combination of forward-to-the-clock LSTM and back-to-the-clock, one of which is a forward processing sequence, and the other is a reverse processing sequence. By stacking two layers of LSTM, the model is freed from the constraint of predicting the next moment’s output solely based on the temporal information from the previous moment. In the BBLC model, the BiLSTM layer processes the output of the BERT, extracting and encoding the above-text information and semantic characteristics of the text. Figure 2 displays the structural diagram of BiLSTM.

After processing is completed, the outputs of the two LSTMs are stacked together as the current output of the BiLSTM. The calculation formulas are (1)–(3).

\vec{h_{t}} = \vec{L S T M} (x_{t}, {\vec{h}}_{t - 1})

(1)

\overset{\leftarrow}{h_{t}} = \overset{\leftarrow}{L S T M} (x_{t}, {\overset{\leftarrow}{h}}_{t - 1})

(2)

h = [\vec{h_{t}} \oplus \overset{\leftarrow}{h_{t}}]

(3)

In Formulas (1)–(3), the variable

x_{t}

denotes the input vector for time

t

.

\vec{h_{t - 1}}

denotes the forward hidden layer vector at time

t - 1

, while

\overset{\leftarrow}{h_{t - 1}}

denotes the reverse hidden layer vector at time

t - 1

. Formula (3) is used to concatenate the two outputs, where

\oplus

represents the concatenation symbol.

3.3. CNN

A pooling layer and a convolution layer make up the CNN layer. The convolution kernel is applied to all possible subsequences in the text, allowing for the effective capture of local patterns and features in the input text. This process generates feature maps that correlate to these patterns and features. The pooling layer decreases the dimensionality of the feature map while retaining the most salient characteristics. One often used pooling method is max pooling, which outputs the highest value from each feature vector.

Y [i, j] = \max_{m . n} X [i \times k + m, j \times k + n]

(4)

Slide a window of size

k \times k

with step

s

in the input feature map, and assign the highest value in each window as the value of the associated point in the output feature map. In formula (4),

Y [i, j]

represents the element in row

i

and column

j

of the output feature map,

X [i \times k + m, j \times k + n]

represents the element that enters the polarized window position in the characteristic diagram, and

m

and

n

are the relative positions in each window, with values ranging from 0 to

k - 1

.

3.4. BERT-BiLSTM-CNN

BiLSTM-CNN is the downstream module, and BERT is the upstream module in the BBLC model architecture. Figure 3 shows the approximate working process of the BBC model:

(1): First, process the data set. $W_{1}, W_{2} \dots W_{n}$ is the input text in Figure 3. The input text is converted into word vectors $T_{1}, T_{2} \dots T_{n}$ after the Transformer encoder, which can be trained as model parameters.
(2): BERT output continues to feed into a two-way LSTM network. BiLSTM uses forward and backward LSTM units for bidirectional modeling of input sequences. The information included in the input sequence is captured by the hidden states of each time step, which are the BiLSTM outputs. CNN receives the spliced forward and backward outputs of the BiLSTM.
(3): Multiple convolution kernels of the CNN convolution layer extract local features by sliding over the input data. The pooling layer receives local feature maps from the convolutional layer as input and performs max pooling operations for each feature map.
(4): The fully connected layer receives the feature representation from the BiLSTM-CNN module and obtains the output $y$ through the linear transformation of the weight matrix. Finally, the Softmax function receives the output of the full connection layer and processes it indexically and normalized to obtain a probability $p_{i}$ for each category. The calculation formulas are (5) and (6).

$y = f (W * x + b)$

(5)

$p_{i} = \frac{e^{y_{i}}}{\sum_{j = 1}^{n} e^{y_{j}}}$

(6)

Formula (5):

f

represents the activation function;

W

represents the weight matrix,

x

is the input eigenvector, and

b

is the bias value. Formula (6):

y_{i}

represents the score of the

i

category, and

n

represents a total of

n

categories.

4. Experiments and Results

4.1. Data Set

The experimental data are based on the Open Data Innovation Application Competition of Jiangxi Province and come from the public service request text data of Xinyu City citizens’ petitioning in 2021. There are 72,300 pieces of the public service request data set and 52,200 pieces of data after annotation and cleaning. The training set, test set, and validation set were separated from the data set using a ratio of 0.7:0.2:0.1. There are four categories: Consultation, Complaint, Help, and Advice. Table 1 displays the training dataset and test dataset distribution.

To facilitate training, the labels for Consultation, Complaint, Help, and Advice are 0, 1, 2, and 3, respectively. Table 2 shows part of the data set.

4.2. Experimental Environment and Parameters Design

All experiments were performed in GeForce RTX 4060 (NVIDIA, Santa Clara, CA, USA) and Inter (R) Core (TM) i7-12650 (Inter, Santa Clara, CA, USA) configurations. The experimental framework is built with PyTorch (2.1.2 + cu118) and runs with Python3.8 in the Anaconda environment. After multiple adjustments of training epochs, it was found that training for 15 epochs yielded the best results. Therefore, for ease of comparison, all experiments in this paper were trained for 15 epochs. With Dropout set to 0.5, turning off 50 percent of neurons during training not only effectively reduces overfitting but also reduces coupling between neurons, and the model achieves convergence with greater ease. The model’s Learning_rate is 0.001, and the Batch_size is 128. The larger the Batch_size is, the more the training speed of the model will be slowed down, and the more severe the gradient oscillation will be, which is not conducive to convergence. In the BiLSTM network, Hidden_size refers to the dimension size of the hidden state in each LSTM cell. After multiple rounds of parameter tuning, the Hidden_size of 768 for the model was found to be relatively simple and can effectively reduce overfitting. Due to the limited length of the public service request text, the maximum height (Max_length) is set to 300. Setting three different sizes of convolutional kernels can help the model better capture semantic information at various scales, thus enhancing its expressive power and generalization performance. Table 3 shows the specific settings of the experimental parameters.

4.3. Experimental Comparison and Result Analysis

4.3.1. Model Comparison

To verify the robustness of the BBLC model, the accuracy, precision, recall, and F1 score on the test set were compared across various models using multiple sets of experiments. Table 4 compares the experimental results of the BBLC model with other models on the test set.

The experiment uses the F1 score as the primary evaluation metric, which simultaneously considers the model’s precision and recall. According to Table 4, it can be observed that, compared to other models, the BBC model has the highest F1 score. A high F1 score indicates that the model has excellent performance in classification tasks and can effectively distinguish and predict various categories. Second, the recall rate is also the highest among them, and a high recall rate usually indicates that the model can find more correct predictions in real positive samples. Thirdly, even when the recall is at its highest, the model also exhibits the highest precision. This indicates that the model, when predicting positive samples, can minimize the misclassification of negative samples as positive, demonstrating better precision. Fourth, the high accuracy of a model indicates that the model can correctly predict more results in all samples and has better overall forecasting ability. These four highest metrics imply that the BBLC model has the most optimal overall performance. Figure 4 shows the accuracy and loss curves of the BBLC model on the training and validation sets.

The confusion matrix in Figure 5 illustrates the classification performance of different models on the test set. According to the confusion matrix, the BERT-CNN model correctly classified 10,466 data points, BERT-BiLSTM classified 10,242, BERT-BiGRU-CNN classified 10,368, while this article’s BBLC model classified 10,497, which exhibits the best performance among these four main models. Although the BBLC model misclassifies most one-class labels into two-class labels, its overall classification ability is the best, achieving the highest accuracy in ticket text classification.

The attention mechanism in a hybrid neural network model usually allows the model to make greater use of features that are retrieved from different components. However, due to public service request text characteristics such as varying lengths—some are excessively long and some are too short, as well as the irregular use of punctuation—some public service request text lacking punctuation or with punctuation errors, and syntactical issues like incoherent sentence structures, the classification performance using attention mechanisms shows variability. To evaluate the effectiveness of the Attention mechanism in the public service request text classification task, the Attention mechanism is integrated into the hybrid neural network for comparison. As shown in Table 5, when the Attention mechanism is incorporated into the models, F1 scores of the models are improved; Table 6 shows that when incorporating the Attention mechanism into hybrid models like CNN, the precision is somewhat enhanced by the influence of recall. However, the overall metric F1 score decreases; this is because when Attention is combined with CNN, the Attention mechanism will cause the model to overlook other significant aspects in favor of paying excessive attention to local features, resulting in some helpful information being ignored, thus affecting the performance of the model.

BERT and Word2Vec exhibit significant differences in language representation, directly influencing their performance in hybrid neural network models. Word2Vec relies on static embeddings where each word has a fixed vector representation, which remains constant at the lexical level and fails to capture semantic variations of words across different contexts. In contrast, BERT, based on the Transformer architecture, is a deep bidirectional model that uses context-aware dynamic embeddings. It dynamically generates representations for each word based on context, thereby better capturing semantic changes in words across different contexts and providing richer and more accurate language representations. Table 7 illustrates the comparison of different hybrid neural network models based on BERT and Word2Vec on the test set. The results show that hybrid neural networks based on BERT consistently outperform those based on Word2Vec. This further demonstrates that BERT outperforms Word2Vec for classification tasks in natural language processing.

4.3.2. Comparison of the Classification Effects of Different Parameters

In the fusion model BBLC, the hyperparameter setting is essential. The different hyperparameter settings of each model result in different classification effects. For example, different Hidden_sizes in BiLSTM lead to different complexity of model learning sequence features, thus affecting the ability to model sequence data. The various sizes of convolutional kernels in CNN will impact the model’s capacity to extract features from text segments of diverse lengths. To further enhance the performance of the BBLC, experiments involving the size of convolutional kernels and the dimensions of Hidden_sizes were conducted.

Table 8 presents the scenario where the convolutional kernel size varies while other parameters remain constant. Table 8 shows that as the convolutional kernel size increases, the model’s recall and F1 score also improve. When the convolutional kernel is too large, the model tends to overfit. Overfitting occurs when the model becomes overly complex and excessively adapts to the characteristics of the training data, leading to a loss of generalization ability. So, it is important to choose the appropriate convolution kernel size; Table 9 shows the dimensions of different hidden layers when other parameters are the same. Too small a hidden layer will lead to poor performance of the model on the test set, inadequate dimensionality of the hidden layer will result in subpar performance of the model on the test data set, and an excessively large hidden layer can lead to overfitting. Therefore, after a series of comparative experiments, Num_layers = 1, Kernel_sizes = [7,8,9], and Hidden_sizes = 768 were selected for hyperparameter training.

5. Conclusions

To enhance the efficiency of retrieving residents’ public service request text, this paper classified the text of public service request text. It makes an in-depth analysis of the text content of the public service request. When residents input their information into categories or select incorrect categories during submission, the classifier categorizes the information upon submission. In this text classification task, a hybrid neural network classification model (BERT-BiLSTM-CNN) is established, significantly improving the classification performance and thus improving the retrieval efficiency and speed. Firstly, BERT is used to encode the text to obtain rich semantic information and generate word vectors. BiLSTM then models bidirectional sequences of the encoded text to capture sequence information in the text. Then, CNN extracts local features to enhance the model’s perception of essential features in the text. After being processed by BiLSTM and CNN, the fully connected layers merge their output features and input the merged characteristics into the Softmax classifier for classification. The hybrid neural network BBLC combines BERT, BiLSTM, and CNN by comprehensively leveraging their advantages in semantic understanding, sequence modeling, and feature extraction, enhancing the efficiency in jobs involving text classification. If other unassisted individuals perform blind classification, the consistency (accuracy) of classification reaches 95.3%. In subsequent studies, the BERT-BiLSTM-CNN model and the interactions between its components will be further explored to enhance its classification performance in future text classification or sentiment analysis tasks.

Author Contributions

Funding acquisition, G.C.; data curation, G.C.; validation, J.C.; writing coach, J.C.; reviewing, J.C.; validation, J.C.; writing and editing, Y.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the 2024 Hainan Province Higher Education Teaching Reform Research Project: Hnjg2024ZD-19 and Hainan Province key research and development plan: ZDYP2021GXJS203.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article; further inquiries can be directed to the corresponding author. The data presented in this study are available in the public domain: https://github.com/XypXypHaHa/firstReposity.git, accessed on 7 July 2024.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Guan, G.; Guo, J.; Wang, H. Varying Naïve Bayes models with applications to classification of chinese text documents. J. Bus. Econ. Stat. 2014, 32, 445–456. [Google Scholar] [CrossRef]
Moraes, R.; Valiati, J.F.; Neto, W.P.G.O. Document-level sentiment classification: An empirical comparison between SVM and ANN. Expert Syst. Appl. 2013, 40, 621–633. [Google Scholar] [CrossRef]
Jiang, S.; Pang, G.; Wu, M.; Kuang, L. An improved K-nearest-neighbor algorithm for text categorization. Expert Syst. Appl. 2012, 39, 1503–1509. [Google Scholar] [CrossRef]
Bilal, M.; Israr, H.; Shahid, M.; Khan, A. Sentiment classification of Roman-Urdu opinions using Naïve Bayesian, Decision Tree and KNN classification techniques. J. King Saud Univ.-Comput. Inf. Sci. 2016, 28, 330–344. [Google Scholar] [CrossRef]
Soni, S.; Chouhan, S.S.; Rathore, S.S. TextConvoNet: A convolutional neural network based architecture for text classification. Appl. Intell. 2023, 53, 14249–14268. [Google Scholar] [CrossRef]
Lai, S.; Xu, L.; Liu, K.; Zhao, J. Recurrent convolutional neural networks for text classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015. [Google Scholar]
Yin, W.; Kann, K.; Yu, M.; Schütze, H. Comparative study of CNN and RNN for natural language processing. arXiv 2017, arXiv:1702.01923. [Google Scholar]
Dirash, A.R.; Bargavi, S.K. LSTM Based Text Classification. IITM J. Manag. IT 2021, 12, 62–65. [Google Scholar]
Liu, G.; Guo, J. Bidirectional LSTM with attention mechanism and convolutional layer for text classification. Neurocomputing 2019, 337, 325–338. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
Galassi, A.; Lippi, M.; Torroni, P. Attention in natural language processing. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 4291–4308. [Google Scholar] [CrossRef]
Sun, X.; Lu, W. Understanding attention for text classification. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Seattle, WA, USA, 5–10 July 2020; pp. 3418–3428. [Google Scholar]
Zhang, D.; Xu, H.; Su, Z.; Xu, Y. Chinese comments sentiment classification based on word2vec and SVMperf. Expert Syst. Appl. 2015, 42, 1857–1863. [Google Scholar] [CrossRef]
Shen, Y.; Liu, J. Comparison of text sentiment analysis based on bert and word2vec. In Proceedings of the 2021 IEEE 3rd International Conference on Frontiers Technology of Information and Computer, Greenville, SC, USA, 12–14 November 2021; pp. 144–147. [Google Scholar]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
Kale, A.S.; Pandya, V.; Di Troia, F.; Stamp, M. Malware classification with word2vec, hmm2vec, bert, and elmo. J. Comput. Virol. Hacking Tech. 2023, 19, 1–16. [Google Scholar] [CrossRef]
Li, Z.; Yang, X.; Zhou, L.; Jia, H.; Li, W. Text matching in insurance question-answering community based on an integrated BiLSTM-TextCNN model fusing multi-feature. Entropy 2023, 25, 639. [Google Scholar] [CrossRef] [PubMed]
Jiang, X.; Song, C.; Xu, Y.; Li, Y.; Peng, Y. Research on sentiment classification for netizens based on the BERT-BiLSTM-TextCNN model. PeerJ Comput. Sci. 2022, 8, e1005. [Google Scholar] [CrossRef]
Li, X.; Cui, M.; Li, J.; Bai, R.; Lu, Z.; Aickelin, U. A hybrid medical text classification framework: Integrating attentive rule construction and neural network. Neurocomputing 2021, 443, 345–355. [Google Scholar] [CrossRef]
Hernández, G.; Zamora, E.; Sossa, H.; Téllez, G.; Furlán, F. Hybrid neural networks for big data classification. Neurocomputing 2020, 390, 327–340. [Google Scholar] [CrossRef]
Li, M.; Chen, L.; Zhao, J.; Li, Q. Sentiment analysis of Chinese stock reviews based on BERT model. Appl. Intell. 2021, 51, 5016–5024. [Google Scholar] [CrossRef]
Cai, R.; Qin, B.; Chen, Y.; Zhang, L.; Yang, R.; Chen, S.; Wang, W. Sentiment Analysis About Investors and Consumers in Energy Market Based on BERT-BiLSTM. IEEE Access 2020, 8, 171408–171415. [Google Scholar] [CrossRef]
Li, X.; Lei, Y.; Ji, S. BERT-and BiLSTM-based sentiment analysis of online Chinese buzzwords. Future Internet 2022, 14, 332. [Google Scholar] [CrossRef]
Kaur, K.; Kaur, P. BERT-CNN: Improving BERT for requirements classification using CNN. Procedia Comput. Sci. 2023, 218, 2604–2611. [Google Scholar] [CrossRef]
Xie, J.; Hou, Y.; Wang, Y.; Wang, Q.; Li, B.; Lv, S.; Vorotnitsky, Y.I. Chinese text classification based on attention mechanism and feature-enhanced fusion neural network. Computing 2020, 102, 683–700. [Google Scholar] [CrossRef]
Deng, J.; Cheng, L.; Wang, Z. Attention-based BiLSTM fused CNN with gating mechanism model for Chinese long text classification. Comput. Speech Lang. 2021, 68, 101182. [Google Scholar] [CrossRef]
Letarte, G.; Paradis, F.; Giguère, P.; Laviolette, F. Importance of self-attention for sentiment analysis. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, Brussels, Belgium, 1 November 2018; pp. 267–275. [Google Scholar]
Bao, T.; Ren, N.; Luo, R.; Wang, B.; Shen, G.; Guo, T. A BERT-based hybrid short text classification model incorporating CNN and attention-based BiGRU. J. Organ. End User Comput. 2021, 33, 1–21. [Google Scholar] [CrossRef]
Kaur, K.; Kaur, P. Improving BERT model for requirements classification by bidirectional LSTM-CNN deep model. Comput. Electr. Eng. 2023, 108, 108699. [Google Scholar] [CrossRef]

Figure 1. BERT feature input diagram.

Figure 2. BiLSTM model structure diagram.

Figure 3. BERT-BiLSTM-CNN model overall frame diagram.

Figure 4. Training set accuracy and loss curves.

Figure 5. Confusion matrices for different models.

Table 1. Dataset distribution.

Label	Text Category	Training Set	Test Set
0	Consultation	14,260	3843
1	Complaint	16,246	4319
2	Help	9669	2549
3	Advice	1207	325

Table 2. Partial data of data set.

Label	Text Content
0	A citizen called to inquire: What are the epidemic prevention and control measures in Huiyu District, Heyuan City, Guangdong Province?
1	Xinyu No. 4 Middle School charges fees for extra classes without parents’ consent and does not allow parental objections. They insist that improving students’ education is solely the responsibility and obligation of teachers, emphasizing efficient classroom management to lighten students’ burdens instead of increasing pressure through fees and extra classes for students and parents.
2	Citizen’s call: Seeking help with a high-speed tire blowout.
0	Recently, the provincial education department issued a document stating that there will be reforms to the entrance examination for the 2021 graduating class of junior high school students. However, when I visited the provincial education department, they mentioned that local enrollment policies will prevail. Could you please clarify if there are indeed reforms to the entrance examination for this year’s junior high school students?
3	Mr. Wu called to suggest reducing the volume of freight trucks on NanYuan Road.
1	A citizen called to report that a BMW 4S dealership in Xiamen refunded a deposit and ceased selling BMW cars, which they find unreasonable.
2	Citizen’s Call: Can teachers from Changqing Elementary School on Hushan Road (citizen declined to disclose grade) take students to other teaching locations for lessons after extended classes end, around 5 PM? (Teaching content pertains to classroom studies) Citizen mentioned this is a common practice.

Table 3. Experimental parameters.

Hyperparameter	Value	Hyperparameter	Value
Epoch	15	Optimizer	Adam
Dropout	0.5	Hidden_size	768
Learning rate	10⁻³	Max_length	300
Batch_size	128	Kernel_sizes	[7,8,9]

Table 4. Comparison of experimental results.

Model	Accuracy	Precision	Recall	F1-Score
BERT-CNN [24]	0.9509	0.9354	0.9503	0.9395
BERT-BiGRU	0.9345	0.9405	0.9126	0.9213
BERT-BiLSTM [23]	0.9306	0.9080	0.9270	0.9129
BERT-BiGRU-CNN [28]	0.9420	0.9385	0.9344	0.9322
BERT-BiLSTM-CNN	0.9536	0.9541	0.9519	0.9509

The bold text is the model of the article.

Table 5. Attention but no CNN models’ comparison.

Model	Accuracy	Precision	Recall	F1-Score
BERT-BiGRU	0.9345	0.9405	0.9136	0.9223
BERT-BiGRU-Attention	0.9469	0.9386	0.9448	0.9387
BERT-BiLSTM	0.9306	0.9080	0.9270	0.9129
BERT-BiLSTM-Attention	0.9391	0.9382	0.9309	0.9314

The bold text indicates that this value is the highest among the comparisons. The following tables are the same.

Table 6. Attention and CNN models’ comparison.

Model	Accuracy	Precision	Recall	F1-Score
BERT-BiGRU-CNN	0.9420	0.9385	0.9344	0.9322
BERT-BiGRU-Attention-CNN	0.9299	0.9274	0.9062	0.9125
BERT-CNN	0.9509	0.9354	0.9503	0.9395
BERT-Attention-CNN	0.9446	0.9440	0.9287	0.9335
BERT-BiLSTM-CNN	0.9536	0.9541	0.9519	0.9509
BERT-BiLSTM-Attention-CNN	0.9505	0.9528	0.9457	0.9470

Table 7. Comparison of models based on BERT and Word2Vec.

Model	Accuracy	Precision	Recall	F1-Score
Word2Vec-CNN [5]	0.9387	0.9410	0.9340	0.9353
BERT-CNN	0.9509	0.9354	0.9503	0.9395
Word2Vec-Attention-CNN	0.9335	0.9378	0.9237	0.9282
BERT-Attention-CNN	0.9446	0.9440	0.9287	0.9335
Word2Vec-BiLSTM-Attention-CNN [25]	0.9444	0.9356	0.9351	0.9325
BERT-BiLSTM-Attention-CNN	0.9505	0.9528	0.9457	0.9470
Word2Vec-BiLSTM-CNN [17]	0.9413	0.9407	0.9415	0.9389
BERT-BiLSTM-CNN	0.9536	0.9541	0.9519	0.9509

Table 8. Experimental comparison of convolution kernel sizes.

Model			BERT-BiLSTM-CNN
Num_Layers	Kernel_Sizes	Hidden_Sizes	Accuracy	Precision	Recall	F1-Score
1	[2,3,4]	768	0.9461	0.9450	0.9454	0.9431
1	[3,4,5]	768	0.9491	0.9510	0.9414	0.9442
1	[4,5,6]	768	0.9467	0.9360	0.9461	0.9378
1	[5,6,7]	768	0.9377	0.9377	0.9485	0.9392
1	[6,7,8]	768	0.9529	0.9521	0.9495	0.9490
1	[7,8,9]	768	0.9536	0.9541	0.9519	0.9509
1	[8,9,10]	768	0.9464	0.9459	0.9474	0.9437
1	[9,10,11]	768	0.9527	0.9532	0.9495	0.9491
1	[10,11,12]	768	0.9519	0.9509	0.9563	0.9487

Table 9. Experimental comparison of Hidden sizes.

Model			BERT-BiLSTM-CNN
Num_Layers	Kernel_Sizes	Hidden_Sizes	Accuracy	Precision	Recall	F1-Score
1	[7,8,9]	128	0.9379	0.9440	0.9337	0.9353
1	[7,8,9]	256	0.9483	0.9530	0.9434	0.9459
1	[7,8,9]	512	0.9521	0.9528	0.9469	0.9480
1	[7,8,9]	768	0.9536	0.9541	0.9519	0.9509

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xiong, Y.; Chen, G.; Cao, J. Research on Public Service Request Text Classification Based on BERT-BiLSTM-CNN Feature Fusion. Appl. Sci. 2024, 14, 6282. https://doi.org/10.3390/app14146282

AMA Style

Xiong Y, Chen G, Cao J. Research on Public Service Request Text Classification Based on BERT-BiLSTM-CNN Feature Fusion. Applied Sciences. 2024; 14(14):6282. https://doi.org/10.3390/app14146282

Chicago/Turabian Style

Xiong, Yunpeng, Guolian Chen, and Junkuo Cao. 2024. "Research on Public Service Request Text Classification Based on BERT-BiLSTM-CNN Feature Fusion" Applied Sciences 14, no. 14: 6282. https://doi.org/10.3390/app14146282

APA Style

Xiong, Y., Chen, G., & Cao, J. (2024). Research on Public Service Request Text Classification Based on BERT-BiLSTM-CNN Feature Fusion. Applied Sciences, 14(14), 6282. https://doi.org/10.3390/app14146282

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on Public Service Request Text Classification Based on BERT-BiLSTM-CNN Feature Fusion

Abstract

1. Introduction

2. Related Work

3. Model Design

3.1. BERT

3.2. BiLSTM

3.3. CNN

3.4. BERT-BiLSTM-CNN

4. Experiments and Results

4.1. Data Set

4.2. Experimental Environment and Parameters Design

4.3. Experimental Comparison and Result Analysis

4.3.1. Model Comparison

4.3.2. Comparison of the Classification Effects of Different Parameters

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI