In recent years, deep learning models have been effective in areas such as CV and NLP [
7,
8], especially in TER [
9,
10]. Compared with traditional entity recognition methods, deep learning can automatically learn critical features and higher-order abstract features from the original dataset, avoiding the need for domain experts to define rules or carry out complex feature engineering manually. Because it can use many parameters, it has apparent advantages in applying deep semantic knowledge and alleviating data sparsity [
11]. It maps raw textual data into a vector or matrix space. It maps words to their corresponding entity classes using different neural networks [
12]. The toponymic entity recognition model based on deep learning mainly comprises an embedding layer, a feature encoding layer, and a label decoding layer. Among them, the feature encoding layer mainly includes Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTM), Gated Recurrent Units (GRUs), as well as Hybrid Neural Networks and Attention Mechanisms. CNNs can increase computation speed by parallelizing data processing and, therefore, have a faster computational efficiency. Gritta M et al. proposed a new approach to systematically encoding geographic metadata in conjunction with CNNs [
13]. This approach involves converting place names in natural language text into corresponding latitude and longitude coordinates and combining them with map information to improve the robustness of the model with a joint training approach. Kumar A et al. faced the problems of unreliable fields, grammatical errors, and non-standard abbreviations of place name information on Twitter. They proposed a CNN-based model that extracts geolocation information from Twitter and achieves an F1 of 96.0% [
14]. CNNs have the problem of missing contextual information when processing long text or sequence data. RNNs are superior to CNNs in terms of performance in processing sequence data and can better capture the dependencies between sequence data. By analytically comparing ARIMA, LSTM, and BiLSTM models, Siami-Namini et al. verified that training data in the opposite direction helps sequence modeling and can significantly improve the accuracy of time series [
15]. Chen T et al. proposed a divide-and-conquer approach by first classifying sentences into three different types using Bilstm-CRF and then utilizing 1d-CNNs to perform sentiment analysis on each type of sentence, which effectively improves the performance of sentence-level sentiment analysis [
16]. Shen Si et al. proposed an RNN-based Chinese character-level annotation model by combining RNNs and Chinese word features, which significantly improved the F1 value of toponymic entities [
17]. Rhanoui M et al. addressed the problems of large data size and conflicting viewpoints in the task of document-level sentiment analysis by constructing a CNN-BiLSTM model for extended text viewpoint analysis using word embedding in Doc2vec. They obtained excellent results on a French newspaper article [
18]. Peng N et al. attempted to train NER and disambiguation as a joint task using an LSTM-CRF model, which improved the F1 value by almost 5% on the results of previous studies [
19]. Lu W et al. constructed a model for prediction by combining CNNs, BiLSTM, and Attention Mechanisms for nonlinear time series such as stock price and obtained better results [
20]. Dong C et al. constructed a BiLSTM-CRF model based on character-level and part-level feature representations for Chinese-named entity recognition. They achieved the best F1 value of 90.95% on the MSRA dataset [
21]. With the development of pre-trained language models (PTLM) such as BERT, PTLM can capture most of the semantic information of Chinese characters in a better way compared with previous studies. Ning X et al. introduced bi-directional attention routing and sausage measure to project data onto complex surfaces with nonlinear mapping, which enables the approximation of any nonlinear function with arbitrary accuracy and maintains the local responsiveness of the capsule entities. The experimental results are excellent [
22]. Ma K et al. proposed a neural network model based on BERT-BiLSTM-CRF for Chinese place name entity recognition, which performs well on MSRA, GeoTR-20, and other datasets [
23]. Ziniu W et al. proposed a hybrid neural network model based on BERT to address the problems of not fully considering the context and ignoring the local features in the NER task by combining the BiLSTM and IDCNN models to extract the features, which resulted in a 4.79% improvement in the F1 value compared with the baseline model on the CLUENER dataset [
24].
Deep learning has made superior progress in TER tasks. However, problems that need to be addressed, such as data scarcity, difficulty in contextualizing contextual understanding, and place name ambiguity. These problems mentioned above are also reflected in the field of Genglubu. First, Genglubu has high scarcity as it is a kind of literature containing unique geographical features. Its data samples are small, and the data available for training are limited. These problems cause difficulties for the model in discovering hidden features in the data. Second, the way of documentation in Genglubu differs from modern times, especially the rich contextualized information in Chinese. Finally, the phenomenon that the same name can be used as both a place name and an orientation exists in Genglubu, increasing the ambiguity in the corpus. Based on the above research results, this paper utilizes deep learning models to research toponymic entity recognition in Genglubu.