Next Article in Journal
Engineering and Environmental Analysis of Additive Manufacturing in the Food Industry
Previous Article in Journal
Investigation of the Influence of Deposition Temperature and N2 Flow on the Hardness of TiN Coating
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Proceeding Paper

Research and Development of Police Address-Matching System for City A †

1
School of Information Engineering, Shanghai Zhongqiao Polytechnic University, Shanghai 201514, China
2
Department of Software Development, Shanghai Zhongheng Software Technology Co., Ltd., Shanghai 200333, China
3
College of Costume Art Design, Donghua University, Shanghai 200051, China
*
Author to whom correspondence should be addressed.
Presented at the 2024 Cross Strait Conference on Social Sciences and Intelligence Management, Shanghai, China, 13–15 December 2024.
Eng. Proc. 2025, 98(1), 40; https://doi.org/10.3390/engproc2025098040
Published: 18 July 2025

Abstract

The address is a key element in the construction of smart cities. When receiving reports from citizens, public security officers need to quickly and accurately locate a crime scene based on the address provided by the reporter. The address from the reporter may be a standard address or it may be a point of interest, abbreviation, or common name. The difficulty in converting the address into a standard address can be solved through the analysis of address elements and address matching. We developed a bidirectional encoder representations from transformers (BERT)-based address feature resolution method and an address-matching algorithm. On this basis, a police force address-matching system for City A was designed and implemented. A Web application system was also developed based on the address database of City A. The developed address resolution and matching method with the database maintenance module successfully matched the reported address to the standard one.

1. Introduction

As one of the key elements in the construction of smart cities, address information is important in people’s daily lives and social work. For example, public security officers locate a crime scene based on the location provided by the reporting personnel. The standard address for a location is the address with naming conventions provided by the municipal planning bureau. The hierarchy of standard addresses is complete, and the expression is standardized and highly accurate. The urban address consists of an administrative division name, road name, and house number in a hierarchy. The elements of a rural address include an administrative division name, village name, and group number in that order. The general address format in Chinese cities is XX Province YY City ZZ District MM Street NN Road No. KK LL Building. The address formats in cities have local characteristics, such as “lane” in Shanghai and “hutong” in Beijing. There is no element of “province” in municipalities directly under the central government. The address forms in various ethnic minorities and mountainous areas are different, such as the “banner” in Inner Mongolia. The hierarchical elements are listed in descending order to form the address of a specific location.
In applications, the addresses provided by ordinary citizens are generally not standardized, which are non-standard addresses with a missing hierarchy and mixed expressions, points of interest, or even abbreviations or former names. The task of address matching is to parse the various types of address data provided and convert them into standard addresses. The analysis of such addresses requires parsing each address element from the provided address and then matching the parsed address elements to corresponding ones in the address library. If the address is matched to one in the database but it is not a standard address, it is necessary to convert the address into a standard address. Due to the complexity of Chinese addresses, especially the various complex addresses provided by people, matching and translation are challenging.
As an international megacity, City A has a vast territory with rapid urban development. Its address data has a large and incrementing size and is subject to fast changes with an exceptionally complex structure and naming method. Standard addresses, interest points, and place names are varied depending on citizens. Through multi-party collection and processing, we obtained 2,706,918 standard addresses and 1,356,182 points of interest in City A [1]. As long as the address provided by the user is valid, the developed system in this study matches it to a correct standard address.
The key to address matching is the accuracy of address feature parsing [2]. The syntax and semantics of Chinese addresses are complex and disorganized and do not follow standards, making it challenging to segment and parse them. Deep learning models with transformers result in effective address element parsing. We used Bidirectional Encoder Representations from Transformers (BERT) [3] (bert-base-chinese) for preliminary address resolution to correct and resolve address elements by combining it with a similarity calculation for address matching.
We built an address database including standard addresses and points of interest and developed an address-matching algorithm to provide efficient address information services for public security officers. The algorithm can be used in residents’ daily lives to receive and send express deliveries and for travel navigation, government agencies’ urban planning, housing and population management, and emergency affairs handling.

2. Related Works

Address matching is a key issue in address services and refers to matching unstructured addresses with structured addresses to find the standardized address [4]. Efficient address-matching methods satisfy the demand for high-precision address positioning in the context of smart cities.
There are three main methods for address matching based on similarity judgment, address feature parsing, and real-time text analysis. Foreign languages such as English have simple segmentation information and address hierarchy, using common address names such as “Street” and “Road”. Most countries have their own national standard address libraries, and using methods based on character similarity show positive results [5,6]. The traditional address-matching methods in China are also based on similarity judgment, which directly compares the character similarity between two addresses or adds semantic and spatial similarity based on address features to improve the effectiveness of address matching [7,8]. Splitting the address string first, calculating the similarity of substrings, and then combining them through the backpropagation (BP) neural network also produces positive results [9].
However, similarity-based methods make it difficult to deal with problems such as blurred address boundaries and complex features. At present, the commonly used solutions for address feature parsing include rule-based methods, statistical methods, and deep learning-based methods.
  • Rule-based method
This method utilizes address hierarchy and the segmentation of address elements based on manually designed rules. However, the address data encountered in applications is not standardized, and the meaning of the same Chinese character at different levels is different [10]. Relying solely on rules cannot cope with complex address expressions.
2.
Statistical method
This method is based on statistics and machine learning, analyzing large-scale databases to form statistical models that combine lexical and contextual information of place names to mitigate semantic ambiguity. The mainstream method adopts a dictionary that includes information on administrative divisions and points of interest and combines conditional random fields and hidden Markov models for word segmentation [11]. This method accurately identifies the included address elements, but it cannot solve the problem of unregistered words.
3.
Deep learning method
This method mines potential rule features in address data and produces a semantically rich address. This model extracts semantic information between address elements for irregular address expressions. For example, Li et al. [12] proposed a Chinese address element segmentation method based on a bidirectional gated recurrent unit neural network; Wang et al. [13] used a deep learning architecture model based on the fusion of character, word, and address features and utilized conditional random fields for sequence annotation to recognize and extract address information; and Zhang et al. [14] used the BERT language model to learn contextual information and address model features and conditional random fields to model the constraint relationships between labels to identify new address features.
Address information is constantly changing, and there is a lag in updating the address database. Therefore, an address-matching method was developed based on a real-time search and text analysis of relevant address information. For example, Shan et al. [15] obtained relevant information about addresses from search engines and conducted a text analysis, greatly enriching the semantic information of addresses, partially solving the problems of address redundancy and incompleteness, and alleviating the impact of rapid address updates. Not all required address information exists in the database. When there is no matching address in the database, Tian et al. [16] used a spatial coordinate interpolation method based on street numbers, Cura et al. [17] designed an algorithm to calculate the distance between street numbers, and Cheng et al. [18] proposed a fast spatial inference algorithm based on a global latitude–longitude network to find the nearest standard address.
In this study, we established a standard address database for City A including standard address information and point of interest information. The standard address data consists of the following four tables.
  • DISTRICT (district_id, district_name, and district_code)
  • STREET (street_id, street_name, street_code, and district_id)
  • QUARTER (quarter_id, quarter_name, quarter_code, and street_id)
  • SH_STD_TB (addr_id, district_id, street_id, road, door, longitude, and latitude)
The administrative divisions under City A comprise streets, towns, or townships, which belong to the same level of the three types. According to the official website [19], as of 31 July 2024, City A had 16 administrative districts, which consisted of 108 streets, 106 towns, and two townships. The third-level administrative divisions of City A are communities or villages, with 5004 and 1553, respectively. The number of final collected and organized standard addresses for City A was 2,706,918. The structure of the point of interest table is as follows: POI (poi_id, poi_name, type_id, district_id, longitude, latitude, and address). When converting a point of interest into a standard address, a corresponding standard address for a certain point of interest was directly obtained. If not, the nearest standard address was obtained through a latitude and longitude calculation. Finally, 1,356,182 points of interest were collected and organized for City A.

3. Address Matching and Conversion

The address recognition method developed in this study consisted of two main components. Firstly, the BERT (bert-base-chinese) model was employed to vectorize the data and learn the contextual information. Secondly, the tag revision module (TRM) was employed to revise the individual tags with the address characteristics, which enhanced the parsing accuracy of address elements. The developed method was named the BERT-TRM model, and the overall architecture of the model is illustrated in Figure 1 [1].

3.1. Sequence Labeling with BERT

Address recognition is a sequential annotation task, where the input address is vectorized using the BERT model. Specifically, each address string is treated as a separate sentence, with a start flag [CLS] at the beginning and an end marker [SEP] at the end. The BERT model relies on a dictionary for word separation, where each Chinese character corresponds to a code, and characters not found in the dictionary are represented by [UNK]. Although the original model splits numbers and English words into multiple parts, a character-by-character splitting scheme is adopted to facilitate decoding. That is, each letter or number corresponds to a single code. An example of address vectorization using the BERT model is depicted in Figure 2.
For an address text s with n characters, each character is input as a token into the BERT model to obtain contextually related information. For each token, the likelihood x i of being tagged as each type of tag is calculated using the SoftMax layer to select the tag with the highest likelihood as the target tag.
x i = B E R T T o k e n   i
T a g i = S o f t M a x ( x i )

3.2. TRM

In this study, the labeling of address elements is based on the BIOES scheme, which consists of five tags: B (begin), I (inside), O (outside), E (end), and S (single). Each character in the input address string is assigned a Tag-Label structure, where Tag is one of B, I, O, E, and S, and Label corresponds to the address element category. For example, the address element “温州市” (Wenzhou City) is labeled as [B-city I-city E-city], where “city” is the Label for this municipal administrative division. The first character in the element is tagged as “B”, the last character is tagged as “E”, and any characters between the beginning and end of an address element are tagged as “I”.
To recognize an address element, a string of characters must satisfy two conditions: the Label within the element must be the same, and the Tag must comply with the BIOES labeling specification. Specifically, “I” only appears between the nearest “B” and “E”, and the address element containing two or more characters must start with “B” and end with “E”. In contrast, an address element with only one character is tagged as “S”.
After analyzing the statistics of the labels predicted by BERT, the accuracy rate of the word level is lower than that of the character level. Address elements that are incorrectly identified are mostly due to the prediction of a single incorrect character. Therefore, the correction of the wrong tags improves the prediction accuracy at the word level.
The annotation modification rules in this study are based on two key address features: address hierarchy and address common name. The address has a hierarchical structure, from top to bottom, consisting of administrative divisions (province → city → district → street), road names, road numbers, points of interest, sub-points of interest, unit numbers, building numbers, and room numbers and can also be accompanied by auxiliary address descriptive information. The address expressed by the user defaults to multiple elements in one hierarchy, but the order of existing elements in the hierarchy is correct. In addition, each layer of the address element has a corresponding address generic name, such as “province”, “city”, and “district” in administrative divisions; “road” and “avenue” in road names; “number” in road numbers; “hospital”, “building”, “university”, “company” in points of interest; and “unit” in unit numbers. Based on this, the annotation correction rules are formulated as follows [1]:
  • Select a text segment s that does not comply with the labeling standards, with a length of n and a label type of m for that segment.
  • If the Tag part of each character in s conforms to the specification, that is, only the Label part is different, then calculate the Label for that segment according to the following rules:
    • Calculate the proportion p of each type of label and select the one with the highest proportion as the label for each character in this text.
    • If there is more than one label with the highest proportion, check whether the last word or the last two words are address common names. If they are address common names, revise the label of each character in the segment to the category of the address common name.
    • If it is not an address generic name, or if the category to which the address generic name belongs is not in the existing label table, modify the label based on the hierarchical characteristics of the address. The labels for the address elements before and after s are La and Lb. Choose m labels that are located between La and Lb and closest to either La or Lb as the label for s.
3.
If the Tag part of the character annotation in s does not comply with the specifications, then first determine the Label:
  • If m = 1, which means the label type is unique, this label is the s segment label.
  • If m > 1, then, based on the proportion of various labels, address common names, and hierarchical scoring, select the label with the highest score as the label for segment s.
  • Record the number of times Label x appears in s as t; if the end of segment s contains an address generic name that belongs to x, it is denoted as j. If it meets the criteria, j = 1; otherwise, j = 0. Let the number of levels between the address element categories before and after segment s be r. If x is in the interval level, the shortest distance between the two ends is taken as pos, counted from 1, and pos = 0 when x is not in the interval level. The weights of label proportion, address generic name, and hierarchy are w1, w2, and w3, respectively. Equation (3) is used to calculate scores.
    score = w1 × t/n + w2 × j + w3 × 1/⌈r/2⌉ × (⌈r/2⌉-pos + 1)
Then, the situation of segment s is combined with the preceding and following elements.
  • If the beginning of segment s is not B, but it is the same as the label of the previous feature, the text of that segment is merged with the previous feature to modify the Tag to ensure that the Tag of the first character is B, the Tag of the last character is E, and the Tag of the remaining characters is I.
  • If the end of segment s is not E, but it has the same label as the following text, it is merged with the following text to modify the Tag to ensure that the Tag of the first character is B, the Tag part of the last character is E, and the Tag of the remaining characters is I.
  • If the label of segment s is different from the labels of the preceding and following segments, then the Tag of segment s is modified separately, with the first one being B, the last one being E, and the rest being I. When there is only one character, modify its Tag to S.

3.3. Revision Based on Administrative Divisions

The revision module based on administrative divisions is used to correct administrative division information to improve the accuracy of subsequent address standardization tasks. Based on the national standard of zoning information, the provincial, prefectural, and county-level address elements are supplemented and revised in the output by the Tag Revision Module. This module uses the national administrative division database constructed by the project. Statistical data in the database is shown in Table 1. The address elements include province, prefecture, and county from the output, which are retrieved from the tag revision module. If they contain complete province, prefecture, and county information, their hierarchical structure is correct. For incomplete administrative division information, the default part is determined by the existing two parts.

4. Address-Matching Algorithm

The address-matching algorithm was developed based on the established City A’s standard address database. Based on the official data released by the Statistics Bureau of City A, and after revising the existing data, the community table, street table, administrative district table, and standard address table were built. The relationship pattern is described as follows.
  • STREET (street_id, street_name, district_id, and street_code)
  • COMMITTEE (cmt_id, cmt_name, street_id, and cmt_code)
  • DISTRICT (district_id, district_name, and district_code)
  • SH_STD_TB (addr_id, road, street_id, longitude, and latitude)
Using various means of supplementation and correction, the final standard address table was created with a total of 2,706,918 pieces of data. The structure of the interest point table is as follows: POI (POI_id, POI_name, type_id, district_id, GCJ02_LON, GCJ02_LAT, WGS84_LON, and WGS84_LAT). Here, GCJ02_LON and GCJ02_LAT are the longitude and latitude obtained from the latitude map, while WGS84_LON and WGS84_LAT are the longitude and latitude of Wgs84, respectively. A total of 1,356,182 valid interest point data were collected, including 23 major categories such as company, address, and school. The quality of these data is not high enough and is still being improved.
The address matching in this study refers to searching for standard addresses in the database corresponding to non-standard address expressions. The input of the query module is the result of address parsing; that is, address segments and their corresponding categories. For example, the segmentation result of “人民大道200号”, which is a Chinese address in City A, is “[‘人民大道’, ‘200 ’], [‘road’, ‘roadno’]”, in which ‘人民大道’ is the road name in the address expression, and ‘200’ is number of the address in the road. The format of the results obtained by querying the standard address table and the point of interest table is unified, and each result is stored in a dictionary. The contents of the dictionary are shown in Table 2.
In Table 2, “黄浦区“, “南京东路街道“, is a district name in City A, street name in the district, respectively, they are in hierarchical structure. And, “人民大道“ is a road name, which road is located in that district. “号“, is added after a number, is a Chinese symbol used for expressing number.”上海市黄浦区南京东路街道人民大道200号“ is a complete address express, formed by concatenate each address element on hierarchy order. ”上海市人民政府“ means people government of City Shanghai.

4.1. Standard Address Table

If the address elements contain a road name and house number information, that is, the label list obtained by parsing the address element contains “road” and “roadno”, a fuzzy query is conducted from the standard address table. If the corresponding address information is identified, the query is successful, and the result is returned directly. If no information is identified, the point of interest table is used for querying.

4.2. Interest Point Table

There are two possible situations encountered when using the point of interest table query. One is that the segmented address elements do not include roads and house numbers but contain point of interest information. Another scenario is that no corresponding information was found in the standard address table.
In the first scenario, the POI_NAME column is used in the point of interest table for querying. If the address element contains administrative divisions, the administrative division information is combined for searching. For the second scenario, it is necessary to query the address column in the administrative division table. If the address information containing the road and house number is identified, the address exists and is recorded. If it cannot be identified and there is POI information, the solution is used in the first scenario to be queried. According to different specific situations, 10 query statements were used. Several queries are based on POI-NAME to directly obtain longitude, latitude, and administrative division information, with road names and house numbers included in the EXPRESS column. However, the content of this column contains redundant and mixed expressions, such as floor, points of interest, etc. It is necessary to standardize the address or obtain the standard address for such addresses through latitude and longitude calculations. One meter corresponds to a longitude of approximately 0.00001141° and a latitude of approximately 0.00000899°. When assigning house numbers, adjacent house numbers generally do not exceed 10 m. Therefore, if the longitude and latitude errors of two points are within 0.0001 and 0.00008, respectively, these two points use the same house number. If the latitude and longitude of the queried address are X1 and Y1, the standard address table is searched for addresses with latitude and longitude X2 and Y2 that meet the conditions X1 − 0.00005 < X2 < X1 + 0.0005, Y1 − 0.00004 < Y2 < Y1 + 0.0004 as the standard address.

4.3. Address Filtering and Sorting

When using the interest point table query, multiple results may be returned, so it is necessary to filter and sort these results. According to the algorithm used for address normalization, filtering and sorting schemes are conducted using the following method:
  • Similarity of place names and addresses
The edit distance is calculated between the original input address addr0 and the n queried addresses addr1, addr2, …, and addrn. Select the address with the smallest edit distance and normalize it as the standard address.
2.
Similarity in latitude and longitude
The latitude and longitude of the address addr1 to be matched are denoted as (X1, Y1), and the latitude and longitude that match the error found in the standard address table are denoted as (X1, Y1), (X2, Y2), …, and (Xn, Yn). The distance between each coordinate and the original coordinate is calculated to select the address with the smallest distance as the standard address for addr1. The distance between the i-th latitude and longitude and (X1, Y1) is computed by using Equation (4).
d i s t a n c e i = ( X 1 X i ) 2 + ( Y 1 Y i ) 2

5. System Design and Development

The City A police’s address-matching system is mainly used for public security by quickly and accurately locating the crime scene. Its core functions are address matching and map search.

5.1. Address Parsing and Matching

When receiving a call, the reported address may be a standard one, but most of them are in a non-standard form with aliases, abbreviations, and points of interest. Regardless of the form, the standard address in the database can be found through the processing of this system. The address parsing and matching module has three sub-modules: preprocessing, address element parsing, and address-matching modules. The process is as follows: Start → Input an address to be queried → Preprocess address → Parse address elements → Address matching → Output result → End.
The preprocessing module verifies the address input by the user, limiting the length (not less than two) and type (not all numbers) of the input text. Then, punctuation marks are replaced or removed from the valid data. The address parsing module preprocesses addresses as follows:
  • Add starting [CLS] at the beginning and [SEP] marker at the end;
  • Calculate the three embeddings for each token;
  • Calculate the sum of three embeddings;
  • Obtain the annotation of each character through the BERT model;
  • Revise annotations based on address hierarchy and address common names;
  • Revision based on administrative divisions.
The input description address is split into multiple segments of address elements, and the output is an array of address element segments and an array of element categories, namely “[Address Segment 1, Address Segment 2, Address Segment 3…], [Category 1, Category 2, Category 3…]”. The address-matching module selects different address-matching schemes based on different types of address elements, queries the matching standard addresses from the database, filters and sorts the results, and selects the output with the highest score.

5.2. Map Search

The map search function includes map point selection, conversion between address point and latitude–longitude pair, and address query. The system calculates the corresponding latitude and longitude, administrative divisions, and address names based on the user’s location on the map. The conversion between address points and latitude–longitude pair refers to after the user inputs latitude and longitude, and the system displays the corresponding points on the map. Users select points on the map and receive corresponding latitudes and longitudes. Address query refers to displaying the address represented by the user in text on the map. This module is implemented using the application programming interface provided by Baidu Maps and presents the map on the front-end page using the Echarts plugin.
The user roles of this system include alarm personnel, database administrators, and system administrators. The developed system also needs to consider performance, usability, maintainability, and security.

6. Conclusions

In this study, a complete address-matching solution is developed as an effective address service system using Python 3.9 and Django framework and is designed using MTV architecture. Through experimental verification, the matching success rate of the developed matching method reached 82.9%. The data in the database still needs to be further supplemented and improved. The developed system has the following characteristics and advantages:
  • Performance: The special usage scenarios of this system require a very smooth system operation. The front-end display time is controlled within 3 s and the time is standardized for a single address within 2 s. In addition, the system throughput and concurrency are regulated to ensure that the system meets the requirements of multi-user concurrent usage without any lag.
  • Usability: This system is designed for police officers to use in a tense environment when receiving calls. It is easy to use, with simple and beautiful pages, consistent style, and clear, prompt information.
  • Maintainability: To facilitate the expansion of system functions, the modular granularity becomes as reasonable as possible, and the design style is unified and easy to analyze and reuse.
  • Security: There are multiple types of users in this system, and it is necessary to control their permissions reasonably.

Author Contributions

Conceptualization, X.D. and J.F.; methodology, X.D.; software, J.F.; validation, X.D., J.F. and M.D.; formal analysis, X.D.; investigation, M.D.; resources, M.D.; data curation, M.D.; writing—original draft preparation, J.F. and M.D.; writing—review and editing, X.D.; visualization, M.D.; supervision, X.D.; project administration, X.D.; funding acquisition, X.D. All authors have read and agreed to the published version of the manuscript.

Funding

Supported By Shanghai Information Technology Development Special Fund Project (XX-XXFZ-05-16-0139); Shanghai Science and Technology Action Plan Project (15511106900).

Institutional Review Board Statement

The City A police’s address-matching system is mainly used for public security by quickly and accurately locating the crime scene. If public security department in city A maintained a complete database of place names and addresses in City A, this system can be used in quickly and accurately locating the crime scene. This database will be specially used by the public security department.

Informed Consent Statement

This study does not target any individual or organization.

Data Availability Statement

The data used in this study comes from publicly available data on the website, and the database was created by our research team. According to the commercial agreement, it can be used after authorization.

Conflicts of Interest

This study was sponsored by Shanghai Information Technology Development Special Fund Project (XX-XXFZ-05-16-0139); Shanghai Science and Technology Action Plan Project (15511106900). Authors reports no disclosures.

References

  1. Zhao, R.N.; Ding, X.W. Enhancing BERT-Based Chinese Address Recognition Model with Tag Revision Module. In Proceedings of the International Conference on Computer Information and Big Data Applications Engineering, Dalian, China, 23–25 June 2023. [Google Scholar]
  2. Lin, Y.; Kang, M.J.; Wu, Y.Y.; Du, Q.; Liu, T. A deep learning architecture for semantic address-matching. Int. J. Geogr. Inf. Sci. 2020, 34, 559–576. [Google Scholar] [CrossRef]
  3. Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: Human language technologies, Minneapolis, MI, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
  4. Xu, L.C.; Mao, R.C.; Zhang, C.K.; Wang, Y.; Zheng, X.; Xue, X.; Xia, F. Deep Transfer Learning Model for Semantic Address-matching. Appl. Sci. 2022, 12, 10110. [Google Scholar] [CrossRef]
  5. Lee, K.; Claridades, A.R.C.; Lee, J. Improving a Street-Based Geocoding Algorithm Using Machine Learning Techniques. Appl. Sci. 2020, 10, 5628. [Google Scholar] [CrossRef]
  6. Cebeci, S.; Ozyilmaz, M.; Ince, G. Automatic Standardization System for Free Text Addresses. In Proceedings of the 27th Signal Processing and Communications Applications Conference (SIU), Sivas, Turkey, 24–26 April 2019. [Google Scholar]
  7. Dumedah, G. Address points of landmarks and paratransit services as a credible reference database for geocoding. Trans. GIS 2021, 25, 1027–1048. [Google Scholar] [CrossRef]
  8. Jin, P.; Yang, J.; Wang, Z.; Bu, X.; Wu, P. Power Customer Data Relational Algorithm Based on Magnanimity Fuzzy Address-matching. Front. Energy Res. 2021, 9, 674856. [Google Scholar] [CrossRef]
  9. Liu, J.; Wang, J.; Zhang, C.; Yang, X.; Deng, J.; Zhu, R.; Nan, X.; Chen, Q. Chinese Address Similarity Calculation Based on Auto Geological Level Tagging. In 16th International Symposium on Neural Networks (ISNN); Springer International Publishing AG: Moscow, Russia, 2019; pp. 431–438. [Google Scholar]
  10. Qian, C.Y.; Yi, C.; Cheng, C.Q.; Pu, G.; Liu, J. A Coarse-to-Fine Model for Geolocating Chinese Addresses. ISPRS Int. J. Geo-Inf. 2020, 9, 698. [Google Scholar] [CrossRef]
  11. Chen, J.; Chen, J.P.; She, X.R.; Mao, J.; Chen, G. Deep Contrast Learning Approach for Address Semantic Matching. Appl. Sci. 2021, 11, 7608. [Google Scholar] [CrossRef]
  12. Li, P.; Luo, A.; Liu, J.; Wang, Y.; Zhu, J.; Deng, Y.; Zhang, J. Bidirectional Gated Recurrent Unit Neural Network for Chinese Address Element Segmentation. ISPRS Int. J. Geo-Inf. 2020, 9, 19. [Google Scholar] [CrossRef]
  13. Wang, Y.S.; Wang, M.; Ding, C.L.; Yang, X.; Chen, J. Chinese Address Recognition Method Based on Multi-Feature Fusion. IEEE Acc. 2022, 10, 108905–108913. [Google Scholar] [CrossRef]
  14. Zhang, H.; Ren, F.; Li, H.; Yang, R.; Zhang, S.; Du, Q. Recognition Method of New Address Elements in Chinese Address Matching Based on Deep Learning. ISPRS Int. J. Geo-Inf. 2020, 9, 745. [Google Scholar] [CrossRef]
  15. Shan, S.; Li, Z.; Yang, Q.; Liu, A.; Zhao, L.; Liu, G.; Chen, Z. Geographical address representation learning for address-matching. World Wide Web-Internet Web Inf. Syst. 2020, 23, 2005–2022. [Google Scholar] [CrossRef]
  16. Tian, Q.; Ren, F.; Hu, T.; Liu, J.; Li, R.; Du, Q. Using an Optimized Chinese Address-matching Method to Develop a Geocoding Service: A Case Study of Shenzhen, China. ISPRS Int. J. Geo-Inf. 2016, 5, 65. [Google Scholar] [CrossRef]
  17. Cura, R.; Dumenieu, B.; Abadie, N.; Costes, B.; Perret, J.; Gribaudi, M. Historical Collaborative Geocoding. ISPRS Int. J. Geo-Inf. 2018, 7, 262. [Google Scholar] [CrossRef]
  18. Cheng, R.Z.; Liao, J.X.; Chen, J. Quickly locating POIs in large datasets from descriptions based on improved address-matching and compact qualitative representations. Trans. GIS 2022, 26, 129–154. [Google Scholar] [CrossRef]
  19. Which district do the 216 streets and towns belong to? Check out the Latest Shanghai Administrative Divisions. Available online: https://www.shanghai.gov.cn/nw17239/20240815/0979827191b5488f9f6165ba9434a6f2.html (accessed on 1 December 2024).
Figure 1. Architecture of developed model in [1].
Figure 1. Architecture of developed model in [1].
Engproc 98 00040 g001
Figure 2. Address vectorization representation using BERT. note: non-English terms in this paper are all Chinese. The segment between two [SEP] in the first row in Figure 2 is an Chinese address in City A, which is composed of a string of Chinese charaters and a Chinese number(10 号, i.e. No.10). Each character in the right foot in the second row is one from the segment in relative position.
Figure 2. Address vectorization representation using BERT. note: non-English terms in this paper are all Chinese. The segment between two [SEP] in the first row in Figure 2 is an Chinese address in City A, which is composed of a string of Chinese charaters and a Chinese number(10 号, i.e. No.10). Each character in the right foot in the second row is one from the segment in relative position.
Engproc 98 00040 g002
Table 1. Administrative division information statistics.
Table 1. Administrative division information statistics.
Administrative Division Level Quantity
Provincial-level34
Prefecture-level333
County-level2844
Table 2. Structure of dictionary.
Table 2. Structure of dictionary.
Key NameMeaningExample
TypeQuery strategy number1
DistrictDistrict Name黄浦区
StreetStreet name南京东路街道
RoadRoad name人民大道
DoorHouse number200号
LongitudeLongitude121.4693601
LatitudeLatitude31.23204939
Addr_answerAddress result (concatenate each address element)上海市黄浦区南京东路街道人民大道200号
Poi_nameInterest point name (used when querying using the interest point table)上海市人民政府
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ding, X.; Feng, J.; Ding, M. Research and Development of Police Address-Matching System for City A. Eng. Proc. 2025, 98, 40. https://doi.org/10.3390/engproc2025098040

AMA Style

Ding X, Feng J, Ding M. Research and Development of Police Address-Matching System for City A. Engineering Proceedings. 2025; 98(1):40. https://doi.org/10.3390/engproc2025098040

Chicago/Turabian Style

Ding, Xiangwu, Jiale Feng, and Mengke Ding. 2025. "Research and Development of Police Address-Matching System for City A" Engineering Proceedings 98, no. 1: 40. https://doi.org/10.3390/engproc2025098040

APA Style

Ding, X., Feng, J., & Ding, M. (2025). Research and Development of Police Address-Matching System for City A. Engineering Proceedings, 98(1), 40. https://doi.org/10.3390/engproc2025098040

Article Metrics

Back to TopTop