Deep Transfer Learning Model for Semantic Address Matching

: Address matching, which aims to match an input descriptive address with a standard address in an address database, is a key technology for achieving data spatialization. The construction of today’s smart cities depends heavily on the precise matching of Chinese addresses. Existing methods that rely on rules or text similarity struggle when dealing with nonstandard address data. Deep-learning-based methods often require extracting address semantics for embedded representation, which not only complicates the matching process, but also affects the understanding of address semantics. Inspired by deep transfer learning, we introduce an address matching approach based on a pretraining ﬁne-tuning model to identify semantic similarities between various addresses. We ﬁrst pretrain the address corpus to enable the address semantic model (abbreviated as ASM) to learn address contexts unsupervised. We then build a labelled address matching dataset using an address-speciﬁc geographical feature, allowing the matching problem to be converted into a binary classiﬁcation prediction problem. Finally, we ﬁne-tune the ASM using the address matching dataset and compare the output with several popular address matching methods. The results demonstrate that our model achieves the best performance, with precision, recall, and an F1 score above 0.98.


Introduction
Addresses are used to describe a unique spatial location on Earth and are usually expressed in the form of an addressing system [1].In recent years, with the rapid development of location services, massive amounts of industry data based on addresses as spatial information have started to emerge.Address matching is a crucial application in address services, which compares addresses with the same location in different address databases to obtain the best match with the search address and to determine position on a map [2].Traditional address matching technology is challenged by the prevalence of highprecision address matching in urban industries, such as logistics and online taxi services.Therefore, an effective address matching method is required to facilitate the provision of accurate and efficient intelligent spatial location services and to promote the development of smart cities.
The pattern of arrangement of address elements varies from country to country.For instance, the US address pattern is "room number + street + state + country", and it performs well in creating a national geodatabase [3].Japanese addresses, on the other hand, are coded based on location and geographic relativity, with the overall order being the opposite of the address pattern used in the US, and generally without "streets" [4].In general, the address patterns of the above countries are nested and relatively standardized.However, Chinese addresses are relatively more difficult to match due to their complex context and rules, mainly due to the following reasons: (1) Chinese addresses are written without separators; (2) Chinese addresses often contain landmarks or POI and topology (e.g., road intersections); (3) different government departments manage addresses, leading to confusion; and (4) address assignment and updating lags behind rapid urban renewal [5].The data objects of this study are Chinese addresses.
Address matching is generally divided into matching based on rule-based or statistical methods and semantic similarity matching based on machine learning and deep learning.Character-based approaches match addresses by calculating their string similarity metrics and then manually establishing a threshold or a particular classifier to identify a match [6,7].String similarity metrics include edit distance and its variants [8][9][10], Jaccard similarity metric [11], and Jaro distance and its variants [12,13].Among them, Santos et al. compared 13 different string similarity metrics for place name matching and found that adjusting the similarity threshold was the key to achieving good performance [14].In addition, the calculation of cosine similarity between embeddings based on N-grams is also a common method [15,16].This method has better performance compared with traditional metrics.Recently, Yong et al. proposed a normalization method based on the Euclidean distance between the address to be processed and the address in the standard library, but it is only applicable to some specific datasets [17].Another type of methods is address element based, which segments out address elements by rules or statistical methods and then compares the address elements and their hierarchy to determine whether they match [18][19][20].Lin et al. point out that the degree of matching of address elements depends on whether they can be extracted correctly [21].In general, dictionary queries [22], probability statistics such as CRF [23] and HMM [24], and creating matching rules [25,26] are the basic ways for retrieving address elements.Another common method is to construct a decision tree consisting of matching rules, each corresponding to a path in the tree.Kang et al. proposed an address matching tree model based on the analysis of the spatial constraint relationship between address elements; this requirement makes the address model more complex [27].Focusing on the wrong word separation problem, Luo and Huang suggested a method based on a trie tree and finite-state machine [28].The aforementioned techniques, however, frequently struggle when dealing with nonstandardized (missing address elements or represented by POI) and complexly structured addresses (such as the Chinese address feature aforementioned).
In recent years, the area of artificial intelligence has seen tremendous progress in natural language processing (NLP), most of which is attributable to deep learning's enhanced performance.Word2vec [29], ELMo [30], GPT [31], BERT [32], XLNet [33], ERNIE [34], and ELECTRA [35] are a few of these classical language models.Since addresses as special textual descriptions, more and more studies in address matching has also introduced natural language models based on deep learning [36].Cruz et al. analyzed 41 papers on address matching published between 2002 and 2021 and discovered that most of the relevant studies have used deep learning methods.Among them, consistent with the above in this paper, due to the complexity of Chinese addresses, Chinese address matching accounted for half of the studies [37].
Comber et al. used CRF and word2vec for address matching to extract the semantics of addresses without designing complex rules [38]; Zhang et al. provides a convolutional neural network (W-TextCNN) for Chinese address pattern classification [39].With the popularity of gating mechanism neural networks, address matching and normalizing based on LSTM and GRU have been carried out by an increasing number of researchers [40][41][42][43].Santos et al. used a deep neural network based on bidirectional GRUs for place name matching [44]; Shan enriched the address context by collecting address data on the Internet and trained an address representation model with two LSTMs and attention mechanisms to extract address vectors [45].While Li et al. incorporated the hierarchical relationship between address elements into a neural network and proposed a BiLSTMbased multitask learning method [46], Chen et al. proposed a contrast learning address matching model based on attention-Bi-LSTM-CNN networks (ABLC) [47].Subsequently, more and more researchers have used the attention mechanism in their address matching models [48][49][50].With the popularity of pretrained language models, Lin et al. used the classical enhanced sequence inference model (ESIM) [51] for address record pair modelling [21], whereas Xu et al. and Qian et al. used the BERT model.Xu et al. proposed a BERT-based model for extracting address semantic representations to achieve the fusion of address semantics and geospatial information [36]; Qian et al. combined BERT and LSTM, and proposed a hierarchical region-based approach for geolocation of Chinese addresses [52].However, all of the aforementioned methods require the extraction of address semantic features to embedding, and this can affect the effectiveness of address semantic understanding, as has been demonstrated in the field of NLP [32].
In summary, when dealing with nonstandardized addresses with complicated structures, the aforementioned approaches still lack a level of comprehension of address semantics, which negatively impacts the accuracy of address matching.
To address the above problems, we use a deep transfer learning approach.First, we pretrain an addresses corpus so that our address semantic model (abbreviated as ASM) can learn unsupervised address contexts to better understand address semantics.Then, we use the address-specific geospatial property to build a labelled address matching dataset, allowing the matching problem to be converted into a binary classification prediction problem.Finally, fine-tuning the ASM with the address matching dataset allows the model to improve its performance significantly.
The contributions of this paper are as follows: (1) A neural network based on a multihead self-attention mechanism and a permutation-based target task is used to train the ASM for a large-scale corpus in an unsupervised automated manner.The ASM can learn address semantics better.(2) A deep transfer learning approach is used to achieve semantic address matching by fine-tuning the ASM, which improves the matching accuracy.(3) A semantic address matching dataset construction method is proposed to convert address matching into a classification prediction task.The method constructs an address matching dataset with labels using location information as the inference condition.(4) Results demonstrate that with the transfer learning approach, a better-performing downstream task such as address matching can also be achieved with microsupervision.
The remainder of this paper is organized in four sections.Section 2 introduces the materials used in our study, as well as the data processing procedures.The methodology adopted is also demonstrated in Section 2, including the pretraining and fine-tuning based on XLNet.The results of our experiments are analyzed in Section 3. Section 4 presents our conclusions and the future work of this study.

Materials and Methods
In this section, we introduce a deep transfer learning approach in NLP and propose a semantic address matching framework.First, we tokenize all address data to be used as model input in the pretraining phase.Then, we use the XLNet model [33] to pretrain the address corpus and make the model understand the address semantics by learning contextual information.Finally, we construct a supervised dataset for semantic address matching, fine-tune the pre-trained ASM for address matching, and compare it with multiple models to evaluate the accuracy of the ASM.

Dataset
Address records for the raw data were manually collected in 2019 from various government departments.The geographical area to which this data refers is Shangcheng District, Hangzhou, Zhejiang Province, China.The address dataset contains a variety of location description types, including standard addresses, nonstandard addresses, POIs, road intersections, place name abbreviations, and so on.The preprocessed address dataset amounted to 1,552,532, consisting of three fields of address records, longitude, and latitude, which served as the address corpus for the pretraining phase.
To use the address data for semantic address matching, we created a dataset of address pairs with labels based on the address corpus.Based on the set of addresses filtered with the same coordinates, we performed manual matching using a standard address database.In addition, to give the model better prediction performance and generalization capabilities, we augmented the dataset with easy data augmentation [51] methods for text classification tasks, mainly using synonym replacement, address element deletion, and address element insertion.To improve the robustness of the model, we constructed mismatched address pairs in the set of address pairs with Jaccard similarity coefficients [11] greater than zero.We finally obtained a dataset of 64,358 address pairs and corresponding labels, a sample of which is shown in Table 1.The statistical features of the dataset used for semantic address matching are shown in the Table 2, where we used the difference in the number of characters, Levenshtein distance [8], and Jaccard similarity coefficient [11] to show the similarity of address pairs in the dataset.Unmatched address pairs will perform worse in terms of text similarity, in line with our common sense.

Semantic Address Matching Definition
In this paper, we study the address matching in the absence of a standard address database, referred to as the semantic address matching task.The following description defines semantic address matching: Given the address dataset: It is important to note that no information other than the string itself and its corresponding geospatial information is utilized in this study to calculate the similarity of two addresses.Therefore, the task addressed in this study focuses on the problem of matching addresses with the same location instead of address disambiguation.In addition, due to the many different representations of the same location, we believe that it is not possible to achieve a correct match without processing from a natural language understanding perspective.Therefore, in our study, "address semantic understanding" refers to the textual understanding of the address corpus, while "address semantic reasoning" used for address matching is based on the spatial relationship reasoning of addresses.

Pretraining Phase Using the Address Corpus Based on XLNet
This section presents a transfer learning-based pretraining model for address semantics: address semantic model (ASM).The ASM is based on the characteristics of Chinese addresses, combined with the advantages of semantic understanding in deep learning natural language models.The model takes as input a single character of a Chinese address that has been tokenized, and uses a multihead self-attention-based semantic extraction module to help the model understand the semantics of the address with the objective of permutation unknown character prediction.For the practical training problem resulting from the prediction objective, a two-stream self-attention structure for target position representations is used.The overall structure of the ASM is shown in Figure 1. . . . . .

Tokenization of Address Characters
The conversion of Chinese addresses into input that can be received by the ASM is the basis for training.Since Chinese addresses are not like alphabetic forms of languages, such as English, they do not have delimiters.Therefore, most Chinese address studies start with the segmentation of address elements.Due to the unique hierarchy of addresses, partitioning addresses into various address elements is already a problem worth studying.Our study, however, aims to convert the complex address matching into a classification problem that can be automated for computer computation.Although the commonly used SentencePiece method [53] in NLP can automate the segmentation of Chinese addresses by counting high-frequency co-occurring characters combined into subword units and constructing dictionaries, the subwords obtained by its segmentation are too long, and some of the segmented words do not conform to the common sense of Chinese addresses, which will affect the semantic understanding during pretraining.
We therefore use the Basic Tokenizer, which tokenizes a character as a unit.It separates words and symbols according to spaces.We first add blank characters before and after each character of the address.Then the characters are matrix-transformed according to the lookup table to become the input of the one-hot encoding, and the activated dimensions in the one-hot encoding are the index number corresponding to the character in the dictionary key-value pair.In this study, two dictionaries-one with non-Chinese characters and the other with solely Chinese characters-are created once the individual characters from each address have been obtained.These dictionaries have 9425 and 3491 characters, respectively.

Objective of Permutation Unknown Character Prediction and Two-Stream Self-Attention Structure
The objective of permutation language modeling is derived from the XLNet model [30].Without altering the character order of the original text, the target employs rearrangement to sabotage the index order of text descriptions.This training target not only preserves the high-order and long-range dependencies present in the text context, but also improves on the disadvantages of past autoregressive language modeling's targets that could only exploit unidirectional contexts (forward or backward), enabling a pretrained model to utilize deep, bidirectional contextual information more effectively.Addresses, as special natural languages incorporating geospatial information and hierarchy, need to fully utilize the bidirectional contextual information, so we use a permutation language model objective for pretraining the address corpus.
Specifically, we assume that given an address record X of length T, there are a total of T! sequences of permutations.If all permutations are traversed and the parameters of the model are shared, then the model must be able to learn the context of all positions.We take a simplified address record, for example, "Hangzhou Underwater World" ("Hang Zhou Hai Di Shi Jie" in Chinese pinyin), and predict the third character "Hai" in a different order, as shown in Figure 2. In Figure 2b, for instance, the address permutation is disordered as 3→2→4→1→6→5, so when predicting "Hai", there is no address context character, and the prediction can only be made based on the previous hidden state.For Figure 2f, the "Hai" (3) character learns all five context characters except itself.
The objective function of XLNet is to maximize the log-likelihood function of the target subsequence conditional on the nontarget subsequence: [ ] ( ) where T Z denotes the set of all permutations of the index of an address record of length T; T z Z ∈ is one of the sequences of indexed permutations, where t z denotes the t-th ele- ment of the sequence of indexed permutations, and t z < denotes the first t-1 elements of z; T z Z Ε : denotes the maximum Expectation; and p θ denotes the predicted probability.In addition, XLNet used the partial prediction optimization.It slices a permutation z into two subsequences, c z ≤ and c z > , where c is the slice point that slices the two subse- quences into a nontarget sequence and a target sequence, respectively.While the above objective of permutation unknown character prediction works well for understanding address semantic by removing ambiguity from the target prediction, it creates the problem that the model does not know the position of the character to be predicted in the original address record.Therefore XLNet [33] introduces a two-stream selfattention structure to let the model know where the character to be predicted is located explicitly, which consists of two sets of hidden representations instead of one.The two streams of representations are updated with a shared set of parameters as follows: ( , ; ) , ( : ) (2) ( , ; ) ,( : ) where Q, K, V denote the query, key, and value in an attention operation [54]; In addition, we employ Transformer-XL with a multihead self-attention mechanism as an address semantic feature extractor [55].Transformer-XL integrates two important techniques, namely, the relative positional encoding scheme and the segment recurrence mechanism.This allows for better adaptation to the two-stream attention permutation language model.As the number of semantic feature extraction structures affects the performance of the model in subsequent experiments, each layer of the Transformer-XL module is tentatively defined in this section as the address-transformer module.

Fine-Tuning for Semantic Address Matching
Fine-tuning is an implementation of deep transfer learning, which refers to adding task-relevant structures and parameters to an already-trained model, and then retraining on a task-relevant corpus [56].We therefore used a newly constructed labelled address matching corpus for the semantic address matching, adding a new neural network structure for a fine-tuned learning model and training framework based on the classification task.The network structure is first superimposed with a layer of fully connected feedforward neural networks for nonlinear transformation, with an activation function of tanh, which is mathematically formulated as follows: After obtaining the probability distribution features using the fully connected neural network, we then connected the fully connected neural network without the activation function for linear transformation.Since semantic address matching is a binary classification task of whether to match, the output of this layer is two-dimensional.Finally, we passed the output probability distribution score of this layer into the SoftMax normalization function to predict the probability of matching or not matching the address pair, respectively.We designed the deep semantic address matching model (abbreviated as DSAMM) with the following objective function.Here, given that the size of the number of address string pairs per batch iteration is batch_size, the predicted probability output is prob(batch_size,2), and the true label sequence is label(batch_size), the true label probability for each address pair is as follows: The final objective function is obtained by taking logarithmic values of the probabilities and then summing them (i.e., log transformation) and averaging them.The objective function is specified below: The accuracy metrics used in this study include precision, recall, and F1 score [57].Precision calculates the proportion of true positive samples out of those predicted to be positive; recall reflects the rate at which positive examples in this are predicted to be accurate and, in semantic address matching, refers to the percentage of correctly matched pairs out of all address pairs that should be correctly matched; and the F1 score is the harmonic mean of precision and recall.

Address Semantic Model Pretraining
We examine the semantic understanding effectiveness of the ASM by examining the prediction accuracy of address characters for permutation language objective.In the experimental design, we refer to the influencing factors of pretraining in a study by Xu et al. and use the number of address-transformer modules and whether the numbers in the address records are replaced with uniform identifiers as the independent variables for the analysis of the pretraining hyperparameters [36].The purpose of the experiments in this section is to validate the effectiveness of the ASM without testing or predicting it, so only the training set and the verification set are required for pretraining the model.Due to the large address corpus, we set the proportions of training set and verification set to be approximately 99% (1,537,532) and 1% (15,000), respectively.
The optimal values for the relevant hyperparameters were determined by drawing on previous studies of pretrained language models and previous experiments.It is shown in Table 3.In terms of the number of address-transformer modules, we set the number of modules to 6, 8, 10, and 12 to observe the performance of the target task in different situations.As shown in Figure 3, the training loss gradient for different numbers of modules decreases rapidly until about 40k steps, and then keeps decreasing slowly and gently in the following iterations, and basically levels off in the last 40k steps, indicating that the ASM instances have been adequately trained.Additionally, the comparison of each ASM instance reveals that the positions of the four curves overlap more, which indicates that the number of address-transformer modules has little influence on prediction performance.As shown in Table 4, the accuracy of the model validation under the above four comparisons ranged from 90.5% to 91.5%, and the target accuracy increased slightly with the number of modules, indicating that the more the number of address-transformer modules, the better understanding of address semantic.However, due to the tiny increase in accuracy and the large increase in training time, building six layers of modules was the most cost-effective option.The findings of this study are consistent with those of Xu et al. [36].Considering that most of the numbers in the address records have only geospatial property and no semantic information (e.g., "No.116Tianmushan Road" and "No.226Tianmushan Road", there is no difference in their contexts other than numbers, so the model cannot make accurate predictions at all.Therefore, we replaced all Arabic numerals in the address corpus with a uniform identifier: "CODE".Since Xu et al. [36] used BERT [32] to construct the ALM for pretraining the address corpus, we used their method for comparison.Figure 4 shows a comparison of the training loss gradient curves of the replaced and unreplaced "CODE" corpus under the ASM.The training loss curve of the model with the "CODE" replacement is always below the original address corpus, demonstrating that the numbers confound the predictive target of the model and reduce the predictive power of the model.As can be seen from Table 5, the prediction accuracy of the ASM increased by approximately 7 percentage points after the replacement with "CODE".In addition, when compared with the ALM, our model's prediction accuracies all improved, indicating that the ASM performs better in understanding address semantics.

Fine-Tuning for Semantic Address Matching
To explore whether there is overfitting, we tested the validation dataset by fine-tuning it while also spacing the number of iterations by 500, and determined the set of hyperparameters that worked best.In this study, 80% of the data were randomly taken as the training dataset and the remaining 20% as the validation dataset.The loss gradient curves for training and validation are shown in Figure 5. Figure 5 shows that the training loss value of the DSAMM decreases rapidly before the steps of training iterations are 1k, indicating the model's excellent learning ability and ability to make a good assessment of whether address pairs match even before the dataset is entirely learned.This has inspired us to investigate whether better matching can be achieved even with microsupervision, that is, with a small amount of supervised training data.
Between 2k and 6k training iteration steps, the training loss value decreases more gently and steadily, indicating that the model is still learning the task objectives for the supervised data.Between the final 6k and 8k, the training loss values have largely flattened out, demonstrating that the DSAMM instance has been sufficiently learned by that number of iterations to warrant further training.The final training loss values ranged from 0.01 to 0.05, indicating that the fine-tuning training was effective, and the exact metric values will be discussed further in the subsequent comparative experimental analysis.In addition, the trend of the loss values is like that of the training loss values, with a "high rate of declineslow decline-gradual levelling off".The final loss values for the validation datasets range from 0.04 to 0.05, indicating that there is no overfitting.We selected the optimal iteration step of the validation datasets as the DSAMM instance after training 8 epochs.
We set up comparative experiments with various gradients of the proportion of training sets to examine whether the DSAMM only needs a limited number of labelled training datasets to attain high matching precision.Figure 6 displays the matching prediction precision with 1%, 5%, 10%, 15%, and 20% of the original training datasets.When employing only 15% of the initial training set of labelled address pairings, the DSAMM achieves a matching precision of above 0.80, and 0.84 when using 20%.It demonstrates that the DSAMM can perform well in weakly supervised learning, most likely due to the transfer learning employed in our study.Our model has fully understood the address semantics using self-supervised learning in the pretraining phase, and when then fine-tuned for task-based learning using partially supervised data, it is able to combine the advantages of the two phases mentioned above, allowing the model to perform at a high level on the task with less supervised training data.This also coincides with the research that first proposed the idea of pre-training [58].

Comparative Experiment Analysis of the Address Matching
To evaluate the semantic address matching performance of the DSAMM we proposed, several baseline models were selected for comparison.The methods compared include character-based matching methods, machine-learning-based matching methods, and deep-learning-based matching methods, where the character-based matching methods in this paper use Levenshtein distance [8], Jaccard similarity coefficient [11], and Jaro similarity [12] to measure string correlation, followed by a random forest (RF) classifier [59] and a support vector machine (SVM) classifier [60] to determine whether the address pairs match.In terms of comparing machine-learning-based matching methods, we compared with the method proposed by Comber et al. [38].It uses CRF to label the address elements, then uses word2vec for embedding, and applies RF and SVM for classification prediction.In terms of deep-learning-based matching methods, we compared with the method proposed by Lin et al. [21].It uses a two-stage model, with the first stage using word2vec to obtain embeddings of address pairs for use as input to the next stage of the deep neural network, and the second stage using a typical deep learning model for interaction-based text matching, ESIM [47], which is directly for address matching.
The results of the comparison of the three metrics for each method are shown in Table 6.As the F1 score is the most representative and important evaluation metric, we present the F1 scores for each method in the form of a bar chart for a more visual comparison.As shown in Figure 7, the Jaccard similarity coefficient stands out when using a string-similarity-based approach for the semantic address matching.The classifier employing RF as the task regularly outperforms SVM, and this is also true for the other two groups using machine learning techniques, according to performance comparisons.When comparing the machine-learning-based matching methods, the CRF method performs worse, probably because the pretraining corpus is more susceptible to influence.In addition, the performance of the machine-learning-based matching methods did not outperform the string-based matching methods, which may be since Comber et al. used relatively generic English addresses, whereas Chinese addresses have their own uniqueness and, therefore, lead to different conclusions [38].
The deep learning method of "word2vec + ESIM" outperformed the string-similaritybased and machine-learning-based methods in a comparison of results, with all the three evaluation metrics having values above 0.96.This shows how the deep learning framework can significantly increase the accuracy of the semantic address matching.All the final metric values attained by training the DSAMM instances in our study were above 0.98, with the F1 value coming in at 0.984, which represents a notable improvement in prediction evaluation values and the best metrics.This suggests that the migration learning model used in this work can effectively increase the accuracy of semantic address matching.Additionally, we constructed the model without segmenting the address text for elements, which significantly increased the effectiveness of address matching.

Conclusions
In this study, we built the ASM with strong address semantic understanding using a pretraining approach to semantic modelling of a vast and complicated address corpus.We introduced a fine-tuning approach in deep transfer learning to achieve a high accuracy of semantic address matching.The main conclusions of this study are as follows: ≠ , and B represent the comparison operator.The operation ob- jects on either side of the comparison operator refer to the same real-world object with the same coordinates.
content representation, which serves a similar role to the standard hidden states in Transformer, and t z g denotes the query representation, which only has access to the contextual information z t X < and the position t z X , but not the content t z X .

Figure 3 .
Figure 3.Comparison of training loss gradient curves for the ASM with a different number of address-transformer modules, where the curve smoothness is set to 0.9, and the opaque curve is after the smoothing setting and the translucent curve is the original loss gradient curve.

Figure 4 .
Figure 4. Comparison of training loss gradient curves for the ASM with Arabic numeric code replacement and without replacement, where the curve smoothness is set to 0.7, and the opaque curve is after the smoothing setting and the translucent curve is the original loss gradient curve.

Figure 5 .
Figure 5. Training loss-gradient curve overlaid with validation loss-gradient curve of the DSAMM, where the curve smoothness is set to 0.7, and the opaque curve is after the smoothing setting and the translucent curve is the original loss gradient curve.

Figure 6 .
Figure 6.Comparison of the precision of the validation set of the DSAMM instances with different scaled training sets with labels.

Table 1 .
Examples of some data in the labelled address dataset.

Table 2 .
Statistical characteristics of the labelled address dataset.

Table 3 .
Setting values for each hyperparameter of the ASM.

Table 4 .
Values for the ASM indicator for a different number of address-transformer modules.

Table 5 .
The prediction accuracy of the ASM and the ALM in the validation datasets.

Table 6 .
Comparative evaluations of different address matching methods in precision, recall, and F1 score.
Figure 7. Comparative evaluations of different address matching methods in F1 score.