Word Sense Disambiguation Using Prior Probability Estimation Based on the Korean WordNet

: Supervised disambiguation using a large amount of corpus data delivers better perfor ‐ mance than other word sense disambiguation methods. However, it is not easy to construct large ‐ scale, sense ‐ tagged corpora since this requires high cost and time. On the other hand, implementing unsupervised disambiguation is relatively easy, although most of the efforts have not been satisfac ‐ tory. A primary reason for the performance degradation of unsupervised disambiguation is that the semantic occurrence probability of ambiguous words is not available. Hence, a data deficiency prob ‐ lem occurs while determining the dependency between words. This paper proposes an unsuper ‐ vised disambiguation method using a prior probability estimation based on the Korean WordNet. This performs better than supervised disambiguation. In the Korean WordNet, all the words have similar semantic characteristics to their related words. Thus, it is assumed that the dependency be ‐ tween words is the same as the dependency between their related words. This resolves the data deficiency problem by determining the dependency between words by calculating the 𝜒 (cid:2870) statistic between related words. Moreover, in order to have the same effect as using the semantic occurrence probability as prior probability, which is used in supervised disambiguation, semantically related words of ambiguous vocabulary are obtained and utilized as prior probability data. An experiment was conducted with Korean, English, and Chinese to evaluate the performance of our proposed lexical disambiguation method. We found that our proposed method had better performance than supervised disambiguation methods even though our method is based on unsupervised disambig ‐ uation (using a knowledge ‐ based approach).


Introduction
The present paper addresses lexical disambiguation occurring in the semantic analysis phase of the natural language analysis process that includes cases of ambiguity. In natural language processing, lexical disambiguation refers to the determination of the correct semantic meaning for a word that has multiple meanings (hereafter referred to as an ambiguous word) by evaluating the meaning in its context [1]. Lexical disambiguation, which is the same as morphological analysis and syntactic analysis, is essential in natural language processing and plays an important role in various application areas. In machine translation, lexical disambiguation is critical to select the correctly translated word for a given word. For example, the English verb 'build' can be translated into Korean as construct, build, produce, establish, or develop, and the word that is the most correct should be selected from among these. In information search systems, lexical disambiguation of a query word can provide the high-quality information that a user needs. For example, if a query word inputted by a user is court, the search engine should present the results by categorizing the information into courthouse-related and palace-related suggestions. In addition, it is important to resolve semantic ambiguity in text mining for documents in specialized fields such as medical documents [2,3].
Lexical disambiguation has been a primary interest since the 1950s when natural languages began to be processed by computers. Its study has been conducted based on the following two lexical disambiguation methods. The first is a method based on knowledge bases such as machine-readable dictionaries. The second is a method based on statistical information extracted from large amounts of corpus data. In particular, since the 1990s, studies based on large amounts of corpus data have been actively conducted. In this method, a problem of word sense ambiguity has been simplified as a statistical classification problem in machine learning so that traditional machine learning techniques (for example, case-based learning, decision tree, and Bayesian classifier) are applied to solve the problem. Lexical disambiguation through machine learning is divided into supervised and unsupervised disambiguation, depending on whether a corpus consisting of individual sense-tagged words (hereafter referred to as a sense-tagged corpus) is used for the learning [4].
In lexical disambiguation, supervised disambiguation using a large amount of sensetagged corpus has shown better performance than other lexical disambiguation methods. However, the construction of a large sense-tagged corpus takes high cost and time, which is a drawback. On the other hand, while implementing unsupervised disambiguation is relatively easy, their performance is usually not satisfactory. In particular, Korean lacks language resources such as machine-readable dictionaries and sense-tagged corpora, compared with English. Therefore, in order to overcome the limitations of these linguistic resources in minority languages such as Korean and Vietnamese, it is urgent to study a method for clarification of vocabulary [5].
In this paper, a novel, unsupervised disambiguation method that shows better performance than existing knowledge-based lexical disambiguation and unsupervised lexical disambiguation methods, without the need of a large sense-tagged corpus, is proposed. Generally, the reason for the low accuracy of knowledge-based lexical disambiguation and unsupervised lexical disambiguation is the lack of the semantic occurrence probability of ambiguous words and the data deficiency problem that occurs while dependency between words is being determined. The novel, unsupervised disambiguation method proposed in this paper uses prior probability estimation based on the Korean WordNet [6], which takes advantage of the semantic characteristic that all words share the same semantic characteristics with their related words (hypernym, hyponym, and coordinate term). Thus, it is assumed that a dependency between words is the same as a dependency between related words so that a data deficiency problem is solved by determining the dependency between words by calculating the chi-square statistic between related words. Moreover, in order to have the same effect when using the semantic occurrence probability as using prior probability, which is used in supervised disambiguation, semantically related words of ambiguous vocabulary are obtained and utilized as prior probability data.
The present paper is organized as follows: In Section 2, existing studies on lexical disambiguation are summarized. The lexical disambiguation method using related words in the Korean WordNet, which is proposed in this paper, is explained in Section 3. In particular, a solution to the data deficiency problem using the expansion of related words and a method of using semantically related words as the prior probability is explained in detail. In Section 4, the experimental method and results are described. Finally, in Section 5, conclusions and future research are discussed.

Related Study
Lexical disambiguation has been a major concern since natural language began being processed with computers in the 1950s, but its research achievement has been insufficient compared with studies on morphological analysis. In morphological analysis, part-ofspeech tagging accuracy is generally more than 95% with respect to all vocabularies in a total corpus. On the other hand, in lexical disambiguation, sense-tagged accuracy for frequently used specific words is only 80% to 90%.
In early research on lexical disambiguation, studies based on knowledge bases such as machine-readable dictionaries were conducted actively. The study of Lesk [7] can be referred to as a representative example. Lesk identified the meaning of an ambiguous word according to the multiplicity between the words used in the definition of the ambiguous word in a dictionary and the words used in the definition of neighboring words of the ambiguous word in a dictionary. This method had the advantage that it did not require high-cost language resources, and implementation was relatively simple. It did suffer from a severe data deficiency problem. However, that occurred due to its requirement of a highly accurate match between words, only showing a low accuracy of 50% to 70%. To minimize this problem, Luk [8] proposed a method of extracting common words as a definition concept from the Longman modern English dictionary and then extracting statistical information regarding the definition concept from the Brown Corpus to remove ambiguity. However, this method did not provide a fundamental solution to the data deficiency problem.
A study based on a knowledge base opened a new era by utilizing WordNet, a lexical semantic network developed for lexical disambiguation. WordNet calculates the shortest path between meanings in a word sense disambiguation study using a lexical-semantic network [6,9,10]. The similarity or semantic relationship type between meanings is found using the hierarchy's distance from the highest meaning. Resnik [10] proposed a method of measuring the semantic similarity of nouns in the IS-A hierarchy relationship in Word-Net for the use of lexical disambiguation. Agirre et al. [11] defined conceptual density that calculated a distance between words using the semantic relationships of WordNet to calculate the conceptual density between co-occurrence words within the context that included ambiguous words, thereby determining the meaning of the ambiguous word. Mihalcea et al. [12,13] proposed a technique to remove the word sense ambiguity by obtaining the co-occurrence statistical data between the two words and then measuring the semantic density between the two words through WordNet and removing the word sense ambiguity based on the rank. Other than the similarity calculation between senses or concepts based on WordNet, additional studies on lexical disambiguation using WordNet can be found, such as Pederson et al. [14] and Ganesh et al. [15]. Ted proposed a method using a modified dictionary-based algorithm of Lesk that was applicable to WordNet. Ganesh proposed a method that determined the word sense by choosing the synset (synonym set) of the gloss that had the highest similarity once the similarity between the words in the context that included the target word and gloss in WordNet was calculated using cosine and Jacquard similarity. Such WordNet-based lexical disambiguation techniques have an advantage that mitigates the data deficiency problem by expanding an ambiguous word and co-occurrence words used with the ambiguous word.
A graph-based word sense disambiguation method using WordNet is also one of the widely studied methods [13,[16][17][18]. Such methods convert an input sentence into a graph format that has a synset of WordNet as a basic unit and calculates the semantic similarity of the global context rather than the local context using lexical chains. A lexical chain is a sequence of related words in writing that is referred to as a unit that represents consistent meaning in context or a whole paragraph. That is, rather than calculating semantic similarity between words in a local context, the semantic similarity between lexicon chains, or a lexicon chain and a word, is calculated so that information that is more accurate can be obtained for lexical disambiguation. In graph-based lexical disambiguation, well-known graph-based technologies are used to structure the graph, thereby determining the optimal lexicon chain. The graph-based lexical disambiguation method showed the best performance among the methods of lexical disambiguation utilizing WordNet, but it had the drawback that it took a long time to determine the optimal lexicon chain when the graph structure was complicated.
In the case of the Korean language, a large-scale lexical semantic network such as WordNet did not exist in the early days of research on lexical disambiguation, and so studies based on statistical-based lexical disambiguation were conducted. Since 2000, several lexical-semantic networks have been developed, and thus studies based on lexicalsemantic networks have been conducted to overcome word sense ambiguity. Heo et al. [19] proposed a lexical disambiguation model utilizing mutual information extracted from the Korean Noun Concept Network (ETRINET), a compound noun sense-tagged dictionary and raw corpus.
Like other tasks in the field of natural language processing, deep learning-based supervised learning models show good performance in resolving word sense disambiguation [20][21][22][23]. However, these models are expensive to build the training data because they require a large corpus containing semantic information for word senses. Therefore, using external resources such as WordNet, knowledge-based word sense disambiguation is a good approach for word sense disambiguation [24][25][26]. In this study, the relationship between ambiguous words and co-occurrence words within a context is determined using a Korean lexical-semantic network. Moreover, to have the effect of using prior information in supervised disambiguation, semantically related words of ambiguous vocabulary are obtained and utilized as prior probability data.

Lexical Disambiguation Using the Korean Lexical Semantic Network
This section explains the unsupervised lexical disambiguation method using the Korean Lexical Semantic Network (KorLex), which is proposed in this paper. Generally, supervised disambiguation shows better performance than unsupervised disambiguation but requires a large-scale, sense-tagged corpus. In this paper, a sense-tagged corpus, which involves a high development cost, is not used. In its place, a morph-tagged corpus of 5 M word phrases is used. Moreover, richer statistical information is exploited through the expansion of semantically related words by utilizing KorLex (Korean Lexical Semantic Network), and prior probability is estimated by calculating semantically related words of ambiguous words.

Analysis of Relationship between Words Using the Korean Lexical Semantic Network (KorLex)
The Korean Lexical Semantic Network (KorLex) was developed using WordNet as a reference model and included approximately 130,000 synsets and about 150,000 word senses. A synset is a set of synonyms that share the same word sense. In this paper, a word that has more than two synsets in the Korean Lexical Semantic Network is considered as an ambiguous word. For example, in Korean, sagwa is an ambiguous word that has two synsets, sagwa 1 meaning 'apology' and sagwa 2 meaning 'fruit of apple tree (apple).' To distinguish such ambiguous words, semantic relation words are used. Semantic relation words are words in semantic relations in the hierarchy of the Korean Lexical Semantic Network. Semantic relation words are called hypernym, hyponym, and coordinate terms, depending on the relationship. Figure 1 shows the relation words of sagwa 2 in the Korean Lexical Semantic Network.
Relation words in a hierarchy of the Korean Lexical Semantic Network have the same characteristic. In particular, coordinate terms in relation words have the same co-occurrence words. For example, sagwa (apple) and boksunga (peach) are hyponyms of gwail (fruit), and so are related with mukda (eat) and masitda (delicious). Sagwa (apology) and gamsa (appreciation), however, are hyponyms of inji (recognition), that has no relationship to mukda (eat) or masitda (delicious). Thus, word sense ambiguity can be removed by identifying the relationship of the semantic relation words of ambiguous and co-occurrence words in the local context. Figure 2 shows the relationship between coordinate terms of Sagwa and words in the local context in the Korean Lexical Semantic Network.  The most basic method used to analyze the relationship between two words is to determine the frequency of co-occurrence of the two words. That is, finding how often two words are used in a local context will be a measure to determine the relationship between the two words. However, because of words that are used normally, regardless of the meaning of the ambiguous word, the frequency of the co-occurrence cannot determine the relationship between two words. To overcome this, various kinds of statistical approaches are used, such as information-theoretic measures, likelihood measures, statistical hypothesis tests, and coefficients of association strength. Among them, we use the chi-square independence test, which is known to be easy to interpret and effective in finding collocations [27][28][29]. Figure 3 shows the relationship analysis between two words using the semantic relation words of an ambiguous word. Assuming that the meaning of an ambiguous word is and co-occurrence word is , then the chi-square statistic of the two words , is calculated by the following formula according to the relation word of .
Here, if the relation word is an ambiguous word, it would cause a problem when calculating the chi-square statistic. One of the solutions is to exclude ambiguous words from the related words. However, it would be dangerous to exclude ambiguous words since it could, in the worst case, remove all related words. This would also not help to reduce the data deficiency problem. In this study, then, assuming that the semantic frequency of an ambiguous word is the same, the frequency of an ambiguous word that has meanings is calculated as ⁄ .  Table 1 shows an analysis of the relationship between an ambiguous word and the co-occurrence words in a local context in the sentence: 'Sagwa han gairul megeotda'. ('I ate an apple'). Based on Table 1, various methods that can distinguish the meaning of the ambiguous word 'sagwa' can be found. The simplest method is to select the meaning with the largest number of related words in the local context using the chi-square test of independence. A null hypothesis is set that the co-occurrence of two words has no relationship to each other and the hypothesis is tested through the independence test. If the null hypothesis is rejected, then the alternative hypothesis is selected, concluding that the co-occurrence of two words is related to each other.  Null hypothesis: Two words ( , ) are not related to each other (independent), Alternative hypothesis: Two words ( , ) are related to each other (dependent), If the chi-square statistic is above a critical value, the null hypothesis is rejected, concluding that the two words are related to each other. In the chi-square distribution table (Table A1), the critical value is 7.88 when the degree of freedom is 1 and the significant level is 0.005. Table 2 shows the number of semantically related words of the ambiguous word using the chi-square test of independence. In Table 2, the number of words related to 'Sagwa1' is three while the number of words related with 'Sagwa2' is one. Thus, in the sentence 'I ate an apple', the ambiguous word 'Sagwa' should be 'Sagwa 1', which has more related words in the local context. However, this method has several problems. First, when the number of related words is the same, there is no way to resolve the lexical ambiguity. Table 3 shows an analysis of the relationship of the ambiguous word and the co-occurrence words in a local context in the sentence 'Naneun sagwareul badatda' (I received an apology). As shown in Table 3, the two meanings have the same number of related words, namely, one. Thus, there must be another method to distinguish the meaning of an ambiguous word other than simply using the number of related words. Second, despite the fact that each word's degree of semantic relation is different, the relationship of the words above the critical chi-square statistic value is the same. That is, some words among the cooccurrence words in the local context may have more weight to determine the meaning of the ambiguous word, but this method cannot reflect this. For example, in Table 3, both of the co-occurrence words, 'Na' and 'Batda', have a chi-square statistic above 7.88. The cooccurrence word 'Batda', however, has a closer relationship with 'Sagwa1'.
Generally, the larger the chi-square statistic, the greater the relationship between the two words. Therefore, a method of applying the chi-square statistic is to use a sum, average, and multiplication of weight of the chi-square statistic. The multiplication of weight is a factor calculated to show the influence on the meaning based on the ratio of the chisquare statistic assuming that the total influence of all the co-occurrence words on the ambiguous word is one.
As shown in Table 4, the sum, multiplication, average, and multiplication of weight of the chi-square statistic indicate a correct answer for the lexical disambiguation. Among them, using the multiplication of weight showed the best performance in the experiment result. Due to the characteristic of the chi-square statistic, if the frequency of a specific cooccurrence word is significantly over a certain threshold, the chi-square statistic also becomes too large. Because of this, the use of the sum, multiplication, and average of the chisquare statistic can result in an incorrect result where a specific word can decide the outcome. Thus, normalization of the chi-square statistic between 0 and 1 is required. The weight is used for normalization in this study. The following formula shows the word sense disambiguation using co-occurrence words of semantically related words in the local context and weight of the value.
, argmax , , To prevent a resulting value of the above formula from being zero or infinite value because the value was zero, a non-observed data frequency was estimated using the Good-Turing frequency estimation. In addition, performance can be different depending on which relation words and which relationships in the Korean Lexical Semantic Network are used. In this study, a relationship that can be used for word sense disambiguation is divided into five relationships and is calculated by varying the weight. The five relationships are: ① coordinate term (s), ② hyponym (c), ③ hypernym (p), ④ hyponym of coordinate term (sc), and ⑤ coordinate term of hypernym (ps). In Section 4, the weight ( ) of relation words will be found through experiments. , Furthermore, the data deficiency problem will be solved by normalizing the number of words using the part-of-speech information of the words. Table 5 shows the normalized expression and an example of words.

Expansion of Semantically Related Words to the Ambiguous Word
In Section 3.1, semantic relation words of the ambiguous word were expanded in the hierarchical structure of the Korean Lexical Semantic Network. However, a lack of statistical information due to an ongoing data deficiency problem prevented finding the semantically related co-occurrence words. One of the reasons is insufficient relation words. For example, 'shinjang' used as a meaning of kidney is the lowest hyponym in the Korean Lexical Semantic Network; thus, there is no hyponym and only two coordinate terms 'Kongpat (kidney)' and 'Bulggotsepo (flame cell)'. Even using all five relationships in Section 3.1, only 13 related words can be found. To solve such a data deficiency problem, words that are related to an ambiguous word must be expanded.
In this paper, a set of semantically related words of an ambiguous word is created through the chi-square statistic used in Section 3.1. The related words refer to a collocation of two words in a semantic co-occurrence relationship. This is a significant clue to determine the correct meaning of the ambiguous word. First, the collocation words that are in the relationship of collocation with the ambiguous word are found using the chi-square statistic from the Sejong morph-tagged corpus. Then, a set of semantically related words of the ambiguous word is created using the chi-square test of independence to determine which meaning of the ambiguous word is in the collocation relationship with the collocation words found. Table 6 shows a part of a set of semantically related words of the ambiguous word 'Noon'. Using the semantically related words of the ambiguous word, word sense ambiguity can be removed as shown in Section 3.1. Not only can using the relationship between semantically related words of an ambiguous word and co-occurrence words in the local context be a method for removing word sense ambiguity, but also using the appearance of semantically related words of the ambiguous word in the local context. Word sense ambiguity is removed using the semantic determination formula in Section 3.1. Figure 4 shows the expression of the relationship analysis between words using related words. Moreover, we attempted to overcome the data deficiency problem by expanding the coordinate terms of words that are highly related among the related words of an ambiguous word. In Section 4, we will discuss how many expansions of the related word are needed through an experiment. Word sense ambiguity will be removed using the coordinate terms of the semantically related words of an ambiguous word. Not only using the relationship between the semantically related words of an ambiguous word and the cooccurrence words in the local context but also using the appearance of the semantically related words of the ambiguous word in the local context as shown in Section 3.1 can be a method of removing word sense ambiguity. Figure 5 shows the expression of the relationship analysis between words using the coordinate terms of the semantically related words of an ambiguous word. In supervised disambiguation, a disambiguation corpus, a corpus that was classified into meaning in a specific context of all appearances of the ambiguous word, is used as the learning data. A naïve Bayesian classifier is a statistical theory that is applied in natural language processing and lexical disambiguation. The naïve Bayesian classifier identifies the meaning using adjacent words of the ambiguous word in a large-scale context. Adjacent words provide useful information to identify the meaning of the ambiguous word so that statistical inference can be applied using the co-occurrence frequency information of these adjacent words. The naïve Bayesian classifier uses Bayesian decision rules to minimize error probability when determining class.
Assuming that is words used for a contextual feature in a context where the ambiguous word appears in a corpus, the decision rule of the naïve Bayesian classifier that solves the word sense ambiguity based on the contextual feature is as follows: In the above formula, , ⋯ , | and are calculated by maximum-likelihood estimation from the learning corpus of disambiguation. Here, , ⋯ , | is posterior probability and is prior probability. Generally, the reason for the high performance of probability models, such as the naïve Bayesian classifier, in supervised disambiguation is due to the large influence of prior probability, which is the semantic probability of words. Most ambiguous words have two or more meanings, but only one or two meanings are actually used frequently in our daily lives. Thus, if we know the semantic word's prior probability in advance, it would significantly increase the lexical disambiguation performance.
Moreover, in this paper, in order to realize the same effect of using prior information in supervised disambiguation, semantically related words of the ambiguous vocabulary are obtained and utilized as prior information. Using prior information, word sense ambiguity can be solved even under cases where words that are strongly related to a specific meaning in the local context are not found or semantically related words cannot be found due to the lack of statistical information caused by data deficiency.
In this paper, the semantic prior probability of an ambiguous word can be calculated using the weight of the semantically related words of the ambiguous word, as indicated in the following formula. The prior probability of meaning of an ambiguous word is assumed to be a ratio of the frequency of and the related word of meaning . (9)

Experiment Environment
In this paper, the 'Sejong morph-tagged corpus (approximately 5 M word phrases)', a 21st Century Sejong Project deliverable, was used to extract statistical information. Nouns, adjectives, and verbs were extracted from the Sejong morph-tagged corpus and the co-occurrence frequency of all the words was found in the dictionary.
In order to compare the lexical disambiguation method proposed in this paper with other studies, experiments were conducted using the Korean learning data called SENSE-VAL-2. SENSEVAL is a contest for word sense disambiguation technology under the sponsorship of ACL SIGLEX and EURALEX. It has been held every three years since 1998. Two Korean teams participated in the second contest. The target words in SENSEVAL-2 for Korean learning data were 'mal', 'noon', 'son', 'baram', 'geori', 'jari', 'euisa','mok','jeom', and 'bam'. The detailed data composition can be found in Appendix A.
The evaluation measure for lexical disambiguation methods in this paper was accuracy. The accuracy can be obtained as follows: % the number of ambiguous words whose meanings were correctly distinguished the number of ambiguous words (10)

Experiment Method
A window size of context was considered when co-occurrence words appeared in the local context of the ambiguous word used for the lexical disambiguation. A window size refers to the number of words on the right and left sides of the ambiguous word. As the window size became larger, accuracy also increased rapidly, eventually being stabilized. In this paper, considering the size of the statistical dictionary, five was selected as a basic value of the window size.
The performance also depends on which relation words are used in the Korean Lexical Semantic Network. In this study, a relationship that can be used for word sense disambiguation was divided into five relationships, as shown in Section 3.1 and was calculated by varying the weight. The weight of the coordinate terms was fixed to 1.0 while varying the weights of other related words in the experiment.
As shown in Figure 6, when the weight of the coordinate term is one, the best accuracy was obtained if the weight of the hyponym was 0.8 and the weight of the hyponym of the coordinate term was 0.2. Furthermore, the accuracy was higher when there was no expansion of the hypernym and the coordinate term of the hypernym. In this study, weights of relation words were set as follows.
, 0.5, 0.4, 0, 0.1, 0 (11) In addition, when the coordinate terms of the semantically related words of an ambiguous word were expanded, the range of related words to be expanded was changed in the experiment. Figure 7 shows a change in accuracy according to changes in the expansion range of the coordinate terms of the semantically related words of an ambiguous word.  As shown in Figure 7, it is more effective for word sense disambiguation to expand only the words that are highly related rather than expanding the collocation coordinate terms of all the semantically related words. In this study, only the coordinate terms of the collocation words in the top 25% of the semantically highly related words of an ambiguous word were expanded.
To evaluate the performance of the algorithm proposed in this paper, a method of determining the meaning by the most frequent class (MFC) was used as the baseline for comparison of performance. In addition, performance for a basic algorithm and the newly improved algorithm was compared.
The basic algorithm is the one that was used previously in the lexical disambiguation system at Busan University that calculated using the semantic coordinate terms of the ambiguous word. The improved method for lexical disambiguation in this paper solves the data proficiency problem as follows: ① A weight is adjusted according to the types of semantically related words of an ambiguous word so that more information regarding the relation words can be used than in existing methods. ② Semantically related words of an ambiguous word and the coordinate terms of the related words are expanded so that more information can be used than in existing methods. ③ Using the part-of-speech information of words, normalization is done for words such as numerals and proper nouns. Table 7 shows a comparison of the performance between the basic and improved algorithms. The average accuracy of MFC was 78.29%, while accuracy of the proposed algorithm was 88.11%. Then, the number ① improvement method was applied to the basic algorithm. Next, the numbers ① and ② improvement methods were applied, and finally, the numbers ①~③ were applied. Accuracy was improved 5.08%, 8.01%, and 9.90%, respectively. The proposed method showed better accuracy in most ambiguous words than the MFC. However, the MFC had significantly high accuracy for words whose evaluated corpus meaning was biased to one-sided direction such as 'baram' and 'mok'. In particular, the accuracy for 'mal' was the lowest in terms of lexical disambiguation. This was because 'mal' expressing 'grain' or 'unit of quantity of liquid' appeared more frequently than 'mal' meaning 'means to express people's thought and feeling' that was widely used in general. Table 8 shows the ratios of the meanings of 'mal' in the 'Korean learning data in SENSE-VAL-2'.

Analysis of Effect of the Prior Probability Estimation
A method of using prior information in the previous studies was developed that included: performing lexical disambiguation on raw corpus using a basic model (hereafter referred to as a primary model) based on unsupervised disambiguation, and applying the extraction of prior knowledge from the previous result to the primary model (hereafter referred to as a secondary model). Figure 8 shows this process. In order to compare with the proposed prior probability estimation, an experiment was conducted using the prior probability estimation method shown in Figure 8. Using statistical information extracted from the learning corpus, a primary model was constructed (using related words and relation words) and ② using the primary model, lexical disambiguation was conducted with regard to learning corpus. ③ Prior knowledge was extracted using the lexical disambiguation result, ④. A secondary model was constructed using the primary model and extracted prior knowledge and ⑤ lexical disambiguation was conducted with regard to the evaluation of the corpus using the secondary model. Table 9 shows the performance of lexical disambiguation when prior probability estimated using the method in Figure 8 and the prior probability proposed in this study were used. As shown in Table 9, the use of prior knowledge estimated using semantically related words of the ambiguous word contributed more to the lexical disambiguation than using prior knowledge extracted from the learning corpus tagging results. This result was revealed because the accuracy of the primary model was 83.49% on average so that the secondary model could not be constructed using the accurate prior knowledge.
To determine whether the proposed method in this paper showed the same performance in other languages, we conducted an experiment with English. For the English, the English WordNet was used instead of the Korean Lexical Semantic Network. We evaluate out methods using the SemCor [30], which is an English corpus with semantically annotated texts. The semantic analysis was done manually with WordNet 1.6 senses (SemCor version 1.6) and later automatically mapped to WordNet 3.0 (SemCor version 3.0). The SemCorpus corpus consists of 352 texts from Brown corpus. Table 10 shows the performance of our model and the performance of the existing model for SemCor. Existing models to be compared are deep-learning language model-based fine-tuning models for word sense disambiguation. All three models were fine-tuned to 80% of SemCor using the basic model and then evaluated for 20%. As can be seen in Table 10, our proposed method showed almost the same or slightly lower performance than the existing models even though it is not supervised learning. In particular, in the evaluation results for SE13, our model showed the best performance. This is because, in the case of SE13, the number of training data are very small compared with other tasks. It can be seen that the deep learning-based model performs better if the training data for target word is sufficient, but our proposed method performs better when the training data are insufficient.

Practical Experiment with the Proposed System
As explained earlier, lexical disambiguation can be utilized as a preprocessing system in various natural language processing application areas such as information search or machine translation. To increase the acceptance of the system as a preprocessing system, it is necessary to increase the performance of the lexical disambiguation and reduce the processing time and minimize the required storage space. In particular, the calculation of the chi-square statistic and semantic prior probability takes significant time. In this study, the dictionary of chi-square statistics between the words and the prior probability dictionary were constructed beforehand. The chi-square statistics and related words were previously obtained through a search method thereby minimizing the processing time of the lexical disambiguation.
To search the chi-square statistic, a certain block unit of indexes was created using the chi-square information. A target block was found using a word pair key to load the block from the file to the memory thereby fetching the chi-square statistic of the target word pair using a binary search. The prior probability information was connected to the word index directly, thereby fetching the prior probability information when a word pair key was searched. Figure 9 shows the aforementioned chi-square statistic and search method for the prior probability information.
For the practical experiment of the lexical disambiguation technology based on the large-scale chi-square statistic and prior probability, speed was measured. Based on the top ranking of the appearance of words in the Sejong semantic-tagged corpus, 200 ambiguous words were extracted and tested over 10,000 sentences to analyze the processing speed and performance of the lexical disambiguation. The average number of meanings of the ambiguous words was 5.7 words.
As shown in Figure 10, when the memory-based search method was not used, the execution time was 350 s but when the memory-based search method was used, the execution time was 22 s. That is, an average of 450 ambiguous words was resolved per second.  The average accuracy of the lexical disambiguation was 86.3%, which was about 4% lower than using the SENSEVAL-2 data. This was because the average number of meanings was larger than the number of ambiguous words in the SENSEVAL-2 data. Figure 10 shows the semantic analysis accuracy distribution over 200 ambiguous words. More than 90% of accurate semantic analysis was shown in 67 ambiguous words, which accounted for 31% of the total words; 85~90% accuracy was shown in 89 ambiguous words.

Conclusions and Future Research
This paper proposed a novel unsupervised disambiguation method that showed better performance than existing knowledge-based lexical disambiguation or unsupervised lexical disambiguation methods without need of a large amount of sense-tagged corpus.
Since the related words in the Korean Lexical Semantic Network have the same characteristics, the meaning of an ambiguous word could be distinguished by determining the relationship between the semantic relation words of the ambiguous word and the co-occurrence words in a local context. Moreover, the performance of the lexical disambiguation method was improved by using more relation word information than existing methods. Weights were adjusted depending on the semantic relation word type of an ambiguous word and expanding the semantically related words of the ambiguous word and coordinate terms of the related words. Finally, numerals and proper nouns were normalized using the part-of-speech information to solve the data deficiency problem, while semantically related words of an ambiguous word were obtained and used as prior information in order to have the same effect of using prior information in supervised disambiguation.
The contributions of this study are as follows: First, lexical disambiguation was conducted using statistical information without a sense-tagged corpus by utilizing KorLex, which is a Korean Lexical Semantic Network. Second, better performance was achieved using only the minimum information (frequency of appearance of a single word, frequency of appearance of co-occurrence, and part-of-speech information) than the existing knowledge-based lexical disambiguation method.
Future research will first include, evaluating additional ambiguous words using other evaluation data to further increase the reliability of the systems. Second, a study on preprocessing, such as selection constraints, will be done for an analysis that cannot be solved by statistical information due to the data deficiency problem.

Conflicts of Interest:
The authors declare no conflict of interest.