A Survey on Portuguese Lexical Knowledge Bases: Contents, Comparison and Combination †
Abstract
:1. Introduction
2. Related Work
3. Open Portuguese LKBs
- Two synset-based thesauri: TeP [18] and OpenThesaurus.PT (http://paginas.fe.up.pt/~arocha/AED1/0607/trabalhos/thesaurus.txt (January 2018)) (OT.PT);
- Three lexical-semantic networks extracted from Portuguese dictionaries: PAPEL [19], relations extracted from Dicionário Aberto (DA) [20], and relations extracted from Wiktionary.PT (http://pt.wiktionary.org (2015 dump));
- Semantic relations available in Port4Nooj [21], a set of linguistic resources.
- Semantic relations between Portuguese words in the ConceptNet [22] semantic network, which includes common-sense knowledge, lexical knowledge and others.
4. Redundancy in Portuguese LKBs
5. Comparing Portuguese LKBs Indirectly
5.1. Selecting the Most Similar Word from a Small Set
5.2. Computing the Similarity between Word Pairs
- PageRank vectors, inspired by Pilehvar et al. [30]. For each word of a pair, Personalized PageRank was first run in the target LKB, for 30 iterations, using the word as context; a vector was then created with the resulting rank of each other word of the LKB in each position. Finally, the similarity between the vectors for each word was computed, using: the Jaccard coefficient between the sets of words in these vectors (PR-Jac) or the cosine of the vectors (PR-CosV). Given the large vector sizes, vectors were trimmed to the top ranked words. Different sizes N were tested, from 50 to 3200.
5.3. Answering Cloze Questions
5.4. Textual Similarity and Entailment
6. Conclusions
Conflicts of Interest
References
- Fellbaum, C. (Ed.) WordNet: An Electronic Lexical Database (Language, Speech, and Communication); The MIT Press: Cambridge, MA, USA, 1998. [Google Scholar]
- Marrafa, P. Portuguese WordNet: General architecture and internal semantic relations. DELTA 2002, 18, 131–146. [Google Scholar] [CrossRef]
- Gonçalo Oliveira, H. Comparing and Combining Portuguese Lexical-Semantic Knowledge Bases. In Proceedings of 6th Symposium on Languages, Applications and Technologies (SLATE 2017); Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, OASICS: Kobe, Japan, 2017; Volume 56, pp. 16:1–16:15. [Google Scholar]
- De Paiva, V.; Real, L.; Gonçalo Oliveira, H.; Rademaker, A.; Freitas, C.; Simões, A. An overview of Portuguese Wordnets. In Proceedings of the 8th Global WordNet Conference (GWC’16), Bucharest, Romania, 27–30 January 2016; pp. 74–81. [Google Scholar]
- Magnini, B.; Cavaglià, G. Integrating Subject Field Codes into WordNet. In Proceedings of the 2nd International Conference on Language Resources and Evaluation (LREC 2000), Athens, Greece, 31 May–2 June 2000; ELRA: Paris, France, 2000; pp. 1413–1418. [Google Scholar]
- Shi, L.; Mihalcea, R. Putting Pieces Together: Combining FrameNet, VerbNet and WordNet for Robust Semantic Parsing. In Proceedings of Computational Linguistics and Intelligent Text Processing (CICLing’05); Lecture Notes in Computer Science; Springer: Berlin, Germany, 2005; Volume 3406, pp. 100–111. [Google Scholar]
- Gurevych, I.; Eckle-Kohler, J.; Hartmann, S.; Matuschek, M.; Meyer, C.M.; Wirth, C. UBY—A Large-Scale Unified Lexical-Semantic Resource. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012), Avignon, France, 23–27 April 2012; ACL Press: Avignon, France, 2012; pp. 580–590. [Google Scholar]
- Vossen, P. EuroWordNet: A multilingual database for information retrieval. In Proceedings of the DELOS Workshop on Cross-Language Information Retrieval, Zurich, Switzerland, 5–7 March 1997. [Google Scholar]
- Pianta, E.; Bentivogli, L.; Girardi, C. MultiWordNet: Developing an aligned multilingual database. In Proceedings of the 1st International Conference on Global WordNet (GWC 2002), Mysore, India, 21–25 January 2002. [Google Scholar]
- Bond, F.; Foster, R. Linking and Extending an Open Multilingual Wordnet. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, 4–9 August 2013; ACL Press: Sofia, Bulgaria, 2013; pp. 1352–1362. [Google Scholar]
- Gonzalez-Agirre, A.; Laparra, E.; Rigau, G. Multilingual Central Repository version 3.0. In Proceedings of the 8th International Conference on Language Resources and Evaluation (ELRA), Istanbul, Turkey, 21–27 May 2012; pp. 2525–2529. [Google Scholar]
- De Melo, G.; Weikum, G. Towards a Universal Wordnet by Learning from Combined Evidence. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM 2009), Hong Kong, China, 2–6 November 2009; ACM: New York, NY, USA, 2009; pp. 513–522. [Google Scholar]
- Navigli, R.; Ponzetto, S.P. BabelNet: The Automatic Construction, Evaluation and Application of a Wide-Coverage Multilingual Semantic Network. Artif. Intell. 2012, 193, 217–250. [Google Scholar] [CrossRef]
- Downey, D.; Etzioni, O.; Soderland, S. A Probabilistic Model of Redundancy in Information Extraction. In Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI 2005), Edinburgh, Scotland, 30 July–5 August 2005; pp. 1034–1041. [Google Scholar]
- Dias-da-Silva, B.C. Wordnet.Br: An exercise of human language technology research. In Proceedings of the 3rd International WordNet Conference (GWC), Jeju Island, Korea, 22–26 January 2006; pp. 301–303. [Google Scholar]
- De Paiva, V.; Rademaker, A.; de Melo, G. OpenWordNet-PT: An Open Brazilian WordNet for Reasoning. In Proceedings of the 24th International Conference on Computational Linguistics (COLING), Mumbai, India, 8–15 December 2012. [Google Scholar]
- Simões, A.; Guinovart, X.G. Bootstrapping a Portuguese WordNet from Galician, Spanish and English Wordnets. In Advances in Speech and Language Technologies for Iberian Languages, Proceedings of the 2nd International Conference on IberSPEECH 2014, Las Palmas de Gran Canaria, Spain, 19–22 November 2014; Lecture Notes in Computer Science; Springer: Berlin, Germany, 2014; Volume 8854, pp. 239–248. [Google Scholar]
- Maziero, E.G.; Pardo, T.A.S.; Felippo, A.D.; Dias-da-Silva, B.C. A Base de Dados Lexical e a Interface Web do TeP 2.0—Thesaurus Eletrônico para o Português do Brasil. In Proceedings of the Companion Proceedings of the XIV Brazilian Symposium on Multimedia and the Web, Vila Velha, Brazil, 26–29 October 2008; ACM: New York, NY, USA, 2008; pp. 390–392. [Google Scholar]
- Gonçalo Oliveira, H.; Santos, D.; Gomes, P.; Seco, N. PAPEL: A Dictionary-Based Lexical Ontology for Portuguese. In Proceedings of 8th International Conference on Computational Processing of the Portuguese Language (PROPOR 2008); Lecture Notes in Computer Science; Springer: Berlin, Germany, 2008; Volume 5190, pp. 31–40. [Google Scholar]
- Simões, A.; Sanromán, Á.I.; Almeida, J.J. Dicionário-Aberto: A Source of Resources for the Portuguese Language Processing. In Proceedings of 10th International Conference on Computational Processing of the Portuguese Language (PROPOR 2012); Lecture Notes in Computer Science; Springer: Berlin, Germany, 2012; Volume 7243, pp. 121–127. [Google Scholar]
- Barreiro, A. Port4NooJ: An open source, ontology-driven Portuguese linguistic system with applications in machine translation. In Proceedings of the 2008 International NooJ Conference (NooJ’08), Budapest, Hungaria, 8–10 June 2008; Cambridge Scholars Publishing: Newcastle upon Tyne, UK, 2010. [Google Scholar]
- Speer, R.; Chin, J.; Havasi, C. ConceptNet 5.5: An Open Multilingual Graph of General Knowledge. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; pp. 4444–4451. [Google Scholar]
- Santos, D.; Bick, E. Providing Internet access to Portuguese corpora: The AC/DC project. In Proceedings of the 2nd International Conference on Language Resources and Evaluation (LREC 2000), Athens, Greece, 31 May–2 June 2000; pp. 205–210. [Google Scholar]
- Gonçalo Oliveira, H.; Pérez, L.A.; Costa, H.; Gomes, P. Uma rede léxico-semântica de grandes dimensões para o português, extraída a partir de dicionários electrónicos. Linguamática 2011, 3, 23–38. [Google Scholar]
- Wilkens, R.; Zilio, L.; Ferreira, E.; Villavicencio, A. B2SG: A TOEFL-like Task for Portuguese. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia, 23–28 May 2016; ELRA: Paris, France, 2016. [Google Scholar]
- Freitag, D.; Blume, M.; Byrnes, J.; Chow, E.; Kapadia, S.; Rohwer, R.; Wang, Z. New Experiments in Distributional Representations of Synonymy. In Proceedings of the 9th Conference on Computational Natural Language Learning (CONLL ’05), Ann Arbor, MI, USA, 29–30 June 2005; ACL Press: Stroudsburg, PA, USA, 2005; pp. 25–32. [Google Scholar]
- Agirre, E.; Soroa, A. Personalizing PageRank for word sense disambiguation. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics (EACL’09), Athens, Greece, 30 March–3 April 2009; ACL Press: Stroudsburg, PA, USA, 2009; pp. 33–41. [Google Scholar]
- Hill, F.; Reichart, R.; Korhonen, A. Simlex-999: Evaluating Semantic Models with Genuine Similarity Estimation. Comput. Linguist. 2015, 41, 665–695. [Google Scholar] [CrossRef]
- Querido, A.; Carvalho, R.; Rodrigues, J.; Garcia, M.; Silva, J.; Correia, C.; Rendeiro, N.; Pereira, R.; Campos, M.; Branco, A. LX-LR4DistSemEval: A collection of language resources for the evaluation of distributional semantic models of Portuguese. Rev. Assoc. Port. Linguíst. 2017, 265–283. [Google Scholar] [CrossRef]
- Pilehvar, M.T.; Jurgens, D.; Navigli, R. Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), Sofia, Bulgaria, 4–9 August 2013; Volume 1, pp. 1341–1351. [Google Scholar]
- Banjade, R.; Maharjan, N.; Niraula, N.B.; Rus, V.; Gautam, D. Lemon and Tea Are Not Similar: Measuring Word-to-Word Similarity by Combining Different Methods. In Proceedings of 16th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing 2015); Lecture Notes in Computer Science; Springer: Berlin, Germany, 2015; Volume 9041, Part I, pp. 335–346. [Google Scholar]
- Correia, R.; Baptista, J.; Eskenazi, M.; Mamede, N. Automatic generation of cloze question stems. In Proceedings of 10th International Conference on Computational Processing of the Portuguese Language (PROPOR 2012); Lecture Notes in Computer Science; Springer: Berlin, Germany, 2012; Volume 7243, pp. 168–178. [Google Scholar]
- Gonçalo Oliveira, H.; Coelho, I.; Gomes, P. Exploiting Portuguese Lexical Knowledge Bases for Answering Open Domain Cloze Questions Automatically. In Proceedings of the 9th Language Resources and Evaluation Conference (LREC 2014), Reykjavik, Iceland, 26–31 May 2014; ELRA: Paris, France, 2014. [Google Scholar]
- Fonseca, E.R.; dos Santos, L.B.; Criscuolo, M.; Aluísio, S.M. Visão Geral da Avaliação de Similaridade Semântica e Inferência Textual. Linguamática 2016, 8, 3–13. [Google Scholar]
- Gonçalo Oliveira, H.; Alves, A.O.; Rodrigues, R. Gradually Improving the Computation of Semantic Textual Similarity in Portuguese. In Progress in Artificial Intelligence, Proceedings of the 18th EPIA Conference on Artificial Intelligence, Porto, Portugal, 5–8 September 2017; Lecture Notes in Computer Science; Springer: Berlin, Germany, 2017; Volume 10423, pp. 841–854. [Google Scholar]
- Gonçalo Oliveira, H. Unsupervised Approaches for Computing Word Similarity in Portuguese. In Progress in Artificial Intelligence, Proceedings of the 18th Portuguese Conference on Artificial Intelligence (EPIA 2017), Porto, Portugal, 5–8 September 2017; Springer: Berlin, Germany, 2017. [Google Scholar]
- Gonçalo Oliveira, H. CONTO.PT: Groundwork for the Automatic Creation of a Fuzzy Portuguese Wordnet. In Proceedings of 12th International Conference on Computational Processing of the Portuguese Language (PROPOR 2016); Lecture Notes in Computer Science; Springer: Berlin, Germany, 2016; Volume 9727, pp. 283–295. [Google Scholar]
POS | PAPEL, DA, Wikt.PT | TeP | OT.PT | OWN.PT | PULO | WN.Br | Port4Nooj | ConceptNet |
---|---|---|---|---|---|---|---|---|
Synonymy | SINONIMO_[N|V|ADJ|ADV]_DE | same synset | same synset | same synset | same synset | same synset | É SINÓNIMO DE | Synonym |
Antonymy | ANTONIMO_[N|V|ADJ|ADV]_DE | synset connections | – | antonymOf | near_antonym | – | – | Antonym |
DistinctFrom | ||||||||
Hypernymy | HIPERONIMO_DE | – | – | hypernymOf | has_hyponym | hypernymOf | É_HIPÓNIMO_DE | IsA |
DefinedAs | ||||||||
Part | PARTE_DE | – | – | partHolonymOf | has_holo_part | – | – | PartOf |
PARTE_DE_ALGO_COM_PROPRIEDADE | – | – | entails | – | – | – | ||
PROPRIEDADE_DE_ALGO_PARTE_DE | ||||||||
Member | MEMBRO_DE | – | – | memberHolonymOf | has_holo_member | – | ||
MEMBRO_DE_ALGO_COM_PROPRIEDADE | ||||||||
PROPRIEDADE_DE_ALGOMEMBRO_DE | ||||||||
Material | MATERIAL_DE | – | – | substanceHolonymOf | has_holo_madeof | – | – | – |
Contains | CONTIDO_EM | – | – | – | – | – | – | |
CONTIDO_EM_ALGO_COM_PROPRIEDADE | – | – | – | – | ||||
Cause | CAUSADOR_DE | – | – | causes | causes | – | Causes | |
ACCAO_QUE_CAUSA | ||||||||
CAUSADOR_DA_ACCAO | É RESULTADO DE | |||||||
CAUSADOR_DE_ALGO_COM_PROPRIEDADE | É ACÇÃO DE | |||||||
PROPRIEDADE_DE_ALGO_QUE_CAUSA | ||||||||
Producer | PRODUTOR_DE | – | – | – | – | – | – | – |
PRODUTOR_DE_ALGO_COM_PROPRIEDADE | ||||||||
PROPRIEDADE_DE_ALGO_PRODUTOR_DE | ||||||||
Purpose | FINALIDADE_DE | – | – | – | – | – | – | UsedFor |
FAZ_SE_COM | ||||||||
FINALIDADE_DA_ACCAO | ||||||||
FAZ_SE_COM_ALGO_COM_PROPRIEDADE | ||||||||
FINALIDADE_DE_ALGO_COM_PROPRIEDADE | ||||||||
Property | DIZ_SE_SOBRE | – | – | similarTo | related_to | – | – | RelatedTo |
DIZ_SE_DO_QUE | attributeOf | |||||||
State | TEM_ESTADO | – | – | be_in_state | – | – | ||
DEVIDO_A_ESTADO | ||||||||
Quality | TEM_QUALIDADE | – | – | – | – | – | – | – |
DEVIDO_A_QUALIDADE | ||||||||
Manner | MANEIRA_POR_MEIO_DE | – | – | – | – | – | – | – |
MANEIRA_COM_PROPRIEDADE | ||||||||
Place | LOCAL_ORIGEM_DE | – | – | – | – | – | – | AtLocation |
Lexical Items | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
POS | PAPEL | DA | Wikt.PT | TeP | OT.PT | OWN.PT | PULO | WN.Br | Port4Nooj | ConceptNet |
Nouns | 56,660 | 61,334 | 30,170 | 17,244 | 6110 | 32,509 | 7372 | 0 | 8109 | 9225 |
Verbs | 21,585 | 16,429 | 8918 | 8343 | 2856 | 3626 | 2721 | 5857 | 3161 | 12,718 |
Adjectives | 22,561 | 18,892 | 9536 | 14,979 | 3747 | 4401 | 2742 | 0 | 1055 | 214 |
Adverbs | 1376 | 3160 | 610 | 1138 | 143 | 1120 | 312 | 0 | 475 | 295 |
Distinct | 94,165 | 95,188 | 45,345 | 40,499 | 12,782 | 40,940 | 12,135 | 5857 | 12,641 | 40,778 * |
Relations | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Type | PAPEL | DA | Wikt.PT | TeP | OT.PT | OWN.PT | PULO | WN.Br | Port4Nooj | ConceptNet |
Synonymy | 83,432 | 52,278 | 35,330 | 388,698 | 51,410 | 35,597 | 69,618 | 88,488 | 559 | 30,834 |
Antonymy | 388 | 440 | 1263 | 92,234 | – | 5774 | 8816 | – | – | 1651 |
Hypernymy | 49,210 | 46,079 | 22,931 | – | – | 78,854 | 55,053 | 73,302 | 15,303 | 11,627 |
Part | 5491 | 4367 | 1574 | – | – | 14,275 | 2025 | – | – | 169 |
Member | 6585 | 1057 | 1578 | – | – | 5153 | 357 | – | – | – |
Material | 336 | 518 | 192 | – | – | 958 | 88 | – | – | – |
Contains | 391 | 263 | 120 | – | – | – | – | – | – | – |
Cause | 7700 | 7211 | 3278 | – | – | 295 | 847 | – | 3325 | 281 |
Producer | 1336 | 913 | 500 | – | – | – | – | – | – | – |
Purpose | 9144 | 5220 | 4227 | – | – | – | – | – | 303 | 16,021 |
Property | 23,354 | 15,732 | 7020 | – | – | 10,825 | 17,213 | – | – | 2672 |
State | 394 | 237 | 79 | – | – | – | 889 | – | – | – |
Quality | 1636 | 1221 | 381 | – | – | – | – | – | – | – |
Manner | 1268 | 3381 | 439 | – | – | – | – | – | 850 | – |
Place | 832 | 487 | 1159 | – | – | – | – | – | – | 17,246 |
Total | 191,497 | 139,404 | 80,071 | 480,932 | 51,410 | 151,731 | 154,906 | 161,790 | 20,340 | 132,862 * |
Avg. degree | 3.9 | 2.9 | 3.3 | 11.9 | 4.0 | 6.4 | 21.7 | 36.9 | 3.2 | 3.0 |
Relation | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | Total |
---|---|---|---|---|---|---|---|---|---|---|
Synonymy | 276,113 | 68,983 | 20,068 | 8773 | 4194 | 2079 | 955 | 361 | 88 | 381,614 |
Antonymy | 51,179 | 1763 | 534 | 164 | 54 | 9 | 4 | – | – | 53,707 |
Hypernymy | 281,125 | 27,712 | 4339 | 584 | 89 | 2 | – | – | – | 313,851 |
Part | 23,431 | 1994 | 151 | 6 | 1 | – | – | – | – | 25,583 |
Member | 13,294 | 640 | 48 | 3 | – | – | – | – | – | 13,985 |
Material | 1756 | 159 | 6 | – | – | – | – | – | – | 1921 |
Contains | 635 | 65 | 3 | – | – | – | – | – | – | 703 |
Cause | 11,481 | 3127 | 1158 | 432 | – | – | – | – | – | 16,198 |
Producer | 2216 | 217 | 33 | – | – | – | – | – | – | 2466 |
Purpose | 31,771 | 1333 | 142 | 13 | – | – | – | – | – | 33,259 |
Property | 58,374 | 7569 | 870 | 146 | 22 | – | – | – | – | 66,981 |
State | 1424 | 77 | 7 | – | – | – | – | – | – | 1508 |
Quality | 1760 | 631 | 72 | – | – | – | – | – | – | 2463 |
Manner | 4274 | 683 | 98 | 1 | – | – | – | – | – | 5056 |
Place | 18,848 | 286 | 100 | 1 | – | – | – | – | – | 19,235 |
Total | 777,681 | 115,239 | 27,629 | 10,123 | 4360 | 2090 | 959 | 361 | 88 | 938,530 |
(82.9%) | (12.3%) | (2.9%) | (1.1%) | (0.5%) | (0.2%) | (0.1%) | (0.0%) | (0.0%) |
Exclusive | +1 | +2 | |
---|---|---|---|
PAPEL | 121,673 (63.5%) | 69,824 (36.5%) | 26,749 (14.0%) |
DA | 79,010 (56.7%) | 60,394 (43.3%) | 23,792 (17.1%) |
Wikt.PT | 50,881 (63.5%) | 29,190 (36.5%) | 15,418 (19.3%) |
TeP | 400,334 (83.0%) | 80,598 (16.7%) | 28,676 (6.0%) |
OT.PT | 36,019 (70.0%) | 15,391 (30.0%) | 10,718 (20.8%) |
OWN.PT | 129,377 (85.3%) | 22,354 (14.7%) | 7577 (5.0%) |
PULO | 136,223 (87.9%) | 18,683 (12.1%) | 6731 (4.3%) |
WN.Br | 114,616 (70.8%) | 47,174 (29.2%) | 12,320 (7.6%) |
Port4Nooj | 17,581 (86.4%) | 2759 (13.6%) | 1573 (7.7%) |
ConceptNet | 123,037 (92.6%) | 9826 (7.4%) | 6042 (4.5%) |
# | Examples of Relation Instances |
---|---|
9 | agarrar synonymOf pegar (grab, catch), apressar synonymOf acelerar (rush, hasten), punir synonymOf castigar (punish, discipline) |
8 | pedinte synonymOf mendigo (beggar, mendicant), vulgar synonymOf ordinário (vulgar, ordinary), porventura synonym talvez (perhaps, possibly) |
7 | fácil antonymOf difícil (easy, hard), legal antonymOf ilegal (legal, ilegal) |
6 | árvore hypernymOf carvalho (tree, oak), árvore hypernymOf faia (tree, beech) |
5 | degrau partOf escada (step, stairs), mítico propertyOf mito (mythical, myth), tristeza antonymOf alegria (sadness, joy), somar antonymOf subtrair (add up, subtract) |
4 | alterar hypernymOf modificar (change, modify), investir causes investimento (invest, investment), feliz stateOf felicidade (happy, happiness), carta memberOf baralho (card, deck), fumar purposeOf charuto (smoke, cigar), habilmente mannerOf habilidade (ably, ability), dependente propertyOf depender (dependable, depend), Equador placeOf equatoriano (Ecuador, Ecuadorian) |
3 | impertinente qualityOf impertinência (impertinent, impertinence), vinho containedIn galheta (wine, cruet), coqueiro producerOf coco (coconut tree, coconut), fio materialOf meada (thread, hank), condução purposeOf cano (conduction, pipe), força partOf robusto (strength, robust) |
olorado synonymOf aromal (smelt, aromal?), economicamente synonymOf regradamente (economically, ordely), saltão synonymOf salta-paredes (locust, wall-jumper?), despropositado antonymOf razoável (inopportune, reasonable), em_definitivo antonymOf temporariamente (definitively, temporarily), crueza antonymOf clemência (crudeness, mercy), desgarrar antonymOf aprochegar (tear apart, approach?), despigmentado propertyOf perder_cor (depigmented?, lose_color), diluviano propertyOf aluvião (diluvial, alluvium), alfitomancia purposeOf farinha (alphitomancy, flour), cuidar_dos_pacientes purposeOf médico (take_care_of_the_patients, doctor), transformar hypernymOf descolorir (transform, decolor), atitude hypernymOf anticomunismo (attitude, anticomunism), coisa hasState clima (thing, climate), lugar-tenente hasQuality lugar-tenência (lieutenant, lieutenancy?), satanizar causes satanização (demonize, demonization), causar causes causa (to cause, cause), pressão causes depressão (pressure, depression), cobre containedIn hemocianina (copper, hemocyanin), Abissínia placeOf abissínio (Abyssinia, Abyssinian), parabolicamente mannerOf parábola (paraborically?, parable), imunoglobina materialOf plasma (immunoglobulin, plasma), pessoa memberOf lóbi (person, lobby), kibibyte partOf megabyte, caju producerOf castanha (cashew, chestnut) |
Redundancy | 1 (All) | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | CARTÃO |
---|---|---|---|---|---|---|---|---|---|---|
Lexical items | 202,000 | 58,412 | 24,959 | 13,213 | 7495 | 4196 | 2042 | 761 | 168 | 149,818 |
Relation instances | 938,846 | 160,749 | 45,510 | 17,981 | 7858 | 3498 | 1408 | 449 | 88 | 327,405 |
Word Level | Sentence Level | |
---|---|---|
Multiple choice | BSG | Cloze questions |
Similarity score | SimLex-999 | ASSIN |
Relation | Target | Candidates | |||
---|---|---|---|---|---|
Synonym (noun) | concorrente | competidor * | cortina | amurada | carmesim |
Synonym (verb) | trancar | barrar | aviar | alienar | progredir |
Hypernym (noun) | matemática | ciência | célula | pulseira | libertação |
Hypernym (verb) | segar | ceifar | anexar | concentrar | desembrulhar |
Antonym (noun) | esquerda | direita | repressão | sétimo | diácono |
Antonym (verb) | trancar | abrir | praticar | dragar | empenhar |
LKB | Synon (1171) | Hypern (758) | Anton (145) | ||||
In | Guess | In | Guess | In | Guess | ||
Nouns | PAPEL | 28.9% | 84.0% | 5.0% | 78.2% | 0.0% | 63.4% |
DA | 16.5% | 71.7% | 4.6% | 66.1% | 0.0% | 59.3% | |
Wikt.PT | 16.6% | 66.2% | 5.0% | 67.9% | 8.3% | 74.5% | |
OWN-PT | 62.8% | 80.1% | 59.0% | 82.5% | 60.0% | 82.8% | |
PULO | 13.2% | 30.2% | 18.3% | 38.8% | 27.6% | 49.7% | |
TeP | 33.2% | 63.9% | 0.0% | 52.9% | 32.4% | 69.7% | |
OT.PT | 17.7% | 35.0% | 0.0% | 30.2% | 0.0% | 31.7% | |
Port4Nooj | 0.1% | 17.1% | 0.3% | 20.4% | 0.0% | 26.2% | |
WN.Br | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% | |
ConceptNet | 24.3% | 60.2% | 0.1% | 54.2% | 11.7% | 65.5% | |
CARTÃO | 36.8% | 89.0% | 10.4% | 86.0% | 8.3% | 79.3% | |
Redun3 | 33.2% | 70.2% | 5.3% | 61.6% | 20.0% | 75.2% | |
Redun2 | 50.4% | 89.3% | 20.2% | 85.5% | 41.4% | 86.9% | |
All | 81.5% | 99.0% | 64.9% | 95.6% | 71.0% | 97.2% | |
LKB | Synon (435) | Hypern (198) | Anton (167) | ||||
In | Guess | In | Guess | In | Guess | ||
Verbs | PAPEL | 37.0% | 82.8% | 0.0% | 78.8% | 0.0% | 46.7% |
DA | 24.8% | 74.0% | 0.0% | 71.7% | 0.0% | 37.7% | |
Wikt.PT | 18.9% | 60.9% | 0.0% | 55.1% | 3.6% | 52.7% | |
OWN-PT | 84.8% | 95.4% | 88.4% | 97.5% | 86.8% | 97.6% | |
PULO | 24.4% | 41.6% | 24.7% | 46.0% | 40.1% | 59.9% | |
TeP | 53.1% | 76.8% | 0.0% | 69.7% | 47.9% | 79.0% | |
OT.PT | 25.1% | 43.0% | 0.0% | 35.4% | 0.0% | 24.6% | |
Port4Nooj | 0.0% | 17.7% | 0.0% | 19.2% | 0.0% | 22.8% | |
WN.Br | 47.6% | 73.1% | 32.3% | 74.2% | 0.0% | 44.9% | |
ConceptNet | 32.0% | 62.6% | 5.1% | 54.0% | 18.6% | 70.1% | |
CARTÃO | 43.7% | 86.4% | 0.0% | 82.3% | 3.6% | 51.5% | |
Redun3 | 55.2% | 84.4% | 12.6% | 79.3% | 29.9% | 68.9% | |
Redun2 | 66.2% | 89.0% | 44.4% | 88.9% | 59.3% | 85.6% | |
All | 93.1% | 98.2% | 91.9% | 99.0% | 94.6% | 97.6% |
Word 1 | Word 2 | POS | Similarity |
---|---|---|---|
esperto (smart) | inteligente (intelligent) | A | 8.33 |
sujo (dirty) | estreito (narrow) | A | 0.00 |
esposa (wife) | marido (husband) | N | 5.00 |
livro (book) | texto (text) | N | 5.00 |
ir (go) | vir (come) | V | 3.33 |
levar (take) | roubar (steal) | V | 6.67 |
LKB | Relations | Algorithm | |
---|---|---|---|
PAPEL | All | PR-Jac | 0.49 |
DA | All | PR-Jac | 0.38 |
Wikt.PT | All | PR-Jac | 0.42 |
OWN-PT | Syn + Hyp | Adj-Cos | 0.44 |
PULO | Syn + Hyp | Adj-Cos | 0.29 |
TeP | Syn + Hyp | Adj-Jac | 0.36 |
OT.PT | Syn + Hyp | Adj-Cos | 0.34 |
Port4Nooj | All | Adj-Jac | 0.19 |
WN.Br | Syn + Hyper | Adj-Jac | 0.04 |
ConceptNet | Syn + Hyp | Adj-Jac | 0.43 |
CARTÃO | All | PR-CosV | 0.53 |
Redun3 | Syn + Hyper | Adj-Jac | 0.44 |
Redun2 | Syn + Hyper | PR-Jac | 0.49 |
All | Syn + Hyper | PR-CosV | 0.57 |
All | Syn + Hyper | PR-CosV | 0.59 |
All | Syn + Hyper | PR-CosV | 0.61 |
All | Syn + Hyper | PR-CosV | 0.61 |
All | Syn + Hyper | PR-CosV | 0.61 |
All | Syn + Hyper | PR-CosV | 0.60 |
All | Syn + Hyper | PR-CosV | 0.60 |
All | Syn + Hyper | Adj-Cos | 0.58 |
All | Syn + Hyper | Adj-Jac | 0.57 |
All | All | PR-CosV | 0.56 |
# | Sentence | Candidates | |
---|---|---|---|
1 | A instalação de «superpostos» nas entradas e saídas dos grandes_________urbanos levanta, por outro lado, algumas dúvidas à Anarec. | centros | centers |
mecanismos | mechanisms | ||
(The installation of «overlays» at the entrances and exits of the major urban_________raises some doubts to Anarec.) | inquéritos | surveys | |
indivíduos | individuals | ||
2 | O artista_________uma verdadeira obra de arte. | criou | created |
emigrou | emigrated | ||
(The artist_________a real work of art.) | requereu | required | |
atribuiu | attributed |
Noun (1769) | Verb (1077) | Adj (809) | Adv (235) | Total (3890) | |
---|---|---|---|---|---|
Baseline | 34.43% | 32.82% | 25.28% | 25.11% | 31.52% |
PAPEL | 44.19% | 36.63% | 33.47% | 22.13% | 38.53% |
DA | 39.49% | 32.87% | 30.01% | 24.36% | 34.77% |
Wikt.PT | 39.85% | 35.65% | 31.15% | 27.45% | 36.13% |
OpenWN-PT | 38.72% | 31.78% | 25.28% | 26.17% | 33.25% |
PULO | 40.77% | 31.43% | 22.16% | 23.19% | 33.25% |
TeP | 41.72% | 30.71% | 31.49% | 25.00% | 35.53% |
OpenThes.PT | 35.01% | 26.51% | 26.21% | 25.43% | 30.24% |
Port4Nooj | 37.11% | 26.86% | 27.97% | 29.89% | 31.93% |
WN.Br | 24.82% | 29.55% | 24.44% | 25.11% | 26.07% |
ConceptNet | 37.00% | 34.42% | 32.55% | 27.73% | 34.79% |
CARTÃO | 46.78% | 36.86% | 36.46% | 27.77% | 40.74% |
Redun3 | 40.54% | 32.61% | 28.83% | 27.70% | 35.13% |
Redun2 | 45.00% | 34.03% | 30.44% | 28.09% | 37.90% |
All | 49.90% | 33.05% | 34.98% | 26.81% | 40.72% |
Variant | Id | Pair | Sim | Entailment | |
---|---|---|---|---|---|
PTPT | 2675 | t | O Chelsea só conseguiu reagir no final da primeira parte. | 1.25 | None |
(Chelsea were only able to react at the end of the first half) | |||||
h | Não podemos aceitar outra primeira parte como essa. | ||||
(We can not accept another first half like this.) | |||||
PTBR | 319 | t | Cerca de 10% da Grande Muralha da China já desapareceu. | 2.50 | None |
(About 10% of the Great Wall of China has disappeared.) | |||||
h | Em 2006, a China estabeleceu regulamentos para a proteção da Grande Muralha. | ||||
(In 2006, China established regulations for the protection of the Great Wall.) | |||||
PTPT | 315 | t | Todos que ficaram feridos e os mortos foram levados ao hospital. | 3.00 | None |
(All the wounded and the dead were taken to the hospital.) | |||||
h | Além disso, mais de 180 pessoas ficaram feridas. | ||||
(In addition, more than 180 people were injured.) | |||||
PTBR | 2982 | t | Maldonado disse ainda que cerca de 125 casas foram afetadas pelo deslizamento. | 4.00 | Entailment |
(Maldonado also said that about 125 homes were affected by the landslide) | |||||
h | Segundo Maldonado, mais de 100 casas podem ter sido atingidas. | ||||
(According to Maldonado, more than 100 houses may have been hit) | |||||
PTBR | 1282 | t | As multas previstas nos contratos podem atingir, juntas, 23 milhões de reais. | 5.00 | Paraphrase |
(The penalties set in the contracts may amount to R$ 23 million.) | |||||
h | Somadas, as multas previstas nos contratos podem chegar a R$ 23 milhões. | ||||
(All added up, the penalties set in the contracts may reach R$ 23 million.) |
PTPT | PTBR | |||||||
---|---|---|---|---|---|---|---|---|
Config | Entailment | Similarity | Entailment | Similarity | ||||
Acc | F1 | Pearson | MSE | Acc | F1 | Pearson | MSE | |
Baseline (cosine) | 74.10% | 0.43 | 0.66 | 0.66 | 78.60% | 0.43 | 0.65 | 0.445 |
Best PTPT | 83.85% | 0.70 | 0.73 | 0.61 | – | – | – | – |
Best sim PTBR | – | – | 0.70 | 0.66 | – | – | 0.70 | 0.38 |
Best entail PTBR | 77.60% | 0.61 | 0.64 | 0.72 | 81.65% | 0.52 | 0.64 | 0.45 |
PAPEL | 74.30% | 0.45 | 0.67 | 0.70 | 78.25% | 0.45 | 0.66 | 0.44 |
DA | 74.10% | 0.44 | 0.67 | 0.69 | 78.50% | 0.44 | 0.66 | 0.43 |
Wikt.PT | 74.00% | 0.44 | 0.67 | 0.68 | 77.55% | 0.43 | 0.66 | 0.43 |
OWN-PT | 73.80% | 0.45 | 0.67 | 0.71 | 77.30% | 0.43 | 0.66 | 0.43 |
PULO | 74.00% | 0.45 | 0.66 | 0.74 | 76.80% | 0.45 | 0.66 | 0.45 |
TeP | 74.55% | 0.47 | 0.67 | 0.71 | 77.90% | 0.47 | 0.67 | 0.45 |
OT.PT | 74.05% | 0.44 | 0.67 | 0.68 | 78.40% | 0.44 | 0.66 | 0.43 |
Port4Nooj | 73.85% | 0.43 | 0.66 | 0.68 | 78.10% | 0.43 | 0.66 | 0.44 |
WN.Br | 74.20% | 0.45 | 0.66 | 0.71 | 77.50% | 0.44 | 0.66 | 0.45 |
ConceptNet | 74.35% | 0.45 | 0.67 | 0.73 | 77.80% | 0.45 | 0.65 | 0.47 |
Redun3 | 74.80% | 0.47 | 0.67 | 0.73 | 78.00% | 0.46 | 0.67 | 0.46 |
Redun2 | 74.15% | 0.47 | 0.67 | 0.73 | 77.55% | 0.48 | 0.66 | 0.44 |
All | 72.95% | 0.47 | 0.66 | 0.86 | 76.00% | 0.48 | 0.65 | 0.46 |
© 2018 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gonçalo Oliveira, H. A Survey on Portuguese Lexical Knowledge Bases: Contents, Comparison and Combination. Information 2018, 9, 34. https://doi.org/10.3390/info9020034
Gonçalo Oliveira H. A Survey on Portuguese Lexical Knowledge Bases: Contents, Comparison and Combination. Information. 2018; 9(2):34. https://doi.org/10.3390/info9020034
Chicago/Turabian StyleGonçalo Oliveira, Hugo. 2018. "A Survey on Portuguese Lexical Knowledge Bases: Contents, Comparison and Combination" Information 9, no. 2: 34. https://doi.org/10.3390/info9020034