ULYSSES: Automated FreqUentLY ASked QueStions for KnowlEdge GraphS
Abstract
:1. Introduction
- To address the problem of selecting the relevant information from large Knowledge Graphs, ULYSSES does not use the original Knowledge Graph but instead exploits query logs. These query logs are available to the curators of the KGs through the SPARQL endpoints of the corresponding KGs.
- ULYSSES identifies the most frequent SPARQL queries in the logs and uses the corresponding SPARQL endpoints to retrieve their answers.
- Then, to transform both the queries and the answers into text, it exploits transformer models, i.e., the ChatGPT and the Gemini LLMs, using appropriate prompts.
- We evaluate our approach on the DBpedia KG and we show the interesting results achieved by ULYSSES.
- As a side effect, we also generate and offer to the community the first golden standard dataset generated by a user study.
2. Related Work
2.1. Semantic Summaries
2.2. Question Answering over KGs
2.3. FAQ Generation
3. Methodology
3.1. Preliminaries
3.2. The Problem
3.3. The FAQGen Algorithm
Algorithm 1 FAQGen(Q, , n) |
Input: Q – the query log; – the SPARQL endpoint of the corresponding KG; n – the number of question–answers to include in the resulting FAQ Output: – a set of question answers
|
4. Implementation
4.1. The Data Layer
4.2. The Service Layer
4.2.1. Query Preprocessing and Cleaning
4.2.2. Query Selection and Answering
4.2.3. Transformation to Text
- ChatGPT 3.5: ChatGPT is a state-of-the-art language model developed by OpenAI, capable of understanding and generating human-like text across a wide range of topics and styles. It uses a deep neural network architecture with 175 billion parameters to achieve its language-processing abilities.
- Gemini 1.5 Pro: Google’s Gemini is a more recent entrant, enabling even broader capabilities as a multimodal AI model, integrating language processing with image processing and, potentially, other modalities [30].
4.3. The GUI Layer
5. Experimental Evaluation
5.1. Datasets
5.2. Baselines
5.3. Evaluation Task
5.4. Golden Standard Construction
5.5. Metrics
5.6. Results
5.7. Interesting Observations
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Cebiric, S.; Goasdoué, F.; Kondylakis, H.; Kotzinos, D.; Manolescu, I.; Troullinou, G.; Zneika, M. Summarizing semantic graphs: A survey. VLDB J. 2019, 28, 295–327. [Google Scholar] [CrossRef]
- de Oliveira, E.C.C.; da Silva, A.S.; de Moura, E.S.; Cavalcanti, J.M.B. Extracting and Searching Useful Information Available on Web FAQs. In Proceedings of the XXI Simpósio Brasileiro de Banco de Dados, Florianópolis, SC, Brasil, 16–20 October 2006; Anais/Proceedings. Nascimento, M.A., Ed.; UFSC: Florianópolis, Brasil, 2006; pp. 102–116. [Google Scholar]
- Trouli, G.E.; Papadakis, N.; Kondylakis, H. Constructing Semantic Summaries Using Embeddings. Information 2024, 15, 238. [Google Scholar] [CrossRef]
- Vassiliou, G.; Papadakis, N.; Kondylakis, H. iSummary: Demonstrating Workload-based, Personalized Summaries for Knowledge Graphs. In Proceedings of the ISWC 2023 Posters and Demos: 22nd International Semantic Web Conference, Athens, Greece, 6–10 November 2023; Available online: https://ceur-ws.org/Vol-3632/ISWC2023_paper_435.pdf (accessed on 22 August 2024).
- Troullinou, G.; Kondylakis, H.; Stefanidis, K.; Plexousakis, D. Exploring RDFS KBs Using Summaries. In Proceedings of the Semantic Web—ISWC 2018—17th International Semantic Web Conference, Monterey, CA, USA, 8–12 October 2018; Proceedings, Part I. Vrandecic, D., Bontcheva, K., Suárez-Figueroa, M.C., Presutti, V., Celino, I., Sabou, M., Kaffee, L., Simperl, E., Eds.; Lecture Notes in Computer Science. Springer: Berlin/Heidelberg, Germany, 2018; Volume 11136, pp. 268–284. [Google Scholar] [CrossRef]
- Motta, E.; Mulholland, P.; Peroni, S.; d’Aquin, M.; Gómez-Pérez, J.M.; Mendez, V.; Zablith, F. A Novel Approach to Visualizing and Navigating Ontologies. In Proceedings of the Semantic Web—ISWC 2011—10th International Semantic Web Conference, Bonn, Germany, 23–27 October 2011; Proceedings, Part I. Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N.F., Blomqvist, E., Eds.; Lecture Notes in Computer Science. Springer: Berlin/Heidelberg, Germany, 2011; Volume 7031, pp. 470–486. [Google Scholar] [CrossRef]
- Zhang, X.; Cheng, G.; Ge, W.; Qu, Y. Summarizing Vocabularies in the Global Semantic Web. J. Comput. Sci. Technol. 2009, 24, 165–174. [Google Scholar] [CrossRef]
- Vassiliou, G.; Alevizakis, F.; Papadakis, N.; Kondylakis, H. iSummary: Workload-Based, Personalized Summaries for Knowledge Graphs. In Proceedings of the Semantic Web—20th International Conference, ESWC 2023, Hersonissos, Greece, 28 May–1 June 2023; Proceedings. Pesquita, C., Jiménez-Ruiz, E., McCusker, J.P., Faria, D., Dragoni, M., Dimou, A., Troncy, R., Hertling, S., Eds.; Lecture Notes in Computer Science. Springer: Berlin/Heidelberg, Germany, 2023; Volume 13870, pp. 192–208. [Google Scholar] [CrossRef]
- Vassiliou, G.; Troullinou, G.; Papadakis, N.; Kondylakis, H. WBSum: Workload-based Summaries for RDF/S KBs. In Proceedings of the SSDBM 2021: 33rd International Conference on Scientific and Statistical Database Management, Tampa, FL, USA, 6–7 July 2021; Zhu, Q., Zhu, X., Tu, Y., Xu, Z., Kumar, A., Eds.; ACM: New York, NY, USA, 2021; pp. 248–252. [Google Scholar] [CrossRef]
- Khan, A. Knowledge Graphs Querying. SIGMOD Rec. 2023, 52, 18–29. [Google Scholar] [CrossRef]
- Diefenbach, D.; López, V.; Singh, K.D.; Maret, P. Core techniques of question answering systems over knowledge bases: A survey. Knowl. Inf. Syst. 2018, 55, 529–569. [Google Scholar] [CrossRef]
- Formica, A.; Mele, I.; Taglino, F. A template-based approach for question answering over knowledge bases. Knowl. Inf. Syst. 2024, 66, 453–479. [Google Scholar] [CrossRef]
- Lukovnikov, D.; Fischer, A.; Lehmann, J.; Auer, S. Neural Network-based Question Answering over Knowledge Graphs on Word and Character Level. In Proceedings of the 26th International Conference on World Wide Web, WWW 2017, Perth, Australia, 3–7 April 2017; Barrett, R., Cummings, R., Agichtein, E., Gabrilovich, E., Eds.; ACM: New York, NY, USA, 2017; pp. 1211–1220. [Google Scholar] [CrossRef]
- Raazaghi, F. Auto-FAQ-Gen: Automatic Frequently Asked Questions Generation. In Proceedings of the Advances in Artificial Intelligence—28th Canadian Conference on Artificial Intelligence, Canadian AI 2015, Halifax, NS, Canada, 2–5 June 2015; Proceedings. Barbosa, D., Milios, E.E., Eds.; Lecture Notes in Computer Science. Springer: Berlin/Heidelberg, Germany, 2015; Volume 9091, pp. 334–337. [Google Scholar] [CrossRef]
- Du, X.; Cardie, C. Harvesting Paragraph-level Question-Answer Pairs from Wikipedia. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, 15–20 July 2018; Volume 1: Long Papers. Gurevych, I., Miyao, Y., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2018; pp. 1907–1917. [Google Scholar] [CrossRef]
- Willis, A.; Davis, G.M.; Ruan, S.; Manoharan, L.; Landay, J.A.; Brunskill, E. Key Phrase Extraction for Generating Educational Question-Answer Pairs. In Proceedings of the Sixth ACM Conference on Learning @ Scale, L@S 2019, Chicago, IL, USA, 24–25 June 2019; ACM: New York, NY, USA, 2019; pp. 20:1–20:10. [Google Scholar] [CrossRef]
- Kumar, A.; Kharadi, A.; Singh, D.; Kumari, M. Automatic question-answer pair generation using Deep Learning. In Proceedings of the 2021 Third International Conference on Inventive Research in Computing Applications (ICIRCA), Coimbatore, India, 2–4 September 2021; pp. 794–799. [Google Scholar] [CrossRef]
- Shinoda, K.; Sugawara, S.; Aizawa, A. Improving the Robustness of QA Models to Challenge Sets with Variational Question-Answer Pair Generation. In Proceedings of the ACL-IJCNLP 2021 Student Research Workshop, ACL 2021, Online, 5–10 July 2021; Kabbara, J., Lin, H., Paullada, A., Vamvas, J., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 197–214. [Google Scholar] [CrossRef]
- Hu, W.; Yu, D.; Jiau, H.C. A FAQ Finding Process in Open Source Project Forums. In Proceedings of the Fifth International Conference on Software Engineering Advances, ICSEA 2010, Nice, France, 22–27 August 2010; Hall, J.G., Kaindl, H., Lavazza, L., Buchgeher, G., Takaki, O., Eds.; IEEE Computer Society: Washington, DC, USA, 2010; pp. 259–264. [Google Scholar] [CrossRef]
- Sindhgatta, R.; Marvaniya, S.; Dhamecha, T.I.; Sengupta, B. Inferring Frequently Asked Questions from Student Question Answering Forums. In Proceedings of the 10th International Conference on Educational Data Mining, EDM 2017, Wuhan, China, 25–28 June 2017; Hu, X., Barnes, T., Hershkovitz, A., Paquette, L., Eds.; International Educational Data Mining Society (IEDMS): Palermo, Italy, 2017. [Google Scholar]
- Bihani, A.; Ullman, J.D.; Paepcke, A. FAQtor: Automatic FAQ Generation Using Online Forums; Technical Report; Stanford InfoLab: Stanford, CA, USA, 2018. [Google Scholar]
- Zhao, H.; Liu, Y.; Hou, A.; Gu, J. Knowledge Graph based Question Pair Matching for Domain-Oriented FAQ System. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, SMC 2022, Prague, Czech Republic, 9–12 October 2022; IEEE: New York, NY, USA, 2022; pp. 2103–2108. [Google Scholar] [CrossRef]
- Xie, R.; Lu, Y.; Lin, F.; Lin, L. FAQ-Based Question Answering via Knowledge Anchors. In Proceedings of the Natural Language Processing and Chinese Computing—9th CCF International Conference, NLPCC 2020, Zhengzhou, China, 14–18 October 2020; Proceedings, Part I. Zhu, X., Zhang, M., Hong, Y., He, R., Eds.; Lecture Notes in Computer Science. Springer: Berlin/Heidelberg, Germany, 2020; Volume 12430, pp. 3–15. [Google Scholar] [CrossRef]
- Liu, A.; Huang, Z.; Lu, H.; Wang, X.; Yuan, C. BB-KBQA: BERT-Based Knowledge Base Question Answering. In Proceedings of the Chinese Computational Linguistics—18th China National Conference, CCL 2019, Kunming, China, 18–20 October 2019; Proceedings. Sun, M., Huang, X., Ji, H., Liu, Z., Liu, Y., Eds.; Lecture Notes in Computer Science. Springer: Berlin/Heidelberg, Germany, 2019; Volume 11856, pp. 81–92. [Google Scholar] [CrossRef]
- Tseng, W.; Wu, C.; Hsu, Y.; Chen, B. FAQ Retrieval using Question-Aware Graph Convolutional Network and Contextualized Language Model. In Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021, Tokyo, Japan, 14–17 December 2021; IEEE: New York, NY, USA, 2021; pp. 2006–2012. [Google Scholar]
- W3C. Resource Description Framework. Available online: http://www.w3.org/RDF/ (accessed on 1 August 2024).
- W3C. Recommendation, SPARQL Query Language for RDF. Available online: https://www.w3.org/TR/rdf-sparql-query/ (accessed on 1 August 2024).
- Bonifati, A.; Martens, W.; Timm, T. An analytical study of large SPARQL query logs. VLDB J. 2020, 29, 655–679. [Google Scholar] [CrossRef]
- Malyshev, S.; Krötzsch, M.; González, L.; Gonsior, J.; Bielefeldt, A. Getting the Most Out of Wikidata: Semantic Technology Usage in Wikipedia’s Knowledge Graph. In Proceedings of the Semantic Web—ISWC 2018—17th International Semantic Web Conference, Monterey, CA, USA, 8–12 October 2018; Proceedings, Part II. Vrandecic, D., Bontcheva, K., Suárez-Figueroa, M.C., Presutti, V., Celino, I., Sabou, M., Kaffee, L., Simperl, E., Eds.; Lecture Notes in Computer Science. Springer: Berlin/Heidelberg, Germany, 2018; Volume 11137, pp. 376–394. [Google Scholar] [CrossRef]
- Anil, R.; Borgeaud, S.; Wu, Y.; Alayrac, J.; Yu, J.; Soricut, R.; Schalkwyk, J.; Dai, A.M.; Hauth, A.; Millican, K.; et al. Gemini: A Family of Highly Capable Multimodal Models. arXiv 2023, arXiv:2312.11805. [Google Scholar] [CrossRef]
- Etemad, A.G.; Abidi, A.I.; Chhabra, M. Fine-Tuned T5 for Abstractive Summarization. Int. J. Performability Eng. 2021, 17, 900–906. [Google Scholar]
- Venkataramana, A.; Srividya, K.; Cristin, R. Abstractive Text Summarization Using BART. In Proceedings of the 2022 IEEE 2nd Mysore Sub Section International Conference (MysuruCon), Mysuru, India, 16–17 October 2022; pp. 1–6. [Google Scholar] [CrossRef]
- Lin, C.Y. Rouge: A package for automatic evaluation of summaries. In Text Summarization Branches Out; Association for Computational Linguistics: Barcelona, Spain, 2004; pp. 74–81. [Google Scholar]
- What is the ROUGE Score (Recall-Oriented Understudy for Gisting Evaluation)? Available online: https://klu.ai/glossary/rouge-score (accessed on 29 February 2024).
RelationLabel | ObjectLabel |
---|---|
Link from a Wikipage to another Wikipage | “Cahiers du Cinéma”@en |
Link from a Wikipage to another Wikipage | “Cape Cod”@en |
Link from a Wikipage to another Wikipage | “Cate Blanchett”@en |
Link from a Wikipage to another Wikipage | “Erotic mystery films”@en |
Link from a Wikipage to another Wikipage | “Baby Did a Bad, Bad Thing”@en |
Query Statistics | Number |
---|---|
Initial queries in the dataset file | 58,604 |
Unique Queries | 42,870 |
DESCRIBE, CONSTRUCT, ASK queries | 1371 |
Queries containing generic patterns | 34 |
Total queries excluded | 1405 |
Unique queries after the exclusion process | 41,465 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Vassiliou, G.; Trouli, G.E.; Troullinou, G.; Spyridakis, N.; Bitzarakis, G.; Droumalia, F.; Karagiannakis, A.; Skouteli, G.; Oikonomou, N.; Deka, D.; et al. ULYSSES: Automated FreqUentLY ASked QueStions for KnowlEdge GraphS. Appl. Sci. 2024, 14, 7640. https://doi.org/10.3390/app14177640
Vassiliou G, Trouli GE, Troullinou G, Spyridakis N, Bitzarakis G, Droumalia F, Karagiannakis A, Skouteli G, Oikonomou N, Deka D, et al. ULYSSES: Automated FreqUentLY ASked QueStions for KnowlEdge GraphS. Applied Sciences. 2024; 14(17):7640. https://doi.org/10.3390/app14177640
Chicago/Turabian StyleVassiliou, Giannis, Georgia Eirini Trouli, Georgia Troullinou, Nikolaos Spyridakis, George Bitzarakis, Fotini Droumalia, Antonis Karagiannakis, Georgia Skouteli, Nikolaos Oikonomou, Dimitra Deka, and et al. 2024. "ULYSSES: Automated FreqUentLY ASked QueStions for KnowlEdge GraphS" Applied Sciences 14, no. 17: 7640. https://doi.org/10.3390/app14177640
APA StyleVassiliou, G., Trouli, G. E., Troullinou, G., Spyridakis, N., Bitzarakis, G., Droumalia, F., Karagiannakis, A., Skouteli, G., Oikonomou, N., Deka, D., Makaronas, E., Pronoitis, G., Alexandris, K., Kostopoulos, S., Kazantzakis, Y., Vlassis, N., Sfinarolaki, E., Daskalakis, V., Giannakos, I., ... Kondylakis, H. (2024). ULYSSES: Automated FreqUentLY ASked QueStions for KnowlEdge GraphS. Applied Sciences, 14(17), 7640. https://doi.org/10.3390/app14177640