Context-Aware Tourism Recommendations Using Retrieval-Augmented Large Language Models and Semantic Re-Ranking
Abstract
1. Introduction
2. Background and Related Work
- Systems that integrate domain-specific textual resources into the retrieval process, such as in a tourism recommendation engine, will provide more accurate and contextually relevant recommendations than systems relying solely on general-purpose knowledge.
- LLM-based recommendation systems that integrate external retrieval mechanisms and evaluation frameworks achieve higher accuracy and alignment with human evaluations compared to generative models based on static information.
- Different LLM architectures and adaptation strategies, such as prompt refinement and instruction tuning achieve significantly different levels of recommendation quality, with tuned and domain-adapted models outperforming general-purpose LLMs.
3. Methodology
3.1. Data Preparation
- An embedded brochure knowledge base
- Representative user profiles
- Fixed weather context
3.2. Prompt Construction and Query Design
- Where can I have inventive vegan sweets for lunch on Lošinj?
- A summary of the user’s profile—e.g., “User is a vegan, loves plant-based desserts and enjoys outdoor activities such as hiking”
- A note on current weather conditions—e.g., “Weather: sunny, temperature 25 °C, Wind 10 km/h”
- The relevant local information retrieved as knowledge snippets (chunks) from brochures
- “You are an intelligent tourism assistant.You recommend personalized activities to tourists based on (⋯)”
3.3. RAG Pipeline
- Semantic Retrieval: Given a user’s query and context, a semantic search is performed against the FAISS index of brochure embeddings. This provides the top-k most relevant document chunks (e.g., descriptions of attractions, events, or services pertinent to the query). In the experiment, an initial retrieval of 10 passages was performed using FAISS and the top five were subsequently reranked with Cohere to balance relevance with prompt-length constraints. A reranking step is then applied: using a secondary relevance model (in this case, the Cohere re-ranker) to reorder the retrieved chunks by contextual relevance. This retrieval stage with reranker improves the precision of retrieved passages by reordering candidates based on contextual relevance, thereby filtering out less pertinent information before prompt construction.
- Contextual Prompt Assembly: The highest-ranked textual snippets from retrieval are concatenated with the user’s profile and weather context into a single prompt, as described above. Each snippet is typically cited or separated (e.g., with headings or quotes) to clearly outline knowledge base content. This gives the LLM explicit evidence to draw from, reducing the chance of hallucination and providing factual grounding in the tourism domain.
- LLM Generation: The assembled prompt is passed to the language model which generates a response in the form of recommended activities or itineraries. Because the model’s generative process is conditioned on the retrieved facts and context, the output is expected to remain relevant to the query, personalized to the user profile, and contextually appropriate (e.g., suggesting indoor activities on a rainy day). The system prompt is configured to instruct the model as follows:
- “Respond in a friendly and informative tone.Your output should be a short and clear activity recommendation,followed by a brief explanation if needed.”
This setup ensures that a conversational style is adopted, with each recommendation accompanied by an explanation and relevant context references. No additional fine-tuning was performed on the models; instead, retrieval augmentation was employed to inject domain-specific knowledge at query time, following the paradigm demonstrated in prior knowledge-intensive LLM applications [7]. - Post-processing: The generated outputs from each LLM are captured without additional fine-tuning and only minimal post-processing is applied, limited to formatting corrections to preserve the authenticity of model behavior. This supports fair and transparent evaluation by retaining each model’s native response characteristics. Although stricter output constraints could be enforced in production settings, this study prioritizes the analysis of unaltered model capabilities. The RAG-based architecture ensures consistent access to the same context, combining accurate retrieval with consistent response generation. In other words, this system reduces hallucination risks and improves relevance in personalized tourism recommendations by grounding generation in brochure-derived knowledge.
3.4. Model Inference Environment
3.5. Evaluation Metrics
- relevance (the degree to which the recommendation addressed the user’s query and contextual cues)
- personalization (the extent to which the output aligned with the simulated user’s preferences and constraints) and
- factual accuracy (the correctness of specific details included in the response).
3.6. Relevance Ratings
3.7. Cohen’s Kappa
3.8. Results and Discussion
3.9. Category Label Distribution per Model
- Across all models, Gastronomy, which includes food-related venues such as restaurants or cafés, was the most common category of recommendation, counting 420, or 1680 for all four reviewers. This aligns with the query set, which frequently involved food- or drink-based requests
- In contrast, categories such as Cafeteria (25), Nature (21), Sport (15), and Culture (5) appeared only sporadically. The “Other” (16) label, which captures off-topic or hallucinated content, was used selectively but serves as a critical diagnostic signal.
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| FAISS | Facebook AI Similarity Search |
| LLM | Large Language Model |
| POI | Point of Interest |
| RAG | Retrieval-augmented Generation |
References
- Nguyen, L.V. OurSCARA: Awareness-Based Recommendation Services for Sustainable Tourism. World 2024, 5, 471–482. [Google Scholar] [CrossRef]
- Jiang, S.; Song, H.; Lu, Y.; Zhang, Z. News Recommendation Method Based on Candidate-Aware Long-and Short-Term Preference Modeling. Appl. Sci. 2024, 15, 300. [Google Scholar] [CrossRef]
- Smajić, A.; Karlović, R.; Bobanović Dasko, M.; Lorencin, I. Large Language Models for Structured and Semi-Structured Data, Recommender Systems and Knowledge Base Engineering: A Survey of Recent Techniques and Architectures. Electronics 2025, 14, 3153. [Google Scholar] [CrossRef]
- Zheng, H.; Xu, Z.; Pan, Q.; Zhao, Z.; Kong, X. Plugging Small Models in Large Language Models for POI Recommendation in Smart Tourism. Algorithms 2025, 18, 376. [Google Scholar] [CrossRef]
- Arefieva, V.; Egger, R. TourBERT: A pretrained language model for the tourism industry. arXiv 2022, arXiv:2201.07449. [Google Scholar] [CrossRef]
- Liu, F.; Chen, J.; Yu, J.; Zhong, R. Next Point of Interest (POI) Recommendation System Driven by User Probabilistic Preferences and Temporal Regularities. Mathematics 2025, 13, 1232. [Google Scholar] [CrossRef]
- Wei, Q.; Yang, M.; Wang, J.; Mao, W.; Xu, J.; Ning, H. Tourllm: Enhancing llms with tourism knowledge. arXiv 2024, arXiv:2407.12791. [Google Scholar]
- Lee, Y.; Kim, S.; Rossi, R.A.; Yu, T.; Chen, X. Learning to Reduce: Towards Improving Performance of Large Language Models on Structured Data. arXiv 2024, arXiv:2407.02750. [Google Scholar] [CrossRef]
- Flórez, M.; Carrillo, E.; Mendes, F.; Carreño, J. A Context-Aware Tourism Recommender System Using a Hybrid Method Combining Deep Learning and Ontology-Based Knowledge. J. Theor. Appl. Electron. Commer. Res. 2025, 20, 194. [Google Scholar] [CrossRef]
- Song, S.; Yang, C.; Xu, L.; Shang, H.; Li, Z.; Chang, Y. TravelRAG: A Tourist Attraction Retrieval Framework Based on Multi-Layer Knowledge Graph. ISPRS Int. J. Geo-Inf. 2024, 13, 414. [Google Scholar] [CrossRef]
- Banerjee, A.; Satish, A.; Wörndl, W. Enhancing tourism recommender systems for sustainable city trips using retrieval-augmented generation. In Proceedings of the International Workshop on Recommender Systems for Sustainability and Social Good; Springer: Cham, Switzerland, 2024; pp. 19–34. [Google Scholar]
- Smajić, A.; Rovis, M.; Lorencin, I. Context-aware Product Recommendations Using Weather Data and AI Models. In Proceedings of the 7th International Conference on Human Systems Engineering and Design (IHSED 2025), Juraj Dobrila University of Pula, Pula, Croatia, 22–24 September 2025. [Google Scholar]
- Liu, D.; Yang, B.; Du, H.; Greene, D.; Hurley, N.; Lawlor, A.; Dong, R.; Li, I. RecPrompt: A Self-tuning Prompting Framework for News Recommendation Using Large Language Models. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, Boise, ID, USA, 21–25 October 2024; pp. 3902–3906. [Google Scholar]
- Wu, Z.; Jia, Q.; Wu, C.; Du, Z.; Wang, S.; Wang, Z.; Dong, Z. RecSys Arena: Pair-wise Recommender System Evaluation with Large Language Models. arXiv 2024, arXiv:2412.11068. [Google Scholar]
- Qi, J.; Yan, S.; Zhang, Y.; Zhang, W.; Jin, R.; Hu, Y.; Wang, K. Rag-optimized tibetan tourism llms: Enhancing accuracy and personalization. In Proceedings of the 2024 7th International Conference on Artificial Intelligence and Pattern Recognition, Xiamen, China, 20–22 September 2024; pp. 1185–1192. [Google Scholar]
- Ahmed, B.S.; Baader, L.O.; Bayram, F.; Jagstedt, S.; Magnusson, P. Quality Assurance for LLM-RAG Systems: Empirical Insights from Tourism Application Testing. In Proceedings of the 2025 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW), Naples, Italy, 31 March–4 April 2025; pp. 200–207. [Google Scholar]
- Meng, Z.; Yi, Z.; Ounis, I. KERAG_R: Knowledge-Enhanced Retrieval-Augmented Generation for Recommendation. arXiv 2025, arXiv:2507.05863. [Google Scholar]
- Xu, L.; Zhang, J.; Li, B.; Wang, J.; Chen, S.; Zhao, W.X.; Wen, J.R. Tapping the potential of large language models as recommender systems: A comprehensive framework and empirical analysis. ACM Trans. Knowl. Discov. Data 2025, 19, 105. [Google Scholar] [CrossRef]
- Zhang, J.; Xie, R.; Hou, Y.; Zhao, W.X.; Lin, L.; Wen, J.R. Recommendation as Instruction Following: A Large Language Model Empowered Recommendation Approach. arXiv 2023, arXiv:2305.07001. [Google Scholar] [CrossRef]
- Ghiani, G.; Solazzo, G.; Elia, G. Integrating Large Language Models and Optimization in Semi-Structured Decision Making: Methodology and a Case Study. Algorithms 2024, 17, 582. [Google Scholar] [CrossRef]
- Kozhipuram, A.; Shailendra, S.; Kadel, R. Retrieval-Augmented Generation vs. Baseline LLMs: A Multi-Metric Evaluation for Knowledge-Intensive Content. Information 2025, 16, 766. [Google Scholar] [CrossRef]
- Liu, Q.; Zhu, J.; Fan, L.; Wang, K.; Hu, H.; Guo, W.; Liu, Y.; Wu, X.M. Can LLMs Outshine Conventional Recommenders? A Comparative Evaluation. In Proceedings of the Thirty-Ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track, San Diego, CA, USA, 2–7 December 2025. [Google Scholar]
- Johnson, J.; Douze, M.; Jégou, H. Billion-Scale Similarity Search with GPUs. IEEE Trans. Big Data 2021, 7, 535–547. [Google Scholar] [CrossRef]
- Karlović, R.; Lorencin, I. Large language models as Retail Cart Assistants: A Prompt-Based Evaluation. In Human Systems Engineering and Design (IHSED2025): Future Trends and Applications; AHFE Open Access: Pula, Croatia, 2025. [Google Scholar]
- Carvalho, I.; Ivanov, S. ChatGPT for tourism: Applications, benefits, and risks. Tour. Rev. 2024, 79, 290–303. [Google Scholar] [CrossRef]
- Liu, Z.; Zhao, C.; Fedorov, I.; Soran, B.; Choudhary, D.; Krishnamoorthi, R.; Chandra, V.; Tian, Y.; Blankevoort, T. Spinquant: Llm quantization with learned rotations. arXiv 2024, arXiv:2405.16406. [Google Scholar] [CrossRef]
- Team, G.; Kamath, A.; Ferret, J.; Pathak, S.; Vieillard, N.; Merhej, R.; Perrin, S.; Matejovicova, T.; Ramé, A.; Rivière, M.; et al. Gemma 3 technical report. arXiv 2025, arXiv:2503.19786. [Google Scholar] [CrossRef]
- Jiang, A.; Sablayrolles, A.; Mensch, A.; Bamford, C.; Chaplot, D.; de Las Casas, D.; Bressand, F.; Lengyel, G.; Lample, G.; Saulnier, L.; et al. Mistral 7b. arXiv 2023, arXiv:2310.06825. [Google Scholar] [CrossRef]
- Yang, A.; Li, A.; Yang, B.; Zhang, B.; Hui, B.; Zheng, B.; Yu, B.; Gao, C.; Huang, C.; Lv, C.; et al. Qwen3 technical report. arXiv 2025, arXiv:2505.09388. [Google Scholar] [CrossRef]
- Team, G.V.; Karlinsky, L.; Arbelle, A.; Daniels, A.; Nassar, A.; Alfassi, A.; Wu, B.; Schwartz, E.; Joshi, D.; Kondic, J.; et al. Granite Vision: A lightweight, open-source multimodal model for enterprise Intelligence. arXiv 2025, arXiv:2502.09927. [Google Scholar] [CrossRef]
- Guo, D.; Yang, D.; Zhang, H.; Song, J.; Zhang, R.; Xu, R.; Zhu, Q.; Ma, S.; Wang, P.; Bi, X.; et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv 2025, arXiv:2501.12948. [Google Scholar]
- Abdin, M.; Aneja, J.; Behl, H.; Bubeck, S.; Eldan, R.; Gunasekar, S.; Harrison, M.; Hewett, R.J.; Javaheripi, M.; Kauffmann, P.; et al. Phi-4 Technical Report. arXiv 2024, arXiv:2412.08905. [Google Scholar]
- Huang, D.; Wang, Z. LLMs at the Edge: Performance and Efficiency Evaluation with Ollama on Diverse Hardware. In Proceedings of the 2025 International Joint Conference on Neural Networks (IJCNN), Rome, Italy, 30 June–5 July 2025; pp. 1–8. [Google Scholar]
- Srivastava, G.; Hussain, A.; Bi, Z.; Roy, S.; Pitre, P.; Lu, M.; Ziyadi, M.; Wang, X. BeyondBench: Benchmark-Free Evaluation of Reasoning in Language Models. arXiv 2025, arXiv:2509.24210. [Google Scholar]








| Variable | Description |
|---|---|
| age | Numeric variable that determines how old is the individual |
| gender | A self-identified category such as male, female, non-binary |
| diet | The user’s chosen pattern of eating (e.g., vegan, vegetarian), specifying which foods they include or avoid |
| food preferences | Specific types of flavors of food the user prefers |
| likes/dislikes | Activities, items, or experiences the user likes |
| weather preferences | Specific types of weather conditions the user enjoys (e.g., sunny, rainy, snowy) |
| lifestyle | Common activities or ways of spending free time that characterize the user’s daily life (e.g., hiking, outdoor festivals, day trips) |
| Model Name | Number of Parameters | References |
|---|---|---|
| Llama 3.2 | 3B | [26] |
| Gemma 3 | 4B | [27] |
| Mistral | 7B | [28] |
| Qwen 3 | 8B | [29] |
| Granite 3.3 | 8B | [30] |
| DeepSeek R1 | 8B | [31] |
| Phi-4 | 14B | [32] |
| Relevance Score | |||
|---|---|---|---|
| Model Name | 1 | 0 | |
| DeepSeek | 1328 (66.4%) | 94 (4.7%) | 578 (28.9%) |
| Granite | 873 (43.65%) | 46 (2.3%) | 1081 (54.05%) |
| Llama | 526 (26.3%) | 157 (7.85%) | 1317 (65.85%) |
| Gemma | 836 (41.8%) | 88 (4.4%) | 1076 (53.8%) |
| Mistral | 655 (32.75%) | 135 (6.75%) | 1210 (60.05%) |
| Phi | 973 (48.65%) | 533 (26.65%) | 494 (24.7%) |
| Qwen | 1054 (52.7%) | 502 (25.1%) | 444 (22.2%) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Karlović, R.; Rovis, M.; Smajić, A.; Sever, L.; Lorencin, I. Context-Aware Tourism Recommendations Using Retrieval-Augmented Large Language Models and Semantic Re-Ranking. Electronics 2025, 14, 4448. https://doi.org/10.3390/electronics14224448
Karlović R, Rovis M, Smajić A, Sever L, Lorencin I. Context-Aware Tourism Recommendations Using Retrieval-Augmented Large Language Models and Semantic Re-Ranking. Electronics. 2025; 14(22):4448. https://doi.org/10.3390/electronics14224448
Chicago/Turabian StyleKarlović, Ratomir, Mia Rovis, Alma Smajić, Luka Sever, and Ivan Lorencin. 2025. "Context-Aware Tourism Recommendations Using Retrieval-Augmented Large Language Models and Semantic Re-Ranking" Electronics 14, no. 22: 4448. https://doi.org/10.3390/electronics14224448
APA StyleKarlović, R., Rovis, M., Smajić, A., Sever, L., & Lorencin, I. (2025). Context-Aware Tourism Recommendations Using Retrieval-Augmented Large Language Models and Semantic Re-Ranking. Electronics, 14(22), 4448. https://doi.org/10.3390/electronics14224448

