Web Search-Enhanced Small Language Models: A Case Study for a Kazakh-Centric Language Model

Maxutov, Akylbek; Medeu, Nūrali; Varol, Huseyin Atakan

doi:10.3390/make8050128

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

Web Search-Enhanced Small Language Models: A Case Study for a Kazakh-Centric Language Model

by

Akylbek Maxutov

^1,2,*

,

Nūrali Medeu

¹

and

Huseyin Atakan Varol

^1,2

¹

Institute of Smart Systems and Artificial Intelligence (ISSAI), Nazarbayev University, Astana 010000, Kazakhstan

²

Department of AI & Big Data, Faculty of Information Technologies and Artificial Intelligence, Al-Farabi Kazakh National University, Almaty 050040, Kazakhstan

^*

Author to whom correspondence should be addressed.

Mach. Learn. Knowl. Extr. 2026, 8(5), 128; https://doi.org/10.3390/make8050128

Submission received: 6 March 2026 / Revised: 24 April 2026 / Accepted: 8 May 2026 / Published: 12 May 2026

Download

Browse Figure

Versions Notes

Abstract

Small language models (SLMs) are valued for their computational efficiency and suitability for edge deployment, but often underperform in localized linguistic and cultural contexts due to their limited parameter size. This study explores integrating real-time web search into Qolda, a 4B-parameter Kazakh-centric SLM, to close the performance gap with larger models. We assess two strategies: Naïve Retrieval-Augmented Generation (RAG), which uses raw benchmark questions as search queries, and Query-Refined RAG, which applies various refiner models, including a supervised distillation-tuned Qolda, to optimize queries. On the KazCulture and KazMMLU benchmarks, the Naïve RAG approach in reasoning-enabled mode achieved an average accuracy of 76.00%, improving on the Zero-Shot evaluation result of 60.37%, and, in this system-level comparison, exceeding the Zero-Shot accuracy of larger open-source models such as Qwen3-32B (64.72%) and Gemma-3-27b-it (60.24%), which were evaluated without retrieval augmentation. Query refinement improved the accuracy by about 3%, from 76.00% to 79.46%, but nearly doubled the computational cost. Inference time analysis shows that Naïve RAG adds approximately 1 s of retrieval latency per question. Query refiners introduce up to 4 s of additional overhead. However, the retrieved context reduces the time required for model reasoning in think mode. The most notable gains were observed in localized cultural knowledge, where web search integration correctly answered 32.9% of KazCulture questions that the Zero-Shot baseline failed on, while losing only 16.9% in return. These results suggest that retrieval-augmented SLMs can offer a cost-effective and competitive alternative to larger models for tasks in the domains of Kazakh language and Kazakh culture.

Keywords: small language models; retrieval-augmented generation; web search; benchmarking

Graphical Abstract

Share and Cite

MDPI and ACS Style

Maxutov, A.; Medeu, N.; Varol, H.A. Web Search-Enhanced Small Language Models: A Case Study for a Kazakh-Centric Language Model. Mach. Learn. Knowl. Extr. 2026, 8, 128. https://doi.org/10.3390/make8050128

AMA Style

Maxutov A, Medeu N, Varol HA. Web Search-Enhanced Small Language Models: A Case Study for a Kazakh-Centric Language Model. Machine Learning and Knowledge Extraction. 2026; 8(5):128. https://doi.org/10.3390/make8050128

Chicago/Turabian Style

Maxutov, Akylbek, Nūrali Medeu, and Huseyin Atakan Varol. 2026. "Web Search-Enhanced Small Language Models: A Case Study for a Kazakh-Centric Language Model" Machine Learning and Knowledge Extraction 8, no. 5: 128. https://doi.org/10.3390/make8050128

APA Style

Maxutov, A., Medeu, N., & Varol, H. A. (2026). Web Search-Enhanced Small Language Models: A Case Study for a Kazakh-Centric Language Model. Machine Learning and Knowledge Extraction, 8(5), 128. https://doi.org/10.3390/make8050128

Article Menu

Web Search-Enhanced Small Language Models: A Case Study for a Kazakh-Centric Language Model

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI