Next Article in Journal
An Interpretable and Uncertainty-Aware Deep Learning Framework for Early Sepsis Prediction Using SHAP-Enhanced Attention and Continuous-Time Neural Networks
Previous Article in Journal
An Integrated RAG and Agent-Based Architecture for Automated Assessment in Moodle
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Web Search-Enhanced Small Language Models: A Case Study for a Kazakh-Centric Language Model

by
Akylbek Maxutov
1,2,*,
Nūrali Medeu
1 and
Huseyin Atakan Varol
1,2
1
Institute of Smart Systems and Artificial Intelligence (ISSAI), Nazarbayev University, Astana 010000, Kazakhstan
2
Department of AI & Big Data, Faculty of Information Technologies and Artificial Intelligence, Al-Farabi Kazakh National University, Almaty 050040, Kazakhstan
*
Author to whom correspondence should be addressed.
Mach. Learn. Knowl. Extr. 2026, 8(5), 128; https://doi.org/10.3390/make8050128
Submission received: 6 March 2026 / Revised: 24 April 2026 / Accepted: 8 May 2026 / Published: 12 May 2026

Abstract

Small language models (SLMs) are valued for their computational efficiency and suitability for edge deployment, but often underperform in localized linguistic and cultural contexts due to their limited parameter size. This study explores integrating real-time web search into Qolda, a 4B-parameter Kazakh-centric SLM, to close the performance gap with larger models. We assess two strategies: Naïve Retrieval-Augmented Generation (RAG), which uses raw benchmark questions as search queries, and Query-Refined RAG, which applies various refiner models, including a supervised distillation-tuned Qolda, to optimize queries. On the KazCulture and KazMMLU benchmarks, the Naïve RAG approach in reasoning-enabled mode achieved an average accuracy of 76.00%, improving on the Zero-Shot evaluation result of 60.37%, and, in this system-level comparison, exceeding the Zero-Shot accuracy of larger open-source models such as Qwen3-32B (64.72%) and Gemma-3-27b-it (60.24%), which were evaluated without retrieval augmentation. Query refinement improved the accuracy by about 3%, from 76.00% to 79.46%, but nearly doubled the computational cost. Inference time analysis shows that Naïve RAG adds approximately 1 s of retrieval latency per question. Query refiners introduce up to 4 s of additional overhead. However, the retrieved context reduces the time required for model reasoning in think mode. The most notable gains were observed in localized cultural knowledge, where web search integration correctly answered 32.9% of KazCulture questions that the Zero-Shot baseline failed on, while losing only 16.9% in return. These results suggest that retrieval-augmented SLMs can offer a cost-effective and competitive alternative to larger models for tasks in the domains of Kazakh language and Kazakh culture.
Keywords: small language models; retrieval-augmented generation; web search; benchmarking small language models; retrieval-augmented generation; web search; benchmarking
Graphical Abstract

Share and Cite

MDPI and ACS Style

Maxutov, A.; Medeu, N.; Varol, H.A. Web Search-Enhanced Small Language Models: A Case Study for a Kazakh-Centric Language Model. Mach. Learn. Knowl. Extr. 2026, 8, 128. https://doi.org/10.3390/make8050128

AMA Style

Maxutov A, Medeu N, Varol HA. Web Search-Enhanced Small Language Models: A Case Study for a Kazakh-Centric Language Model. Machine Learning and Knowledge Extraction. 2026; 8(5):128. https://doi.org/10.3390/make8050128

Chicago/Turabian Style

Maxutov, Akylbek, Nūrali Medeu, and Huseyin Atakan Varol. 2026. "Web Search-Enhanced Small Language Models: A Case Study for a Kazakh-Centric Language Model" Machine Learning and Knowledge Extraction 8, no. 5: 128. https://doi.org/10.3390/make8050128

APA Style

Maxutov, A., Medeu, N., & Varol, H. A. (2026). Web Search-Enhanced Small Language Models: A Case Study for a Kazakh-Centric Language Model. Machine Learning and Knowledge Extraction, 8(5), 128. https://doi.org/10.3390/make8050128

Article Metrics

Back to TopTop