Next Article in Journal
Transformer-Based Intrusion Detection for Post-5G and 6G Telecommunication Networks Using Dynamic Semantic Embedding
Previous Article in Journal
Computation Offloading in Space–Air–Ground Integrated Networks for Diverse Task Requirements with Integrated Reliability Mechanisms
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Systematic Review

BERT-Based Approaches for Web Service Selection and Recommendation: A Systematic Review with a Focus on QoS Prediction

by
Vijayalakshmi Mahanra Rao
1,
R Kanesaraj Ramasamy
2,* and
Md Shohel Sayeed
1
1
Faculty of Information Science & Technology, Multimedia University, Melaka 75450, Malaysia
2
Faculty of Computing and Informatics, Multimedia University, Cyberjaya 63000, Malaysia
*
Author to whom correspondence should be addressed.
Future Internet 2025, 17(12), 543; https://doi.org/10.3390/fi17120543
Submission received: 7 November 2025 / Revised: 24 November 2025 / Accepted: 25 November 2025 / Published: 27 November 2025

Abstract

Effective web service selection and recommendation are critical for ensuring high-quality performance in distributed and service-oriented systems. Recent research has increasingly explored the use of BERT (Bidirectional Encoder Representations from Transformers) to enhance semantic understanding of service descriptions, user requirements, and Quality of Service (QoS) prediction. This systematic review examines the application of BERT-based models in QoS-aware web service selection and recommendation. A structured database search was conducted across IEEE, ACM, ScienceDirect, and Google Scholar covering studies published between 2020 and 2024, resulting in twenty-five eligible articles based on predefined inclusion criteria and PRISMA screening. The review shows that BERT improves semantic representation and mitigates cold-start and sparsity issues, contributing to better service ranking and QoS prediction accuracy. However, challenges persist, including limited availability of benchmark datasets, high computational overhead, and limited interpretability of model decisions. The review identifies five key research gaps and outlines future directions, including domain-specific pre-training, hybrid semantic–numerical models, multi-modal QoS reasoning, and lightweight transformer architectures for deployment in dynamic and resource-constrained environments. These findings highlight the potential of BERT to support more intelligent, adaptive, and scalable web service management.

1. Introduction

Web services have significantly contributed to the ever-growing landscape of the Internet and the digital economy. Web services are designed for online interoperability; web services enable dynamic, cross-platform interactions that enhance both functionality and accessibility. The ability to select and recommend web services based on Quality of Service (QoS) attributes is very important to ensure that the user requirements are met effectively, offering optimal response times, reliability, and overall service quality. QoS encompasses a variety of attributes which indicate how well a service meets the user’s expectations and requirements. Some of the top attributes of QoS are reliability, response time, availability, throughput, and security.

1.1. Importance of QoS-Based Selection

The main goal of web service selection is to improve user satisfaction. Employing a QoS-based selection method helps users choose services that satisfy or align with their needs, hence enhancing the overall experience. This is because web services can differ significantly in their reliability and performance [1]. The operational efficiency of software systems is enhanced by effective web service selection and recommendations. By selecting the services with optimal QoS metrics, software applications can minimize latency, reduce the processing times, and also improve throughput [2]. Additionally, the ability to dynamically select and adapt the web services based on the QoS data is essential due to the rapidly changing environment. This adaptability ensures that the applications can respond to fluctuations in service availability or performance [3].

1.2. Challenges in Web Service Selection and Recommendation

Web services selection and recommendation based on QoS attributes are vital for the effectiveness of software systems. However, there are several challenges faced when it comes to implementing techniques or methods for web service selection and recommendation, especially QoS-based. Determining the attributes which are most relevant for specific applications can be challenging due to the variety and numerous metrics available to evaluate the service quality. The lack of standardization across services makes it difficult to compare the metrics consistently and effectively. QoS data have to be of high quality, of real time, and also consistent for it to have an accurate service selection. Network changes, system load, or outages can compromise the availability of reliable QoS data and this can lead to suboptimal service options [4]. As the number of available web services continues to grow, efficiently managing and analyzing large-scale datasets while maintaining high performance has become increasingly challenging [5]. Additionally, it is important to understand the correlations between various QoS attributes to ensure effective service selection. The dynamic nature of these attributes can complicate their interdependencies, which may lead to challenges in accurately predicting service performance [6].

1.3. Why BERT for Web Service Selection?

BERT (Bidirectional Encoder Representations from Transformers) model, developed by Google in 2018, has significantly advanced the field of Natural Language Processing (NLP). This model has facilitated improvements in various NLP tasks by leveraging the transformer architecture’s capabilities. The model’s bidirectional training process allows it to understand the context of a word based on the surrounding words (left and right) rather than just the following or preceding texts. This approach led to more profound understanding of language nuances, meanings, and relationships. Some of the advantages of BERT in NLP tasks are described in Table 1.
BERT enables more accurate semantic alignment between user requirements and service capabilities by comprehensively analyzing natural language service descriptions and queries. By doing so, service retrieval efficiency can be improved. For web service recommendation, BERT can be leveraged to analyze user preferences and historical interactions with the web services through the enhanced recommendation systems. The recommendation method can provide more personalized service suggestions that closely align with the user needs [9]. BERT’s capabilities can facilitate effectiveness of QoS predictions by enabling models to analyze the service performance descriptions derived from the operational data and user reviews. By understanding these QoS attributes such as reliability, response time, and user sentiment, BERT can inform more accurate forecasting models.
There is a need to systematically classify the methodologies or techniques of applying BERT to QoS prediction to foster clearer understanding and further progress.
The conceptual framework shown in Figure 1 illustrates how the proposed BERT-based model supports QoS-aware web service recommendations. The process starts with collecting various types of web service data, such as descriptions, execution logs, and user reviews. These inputs are then processed through the BERT Semantic Layer, where the model performs embedding and domain-specific fine-tuning to understand the contextual meaning of the information. The resulting representations are used in the following three key components: the QoS Prediction Model, which estimates the quality of service parameters; the User Preference Modeling Layer, which captures and interprets user needs and behaviors; and the Service Retrieval and Ranking Layer, which identifies and orders the most relevant services. Together, these layers produce a QoS-aware recommendation that balances objective service quality with individual user preferences, ensuring more accurate and personalized results.

1.4. Motivation and Objective

This paper aims to review and understand different BERT methods associated with QoS-based web service selection and recommendation. The review is conducted over high-level referred journals, conferences, and online databases such as IEEE, ACM. These are papers from 2020 to 2024 written in English. The main contributions of the paper are as following:
  • Identifying the gaps in the literature such as how BERT can address unique challenges in web service selection and recommendation and QoS prediction or how it compares against the traditional models.
  • Assessing the performance of BERT models by understanding the strengths and its weaknesses and also when and how to the model is to be applied effectively in the web service alongside with QoS attributes, distinguishing when BERT provides significant gains and when it does not.
  • We identify five high impact research gaps such as QoS attribute neglect, dataset reproducibility, model interpretability, scalability, and cost awareness, and translate them into a structured future research work.
The paper is organized by sections—introduction, background, methodology, BERT applications in the context of web service selection and recommendation, challenges and limitations, comparative analysis, future directions for BERT, and lastly conclusion.

2. Background

2.1. Web Service Selection and Recommendation

A web service is a method for different applications and systems to communicate across the Internet with each other and are an important part of modern software systems. They consist of standards and protocols that enable the data exchange between the systems.
Figure 2 illustrates the flow of the web service invocation. User first interacts with a web service through a browser. The browser then sends the requests over the internet. The web server processes these requests which may involve database, which then generates response back to user via the internet.
The web service selection and recommendation involve various steps for assessing the attributes and characteristics that ensure optimal choices for given selected tasks. The key steps are illustrated in Figure 3. The first phase depicts searching for available web services that meet the functional and non-functional requirements required by users. For second phase, once the potential services are identified, the QoS evaluation is performed to assess the service against the QoS attributes defined, for example, throughput, reliability, availability, and response time. This part helps to rank the services based on the user expectations or requirements and performance. Recommendation systems are used to suggest services to users based on the history, preferences, and the QoS predictions. The recommended services can be compiled into a single unit that performs specific tasks. Finally, the whole process is ongoing or continuously monitoring and this is crucial as the QoS attributes need to be continually assessed to adapt to the requirements based on the dynamic nature of user needs or conditions. This contributes to maintaining user satisfaction over time.

2.2. Existing Techniques for Service Selection and Recommendation

Traditionally, there are three types of recommender system—collaborative filtering, content-based, and hybrid. Collaborative filtering uses ratings and preferences of other users to recommend services to a particular user. Content-based system collects all the data items into different item profiles based on their description or features. Hybrid systems, on the other hand, combine both collaborative and content-based techniques to provide more accurate recommendations.
Content-based filtering is good at recommending new or less popular items and is less affected by the cold-start problem, but it tends to produce repetitive recommendations with limited diversity. Collaborative filtering, on the other hand, effectively captures user preferences, handles various item types, and adapts to changing user behavior over time. However, it struggles with the cold-start issue and data sparsity in large datasets. Hybrid methods combine the strengths of content-based and collaborative filtering, delivering more accurate and diverse recommendations. However, they are often more complex and resource-intensive, and their performance heavily depends on the quality of the underlying techniques. Choosing the right approach depends on factors like data availability, system scalability, and the need for personalization [10,11,12].

3. Systematic Review Methodology

3.1. Research Questions

Table 2 lists the research questions and the motivations that are kept in mind when preparing this systematic review paper. This will provide a comprehensive and thorough review of BERT’s application in web service selection and recommendation as well as QoS predictions.

3.2. Search Strategy and Selection

The search string and the sources from which the papers are identified are the main criteria in the search strategy. The search results are extracted in Research Information Systems (RIS) format and passed to Rayyan tool. The year range is only from year 2020 to 2024 for full text and metadata to keep the studies more relevant. These are taken only for English language studies.
Table 3 refers to the search strategies used according to each section or points.
Table 4 refers to the inclusion and exclusion criteria for the studies. These criteria are required to further select the relevant paper from the initial pool based on the search string.
From the defined research questions, the papers are evaluated and selected based on whether the objectives, the research motivations and methods, and the relevant information on the data, findings, results are clearly defined and supported.

3.3. Data Extraction

Information and data are extracted based on the relevant research questions defined in Table 2. Table 5 summarizes the same.

3.4. Quality Assessment

The final articles are assessed based on criteria as follows:
(1)
Whether the objective is clearly defined;
(2)
Whether the methods or techniques are elaborated;
(3)
Whether the authors have provided their findings and results based on proper data analysis.
Figure 4 demonstrates the PRISMA flow on the article selections. The PRISMA flow diagram outlines the systematic review process, beginning with the identification of 163 records from four databases (ACM, IEEE, Science Direct, and Google Scholar), from which 15 records (12 duplicates and 3 others—due to articles not related to BERT) were removed before screening, leaving 148 records for title or abstract screening. During screening, 125 records were excluded, resulting in 27 reports sought for full-text retrieval, of which 2 were unavailable, leaving 25 reports for eligibility assessment. No further exclusions were made, and all 25 studies were ultimately included in the review, demonstrating a transparent and methodical approach to study selection in accordance with PRISMA guidelines.
This review was conducted in accordance with the PRISMA 2020 (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. The full PRISMA checklist and flow diagram are provided in the Supplementary Materials.

4. Synthesis of the Literature

Literature matrix table is derived based on the review performed on the final selected studies as demonstrated in Table 6.
Table 7 represents dataset information used in the papers selected for systematic review.
From the dataset information derived, only 16% (4/25) papers use publicly available datasets. The following are the publicly available datasets:
(1)
WS-DREAM (Liu et al.)—5825 services, 339 users
(2)
ProgrammableWeb (Meghazi et al.)—8400+ services
(3)
Stack Overflow (Alsayed et al.)—API documentation corpus
(4)
FullTextPeerRead (Jeong)—Citation dataset
(5)
WSDream (Liu et al.)—QoS prediction dataset
The findings indicate that 84% of the studies are difficult to reproduce due to the unavailability of public datasets. They have limited ability to compare methods across studies; hence, there is a need for standardized benchmark datasets in the field.
Table 8 demonstrates the quantitative performance across studies. These are papers with quantitative metrics—17/25 (68%)
From the performance metrics derived, the critical gap identified is the inconsistent metric reporting—which hampers meta-analysis. The “T” represents the throughput, and “RT” represents Response time. Following summary refers to the following:
(1)
RMSE (Root Mean Square Error): 47.1% of evaluations (8/17)
(2)
MAE (Mean Absolute Error): 23.5% of evaluations (4/17)
(3)
Precision: 23.5% of evaluations (4/17)
(4)
NDCG (Normalized Discounted Cumulative Gain): 11.8% of evaluations (2/17)
(5)
Accuracy: 52.9% of evaluations (9/17)
It is impossible to determine “best” method definitively as the publications are biased towards positive results. As a recommendation, a standardized evaluation protocol can be adopted, for example, always including confidence intervals and statistical significance tests.
From these, we can infer the following key findings:
(1)
Best Classification Performance: Meghazi et al.—DeepLAB-WSC
(2)
Best QoS Prediction Performance: Liu et al.—llmQoS
(3)
Best Semantic Matching: Alam et al.—BERT Variants
(4)
Best Citation Performance: Jeong—BERT+GCN
(5)
Best Efficiency: Zeng et al.—Lightweight BERT
Table 9 lists QoS attributes coverage across all 25 papers (research gap matrix). A total of 60% of QoS attributes have zero or minimal BERT research.
The comprehensive analysis of all twenty-five papers reveals that BERT demonstrates consistent improvements, and 40% adoption rate shows growing interest. Coverage level is denoted by dividing the number of papers against twenty-five reviewed papers. There are diverse applications of BERT across QoS, recommendation, and classification.
Figure 5 represents graphical meta-summary of BERT based QoS research from 2020 to 2024. The figure on the left displays the publication trend showing a steady increase in studies using BERT for web service selection and QoS prediction. The graph on right displays the frequency of QoS attributes analyzed across reviewed works, highlighting that response time and availability dominate current research focus.
RMSE and MAE values represent aggregated averages derived from reported results across twenty-five studies (2020–2024). They illustrate the relative performance trend among traditional, BERT-based, and hybrid models in QoS prediction tasks. This is represented in Figure 6. Left figure displays aggregated comparison of RMSE and MAE across traditional and BERT-based models. The right figure displays forest plot showing mean RMSE values with standard deviation whiskers and a summary diamond representing the pooled mean performance across all models. The vertical dotted line denotes the overall mean RMSE across all models.

4.1. Application of BERT on Web Service Selection and Recommendation in the Context of QoS

BERT has significantly enhanced the performance of the web service selection and recommendation, primarily by focusing on improving the understanding of service descriptions, classification accuracy, and ultimately helping in better selection and recommendation outcomes.
BERT’s bidirectional encoding allows each token in a service description to attend to both left and right context, which is important for distinguishing subtle functional and non-functional details in the API texts and for accurately aligning user requirements with service capabilities. Pre-training with MLM (Masked Language Modeling) on large corpora yields robust contextual embeddings that transfer well to relatively small QoS datasets, allowing fine-tuned models such as QoSBERT or ServiceBERT to achieve strong QoS prediction performance without training from scratch. BERT’s subword tokenization handles out of vocabulary identifiers commonly found in API names, parameter labels, and QoS attributes, for example, RESTfulAPI, and RTT_ms, enabling robust encoding of heterogenous service documentation.
In the study by author [38], BERT was integrated together with Deep Pyramid Convolutional Neural Network (DPCNN) to capture both local and global contextual information in service descriptions. By doing so, it improves the service classification metrics, which is important in building QoS-aware recommendation systems. The study highlights how classifications based on web service descriptions can discriminate among services, especially in environments where services are functionally similar but differ in QoS parameters.
In the study by author, they proposed llmQoS (Large Language Model Aided QoS), which encodes descriptive attributes of users and services with a pre-trained language model (RoBERTa) and combines these embeddings with a collaborative-filtering prediction network. The model begins by converting user and service attributes into natural language sentences which are then processed by pre-trained LLMs (RoBERTa or Phi3mini). By extracting semantic features from textual attributes using LLMs, the model overcomes the data sparsity problem inherent in collaborative filtering. Even for new users or services with no historical interactions, the LLM features provide meaningful representations based on their descriptive attributes. llmQoS generalizes the BERT-style embedding idea by using a pre-trained encoder-only or decoder-only LLM to extract descriptive feature vectors from user and service attribute sentences, which play the same role as BERT embeddings in enriching CF-based QoS predictors under sparse data. Although llmQoS uses LLMs beyond vanilla BERT, it further validates the following underlying principle: descriptive text embeddings (obtained from LLM/BERT-style encoders) significantly alleviate sparsity and cold-start by providing content features independent of historical interactions. On the WS-DREAM dataset, llmQoS reduces MAE by >20% for throughput and >10% for response time compared to the best traditional baselines at low densities, confirming that rich textual embeddings are especially beneficial when QoS logs are sparse.
BERT has also been used to model user preferences and service semantics from text. One such study is from author [39], where WSR-DRL (Web Service Recommendation model based on Disentangled Representation Learning) has been proposed, which is an interpretable web service recommender that makes use of BERT to encode each service’s name and a CNN–BiLSTM to encode its description. This method uses a fine-tuned BERT model that learns subtle meaning in service names and descriptions through its bidirectional multi-head attention. The CLS token is used to generate a compact representation of each service, which is then passed into a disentangled interaction module to support more accurate QoS-aware recommendations. It is noteworthy that BERT captures fine-grained service semantics, which uses a “service—name BERT” and deep text encoders to yield richer features than bag of words. Unlike black-box neural models, WSR-DRL’s disentangled factors are interpretable. Users can understand why a service is recommended. This transparency builds trust and helps users make informed decisions.
ServiceBERT employs domain-specific pre-training on a large corpus of web service and API descriptions, followed by multitask fine tuning for service ecosystem tasks. General purpose BERT misses service specific distinctions (for example, REST vs. SOAP, API composition patterns). ServiceBERT’s domain specific vocabulary and multitask pre training capture these nuances, significantly improving performance on service ecosystem tasks (15 to 25% improvement over vanilla BERT on tagging accuracy). ServiceBERT exploits BERT’s masked-language pre-training and subword tokenization to learn domain-specific embeddings of API names and documentation, using CLS pooling from a fine-tuned BERT-base encoder as the service representation for downstream QoS-aware tagging and recommendation.
ServeNet-BERT uses transfer learning from pre-trained BERT with a sophisticated dual pooling strategy for service classification. The dual pooling strategy ([CLS] + mean) captures both global semantic meaning and detailed token level information. This is effective for long-service descriptions where important details might be distributed throughout the text. Experiments show 8 to 12% accuracy improvement over single pooling strategies. Table 10 refers to other studies performed using BERT, apart from the twenty-five articles reviewed.
Overall, BERT’s integration into the processes of web service selection and recommendation creates a robust framework capable of addressing the inherent complexities associated with QoS. As web services continue to proliferate, employing advanced techniques like BERT will be crucial in refining selection processes to meet user expectations effectively.
Table 11 is an extension of Table 10 to explore base model, input representation, fusion method, output layer, training strategy, and dataset used.

4.2. Preprocessing Pipelines from Structured Service Descriptions to BERT Inputs

Many studies extract semantically rich fields such as operation name, summary, description, tags, and parameter descriptions from OpenAPI/Swagger or WSDL, then concatenate them into a single text sequence per service. For ServeNet-BERT and WARBERT-style pipelines, the API name and free text description are concatenated, optionally with category labels or path segments, and fed to BERT as “[CLS] name [SEP] description … [SEP]. For older SOAP/WSDL services, studies typically strip the XML tags while preserving element names and documentation, normalize identifiers, removes schema artifacts, and then treats the resulting text as a short “documentation paragraph” for each operation before tokenization by BERT.
In pipeline-style architectures such as QoSBERT [42] and ServiceBERT-like models, the preprocessed text is passed through a BERT encoder to obtain a pooled sentence embedding or a sequence of token embeddings for attention or CNN/RNN layers. Figure 7 illustrates the common pipeline adopted across the included studies, starting from WSDL/OpenAPI artifacts and ending at a BERT-based embedding integrated with a QoS prediction head. Table 12 summarizes, for the main BERT-based approaches, which WSDL/OpenAPI fields are used, the approximate resulting text length, and the chosen encoder variant and pooling strategy. This highlights a common pattern: semantically rich name or description fields are flattened into short sequences (40–150 tokens) and encoded by BERT-family models using either CLS pooling or mean pooling.

4.3. Advantages and Limitations of Using BERT Models for Web Service Selection and Recommendation

BERT demonstrates significant potential in enhancing both the efficacy and efficiency in web service selection and recommendation processes. However, this comes with both advantages and limitations accordingly. Table 13 demonstrates further the advantages and limitations of BERT.
BERT’s contextual embeddings capture semantic nuances of service text and user feedback that are lost in simple keyword models. This enriches latent features for collaborative filtering and content matching. In QoS prediction, BERT features of user/service attributes help overcome cold-start sparsity, markedly reducing RMSE. In recommendation, BERT makes it easier to match mashup requirements or user preferences with API descriptions. Pre-training on domain data means the model “knows” web service jargon. Experiments consistently show that multi-embedding BERT models yield double-digit percentage gains in standard metrics, for example, RMSE, precision, recall.
BERT models are computationally intensive and require careful fine-tuning on domain-specific data to achieve optimal performance. Training ServiceBERT from scratch required assembling thousands of Web API documents. Even fine-tuning a BERT adds computation overhead at inference. In some cases, for response-time QoS prediction, llmQoS improved RMSE by only a few percent in contrast to larger throughput gains. Moreover, transformer models can be opaque. The disentangled learning in WSR-DRL adds interpretability, but most BERT-based methods remain black boxes. Finally, purely text-based BERT models ignore non-textual QoS signals, so they must be combined with numeric features.
The advantage of BERT-based content features at low density stems from the fact that they approximate similarity in a semantic space rather than relying solely on historical co-occurrence in the QoS matrix. By encoding WSDL or OpenAPI descriptions, API names, and other textual metadata into dense vectors, BERT captures functional and non-functional characteristics (e.g., domain, interface style, expected load, latency hints) that are shared by services, even if those services have never been invoked by the same users. When these embeddings are combined with sparse interaction data in a CF or MLP head, the model can generalize from semantically similar services or users, thereby mitigating sparsity and cold-start effects and yielding larger relative error reductions at 5–10% density than at 20% or higher.
In the Large Language Model (LLM) era, BERT-based encoders remain highly competitive for QoS-aware web service recommendation when cost, latency, or data-governance constraints are binding. Their moderate parameter scale and low inference latency make them suitable for high-throughput QoS prediction scenarios, where thousands of users–service pairs must be scored in real time. Moreover, BERT-family models such as BERT and RoBERTa can be fine-tuned and deployed entirely on premise, allowing organizations to retain full control over sensitive service descriptions and QoS logs, whereas GPT-class LLMs are typically accessed via external APIs with higher computational and governance overhead. Table 14 represents the comparison between BERT and GPT-style LLMs.
Overall, recent research indicates that BERT-style embeddings enhance web service selection and recommendation by bringing in rich textual knowledge. When integrated carefully (often in hybrid models), they lead to more accurate QoS predictions and more relevant ranked lists of services. However, effectiveness depends on having sufficient textual data (service descriptions, reviews, API docs) and computational resources for training.

4.4. Future Directions of Using BERT in the Context of Web Service Selection and Recommendation as Well as QoS Predictions

From the review, academically, research is moving towards adopting BERT models to domain-specific tasks, fusing multiple data modalities, and combining BERT with other models to overcome challenges of sparse or heterogeneous data. In one study [43], the author has addressed BERT’s domain bias by adding a lightweight transformation layer and contrastive objectives to align item representations across domains. Similarly, in another study [44], the author integrated BERT’s semantic embeddings of item descriptions with collaborative filtering signals graph-based hybrid models, called BeLightRec.
In QoS prediction, large language models (LLMs) are used to extract latent features from service and user descriptions for which the author showed that combining BERT-like sentence embeddings with collaborative filtering (CF) dramatically improves QoS estimates, cutting prediction error by over 10 to 20% compared to CF alone.
Table 15 summarizes the academic focus as well as the industry or the current application trends.

5. Conclusions

This review provides a detailed overview of how BERT and its variants are advancing research in web service selection, recommendation, and Quality of Service (QoS) prediction. The evidence from recent studies shows that BERT-based models generally achieve better performance than traditional and earlier deep learning approaches. By learning contextual relationships within service descriptions and user requirements, BERT helps bridge the gap between user intent and service functionality, leading to more accurate and adaptive service selection processes.
At the same time, several challenges remain. The computational demands of transformer models, the shortage of domain specific training data, and the absence of standardized benchmarks continue to limit broader adoption. Another concern is the limited interpretability of BERT-based systems, which makes it difficult to fully understand how decisions are made. Addressing these issues is essential for making such models practical in large-scale and real-time service environments.
Research is moving toward developing lighter, more transparent, and more efficient BERT variants that can operate effectively in diverse and resource-constrained settings. Promising directions include combining BERT with graph neural networks, reinforcement learning, and knowledge graphs to improve contextual reasoning and scalability. Advances in model compression, distillation, and prompt-based learning can also help reduce complexity while maintaining performance.
To move the field forward, we highlight three research priorities, that is, firstly to have standardized QoS benchmarks, interpretable and efficient BERT variants, and multi-modal QoS reasoning. In summary, BERT continues to play a central role in shaping intelligent, context-aware, and QoS-driven web service ecosystems. Its ability to understand and represent meaning-rich text positions it as a cornerstone for the next generation of adaptive and trustworthy service management systems that better align with user expectations and operational goals.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/fi17120543/s1, File S1: PRISMA 2020 Main Checklist. The full PRISMA checklist and flow diagram can be downloaded at: https://doi.org/10.17605/OSF.IO/RKZBJ.

Author Contributions

Conceptualization, V.M.R. and R.K.R.; Methodology, V.M.R. and R.K.R.; Software, V.M.R.; Validation, V.M.R., R.K.R., and M.S.S.; Formal analysis, V.M.R.; Writing—original draft preparation, R.K.R. and M.S.S.; Writing—review and editing, R.K.R. and M.S.S.; Supervision, R.K.R.; Project administration, R.K.R. All authors have read and agreed to the published version of the manuscript.

Funding

The APC was funded by Multimedia University.

Data Availability Statement

The original contributions presented in this study are included in the article/Supplementary Materials. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Hasnain, M.; Pasha, M.F.; Ghani, I.; Mehboob, B.; Imran, M.; Ali, A. Benchmark dataset selection of web services technologies: A factor analysis. IEEE Access 2020, 8, 53649–53665. [Google Scholar] [CrossRef]
  2. Sun, X.; Wang, S.; Xia, Y.; Zheng, W. Predictive-trend-aware composition of web services with time-varying quality-of-service. IEEE Access 2020, 8, 1910–1921. [Google Scholar] [CrossRef]
  3. Yuan, Y.; Guo, Y.; Ma, W. Dynamic service composition method based on zero-sum game integrated inverse reinforcement learning. IEEE Access 2023, 11, 111897–111908. [Google Scholar] [CrossRef]
  4. Rajendran, V.; Ramasamy, R.K.; Mohd-Isa, W.N. Improved eagle strategy algorithm for dynamic web service composition in the IoT: A conceptual approach. Future Internet 2022, 14, 56. [Google Scholar] [CrossRef]
  5. Liu, Q.; Wang, L.; Du, S.; Wyk, B.J.V. A method to enhance web service clustering by integrating label-enhanced functional semantics and service collaboration. IEEE Access 2024, 12, 61301–61311. [Google Scholar] [CrossRef]
  6. Bonab, M.N.; Tanha, J.; Masdari, M. A semi-supervised learning approach to quality-based web service classification. IEEE Access 2024, 12, 50489–50503. [Google Scholar] [CrossRef]
  7. Kowsher, M.; Sami, A.A.; Prottasha, N.J.; Arefin, M.S.; Dhar, P.K.; Koshiba, T. Bangla-bert: Transformer-based efficient model for transfer learning and language understanding. IEEE Access 2022, 10, 91855–91870. [Google Scholar] [CrossRef]
  8. Kim, M.; Lee, S.; Oh, Y.; Choi, H.; Kim, W. A near-real-time answer discovery for open-domain with unanswerable questions from the web. IEEE Access 2020, 8, 158346–158355. [Google Scholar] [CrossRef]
  9. Zhang, C.; Qin, S.; Wu, H.; Zhang, L. Cooperative mashup embedding leveraging knowledge graph for web api recommendation. IEEE Access 2024, 12, 49708–49719. [Google Scholar] [CrossRef]
  10. Ramasamy, R.K.; Chua, F.F.; Haw, S.C.; Ho, C.K. WSFeIn: A novel, dynamic web service composition adapter for cloud-based mobile application. Sustainability 2022, 14, 13946. [Google Scholar] [CrossRef]
  11. Roy, D.; Dutta, M. A systematic review and research perspective on recommender systems. J. Big Data 2022, 9, 59. [Google Scholar] [CrossRef]
  12. Ghafouri, S.H.; Hashemi, S.M.; Hung, P.C. A survey on web service QoS prediction methods. IEEE Trans. Serv. Comput. 2020, 15, 2439–2454. [Google Scholar] [CrossRef]
  13. Kumar, S.; Chattopadhyay, S.; Adak, C. TPMCF: Temporal QoS Prediction Using Multi-Source Collaborative Features. IEEE Trans. Netw. Serv. Manag. 2024, 21, 3945–3955. [Google Scholar] [CrossRef]
  14. Liu, H.; Zhang, Z.; Li, H.; Wu, Q.; Zhang, Y. Large Language Model Aided QoS Prediction for Service Recommendation. arXiv 2024. [Google Scholar] [CrossRef]
  15. Atzeni, D.; Bacciu, D.; Mazzei, D.; Prencipe, G. A systematic review of Wi-Fi and machine learning integration with topic modeling techniques. Sensors 2022, 22, 4925. [Google Scholar] [CrossRef] [PubMed]
  16. Xu, Z.; Gu, Y.; Yao, D. WARBERT: A Hierarchical BERT-based Model for Web API Recommendation. arXiv 2025. [Google Scholar] [CrossRef]
  17. Li, M.; Xu, H.; Tu, Z.; Su, T.; Xu, X.; Wang, Z. A deep learning based personalized QoE/QoS correlation model for composite services. In Proceedings of the 2022 IEEE International Conference on Web Services (ICWS), Barcelona, Spain, 10–16 July 2022; IEEE: New York, NY, USA, 2022; pp. 312–321. [Google Scholar]
  18. Long, S.; Tan, J.; Mao, B.; Tang, F.; Li, Y.; Zhao, M.; Kato, N. A Survey on Intelligent Network Operations and Performance Optimization Based on Large Language Models. IEEE Commun. Surv. Tutor. 2025. [Google Scholar] [CrossRef]
  19. Koudouridis, G.P.; Shalmashi, S.; Moosavi, R. An evaluation survey of knowledge-based approaches in telecommunication applications. Telecom 2024, 5, 98–121. [Google Scholar] [CrossRef]
  20. Alsayed, A.S.; Dam, H.K.; Nguyen, C. MicroRec: Leveraging Large Language Models for Microservice Recommendation. In MSR ‘24: Proceedings of the 21st International Conference on Mining Software Repositories, Lisbon, Portugal, 15–16 April 2024; Association for Computing Machinery: New York, NY, USA, 2024; pp. 419–430. [Google Scholar] [CrossRef]
  21. Liu, H.; Zhang, W.; Zhang, X.; Cao, Z.; Tian, R. Context-aware and QoS prediction-based cross-domain microservice instance discovery. In Proceedings of the 2022 IEEE 13th International Conference on Software Engineering and Service Science (ICSESS), Beijing, China, 21–23 October 2022; IEEE: New York, NY, USA, 2022; pp. 30–34. [Google Scholar]
  22. Meghazi, H.M.; Mostefaoui, S.A.; Maaskri, M.; Aklouf, Y. Deep Learning-Based Text Classification to Improve Web Service Discovery. Comput. Y Sist. 2024, 28, 529–542. [Google Scholar] [CrossRef]
  23. Zeng, K.; Paik, I. Dynamic service recommendation using lightweight BERT-based service embedding in edge computing. In Proceedings of the 2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC), Singapore, 20–23 December 2021; IEEE: New York, NY, USA, 2021; pp. 182–189. [Google Scholar]
  24. Zhang, P.; Ren, J.; Huang, W.; Chen, Y.; Zhao, Q.; Zhu, H. A deep-learning model for service QoS prediction based on feature mapping and inference. IEEE Trans. Serv. Comput. 2023, 17, 1311–1325. [Google Scholar] [CrossRef]
  25. Alam, K.A.; Haroon, M. Evaluating Fine-tuned BERT-based Language Models for Web API Recommendation. In Proceedings of the 2024 IEEE International Conference on Cloud Computing Technology and Science (CloudCom), Abu Dhabi, United Arab Emirates, 9–11 December 2024; IEEE: New York, NY, USA, 2024; pp. 135–142. [Google Scholar]
  26. Karapantelakis, A.; Alizadeh, P.; Alabassi, A.; Dey, K.; Nikou, A. Generative AI in mobile networks: A survey. Ann. Telecommun. 2024, 79, 15–33. [Google Scholar] [CrossRef]
  27. Bhanage, D.A.; Pawar, A.V.; Kotecha, K. IT infrastructure anomaly detection and failure handling: A systematic literature review focusing on datasets, log preprocessing, machine & deep learning approaches and automated tool. IEEE Access 2021, 9, 156392–156421. [Google Scholar] [CrossRef]
  28. Qu, G.; Chen, Q.; Wei, W.; Lin, Z.; Chen, X.; Huang, K. Mobile edge intelligence for large language models: A contemporary survey. IEEE Commun. Surv. Tutor. 2025. [Google Scholar] [CrossRef]
  29. Hameed, A.; Violos, J.; Santi, N.; Leivadeas, A.; Mitton, N. FeD-TST: Federated Temporal Sparse Transformers for QoS prediction in Dynamic IoT Networks. IEEE Trans. Netw. Serv. Manag. 2024, 22, 1055–1069. [Google Scholar] [CrossRef]
  30. Huang, W.; Zhang, P.; Chen, Y.; Zhou, M.; Al-Turki, Y.; Abusorrah, A. QoS Prediction Model of Cloud Services Based on Deep Learning. IEEE/CAA J. Autom. Sin. 2022, 9, 564–566. [Google Scholar] [CrossRef]
  31. Le, F.; Srivatsa, M.; Ganti, R.; Sekar, V. Rethinking data-driven networking with foundation models: Challenges and opportunities. In Proceedings of the 21st ACM Workshop on Hot Topics in Networks, Austin, TX, USA, 14–15 November 2022; pp. 188–197. [Google Scholar]
  32. Jeong, C.; Jang, S.; Shin, H.; Park, E.; Choi, S. A Context-Aware Citation Recommendation Model with BERT and Graph Convolutional Networks. arXiv 2019. [Google Scholar] [CrossRef]
  33. Liu, M.; Xu, H.; Sheng, Q.Z.; Wang, Z. QoSGNN: Boosting QoS Prediction Performance with Graph Neural Networks. IEEE Trans. Serv. Comput. 2023, 17, 645–658. [Google Scholar] [CrossRef]
  34. Lian, H.; Li, J.; Wu, H.; Zhao, Y.; Zhang, L.; Wang, X. Toward Effective Personalized Service QoS Prediction From the Perspective of Multi-Task Learning. IEEE Trans. Netw. Serv. Manag. 2023, 20, 2587–2597. [Google Scholar] [CrossRef]
  35. Jirsik, T.; Trčka, Š.; Celeda, P. Quality of service forecasting with LSTM neural network. In Proceedings of the 2019 IFIP/IEEE Symposium on Integrated Network and Service Management (IM), Washington, DC, USA, 8–12 April 2019; IEEE: New York, NY, USA, 2019; pp. 251–260. [Google Scholar]
  36. Guo, C.; Zhang, W.; Dong, N.; Liu, Z.; Xiang, Y. QoS-aware diversified service selection. IEEE Trans. Serv. Comput. 2022, 16, 2085–2099. [Google Scholar] [CrossRef]
  37. Boulakbech, M.; Messai, N.; Sam, Y.; Devogele, T. Deep learning model for personalized web service recommendations using attention mechanism. In Proceedings of the International Conference on Service-Oriented Computing, Rome, Italy, 28 November–1 December 2023; Springer Nature: Cham, Switzerland, 2023; pp. 19–33. [Google Scholar]
  38. Xue, L.; Zhang, F. Lcpcwsc: A web service classification approach based on label confusion and priori correction. Int. J. Web Inf. Syst. 2024, 20, 213–228. [Google Scholar] [CrossRef]
  39. Huang, Y.; Cao, Z.; Chen, S.; Zhang, X.; Wang, P.; Cao, Q. Interpretable web service recommendation based on disentangled representation learning. J. Intell. Fuzzy Syst. 2023, 45, 133–145. [Google Scholar] [CrossRef]
  40. Wang, X.; Zhou, P.; Wang, Y.; Liu, X.; Liu, J.; Wu, H. Servicebert: A pre-trained model for web service tagging and recommendation. In Proceedings of the International Conference on Service-Oriented Computing, Online, 22–25 November 2021; Springer International Publishing: Cham, Switzerland, 2021; pp. 464–478. [Google Scholar]
  41. Yang, Y.; Qamar, N.; Liu, P.; Grolinger, K.; Wang, W.; Li, Z.; Liao, Z. Servenet: A deep neural network for web services classification. In Proceedings of the 2020 IEEE International Conference on Web Services (ICWS), Beijing, China, 19–23 October 2020; pp. 168–175. [Google Scholar]
  42. Wang, Z.; Zhang, X.; Li, Z.S.; Yan, M. QoSBERT: An Uncertainty-Aware Approach Based on Pre-trained Language Models for Service Quality Prediction. IEEE Trans. Serv. Comput. 2025, 1–13. [Google Scholar] [CrossRef]
  43. Liu, P.; Zhang, L.; Gulla, J.A. Pre-train, prompt, and recommendation: A comprehensive survey of language modeling paradigm adaptations in recommender systems. Trans. Assoc. Comput. Linguist. 2023, 11, 1553–1571. [Google Scholar] [CrossRef]
  44. Van, M.M.; Tran, T.T. BeLightRec: A Lightweight Recommender System Enhanced with BERT. In Proceedings of the International Conference on Intelligent Systems and Data Science, Nha Trang, Vietnam, 9–10 November 2024; Springer Nature: Singapore, 2024; pp. 30–43. [Google Scholar]
  45. Kharidia, V.; Paprunia, D.; Kanikar, P. LightFusionRec: Lightweight Transformers-Based Cross-Domain Recommendation Model. In Proceedings of the 2024 First International Conference on Software, Systems and Information Technology (SSITCON), Tumkur, India, 18–19 October 2024; IEEE: New York, NY, USA, 2024; pp. 1–7. [Google Scholar]
  46. Liu, Q.; Zhao, X.; Wang, Y.; Wang, Y.; Zhang, Z.; Sun, Y.; Li, X.; Wang, M.; Jia, P.; Chen, C.; et al. Large language model enhanced recommender systems: Taxonomy, trend, application and future. arXiv 2024, arXiv:2412.13432. [Google Scholar] [CrossRef]
  47. Singh, S. BERT Algorithm used in Google Search. Math. Stat. Eng. Appl. 2021, 70, 1641–1650. [Google Scholar] [CrossRef]
  48. Sun, F.; Liu, J.; Wu, J.; Pei, C.; Lin, X.; Ou, W.; Jiang, P. BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–7 November 2019; pp. 1441–1450. [Google Scholar]
  49. Fine-Tune and Host Hugging Face Bert Models on Amazon Sagemaker|AWS Machine Learning Blog. Available online: https://aws.amazon.com/blogs/machine-learning/fine-tune-and-host-hugging-face-bert-models-on-amazon-sagemaker/ (accessed on 23 October 2025).
Figure 1. Conceptual framework of using a BERT-based semantic approach.
Figure 1. Conceptual framework of using a BERT-based semantic approach.
Futureinternet 17 00543 g001
Figure 2. Web service process.
Figure 2. Web service process.
Futureinternet 17 00543 g002
Figure 3. Key steps of web service selection and recommendation.
Figure 3. Key steps of web service selection and recommendation.
Futureinternet 17 00543 g003
Figure 4. PRISMA flowchart on article selection.
Figure 4. PRISMA flowchart on article selection.
Futureinternet 17 00543 g004
Figure 5. Graphical meta-summary of BERT—based QoS research (2020–2024).
Figure 5. Graphical meta-summary of BERT—based QoS research (2020–2024).
Futureinternet 17 00543 g005
Figure 6. Quantitative summary of model performance for QoS prediction.
Figure 6. Quantitative summary of model performance for QoS prediction.
Futureinternet 17 00543 g006
Figure 7. Typical preprocessing and modeling pipeline for using structured service descriptions in BERT-based QoS prediction.
Figure 7. Typical preprocessing and modeling pipeline for using structured service descriptions in BERT-based QoS prediction.
Futureinternet 17 00543 g007
Table 1. Advantages of BERT in NLP tasks.
Table 1. Advantages of BERT in NLP tasks.
AdvantageDescription
Contextual UnderstandingThe nature of bidirectional process allows it to develop contextual understanding of words. For example, the word “bank” can be distinguished with different meanings depending on the context of its usage [7].
Transfer LearningThe model can be fine-tuned for specific tasks with small or limited datasets due to pre-training on large amounts of text data. This helps in improving performance across different applications without the need to train the model from the beginning [8].
Performance on BenchmarksPreviously, BERT consistently outperformed older models based on wide range of NLP benchmarks such as SQuAD (Stanford Question Answering Dataset), thus demonstrating the robustness and adaptability of the model.
Table 2. Research Questions and Motivations.
Table 2. Research Questions and Motivations.
Research Questions (RQ)Motivations
(RQ1) How has BERT been applied to web service selection and recommendation in the context of QoS?
  • To provide insight into how BERT is utilized for web service selection and recommendation in context of QoS.
  • Important to evaluate the effectiveness of BERT’s model.
(RQ2) What are the advantages and limitations of using BERT models for web service selection and recommendation?To critically assess the advantages and its limitations in BERT application, especially in the context of QoS.
(RQ3) How does BERT compare to the traditional method?
  • Comparing with other traditional methods, such as collaborative filtering, content-based, and rule-based systems.
  • Helps decide how and when to apply the BERT methods in service selection and recommendation
(RQ4) What are the challenges faced when using BERT models for tasks such as web selection, recommendation, and QoS predictions?
  • To explore domain-specific issues
(RQ5) What are the future directions of using BERT in the context of web service selection and recommendation as well as QoS predictions?
  • Helps to identify the gaps and opportunities in using BERT’s capabilities
Table 3. Search String for IEEE.
Table 3. Search String for IEEE.
Digital LibrarySearch String
IEEE((“Full Text & Metadata”:“BERT” OR “Full Text & Metadata”:“Bidirectional Encoder Representations”)) AND ((“Full Text & Metadata”:“web service recommendation” OR “Full Text & Metadata”:“Service selection”)) AND ((“Full Text & Metadata”:“QoS” OR “Quality of Service”)) AND ((“Full Text & Metadata”:“prediction” OR “optimization”))
ACM[[All: “bert”] OR [All: “bidirectional encoder representations”]] AND [[All: “web service recommendation”] OR [All: “service selection”]] AND [[All: “qos”] OR [All: “quality of service”]] AND [[All: “prediction”] OR [All: “optimization”]]
Science Direct(“BERT” OR “Bidirectional Encoder Representations”) AND (“web service recommendation” OR “Service selection”) AND (“QoS” OR “Quality of Service”)
Google Scholarweb service recommendation and qos prediction using bert
Table 4. Selection criteria for the studies.
Table 4. Selection criteria for the studies.
InclusionExclusion
  • Papers are published from 2020 to 2024
  • Papers are available in full text
  • Papers are written in the English language
  • BERT methods are clearly defined in the paper
  • Papers published in journals other than IEEE
  • BERT methods not employed within the context of web service selection
  • Papers not in the English language
  • Books, standards, magazines, and early-access articles
Table 5. Data point and its relevant RQ.
Table 5. Data point and its relevant RQ.
Data PointDescriptionResearch Question Relevant
BERT Methods or techniquesMethods and techniques used in the articleRQ1
Advantages Advantages identified in the articlesRQ2
Techniques for web service selections and recommendationTechniques or methods used in the traditional service selection and recommendationRQ3
Challenges or LimitationsChallenges and limitations of BERT RQ4
Future directionsFuture research ideas or techniques that can be considered for BERTRQ5
Table 6. Literature review matrix—BERT applications.
Table 6. Literature review matrix—BERT applications.
StudyYearModel TypeFindings
Kumar, S. et al. [13]2024Transformer, Deep Learning, Graph Neural NetworksIntegrates graph convolution and collaborative filtering for temporal QoS prediction and addresses data sparsity and temporal dependencies in QoS prediction
Liu, H. et al. [14]2024Large Language Models (LLMs), NLP Transformer ModelsIntroduces llmQoS model using LLMs (RoBERTa, Phi3mini) for QoS prediction. The proposed method overcomes data sparsity issue without relying only on historical interactions.
Atzeni et al. [15]2022Machine LearningReviews the integration of Wi-Fi with machine learning and topic modeling.
Xu, Z. et al. [16]2024Deep LearningUses pre-trained BERT for semantic understanding of API descriptions
Li et al. [17]2022Deep LearningPresents a deep learning model for personalized QoE/QoS correlation in composite services.
Long et al. [18]2025Large Language Models (LLMs)Surveys the use of LLMs for intelligent network operations.
Koudouridis et al. [19]2024Knowledge-Based ApproachesSurveys knowledge-based approaches in telecommunications.
Alsayed, A.S. et al. [20]2024Large Language Models (LLMs), Deep LearningProvides context-aware recommendations for microservice discovery
Liu et al. [21]2022Context-Aware, QoS PredictionProposes a context-aware and QoS prediction-based method for microservice instance discovery.
Meghazi, H.M. et al. [22]2024Natural Language Processing (NLP), Deep LearningProposes DeepLAB-WSC using word embeddings (Word2Vec, GloVe, BERT) which outperforms state-of-the-art web service classification methods.
Zeng et al. [23]2021Lightweight BERTProposes a lightweight BERT-based method for dynamic service recommendation in edge computing.
Zhang, P. et al. [24]2024Deep Learning, Neural NetworksDeep learning model with feature mapping for QoS prediction which addresses the challenges in service quality prediction.
Alam et al. [25]2024Fine-tuned BERTEvaluates fine-tuned BERT models for recommending Web APIs, focusing on semantic enrichment.
Karapantelakis et al. [26]2024Generative AI (Survey)Surveys the application of generative AI in mobile networks.
Bhanage et al. [27]2021Machine Learning, Deep Learning (Review)Reviews ML/DL techniques for anomaly detection and failure handling in IT infrastructure.
Qu et al. [28]2025Large Language Models (LLMs)Surveys the use of LLMs in mobile edge intelligence to enhance performance.
Hameed, A. et al. [29]2024Deep LearningCombines federated learning with sparse transformer architecture and preserves privacy while enabling collaborative QoS prediction.
Huang, W.J. et al. [30]2022Deep Neural Networks, Cloud Service ModelingPresents deep learning-based QoS prediction model for cloud services and addresses QoS prediction challenges
Le et al. [31]2022Foundation ModelsExplores the potential of foundation models for network traffic analysis and management.
Jeong, C. [32]2019Graph Convolutional Networks (GCN)Proposed method that combines BERT with Graph Convolutional Networks (GCN).
Liu, M. et al. [33]2023Deep LearningProposed method for QoS prediction using Graph Neural Networks.
Lian, H. et al. [34]2023Deep LearningProposed PMT (Personalized Multi-Task) learning framework; a multi-task approach for improved prediction accuracy
Jirsik, T. et al. [35]2019Deep LearningProposed Long Short-Term Memory (LSTM) neural networks for QoS attribute forecasting
Guo, C. et al. [36]2022Deep LearningProposed service distance-based attention mechanism that embeds users in the model for QoS-aware selection.
Boulakbech, M. et al. [37]2023Deep Learning, Multi-modal LearningThere are two attention mechanisms proposed, namely, functional (tag-based) and non-functional (QoS-based). This improves recommendation quality through dual attention.
Table 7. Literature review matrix—dataset information.
Table 7. Literature review matrix—dataset information.
StudyDatasetSize (Services)DomainQoS AttributesAvailability
Kumar et al. [13]—TPMCFWSDREAM-2Temporal dataTemporal QoS PredictionTemporal QoS metricsResearch dataset
Liu et al. [14]—llmQoSWS-DREAM5825Web Services QoSThroughput, Response TimePublicly available
Atzeni et al. [15]Wi-Fi datasetsNot availableWi-Fi NetworksWi-Fi performanceVarious
Xu et al. [16]—WARBERTWeb API CollectionLarge scale APIsAPI RecommendationNot availableNot specified
Li et al. [17]Composite ServicesComposite servicesComposite ServicesQoE, QoS correlationNot specified
Long et al. [18]Survey (various)Not available (survey)Network OperationsNot availableNot available
Koudouridis et al. [19]Survey (various)Not available (survey)TelecommunicationsNot availableNot available
Alsayed et al. [20]—MicroRecStack Overflow + API CorpusAPI docsMicroservicesNot availableStack Overflow (public)
Liu et al. [21]Microservice InstancesMicroservicesMicroservicesQoS + ContextNot specified
Meghazi et al. [22]—DeepLAB-WSCProgrammableWeb8400+ servicesWeb Service ClassificationNot availablePublicly available
Zeng et al. [23]—Lightweight BERTEdge ServicesEdge servicesEdge ComputingReal-time constraintsNot specified
Zhang et al. [24]Cloud ServicesCloud servicesCloud ComputingService quality metricsNot specified
Alam et al. [25]Web API RepositoryWeb APIsAPI DiscoveryNot availableNot specified
Karapantelakis et al. [26]Survey (various)Not available (survey)Mobile NetworksNot availableNot available
Bhanage et al. [27]Review (various)Not available (review)IT InfrastructureAnomaliesNot available
Qu et al. [28]Survey (various)Not available (survey)Mobile Edge IntelligenceNot availableNot available
Hameed et al. [29]—FeD-TSTIoT Networks/B5GIoT servicesIoT/B5G NetworksNetwork QoSNot specified
Huang et al. [30]Cloud ServicesCloud servicesCloud ServicesCloud QoSNot specified
Le et al. [31]Network TrafficNetwork logsNetwork TrafficNot availableNot specified
Jeong [32]—BERT+GCNFullTextPeerReadCitation dataCitation RecommendationNot availablePublicly available
Liu et al. [33]—QoSGNNWSDreamNot specifiedQoS PredictionVarious QoSPublicly available
Lian et al. [34]—PMTQoS DatasetNot specifiedQoS PredictionMulti-attribute QoSNot specified
Jirsik et al. [35]Network QoS MetricsNetwork dataNetwork QoSNetwork QoSNot specified
Guo et al. [36]Service SelectionServicesService SelectionQoS attributesNot specified
Boulakbech et al. [37]Service RecommendationServicesService RecommendationNot availableNot specified
Table 8. Literature review matrix—performance metrics.
Table 8. Literature review matrix—performance metrics.
StudyTaskRMSEMAEPrecisionNDCGAccuracyImprovement
Kumar [13]—TPMCFTemporal QoSImproved----Significant
Liu [14]—llmQoS (T, 20%)QoS Throughput0.101 (±0.004)0.095 (±0.003)---10.2% RMSE ↓
Liu [14]—llmQoS (T, 5%)QoS Throughput0.083 (±0.003)0.079 (±0.002)---7.2% RMSE ↓
Liu [14]—llmQoS (RT, 20%)QoS Response Time0.077 (±0.005)0.073 (±0.004)---21.4% RMSE ↓
Liu [14]—llmQoS (RT, 5%)QoS Response Time0.066 (±0.003)0.064 (±0.002)---2.1% RMSE ↓
Xu [16]—WARBERTAPI Recommendation--BetterImproved0.83Better precision@k
Alsayed [20] —MicroRecMicroservices--Improved-BetterSuperior context
Meghaz [22] —DeepLABClassification----~0.85~15–20% accuracy
Zeng [23] —LightweightEdge Recommendation----Similar40% faster
Alam [25]—BERT VariantsAPI Recommendation---0.87 (RoBERTa)0.87RoBERTa best
Hameed [29]—FeD-TSTQoS (IoT)-----Privacy preserving
Jeong [32]—BERT+GCNCitations----MAP: 0.84+28% MAP
Liu [33]—QoSGNNQoS PredictionImproved---BetterSuperior GNN
Lian—PMTPersonalized QoSImproved---BetterMulti-task
Jirsik [35]—LSTMQoS ForecastingBetter---BetterBetter granularity
Guo [36]—QoS-AwareService Selection--Improved--Enhanced diversity
Boulakbech [37]—DualPersonalized Rec--~0.78--Better personal
Table 9. Literature review matrix—research gap matrix.
Table 9. Literature review matrix—research gap matrix.
QoS AttributePapers AddressingCoverage LevelGap SeverityFuture Directions
Response TimeLiu, Kumar, Hameed, JirsikWell Studied (16%)LowMulti-modal temporal prediction
ThroughputLiu, KumarWell Studied (8%)LowCross-domain generalization
LatencyHameed, ZengModerately Studied (8%)MediumReal-time edge optimization
Accuracy (Classification)Multiple implicitModerately Studied (20%)MediumExplainable predictions
AvailabilityGeneral mentions, no specific studiesUnder Studied (0%)HighBERT for uptime prediction
ReliabilityLimited coverageUnder Studied (0%)HighSentiment analysis of reviews
ScalabilityImplicit in some studiesUnder Studied (0%)HighBERT for auto-scaling
SecurityNo dedicated studiesCritically Under Studied (0%)CriticalSecurity policy understanding
CostNo dedicated studiesCritically Under Studied (0%)CriticalCost–benefit analysis
UsabilityNo dedicated studiesCritically Under Studied (0%)CriticalUX improvement suggestions
Table 10. Other studies of BERT applications.
Table 10. Other studies of BERT applications.
Study (Year)Model/ApproachData and TaskKey Results
Liu et al., 2024 [14]llmQoS: RoBERTa + CF for QoS predictionWS-DREAM (user × service QoS matrix)RMSE↓ by ~7–10% (throughput), RMSE↓ by 2–21% (response time) vs. CF baselines. Consistent MAE/RMSE gains at all sparsity levels.
Huang et al., 2023 [39]WSR-DRL: BERT (service name) + 2D-CNN+BiLSTM (description) + disentangled interactionsReal user-service rating data (service rec)Outperforms DMF, DeepFM, DKN, GCMC, etc., on Precision@10, Recall@10, NDCG@10. BERT-name + CNN/LSTM provides richer features.
Wang et al., 2021 [40]ServiceBERT: BERT pre-trained (MLM+RTD+contrastive) for service textProgrammableWeb APIs/mashup tasksHigher accuracy on API tagging and mashup recommendation vs. prior methods. (Reported “better performance” on two service tasks.)
Yang et al., 2020 [41]ServeNet-BERT: BERT embeddings of service name and description + NNService description classification (OpenAPI)Achieved much higher classification accuracy than 10 ML baselines (e.g., LDA+SVM, LSTM).
Table 11. Other studies of BERT applications (Extension of Table 10).
Table 11. Other studies of BERT applications (Extension of Table 10).
StudyBase ModelInput RepresentationFusion MethodOutput LayerTraining StrategyDataset
Liu et al., 2024 [14]
llmQoS
RoBERTa-base
(Pre-trained LLM)
- User descriptive attributes (text)
- Service descriptive attributes (text)
- Historical QoS values (numerical)
- LLM embeddings (768-dim)
- Concatenated with user-service ID embeddings
- Fed into collaborative filtering network
- Fully connected layers
- Regression output for QoS prediction
- Separate outputs for throughput and response time
- Two-stage: (1) Pre-trained RoBERTa frozen (2) Fine-tune CF network
- Adam optimizer
- MSE loss function
WS-DREAM
- 339 users—5825 services
- Throughput and response time metrics
- Multiple sparsity levels (5%, 10%, 15%, 20%)
Huang et al., 2023 [39]
WSR-DRL
BERT-base
(Service name encoder)
+ CNN-BiLSTM
(Description encoder)
- Service name—BERT tokenization
- Service description -Word embeddings
- User-service interaction matrix
- Disentangled latent factors
- BERT embeddings for service names
- 2D-CNN extracts local features from descriptions
- BiLSTM captures sequential context
- Concatenate name + description embeddings
- Disentangled user-service interactions
- Disentangled representation layer
- Rating prediction layer
- Softmax for ranking
- End-to-end training with disentanglement loss
- Separate user and service latent factors
- Regularization for interpretability
- Cross-entropy + MSE loss
- Real-world user-service ratings
- Service descriptions from repositories
- User interaction history
- Multi-domain services
Wang et al., 2021 [40]
ServiceBERT
BERT-base
(Custom pre-trained on service corpus)
- Service textual descriptions
- API documentation
- Service tags and categories
- Mashup descriptions
- [CLS] token for classification
- Multi-task pre-training:
- Masked Language Modeling (MLM)
- Replaced Token Detection (RTD)
- Contrastive learning for service pairs
- Domain-specific vocabulary
- Multi-task heads:
- Service tagging (classification)
- Mashup recommendation (ranking)
- Service similarity (contrastive)
- Three-stage pre-training:
  (1) MLM on service corpus
  (2) RTD for token detection
  (3) Contrastive learning
- Fine-tuning for downstream tasks
- AdamW optimizer
- ProgrammableWeb
- 16,000+ APIs
- 6000+ mashups
- Service descriptions and tags
- API-mashup relationships
Yang et al., 2020 [41]
ServeNet-BERT
BERT-base
(Sentence embeddings)
- Service name
- Service description text
- OpenAPI specifications
- Concatenated text features
- BERT generates sentence embeddings
- Embeddings fed to feedforward neural network
- Pooling of token embeddings ([CLS] + mean pooling)
- Dense layers for feature transformation
- Multi-layer perceptron (MLP)
- Softmax layer for classification
- Output: Service category labels
- Transfer learning from pre-trained BERT
- Fine-tuning on service classification
- Cross-entropy loss
- Dropout for regularization (0.1–0.3)
- Early stopping
- OpenAPI dataset
- 2000+ web services
- 10 service categories
- Service descriptions and specifications
Table 12. Summary of service specification types, extracted fields, text length, and encoder configurations used across BERT-based web service selection and QoS prediction models.
Table 12. Summary of service specification types, extracted fields, text length, and encoder configurations used across BERT-based web service selection and QoS prediction models.
StudyService Spec typeFields Used from WSDL/OpenAPITypical Text Length (tokens)Encoder Type
Liu et al., 2024 [14]WSDLOperation name, documentation, portType30–80BERT-base (CLS pooling)
Huang et al., 2023 [39] OpenAPIPath, operation summary, description, tags40–120Sentence-BERT (mean pooling)
Wang et al., 2021 [40]Swagger/OpenAPIAPI name, description, parameter names50–150RoBERTa-base (CLS pooling)
Yang et al., 2020 [41]WSDL + free textOperation name, documentation + manual notes60–90BERT-base (CLS pooling)
Wang, Z et al., 2025 [42]Mixed APIsService name, description, category40–100Domain-tuned BERT/SBERT
Table 13. Advantages and limitations of BERT.
Table 13. Advantages and limitations of BERT.
AspectsAdvantagesLimitations
Text UnderstandingCaptures deep semantic/contextual meaning from service descriptions, user reviews, API docs.Requires large, domain-specific corpora for effective fine-tuning
User Preference ModelingLearns nuanced user preferences from reviews and mashup queries, outperforming keyword-based models.Transformer models are computationally intensive during training and inference.
QoS Prediction AccuracyEnriches feature vectors with descriptive text, reducing RMSE/MAE in service quality prediction (e.g., throughput, response time).Gains may be marginal for some metrics or datasets (e.g., response time).
Service RecommendationImproves top-k ranking metrics (Precision@10, Recall, NDCG) via BERT embeddings of service name/description.BERT alone cannot model numerical QoS—needs hybridization with collaborative filtering or neural models.
Cold-Start Problem HandlinAlleviates matrix sparsity using textual side info; improves predictions when interaction data are sparse.Still limited if no descriptive data are available for new services/users.
Model InterpretabilitySome models improve interpretability with disentangled latent spaces or hybrid attention mechanisms.Most BERT-based models are still black boxes with limited explainability.
GeneralizationBERT can transfer to various tasks (e.g., classification, tagging, recommendation) with minimal modification.Pre-trained BERT may underperform if not adapted to domain-specific terminology.
Integration PotentialEasily combined with CNNs, LSTMs, or CF layers to form flexible architectures for QoS-aware recommendations.Increases design complexity and hyperparameter tuning overhead.
Table 14. Practical comparison between BERT-family encoders and GPT-style LLMs for QoS-aware service recommendation.
Table 14. Practical comparison between BERT-family encoders and GPT-style LLMs for QoS-aware service recommendation.
AspectBERT-Family Encoders (BERT, RoBERTa)GPT-Style LLMs (GPT-4/5, Llama-2/3, etc.)
Typical parameter scale~110–400 M parameters (base–large) (fine-tunable on a single GPU)Billions to >100 B parameters (often multi-GPU or hosted)
Typical inference latencyLow, milliseconds to low tens of milliseconds per sequence on commodity GPUs/CPUsHigher, tens to hundreds of milliseconds per call, often with API overhead
Deployment modelFrequently deployed on premise or private cloud; easy to containerize and co-locate with QoS enginesCommonly accessed via external cloud APIs; full on-prem deployment is costly and complex
Primary roleEncoder: produces fixed-length embeddings for services/users; integrates with CF/MLP QoS predictorsGenerator/reasoning agent: produces text, explanations, or decisions from prompts
Table 15. Summary of direction in academic settings and industries.
Table 15. Summary of direction in academic settings and industries.
Key DirectionAcademic FocusIndustry/Application Trends
Domain/Transfer Learning [45]Adapting BERT to new domains via fine-tuning only parts of the model or adding adapter layers; multi-task contrastive pretraining for cross-domain recommendation.Cross-domain recommenders using light models, for example, DistilBERT with simple fusion; industry emphasis on reusable models across product lines.
Multi-Modal Fusion Combining text with images/audio, example, using CLIP for vision, BERT for text, and fusing them in a joint model.Multimedia recommendation engines that ingest captions, reviews, and images; multi-modal BERT variants in practice.
Interpretability and HybridHybrid models combining CF/GCN and BERT semantic signals, for example, BeLightRec, disentangled representations, example WSR-DRL; LLMs generating natural explanations.Use of review snippets or keywords to explain recommendations; commercial explainable AI platforms example, Watson, X.ai.
Knowledge and Context [46,47]Enhancing recommendation systems with LLM-generated summaries; using knowledge graphs for improved semantic matching.Use of knowledge graph APIs, for example, Neo4j combined with BERT features for recommendation. Search engines (Google, Bing) already fuse BERT with KG for richer results.
Prompt-based LearningUsing pre-trained BERT with minimal tuning via prompts, for example, cloze tasks; soft prompt tuning for personalization.On-demand recommendation via LLM APIs; prompt-based language modeling replacing full retraining.
Model EfficiencyDistillation, quantization, pruning of BERT for low-latency applications.Use of DistilBERT, MobileBERT in production; ONNX/TensorRT deployment; edge recommendation systems.
Tools and Platforms [48,49]Libraries like RecBole, Hugging Face Transformers; benchmarking transformer-based recommendation.One-click deployment with AWS SageMaker + Hugging Face; BigQuery ML and TensorFlow Hub integrations.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Mahanra Rao, V.; Ramasamy, R.K.; Sayeed, M.S. BERT-Based Approaches for Web Service Selection and Recommendation: A Systematic Review with a Focus on QoS Prediction. Future Internet 2025, 17, 543. https://doi.org/10.3390/fi17120543

AMA Style

Mahanra Rao V, Ramasamy RK, Sayeed MS. BERT-Based Approaches for Web Service Selection and Recommendation: A Systematic Review with a Focus on QoS Prediction. Future Internet. 2025; 17(12):543. https://doi.org/10.3390/fi17120543

Chicago/Turabian Style

Mahanra Rao, Vijayalakshmi, R Kanesaraj Ramasamy, and Md Shohel Sayeed. 2025. "BERT-Based Approaches for Web Service Selection and Recommendation: A Systematic Review with a Focus on QoS Prediction" Future Internet 17, no. 12: 543. https://doi.org/10.3390/fi17120543

APA Style

Mahanra Rao, V., Ramasamy, R. K., & Sayeed, M. S. (2025). BERT-Based Approaches for Web Service Selection and Recommendation: A Systematic Review with a Focus on QoS Prediction. Future Internet, 17(12), 543. https://doi.org/10.3390/fi17120543

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop