BERT-Based Approaches for Web Service Selection and Recommendation: A Systematic Review with a Focus on QoS Prediction

Vijayalakshmi Mahanra Rao; R Kanesaraj Ramasamy; Md Shohel Sayeed

doi:10.3390/fi17120543

,

and

¹

Faculty of Information Science & Technology, Multimedia University, Melaka 75450, Malaysia

²

Faculty of Computing and Informatics, Multimedia University, Cyberjaya 63000, Malaysia

^*

Author to whom correspondence should be addressed.

Future Internet2025, 17(12), 543;https://doi.org/10.3390/fi17120543

Version Notes

Order Reprints

Abstract

Effective web service selection and recommendation are critical for ensuring high-quality performance in distributed and service-oriented systems. Recent research has increasingly explored the use of BERT (Bidirectional Encoder Representations from Transformers) to enhance semantic understanding of service descriptions, user requirements, and Quality of Service (QoS) prediction. This systematic review examines the application of BERT-based models in QoS-aware web service selection and recommendation. A structured database search was conducted across IEEE, ACM, ScienceDirect, and Google Scholar covering studies published between 2020 and 2024, resulting in twenty-five eligible articles based on predefined inclusion criteria and PRISMA screening. The review shows that BERT improves semantic representation and mitigates cold-start and sparsity issues, contributing to better service ranking and QoS prediction accuracy. However, challenges persist, including limited availability of benchmark datasets, high computational overhead, and limited interpretability of model decisions. The review identifies five key research gaps and outlines future directions, including domain-specific pre-training, hybrid semantic–numerical models, multi-modal QoS reasoning, and lightweight transformer architectures for deployment in dynamic and resource-constrained environments. These findings highlight the potential of BERT to support more intelligent, adaptive, and scalable web service management.

Keywords:

BERT; web service recommendation; QoS prediction; semantic representation; service selection; large language models; transformer architectures

1. Introduction

Web services have significantly contributed to the ever-growing landscape of the Internet and the digital economy. Web services are designed for online interoperability; web services enable dynamic, cross-platform interactions that enhance both functionality and accessibility. The ability to select and recommend web services based on Quality of Service (QoS) attributes is very important to ensure that the user requirements are met effectively, offering optimal response times, reliability, and overall service quality. QoS encompasses a variety of attributes which indicate how well a service meets the user’s expectations and requirements. Some of the top attributes of QoS are reliability, response time, availability, throughput, and security.

1.1. Importance of QoS-Based Selection

The main goal of web service selection is to improve user satisfaction. Employing a QoS-based selection method helps users choose services that satisfy or align with their needs, hence enhancing the overall experience. This is because web services can differ significantly in their reliability and performance []. The operational efficiency of software systems is enhanced by effective web service selection and recommendations. By selecting the services with optimal QoS metrics, software applications can minimize latency, reduce the processing times, and also improve throughput []. Additionally, the ability to dynamically select and adapt the web services based on the QoS data is essential due to the rapidly changing environment. This adaptability ensures that the applications can respond to fluctuations in service availability or performance [].

1.2. Challenges in Web Service Selection and Recommendation

Web services selection and recommendation based on QoS attributes are vital for the effectiveness of software systems. However, there are several challenges faced when it comes to implementing techniques or methods for web service selection and recommendation, especially QoS-based. Determining the attributes which are most relevant for specific applications can be challenging due to the variety and numerous metrics available to evaluate the service quality. The lack of standardization across services makes it difficult to compare the metrics consistently and effectively. QoS data have to be of high quality, of real time, and also consistent for it to have an accurate service selection. Network changes, system load, or outages can compromise the availability of reliable QoS data and this can lead to suboptimal service options []. As the number of available web services continues to grow, efficiently managing and analyzing large-scale datasets while maintaining high performance has become increasingly challenging []. Additionally, it is important to understand the correlations between various QoS attributes to ensure effective service selection. The dynamic nature of these attributes can complicate their interdependencies, which may lead to challenges in accurately predicting service performance [].

1.3. Why BERT for Web Service Selection?

BERT (Bidirectional Encoder Representations from Transformers) model, developed by Google in 2018, has significantly advanced the field of Natural Language Processing (NLP). This model has facilitated improvements in various NLP tasks by leveraging the transformer architecture’s capabilities. The model’s bidirectional training process allows it to understand the context of a word based on the surrounding words (left and right) rather than just the following or preceding texts. This approach led to more profound understanding of language nuances, meanings, and relationships. Some of the advantages of BERT in NLP tasks are described in Table 1.

Table 1. Advantages of BERT in NLP tasks.

BERT enables more accurate semantic alignment between user requirements and service capabilities by comprehensively analyzing natural language service descriptions and queries. By doing so, service retrieval efficiency can be improved. For web service recommendation, BERT can be leveraged to analyze user preferences and historical interactions with the web services through the enhanced recommendation systems. The recommendation method can provide more personalized service suggestions that closely align with the user needs []. BERT’s capabilities can facilitate effectiveness of QoS predictions by enabling models to analyze the service performance descriptions derived from the operational data and user reviews. By understanding these QoS attributes such as reliability, response time, and user sentiment, BERT can inform more accurate forecasting models.

There is a need to systematically classify the methodologies or techniques of applying BERT to QoS prediction to foster clearer understanding and further progress.

The conceptual framework shown in Figure 1 illustrates how the proposed BERT-based model supports QoS-aware web service recommendations. The process starts with collecting various types of web service data, such as descriptions, execution logs, and user reviews. These inputs are then processed through the BERT Semantic Layer, where the model performs embedding and domain-specific fine-tuning to understand the contextual meaning of the information. The resulting representations are used in the following three key components: the QoS Prediction Model, which estimates the quality of service parameters; the User Preference Modeling Layer, which captures and interprets user needs and behaviors; and the Service Retrieval and Ranking Layer, which identifies and orders the most relevant services. Together, these layers produce a QoS-aware recommendation that balances objective service quality with individual user preferences, ensuring more accurate and personalized results.

Figure 1. Conceptual framework of using a BERT-based semantic approach.

1.4. Motivation and Objective

This paper aims to review and understand different BERT methods associated with QoS-based web service selection and recommendation. The review is conducted over high-level referred journals, conferences, and online databases such as IEEE, ACM. These are papers from 2020 to 2024 written in English. The main contributions of the paper are as following:

Identifying the gaps in the literature such as how BERT can address unique challenges in web service selection and recommendation and QoS prediction or how it compares against the traditional models.
Assessing the performance of BERT models by understanding the strengths and its weaknesses and also when and how to the model is to be applied effectively in the web service alongside with QoS attributes, distinguishing when BERT provides significant gains and when it does not.
We identify five high impact research gaps such as QoS attribute neglect, dataset reproducibility, model interpretability, scalability, and cost awareness, and translate them into a structured future research work.

The paper is organized by sections—introduction, background, methodology, BERT applications in the context of web service selection and recommendation, challenges and limitations, comparative analysis, future directions for BERT, and lastly conclusion.

2. Background

2.1. Web Service Selection and Recommendation

A web service is a method for different applications and systems to communicate across the Internet with each other and are an important part of modern software systems. They consist of standards and protocols that enable the data exchange between the systems.

Figure 2 illustrates the flow of the web service invocation. User first interacts with a web service through a browser. The browser then sends the requests over the internet. The web server processes these requests which may involve database, which then generates response back to user via the internet.

Figure 2. Web service process.

The web service selection and recommendation involve various steps for assessing the attributes and characteristics that ensure optimal choices for given selected tasks. The key steps are illustrated in Figure 3. The first phase depicts searching for available web services that meet the functional and non-functional requirements required by users. For second phase, once the potential services are identified, the QoS evaluation is performed to assess the service against the QoS attributes defined, for example, throughput, reliability, availability, and response time. This part helps to rank the services based on the user expectations or requirements and performance. Recommendation systems are used to suggest services to users based on the history, preferences, and the QoS predictions. The recommended services can be compiled into a single unit that performs specific tasks. Finally, the whole process is ongoing or continuously monitoring and this is crucial as the QoS attributes need to be continually assessed to adapt to the requirements based on the dynamic nature of user needs or conditions. This contributes to maintaining user satisfaction over time.

Figure 3. Key steps of web service selection and recommendation.

2.2. Existing Techniques for Service Selection and Recommendation

Traditionally, there are three types of recommender system—collaborative filtering, content-based, and hybrid. Collaborative filtering uses ratings and preferences of other users to recommend services to a particular user. Content-based system collects all the data items into different item profiles based on their description or features. Hybrid systems, on the other hand, combine both collaborative and content-based techniques to provide more accurate recommendations.

Content-based filtering is good at recommending new or less popular items and is less affected by the cold-start problem, but it tends to produce repetitive recommendations with limited diversity. Collaborative filtering, on the other hand, effectively captures user preferences, handles various item types, and adapts to changing user behavior over time. However, it struggles with the cold-start issue and data sparsity in large datasets. Hybrid methods combine the strengths of content-based and collaborative filtering, delivering more accurate and diverse recommendations. However, they are often more complex and resource-intensive, and their performance heavily depends on the quality of the underlying techniques. Choosing the right approach depends on factors like data availability, system scalability, and the need for personalization [,,].

3. Systematic Review Methodology

3.1. Research Questions

Table 2 lists the research questions and the motivations that are kept in mind when preparing this systematic review paper. This will provide a comprehensive and thorough review of BERT’s application in web service selection and recommendation as well as QoS predictions.

Table 2. Research Questions and Motivations.

3.2. Search Strategy and Selection

The search string and the sources from which the papers are identified are the main criteria in the search strategy. The search results are extracted in Research Information Systems (RIS) format and passed to Rayyan tool. The year range is only from year 2020 to 2024 for full text and metadata to keep the studies more relevant. These are taken only for English language studies.

Table 3 refers to the search strategies used according to each section or points.

Table 3. Search String for IEEE.

Table 4 refers to the inclusion and exclusion criteria for the studies. These criteria are required to further select the relevant paper from the initial pool based on the search string.

Table 4. Selection criteria for the studies.

From the defined research questions, the papers are evaluated and selected based on whether the objectives, the research motivations and methods, and the relevant information on the data, findings, results are clearly defined and supported.

3.3. Data Extraction

Information and data are extracted based on the relevant research questions defined in Table 2. Table 5 summarizes the same.

Table 5. Data point and its relevant RQ.

3.4. Quality Assessment

The final articles are assessed based on criteria as follows:

(1): Whether the objective is clearly defined;
(2): Whether the methods or techniques are elaborated;
(3): Whether the authors have provided their findings and results based on proper data analysis.

Figure 4 demonstrates the PRISMA flow on the article selections. The PRISMA flow diagram outlines the systematic review process, beginning with the identification of 163 records from four databases (ACM, IEEE, Science Direct, and Google Scholar), from which 15 records (12 duplicates and 3 others—due to articles not related to BERT) were removed before screening, leaving 148 records for title or abstract screening. During screening, 125 records were excluded, resulting in 27 reports sought for full-text retrieval, of which 2 were unavailable, leaving 25 reports for eligibility assessment. No further exclusions were made, and all 25 studies were ultimately included in the review, demonstrating a transparent and methodical approach to study selection in accordance with PRISMA guidelines.

Figure 4. PRISMA flowchart on article selection.

This review was conducted in accordance with the PRISMA 2020 (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. The full PRISMA checklist and flow diagram are provided in the Supplementary Materials.

4. Synthesis of the Literature

Literature matrix table is derived based on the review performed on the final selected studies as demonstrated in Table 6.

Table 6. Literature review matrix—BERT applications.

Table 7 represents dataset information used in the papers selected for systematic review.

Table 7. Literature review matrix—dataset information.

From the dataset information derived, only 16% (4/25) papers use publicly available datasets. The following are the publicly available datasets:

(1): WS-DREAM (Liu et al.)—5825 services, 339 users
(2): ProgrammableWeb (Meghazi et al.)—8400+ services
(3): Stack Overflow (Alsayed et al.)—API documentation corpus
(4): FullTextPeerRead (Jeong)—Citation dataset
(5): WSDream (Liu et al.)—QoS prediction dataset

The findings indicate that 84% of the studies are difficult to reproduce due to the unavailability of public datasets. They have limited ability to compare methods across studies; hence, there is a need for standardized benchmark datasets in the field.

Table 8 demonstrates the quantitative performance across studies. These are papers with quantitative metrics—17/25 (68%)

Table 8. Literature review matrix—performance metrics.

From the performance metrics derived, the critical gap identified is the inconsistent metric reporting—which hampers meta-analysis. The “T” represents the throughput, and “RT” represents Response time. Following summary refers to the following:

(1): RMSE (Root Mean Square Error): 47.1% of evaluations (8/17)
(2): MAE (Mean Absolute Error): 23.5% of evaluations (4/17)
(3): Precision: 23.5% of evaluations (4/17)
(4): NDCG (Normalized Discounted Cumulative Gain): 11.8% of evaluations (2/17)
(5): Accuracy: 52.9% of evaluations (9/17)

It is impossible to determine “best” method definitively as the publications are biased towards positive results. As a recommendation, a standardized evaluation protocol can be adopted, for example, always including confidence intervals and statistical significance tests.

From these, we can infer the following key findings:

(1): Best Classification Performance: Meghazi et al.—DeepLAB-WSC
(2): Best QoS Prediction Performance: Liu et al.—llmQoS
(3): Best Semantic Matching: Alam et al.—BERT Variants
(4): Best Citation Performance: Jeong—BERT+GCN
(5): Best Efficiency: Zeng et al.—Lightweight BERT

Table 9 lists QoS attributes coverage across all 25 papers (research gap matrix). A total of 60% of QoS attributes have zero or minimal BERT research.

Table 9. Literature review matrix—research gap matrix.

The comprehensive analysis of all twenty-five papers reveals that BERT demonstrates consistent improvements, and 40% adoption rate shows growing interest. Coverage level is denoted by dividing the number of papers against twenty-five reviewed papers. There are diverse applications of BERT across QoS, recommendation, and classification.

Figure 5 represents graphical meta-summary of BERT based QoS research from 2020 to 2024. The figure on the left displays the publication trend showing a steady increase in studies using BERT for web service selection and QoS prediction. The graph on right displays the frequency of QoS attributes analyzed across reviewed works, highlighting that response time and availability dominate current research focus.

Figure 5. Graphical meta-summary of BERT—based QoS research (2020–2024).

RMSE and MAE values represent aggregated averages derived from reported results across twenty-five studies (2020–2024). They illustrate the relative performance trend among traditional, BERT-based, and hybrid models in QoS prediction tasks. This is represented in Figure 6. Left figure displays aggregated comparison of RMSE and MAE across traditional and BERT-based models. The right figure displays forest plot showing mean RMSE values with standard deviation whiskers and a summary diamond representing the pooled mean performance across all models. The vertical dotted line denotes the overall mean RMSE across all models.

Figure 6. Quantitative summary of model performance for QoS prediction.

4.1. Application of BERT on Web Service Selection and Recommendation in the Context of QoS

BERT has significantly enhanced the performance of the web service selection and recommendation, primarily by focusing on improving the understanding of service descriptions, classification accuracy, and ultimately helping in better selection and recommendation outcomes.

BERT’s bidirectional encoding allows each token in a service description to attend to both left and right context, which is important for distinguishing subtle functional and non-functional details in the API texts and for accurately aligning user requirements with service capabilities. Pre-training with MLM (Masked Language Modeling) on large corpora yields robust contextual embeddings that transfer well to relatively small QoS datasets, allowing fine-tuned models such as QoSBERT or ServiceBERT to achieve strong QoS prediction performance without training from scratch. BERT’s subword tokenization handles out of vocabulary identifiers commonly found in API names, parameter labels, and QoS attributes, for example, RESTfulAPI, and RTT_ms, enabling robust encoding of heterogenous service documentation.

In the study by author [], BERT was integrated together with Deep Pyramid Convolutional Neural Network (DPCNN) to capture both local and global contextual information in service descriptions. By doing so, it improves the service classification metrics, which is important in building QoS-aware recommendation systems. The study highlights how classifications based on web service descriptions can discriminate among services, especially in environments where services are functionally similar but differ in QoS parameters.

In the study by author, they proposed llmQoS (Large Language Model Aided QoS), which encodes descriptive attributes of users and services with a pre-trained language model (RoBERTa) and combines these embeddings with a collaborative-filtering prediction network. The model begins by converting user and service attributes into natural language sentences which are then processed by pre-trained LLMs (RoBERTa or Phi3mini). By extracting semantic features from textual attributes using LLMs, the model overcomes the data sparsity problem inherent in collaborative filtering. Even for new users or services with no historical interactions, the LLM features provide meaningful representations based on their descriptive attributes. llmQoS generalizes the BERT-style embedding idea by using a pre-trained encoder-only or decoder-only LLM to extract descriptive feature vectors from user and service attribute sentences, which play the same role as BERT embeddings in enriching CF-based QoS predictors under sparse data. Although llmQoS uses LLMs beyond vanilla BERT, it further validates the following underlying principle: descriptive text embeddings (obtained from LLM/BERT-style encoders) significantly alleviate sparsity and cold-start by providing content features independent of historical interactions. On the WS-DREAM dataset, llmQoS reduces MAE by >20% for throughput and >10% for response time compared to the best traditional baselines at low densities, confirming that rich textual embeddings are especially beneficial when QoS logs are sparse.

BERT has also been used to model user preferences and service semantics from text. One such study is from author [], where WSR-DRL (Web Service Recommendation model based on Disentangled Representation Learning) has been proposed, which is an interpretable web service recommender that makes use of BERT to encode each service’s name and a CNN–BiLSTM to encode its description. This method uses a fine-tuned BERT model that learns subtle meaning in service names and descriptions through its bidirectional multi-head attention. The CLS token is used to generate a compact representation of each service, which is then passed into a disentangled interaction module to support more accurate QoS-aware recommendations. It is noteworthy that BERT captures fine-grained service semantics, which uses a “service—name BERT” and deep text encoders to yield richer features than bag of words. Unlike black-box neural models, WSR-DRL’s disentangled factors are interpretable. Users can understand why a service is recommended. This transparency builds trust and helps users make informed decisions.

ServiceBERT employs domain-specific pre-training on a large corpus of web service and API descriptions, followed by multitask fine tuning for service ecosystem tasks. General purpose BERT misses service specific distinctions (for example, REST vs. SOAP, API composition patterns). ServiceBERT’s domain specific vocabulary and multitask pre training capture these nuances, significantly improving performance on service ecosystem tasks (15 to 25% improvement over vanilla BERT on tagging accuracy). ServiceBERT exploits BERT’s masked-language pre-training and subword tokenization to learn domain-specific embeddings of API names and documentation, using CLS pooling from a fine-tuned BERT-base encoder as the service representation for downstream QoS-aware tagging and recommendation.

ServeNet-BERT uses transfer learning from pre-trained BERT with a sophisticated dual pooling strategy for service classification. The dual pooling strategy ([CLS] + mean) captures both global semantic meaning and detailed token level information. This is effective for long-service descriptions where important details might be distributed throughout the text. Experiments show 8 to 12% accuracy improvement over single pooling strategies. Table 10 refers to other studies performed using BERT, apart from the twenty-five articles reviewed.

Table 10. Other studies of BERT applications.

Overall, BERT’s integration into the processes of web service selection and recommendation creates a robust framework capable of addressing the inherent complexities associated with QoS. As web services continue to proliferate, employing advanced techniques like BERT will be crucial in refining selection processes to meet user expectations effectively.

Table 11 is an extension of Table 10 to explore base model, input representation, fusion method, output layer, training strategy, and dataset used.

Table 11. Other studies of BERT applications (Extension of Table 10).

4.2. Preprocessing Pipelines from Structured Service Descriptions to BERT Inputs

Many studies extract semantically rich fields such as operation name, summary, description, tags, and parameter descriptions from OpenAPI/Swagger or WSDL, then concatenate them into a single text sequence per service. For ServeNet-BERT and WARBERT-style pipelines, the API name and free text description are concatenated, optionally with category labels or path segments, and fed to BERT as “[CLS] name [SEP] description … [SEP]. For older SOAP/WSDL services, studies typically strip the XML tags while preserving element names and documentation, normalize identifiers, removes schema artifacts, and then treats the resulting text as a short “documentation paragraph” for each operation before tokenization by BERT.

In pipeline-style architectures such as QoSBERT [] and ServiceBERT-like models, the preprocessed text is passed through a BERT encoder to obtain a pooled sentence embedding or a sequence of token embeddings for attention or CNN/RNN layers. Figure 7 illustrates the common pipeline adopted across the included studies, starting from WSDL/OpenAPI artifacts and ending at a BERT-based embedding integrated with a QoS prediction head. Table 12 summarizes, for the main BERT-based approaches, which WSDL/OpenAPI fields are used, the approximate resulting text length, and the chosen encoder variant and pooling strategy. This highlights a common pattern: semantically rich name or description fields are flattened into short sequences (40–150 tokens) and encoded by BERT-family models using either CLS pooling or mean pooling.

Figure 7. Typical preprocessing and modeling pipeline for using structured service descriptions in BERT-based QoS prediction.

Table 12. Summary of service specification types, extracted fields, text length, and encoder configurations used across BERT-based web service selection and QoS prediction models.

4.3. Advantages and Limitations of Using BERT Models for Web Service Selection and Recommendation

BERT demonstrates significant potential in enhancing both the efficacy and efficiency in web service selection and recommendation processes. However, this comes with both advantages and limitations accordingly. Table 13 demonstrates further the advantages and limitations of BERT.

Table 13. Advantages and limitations of BERT.

BERT’s contextual embeddings capture semantic nuances of service text and user feedback that are lost in simple keyword models. This enriches latent features for collaborative filtering and content matching. In QoS prediction, BERT features of user/service attributes help overcome cold-start sparsity, markedly reducing RMSE. In recommendation, BERT makes it easier to match mashup requirements or user preferences with API descriptions. Pre-training on domain data means the model “knows” web service jargon. Experiments consistently show that multi-embedding BERT models yield double-digit percentage gains in standard metrics, for example, RMSE, precision, recall.

BERT models are computationally intensive and require careful fine-tuning on domain-specific data to achieve optimal performance. Training ServiceBERT from scratch required assembling thousands of Web API documents. Even fine-tuning a BERT adds computation overhead at inference. In some cases, for response-time QoS prediction, llmQoS improved RMSE by only a few percent in contrast to larger throughput gains. Moreover, transformer models can be opaque. The disentangled learning in WSR-DRL adds interpretability, but most BERT-based methods remain black boxes. Finally, purely text-based BERT models ignore non-textual QoS signals, so they must be combined with numeric features.

The advantage of BERT-based content features at low density stems from the fact that they approximate similarity in a semantic space rather than relying solely on historical co-occurrence in the QoS matrix. By encoding WSDL or OpenAPI descriptions, API names, and other textual metadata into dense vectors, BERT captures functional and non-functional characteristics (e.g., domain, interface style, expected load, latency hints) that are shared by services, even if those services have never been invoked by the same users. When these embeddings are combined with sparse interaction data in a CF or MLP head, the model can generalize from semantically similar services or users, thereby mitigating sparsity and cold-start effects and yielding larger relative error reductions at 5–10% density than at 20% or higher.

In the Large Language Model (LLM) era, BERT-based encoders remain highly competitive for QoS-aware web service recommendation when cost, latency, or data-governance constraints are binding. Their moderate parameter scale and low inference latency make them suitable for high-throughput QoS prediction scenarios, where thousands of users–service pairs must be scored in real time. Moreover, BERT-family models such as BERT and RoBERTa can be fine-tuned and deployed entirely on premise, allowing organizations to retain full control over sensitive service descriptions and QoS logs, whereas GPT-class LLMs are typically accessed via external APIs with higher computational and governance overhead. Table 14 represents the comparison between BERT and GPT-style LLMs.

Table 14. Practical comparison between BERT-family encoders and GPT-style LLMs for QoS-aware service recommendation.

Overall, recent research indicates that BERT-style embeddings enhance web service selection and recommendation by bringing in rich textual knowledge. When integrated carefully (often in hybrid models), they lead to more accurate QoS predictions and more relevant ranked lists of services. However, effectiveness depends on having sufficient textual data (service descriptions, reviews, API docs) and computational resources for training.

4.4. Future Directions of Using BERT in the Context of Web Service Selection and Recommendation as Well as QoS Predictions

From the review, academically, research is moving towards adopting BERT models to domain-specific tasks, fusing multiple data modalities, and combining BERT with other models to overcome challenges of sparse or heterogeneous data. In one study [], the author has addressed BERT’s domain bias by adding a lightweight transformation layer and contrastive objectives to align item representations across domains. Similarly, in another study [], the author integrated BERT’s semantic embeddings of item descriptions with collaborative filtering signals graph-based hybrid models, called BeLightRec.

In QoS prediction, large language models (LLMs) are used to extract latent features from service and user descriptions for which the author showed that combining BERT-like sentence embeddings with collaborative filtering (CF) dramatically improves QoS estimates, cutting prediction error by over 10 to 20% compared to CF alone.

Table 15 summarizes the academic focus as well as the industry or the current application trends.

Table 15. Summary of direction in academic settings and industries.

5. Conclusions

This review provides a detailed overview of how BERT and its variants are advancing research in web service selection, recommendation, and Quality of Service (QoS) prediction. The evidence from recent studies shows that BERT-based models generally achieve better performance than traditional and earlier deep learning approaches. By learning contextual relationships within service descriptions and user requirements, BERT helps bridge the gap between user intent and service functionality, leading to more accurate and adaptive service selection processes.

At the same time, several challenges remain. The computational demands of transformer models, the shortage of domain specific training data, and the absence of standardized benchmarks continue to limit broader adoption. Another concern is the limited interpretability of BERT-based systems, which makes it difficult to fully understand how decisions are made. Addressing these issues is essential for making such models practical in large-scale and real-time service environments.

Research is moving toward developing lighter, more transparent, and more efficient BERT variants that can operate effectively in diverse and resource-constrained settings. Promising directions include combining BERT with graph neural networks, reinforcement learning, and knowledge graphs to improve contextual reasoning and scalability. Advances in model compression, distillation, and prompt-based learning can also help reduce complexity while maintaining performance.

To move the field forward, we highlight three research priorities, that is, firstly to have standardized QoS benchmarks, interpretable and efficient BERT variants, and multi-modal QoS reasoning. In summary, BERT continues to play a central role in shaping intelligent, context-aware, and QoS-driven web service ecosystems. Its ability to understand and represent meaning-rich text positions it as a cornerstone for the next generation of adaptive and trustworthy service management systems that better align with user expectations and operational goals.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/fi17120543/s1, File S1: PRISMA 2020 Main Checklist. The full PRISMA checklist and flow diagram can be downloaded at: https://doi.org/10.17605/OSF.IO/RKZBJ.

Author Contributions

Conceptualization, V.M.R. and R.K.R.; Methodology, V.M.R. and R.K.R.; Software, V.M.R.; Validation, V.M.R., R.K.R., and M.S.S.; Formal analysis, V.M.R.; Writing—original draft preparation, R.K.R. and M.S.S.; Writing—review and editing, R.K.R. and M.S.S.; Supervision, R.K.R.; Project administration, R.K.R. All authors have read and agreed to the published version of the manuscript.

Funding

The APC was funded by Multimedia University.

Data Availability Statement

The original contributions presented in this study are included in the article/Supplementary Materials. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Hasnain, M.; Pasha, M.F.; Ghani, I.; Mehboob, B.; Imran, M.; Ali, A. Benchmark dataset selection of web services technologies: A factor analysis. IEEE Access 2020, 8, 53649–53665. [Google Scholar] [CrossRef]
Sun, X.; Wang, S.; Xia, Y.; Zheng, W. Predictive-trend-aware composition of web services with time-varying quality-of-service. IEEE Access 2020, 8, 1910–1921. [Google Scholar] [CrossRef]
Yuan, Y.; Guo, Y.; Ma, W. Dynamic service composition method based on zero-sum game integrated inverse reinforcement learning. IEEE Access 2023, 11, 111897–111908. [Google Scholar] [CrossRef]
Rajendran, V.; Ramasamy, R.K.; Mohd-Isa, W.N. Improved eagle strategy algorithm for dynamic web service composition in the IoT: A conceptual approach. Future Internet 2022, 14, 56. [Google Scholar] [CrossRef]
Liu, Q.; Wang, L.; Du, S.; Wyk, B.J.V. A method to enhance web service clustering by integrating label-enhanced functional semantics and service collaboration. IEEE Access 2024, 12, 61301–61311. [Google Scholar] [CrossRef]
Bonab, M.N.; Tanha, J.; Masdari, M. A semi-supervised learning approach to quality-based web service classification. IEEE Access 2024, 12, 50489–50503. [Google Scholar] [CrossRef]
Kowsher, M.; Sami, A.A.; Prottasha, N.J.; Arefin, M.S.; Dhar, P.K.; Koshiba, T. Bangla-bert: Transformer-based efficient model for transfer learning and language understanding. IEEE Access 2022, 10, 91855–91870. [Google Scholar] [CrossRef]
Kim, M.; Lee, S.; Oh, Y.; Choi, H.; Kim, W. A near-real-time answer discovery for open-domain with unanswerable questions from the web. IEEE Access 2020, 8, 158346–158355. [Google Scholar] [CrossRef]
Zhang, C.; Qin, S.; Wu, H.; Zhang, L. Cooperative mashup embedding leveraging knowledge graph for web api recommendation. IEEE Access 2024, 12, 49708–49719. [Google Scholar] [CrossRef]
Ramasamy, R.K.; Chua, F.F.; Haw, S.C.; Ho, C.K. WSFeIn: A novel, dynamic web service composition adapter for cloud-based mobile application. Sustainability 2022, 14, 13946. [Google Scholar] [CrossRef]
Roy, D.; Dutta, M. A systematic review and research perspective on recommender systems. J. Big Data 2022, 9, 59. [Google Scholar] [CrossRef]
Ghafouri, S.H.; Hashemi, S.M.; Hung, P.C. A survey on web service QoS prediction methods. IEEE Trans. Serv. Comput. 2020, 15, 2439–2454. [Google Scholar] [CrossRef]
Kumar, S.; Chattopadhyay, S.; Adak, C. TPMCF: Temporal QoS Prediction Using Multi-Source Collaborative Features. IEEE Trans. Netw. Serv. Manag. 2024, 21, 3945–3955. [Google Scholar] [CrossRef]
Liu, H.; Zhang, Z.; Li, H.; Wu, Q.; Zhang, Y. Large Language Model Aided QoS Prediction for Service Recommendation. arXiv 2024. [Google Scholar] [CrossRef]
Atzeni, D.; Bacciu, D.; Mazzei, D.; Prencipe, G. A systematic review of Wi-Fi and machine learning integration with topic modeling techniques. Sensors 2022, 22, 4925. [Google Scholar] [CrossRef] [PubMed]
Xu, Z.; Gu, Y.; Yao, D. WARBERT: A Hierarchical BERT-based Model for Web API Recommendation. arXiv 2025. [Google Scholar] [CrossRef]
Li, M.; Xu, H.; Tu, Z.; Su, T.; Xu, X.; Wang, Z. A deep learning based personalized QoE/QoS correlation model for composite services. In Proceedings of the 2022 IEEE International Conference on Web Services (ICWS), Barcelona, Spain, 10–16 July 2022; IEEE: New York, NY, USA, 2022; pp. 312–321. [Google Scholar]
Long, S.; Tan, J.; Mao, B.; Tang, F.; Li, Y.; Zhao, M.; Kato, N. A Survey on Intelligent Network Operations and Performance Optimization Based on Large Language Models. IEEE Commun. Surv. Tutor. 2025. [Google Scholar] [CrossRef]
Koudouridis, G.P.; Shalmashi, S.; Moosavi, R. An evaluation survey of knowledge-based approaches in telecommunication applications. Telecom 2024, 5, 98–121. [Google Scholar] [CrossRef]
Alsayed, A.S.; Dam, H.K.; Nguyen, C. MicroRec: Leveraging Large Language Models for Microservice Recommendation. In MSR ‘24: Proceedings of the 21st International Conference on Mining Software Repositories, Lisbon, Portugal, 15–16 April 2024; Association for Computing Machinery: New York, NY, USA, 2024; pp. 419–430. [Google Scholar] [CrossRef]
Liu, H.; Zhang, W.; Zhang, X.; Cao, Z.; Tian, R. Context-aware and QoS prediction-based cross-domain microservice instance discovery. In Proceedings of the 2022 IEEE 13th International Conference on Software Engineering and Service Science (ICSESS), Beijing, China, 21–23 October 2022; IEEE: New York, NY, USA, 2022; pp. 30–34. [Google Scholar]
Meghazi, H.M.; Mostefaoui, S.A.; Maaskri, M.; Aklouf, Y. Deep Learning-Based Text Classification to Improve Web Service Discovery. Comput. Y Sist. 2024, 28, 529–542. [Google Scholar] [CrossRef]
Zeng, K.; Paik, I. Dynamic service recommendation using lightweight BERT-based service embedding in edge computing. In Proceedings of the 2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC), Singapore, 20–23 December 2021; IEEE: New York, NY, USA, 2021; pp. 182–189. [Google Scholar]
Zhang, P.; Ren, J.; Huang, W.; Chen, Y.; Zhao, Q.; Zhu, H. A deep-learning model for service QoS prediction based on feature mapping and inference. IEEE Trans. Serv. Comput. 2023, 17, 1311–1325. [Google Scholar] [CrossRef]
Alam, K.A.; Haroon, M. Evaluating Fine-tuned BERT-based Language Models for Web API Recommendation. In Proceedings of the 2024 IEEE International Conference on Cloud Computing Technology and Science (CloudCom), Abu Dhabi, United Arab Emirates, 9–11 December 2024; IEEE: New York, NY, USA, 2024; pp. 135–142. [Google Scholar]
Karapantelakis, A.; Alizadeh, P.; Alabassi, A.; Dey, K.; Nikou, A. Generative AI in mobile networks: A survey. Ann. Telecommun. 2024, 79, 15–33. [Google Scholar] [CrossRef]
Bhanage, D.A.; Pawar, A.V.; Kotecha, K. IT infrastructure anomaly detection and failure handling: A systematic literature review focusing on datasets, log preprocessing, machine & deep learning approaches and automated tool. IEEE Access 2021, 9, 156392–156421. [Google Scholar] [CrossRef]
Qu, G.; Chen, Q.; Wei, W.; Lin, Z.; Chen, X.; Huang, K. Mobile edge intelligence for large language models: A contemporary survey. IEEE Commun. Surv. Tutor. 2025. [Google Scholar] [CrossRef]
Hameed, A.; Violos, J.; Santi, N.; Leivadeas, A.; Mitton, N. FeD-TST: Federated Temporal Sparse Transformers for QoS prediction in Dynamic IoT Networks. IEEE Trans. Netw. Serv. Manag. 2024, 22, 1055–1069. [Google Scholar] [CrossRef]
Huang, W.; Zhang, P.; Chen, Y.; Zhou, M.; Al-Turki, Y.; Abusorrah, A. QoS Prediction Model of Cloud Services Based on Deep Learning. IEEE/CAA J. Autom. Sin. 2022, 9, 564–566. [Google Scholar] [CrossRef]
Le, F.; Srivatsa, M.; Ganti, R.; Sekar, V. Rethinking data-driven networking with foundation models: Challenges and opportunities. In Proceedings of the 21st ACM Workshop on Hot Topics in Networks, Austin, TX, USA, 14–15 November 2022; pp. 188–197. [Google Scholar]
Jeong, C.; Jang, S.; Shin, H.; Park, E.; Choi, S. A Context-Aware Citation Recommendation Model with BERT and Graph Convolutional Networks. arXiv 2019. [Google Scholar] [CrossRef]
Liu, M.; Xu, H.; Sheng, Q.Z.; Wang, Z. QoSGNN: Boosting QoS Prediction Performance with Graph Neural Networks. IEEE Trans. Serv. Comput. 2023, 17, 645–658. [Google Scholar] [CrossRef]
Lian, H.; Li, J.; Wu, H.; Zhao, Y.; Zhang, L.; Wang, X. Toward Effective Personalized Service QoS Prediction From the Perspective of Multi-Task Learning. IEEE Trans. Netw. Serv. Manag. 2023, 20, 2587–2597. [Google Scholar] [CrossRef]
Jirsik, T.; Trčka, Š.; Celeda, P. Quality of service forecasting with LSTM neural network. In Proceedings of the 2019 IFIP/IEEE Symposium on Integrated Network and Service Management (IM), Washington, DC, USA, 8–12 April 2019; IEEE: New York, NY, USA, 2019; pp. 251–260. [Google Scholar]
Guo, C.; Zhang, W.; Dong, N.; Liu, Z.; Xiang, Y. QoS-aware diversified service selection. IEEE Trans. Serv. Comput. 2022, 16, 2085–2099. [Google Scholar] [CrossRef]
Boulakbech, M.; Messai, N.; Sam, Y.; Devogele, T. Deep learning model for personalized web service recommendations using attention mechanism. In Proceedings of the International Conference on Service-Oriented Computing, Rome, Italy, 28 November–1 December 2023; Springer Nature: Cham, Switzerland, 2023; pp. 19–33. [Google Scholar]
Xue, L.; Zhang, F. Lcpcwsc: A web service classification approach based on label confusion and priori correction. Int. J. Web Inf. Syst. 2024, 20, 213–228. [Google Scholar] [CrossRef]
Huang, Y.; Cao, Z.; Chen, S.; Zhang, X.; Wang, P.; Cao, Q. Interpretable web service recommendation based on disentangled representation learning. J. Intell. Fuzzy Syst. 2023, 45, 133–145. [Google Scholar] [CrossRef]
Wang, X.; Zhou, P.; Wang, Y.; Liu, X.; Liu, J.; Wu, H. Servicebert: A pre-trained model for web service tagging and recommendation. In Proceedings of the International Conference on Service-Oriented Computing, Online, 22–25 November 2021; Springer International Publishing: Cham, Switzerland, 2021; pp. 464–478. [Google Scholar]
Yang, Y.; Qamar, N.; Liu, P.; Grolinger, K.; Wang, W.; Li, Z.; Liao, Z. Servenet: A deep neural network for web services classification. In Proceedings of the 2020 IEEE International Conference on Web Services (ICWS), Beijing, China, 19–23 October 2020; pp. 168–175. [Google Scholar]
Wang, Z.; Zhang, X.; Li, Z.S.; Yan, M. QoSBERT: An Uncertainty-Aware Approach Based on Pre-trained Language Models for Service Quality Prediction. IEEE Trans. Serv. Comput. 2025, 1–13. [Google Scholar] [CrossRef]
Liu, P.; Zhang, L.; Gulla, J.A. Pre-train, prompt, and recommendation: A comprehensive survey of language modeling paradigm adaptations in recommender systems. Trans. Assoc. Comput. Linguist. 2023, 11, 1553–1571. [Google Scholar] [CrossRef]
Van, M.M.; Tran, T.T. BeLightRec: A Lightweight Recommender System Enhanced with BERT. In Proceedings of the International Conference on Intelligent Systems and Data Science, Nha Trang, Vietnam, 9–10 November 2024; Springer Nature: Singapore, 2024; pp. 30–43. [Google Scholar]
Kharidia, V.; Paprunia, D.; Kanikar, P. LightFusionRec: Lightweight Transformers-Based Cross-Domain Recommendation Model. In Proceedings of the 2024 First International Conference on Software, Systems and Information Technology (SSITCON), Tumkur, India, 18–19 October 2024; IEEE: New York, NY, USA, 2024; pp. 1–7. [Google Scholar]
Liu, Q.; Zhao, X.; Wang, Y.; Wang, Y.; Zhang, Z.; Sun, Y.; Li, X.; Wang, M.; Jia, P.; Chen, C.; et al. Large language model enhanced recommender systems: Taxonomy, trend, application and future. arXiv 2024, arXiv:2412.13432. [Google Scholar] [CrossRef]
Singh, S. BERT Algorithm used in Google Search. Math. Stat. Eng. Appl. 2021, 70, 1641–1650. [Google Scholar] [CrossRef]
Sun, F.; Liu, J.; Wu, J.; Pei, C.; Lin, X.; Ou, W.; Jiang, P. BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–7 November 2019; pp. 1441–1450. [Google Scholar]
Fine-Tune and Host Hugging Face Bert Models on Amazon Sagemaker|AWS Machine Learning Blog. Available online: https://aws.amazon.com/blogs/machine-learning/fine-tune-and-host-hugging-face-bert-models-on-amazon-sagemaker/ (accessed on 23 October 2025).

Figure 1. Conceptual framework of using a BERT-based semantic approach.

Figure 2. Web service process.

Figure 3. Key steps of web service selection and recommendation.

Figure 4. PRISMA flowchart on article selection.

Figure 5. Graphical meta-summary of BERT—based QoS research (2020–2024).

Figure 6. Quantitative summary of model performance for QoS prediction.

Figure 7. Typical preprocessing and modeling pipeline for using structured service descriptions in BERT-based QoS prediction.

Table 1. Advantages of BERT in NLP tasks.

Advantage	Description
Contextual Understanding	The nature of bidirectional process allows it to develop contextual understanding of words. For example, the word “bank” can be distinguished with different meanings depending on the context of its usage [].
Transfer Learning	The model can be fine-tuned for specific tasks with small or limited datasets due to pre-training on large amounts of text data. This helps in improving performance across different applications without the need to train the model from the beginning [].
Performance on Benchmarks	Previously, BERT consistently outperformed older models based on wide range of NLP benchmarks such as SQuAD (Stanford Question Answering Dataset), thus demonstrating the robustness and adaptability of the model.

Table 2. Research Questions and Motivations.

Research Questions (RQ)	Motivations
(RQ1) How has BERT been applied to web service selection and recommendation in the context of QoS?	To provide insight into how BERT is utilized for web service selection and recommendation in context of QoS. Important to evaluate the effectiveness of BERT’s model.
(RQ2) What are the advantages and limitations of using BERT models for web service selection and recommendation?	To critically assess the advantages and its limitations in BERT application, especially in the context of QoS.
(RQ3) How does BERT compare to the traditional method?	Comparing with other traditional methods, such as collaborative filtering, content-based, and rule-based systems. Helps decide how and when to apply the BERT methods in service selection and recommendation
(RQ4) What are the challenges faced when using BERT models for tasks such as web selection, recommendation, and QoS predictions?	To explore domain-specific issues
(RQ5) What are the future directions of using BERT in the context of web service selection and recommendation as well as QoS predictions?	Helps to identify the gaps and opportunities in using BERT’s capabilities

Table 3. Search String for IEEE.

Digital Library	Search String
IEEE	((“Full Text & Metadata”:“BERT” OR “Full Text & Metadata”:“Bidirectional Encoder Representations”)) AND ((“Full Text & Metadata”:“web service recommendation” OR “Full Text & Metadata”:“Service selection”)) AND ((“Full Text & Metadata”:“QoS” OR “Quality of Service”)) AND ((“Full Text & Metadata”:“prediction” OR “optimization”))
ACM	[[All: “bert”] OR [All: “bidirectional encoder representations”]] AND [[All: “web service recommendation”] OR [All: “service selection”]] AND [[All: “qos”] OR [All: “quality of service”]] AND [[All: “prediction”] OR [All: “optimization”]]
Science Direct	(“BERT” OR “Bidirectional Encoder Representations”) AND (“web service recommendation” OR “Service selection”) AND (“QoS” OR “Quality of Service”)
Google Scholar	web service recommendation and qos prediction using bert

Table 4. Selection criteria for the studies.

Inclusion	Exclusion
Papers are published from 2020 to 2024 Papers are available in full text Papers are written in the English language BERT methods are clearly defined in the paper	Papers published in journals other than IEEE BERT methods not employed within the context of web service selection Papers not in the English language Books, standards, magazines, and early-access articles

Table 5. Data point and its relevant RQ.

Data Point	Description	Research Question Relevant
BERT Methods or techniques	Methods and techniques used in the article	RQ1
Advantages	Advantages identified in the articles	RQ2
Techniques for web service selections and recommendation	Techniques or methods used in the traditional service selection and recommendation	RQ3
Challenges or Limitations	Challenges and limitations of BERT	RQ4
Future directions	Future research ideas or techniques that can be considered for BERT	RQ5

Table 6. Literature review matrix—BERT applications.

Study	Year	Model Type	Findings
Kumar, S. et al. []	2024	Transformer, Deep Learning, Graph Neural Networks	Integrates graph convolution and collaborative filtering for temporal QoS prediction and addresses data sparsity and temporal dependencies in QoS prediction
Liu, H. et al. []	2024	Large Language Models (LLMs), NLP Transformer Models	Introduces llmQoS model using LLMs (RoBERTa, Phi3mini) for QoS prediction. The proposed method overcomes data sparsity issue without relying only on historical interactions.
Atzeni et al. []	2022	Machine Learning	Reviews the integration of Wi-Fi with machine learning and topic modeling.
Xu, Z. et al. []	2024	Deep Learning	Uses pre-trained BERT for semantic understanding of API descriptions
Li et al. []	2022	Deep Learning	Presents a deep learning model for personalized QoE/QoS correlation in composite services.
Long et al. []	2025	Large Language Models (LLMs)	Surveys the use of LLMs for intelligent network operations.
Koudouridis et al. []	2024	Knowledge-Based Approaches	Surveys knowledge-based approaches in telecommunications.
Alsayed, A.S. et al. []	2024	Large Language Models (LLMs), Deep Learning	Provides context-aware recommendations for microservice discovery
Liu et al. []	2022	Context-Aware, QoS Prediction	Proposes a context-aware and QoS prediction-based method for microservice instance discovery.
Meghazi, H.M. et al. []	2024	Natural Language Processing (NLP), Deep Learning	Proposes DeepLAB-WSC using word embeddings (Word2Vec, GloVe, BERT) which outperforms state-of-the-art web service classification methods.
Zeng et al. []	2021	Lightweight BERT	Proposes a lightweight BERT-based method for dynamic service recommendation in edge computing.
Zhang, P. et al. []	2024	Deep Learning, Neural Networks	Deep learning model with feature mapping for QoS prediction which addresses the challenges in service quality prediction.
Alam et al. []	2024	Fine-tuned BERT	Evaluates fine-tuned BERT models for recommending Web APIs, focusing on semantic enrichment.
Karapantelakis et al. []	2024	Generative AI (Survey)	Surveys the application of generative AI in mobile networks.
Bhanage et al. []	2021	Machine Learning, Deep Learning (Review)	Reviews ML/DL techniques for anomaly detection and failure handling in IT infrastructure.
Qu et al. []	2025	Large Language Models (LLMs)	Surveys the use of LLMs in mobile edge intelligence to enhance performance.
Hameed, A. et al. []	2024	Deep Learning	Combines federated learning with sparse transformer architecture and preserves privacy while enabling collaborative QoS prediction.
Huang, W.J. et al. []	2022	Deep Neural Networks, Cloud Service Modeling	Presents deep learning-based QoS prediction model for cloud services and addresses QoS prediction challenges
Le et al. []	2022	Foundation Models	Explores the potential of foundation models for network traffic analysis and management.
Jeong, C. []	2019	Graph Convolutional Networks (GCN)	Proposed method that combines BERT with Graph Convolutional Networks (GCN).
Liu, M. et al. []	2023	Deep Learning	Proposed method for QoS prediction using Graph Neural Networks.
Lian, H. et al. []	2023	Deep Learning	Proposed PMT (Personalized Multi-Task) learning framework; a multi-task approach for improved prediction accuracy
Jirsik, T. et al. []	2019	Deep Learning	Proposed Long Short-Term Memory (LSTM) neural networks for QoS attribute forecasting
Guo, C. et al. []	2022	Deep Learning	Proposed service distance-based attention mechanism that embeds users in the model for QoS-aware selection.
Boulakbech, M. et al. []	2023	Deep Learning, Multi-modal Learning	There are two attention mechanisms proposed, namely, functional (tag-based) and non-functional (QoS-based). This improves recommendation quality through dual attention.

Table 7. Literature review matrix—dataset information.

Study	Dataset	Size (Services)	Domain	QoS Attributes	Availability
Kumar et al. []—TPMCF	WSDREAM-2	Temporal data	Temporal QoS Prediction	Temporal QoS metrics	Research dataset
Liu et al. []—llmQoS	WS-DREAM	5825	Web Services QoS	Throughput, Response Time	Publicly available
Atzeni et al. []	Wi-Fi datasets	Not available	Wi-Fi Networks	Wi-Fi performance	Various
Xu et al. []—WARBERT	Web API Collection	Large scale APIs	API Recommendation	Not available	Not specified
Li et al. []	Composite Services	Composite services	Composite Services	QoE, QoS correlation	Not specified
Long et al. []	Survey (various)	Not available (survey)	Network Operations	Not available	Not available
Koudouridis et al. []	Survey (various)	Not available (survey)	Telecommunications	Not available	Not available
Alsayed et al. []—MicroRec	Stack Overflow + API Corpus	API docs	Microservices	Not available	Stack Overflow (public)
Liu et al. []	Microservice Instances	Microservices	Microservices	QoS + Context	Not specified
Meghazi et al. []—DeepLAB-WSC	ProgrammableWeb	8400+ services	Web Service Classification	Not available	Publicly available
Zeng et al. []—Lightweight BERT	Edge Services	Edge services	Edge Computing	Real-time constraints	Not specified
Zhang et al. []	Cloud Services	Cloud services	Cloud Computing	Service quality metrics	Not specified
Alam et al. []	Web API Repository	Web APIs	API Discovery	Not available	Not specified
Karapantelakis et al. []	Survey (various)	Not available (survey)	Mobile Networks	Not available	Not available
Bhanage et al. []	Review (various)	Not available (review)	IT Infrastructure	Anomalies	Not available
Qu et al. []	Survey (various)	Not available (survey)	Mobile Edge Intelligence	Not available	Not available
Hameed et al. []—FeD-TST	IoT Networks/B5G	IoT services	IoT/B5G Networks	Network QoS	Not specified
Huang et al. []	Cloud Services	Cloud services	Cloud Services	Cloud QoS	Not specified
Le et al. []	Network Traffic	Network logs	Network Traffic	Not available	Not specified
Jeong []—BERT+GCN	FullTextPeerRead	Citation data	Citation Recommendation	Not available	Publicly available
Liu et al. []—QoSGNN	WSDream	Not specified	QoS Prediction	Various QoS	Publicly available
Lian et al. []—PMT	QoS Dataset	Not specified	QoS Prediction	Multi-attribute QoS	Not specified
Jirsik et al. []	Network QoS Metrics	Network data	Network QoS	Network QoS	Not specified
Guo et al. []	Service Selection	Services	Service Selection	QoS attributes	Not specified
Boulakbech et al. []	Service Recommendation	Services	Service Recommendation	Not available	Not specified

Table 8. Literature review matrix—performance metrics.

Study	Task	RMSE	MAE	Precision	NDCG	Accuracy	Improvement
Kumar []—TPMCF	Temporal QoS	Improved	-	-	-	-	Significant
Liu []—llmQoS (T, 20%)	QoS Throughput	0.101 (±0.004)	0.095 (±0.003)	-	-	-	10.2% RMSE ↓
Liu []—llmQoS (T, 5%)	QoS Throughput	0.083 (±0.003)	0.079 (±0.002)	-	-	-	7.2% RMSE ↓
Liu []—llmQoS (RT, 20%)	QoS Response Time	0.077 (±0.005)	0.073 (±0.004)	-	-	-	21.4% RMSE ↓
Liu []—llmQoS (RT, 5%)	QoS Response Time	0.066 (±0.003)	0.064 (±0.002)	-	-	-	2.1% RMSE ↓
Xu []—WARBERT	API Recommendation	-	-	Better	Improved	0.83	Better precision@k
Alsayed [] —MicroRec	Microservices	-	-	Improved	-	Better	Superior context
Meghaz [] —DeepLAB	Classification	-	-	-	-	~0.85	~15–20% accuracy
Zeng [] —Lightweight	Edge Recommendation	-	-	-	-	Similar	40% faster
Alam []—BERT Variants	API Recommendation	-	-	-	0.87 (RoBERTa)	0.87	RoBERTa best
Hameed []—FeD-TST	QoS (IoT)	-	-	-	-	-	Privacy preserving
Jeong []—BERT+GCN	Citations	-	-	-	-	MAP: 0.84	+28% MAP
Liu []—QoSGNN	QoS Prediction	Improved	-	-	-	Better	Superior GNN
Lian—PMT	Personalized QoS	Improved	-	-	-	Better	Multi-task
Jirsik []—LSTM	QoS Forecasting	Better	-	-	-	Better	Better granularity
Guo []—QoS-Aware	Service Selection	-	-	Improved	-	-	Enhanced diversity
Boulakbech []—Dual	Personalized Rec	-	-	~0.78	-	-	Better personal

Table 9. Literature review matrix—research gap matrix.

QoS Attribute	Papers Addressing	Coverage Level	Gap Severity	Future Directions
Response Time	Liu, Kumar, Hameed, Jirsik	Well Studied (16%)	Low	Multi-modal temporal prediction
Throughput	Liu, Kumar	Well Studied (8%)	Low	Cross-domain generalization
Latency	Hameed, Zeng	Moderately Studied (8%)	Medium	Real-time edge optimization
Accuracy (Classification)	Multiple implicit	Moderately Studied (20%)	Medium	Explainable predictions
Availability	General mentions, no specific studies	Under Studied (0%)	High	BERT for uptime prediction
Reliability	Limited coverage	Under Studied (0%)	High	Sentiment analysis of reviews
Scalability	Implicit in some studies	Under Studied (0%)	High	BERT for auto-scaling
Security	No dedicated studies	Critically Under Studied (0%)	Critical	Security policy understanding
Cost	No dedicated studies	Critically Under Studied (0%)	Critical	Cost–benefit analysis
Usability	No dedicated studies	Critically Under Studied (0%)	Critical	UX improvement suggestions

Table 10. Other studies of BERT applications.

Study (Year)	Model/Approach	Data and Task	Key Results
Liu et al., 2024 []	llmQoS: RoBERTa + CF for QoS prediction	WS-DREAM (user × service QoS matrix)	RMSE↓ by ~7–10% (throughput), RMSE↓ by 2–21% (response time) vs. CF baselines. Consistent MAE/RMSE gains at all sparsity levels.
Huang et al., 2023 []	WSR-DRL: BERT (service name) + 2D-CNN+BiLSTM (description) + disentangled interactions	Real user-service rating data (service rec)	Outperforms DMF, DeepFM, DKN, GCMC, etc., on Precision@10, Recall@10, NDCG@10. BERT-name + CNN/LSTM provides richer features.
Wang et al., 2021 []	ServiceBERT: BERT pre-trained (MLM+RTD+contrastive) for service text	ProgrammableWeb APIs/mashup tasks	Higher accuracy on API tagging and mashup recommendation vs. prior methods. (Reported “better performance” on two service tasks.)
Yang et al., 2020 []	ServeNet-BERT: BERT embeddings of service name and description + NN	Service description classification (OpenAPI)	Achieved much higher classification accuracy than 10 ML baselines (e.g., LDA+SVM, LSTM).

Table 11. Other studies of BERT applications (Extension of Table 10).

Study	Base Model	Input Representation	Fusion Method	Output Layer	Training Strategy	Dataset
Liu et al., 2024 [] llmQoS	RoBERTa-base (Pre-trained LLM)	- User descriptive attributes (text) - Service descriptive attributes (text) - Historical QoS values (numerical)	- LLM embeddings (768-dim) - Concatenated with user-service ID embeddings - Fed into collaborative filtering network	- Fully connected layers - Regression output for QoS prediction - Separate outputs for throughput and response time	- Two-stage: (1) Pre-trained RoBERTa frozen (2) Fine-tune CF network - Adam optimizer - MSE loss function	WS-DREAM - 339 users—5825 services - Throughput and response time metrics - Multiple sparsity levels (5%, 10%, 15%, 20%)
Huang et al., 2023 [] WSR-DRL	BERT-base (Service name encoder) + CNN-BiLSTM (Description encoder)	- Service name—BERT tokenization - Service description -Word embeddings - User-service interaction matrix - Disentangled latent factors	- BERT embeddings for service names - 2D-CNN extracts local features from descriptions - BiLSTM captures sequential context - Concatenate name + description embeddings - Disentangled user-service interactions	- Disentangled representation layer - Rating prediction layer - Softmax for ranking	- End-to-end training with disentanglement loss - Separate user and service latent factors - Regularization for interpretability - Cross-entropy + MSE loss	- Real-world user-service ratings - Service descriptions from repositories - User interaction history - Multi-domain services
Wang et al., 2021 [] ServiceBERT	BERT-base (Custom pre-trained on service corpus)	- Service textual descriptions - API documentation - Service tags and categories - Mashup descriptions - [CLS] token for classification	- Multi-task pre-training: - Masked Language Modeling (MLM) - Replaced Token Detection (RTD) - Contrastive learning for service pairs - Domain-specific vocabulary	- Multi-task heads: - Service tagging (classification) - Mashup recommendation (ranking) - Service similarity (contrastive)	- Three-stage pre-training: (1) MLM on service corpus (2) RTD for token detection (3) Contrastive learning - Fine-tuning for downstream tasks - AdamW optimizer	- ProgrammableWeb - 16,000+ APIs - 6000+ mashups - Service descriptions and tags - API-mashup relationships
Yang et al., 2020 [] ServeNet-BERT	BERT-base (Sentence embeddings)	- Service name - Service description text - OpenAPI specifications - Concatenated text features	- BERT generates sentence embeddings - Embeddings fed to feedforward neural network - Pooling of token embeddings ([CLS] + mean pooling) - Dense layers for feature transformation	- Multi-layer perceptron (MLP) - Softmax layer for classification - Output: Service category labels	- Transfer learning from pre-trained BERT - Fine-tuning on service classification - Cross-entropy loss - Dropout for regularization (0.1–0.3) - Early stopping	- OpenAPI dataset - 2000+ web services - 10 service categories - Service descriptions and specifications

Table 12. Summary of service specification types, extracted fields, text length, and encoder configurations used across BERT-based web service selection and QoS prediction models.

Study	Service Spec type	Fields Used from WSDL/OpenAPI	Typical Text Length (tokens)	Encoder Type
Liu et al., 2024 []	WSDL	Operation name, documentation, portType	30–80	BERT-base (CLS pooling)
Huang et al., 2023 []	OpenAPI	Path, operation summary, description, tags	40–120	Sentence-BERT (mean pooling)
Wang et al., 2021 []	Swagger/OpenAPI	API name, description, parameter names	50–150	RoBERTa-base (CLS pooling)
Yang et al., 2020 []	WSDL + free text	Operation name, documentation + manual notes	60–90	BERT-base (CLS pooling)
Wang, Z et al., 2025 []	Mixed APIs	Service name, description, category	40–100	Domain-tuned BERT/SBERT

Table 13. Advantages and limitations of BERT.

Aspects	Advantages	Limitations
Text Understanding	Captures deep semantic/contextual meaning from service descriptions, user reviews, API docs.	Requires large, domain-specific corpora for effective fine-tuning
User Preference Modeling	Learns nuanced user preferences from reviews and mashup queries, outperforming keyword-based models.	Transformer models are computationally intensive during training and inference.
QoS Prediction Accuracy	Enriches feature vectors with descriptive text, reducing RMSE/MAE in service quality prediction (e.g., throughput, response time).	Gains may be marginal for some metrics or datasets (e.g., response time).
Service Recommendation	Improves top-k ranking metrics (Precision@10, Recall, NDCG) via BERT embeddings of service name/description.	BERT alone cannot model numerical QoS—needs hybridization with collaborative filtering or neural models.
Cold-Start Problem Handlin	Alleviates matrix sparsity using textual side info; improves predictions when interaction data are sparse.	Still limited if no descriptive data are available for new services/users.
Model Interpretability	Some models improve interpretability with disentangled latent spaces or hybrid attention mechanisms.	Most BERT-based models are still black boxes with limited explainability.
Generalization	BERT can transfer to various tasks (e.g., classification, tagging, recommendation) with minimal modification.	Pre-trained BERT may underperform if not adapted to domain-specific terminology.
Integration Potential	Easily combined with CNNs, LSTMs, or CF layers to form flexible architectures for QoS-aware recommendations.	Increases design complexity and hyperparameter tuning overhead.

Table 14. Practical comparison between BERT-family encoders and GPT-style LLMs for QoS-aware service recommendation.

Aspect	BERT-Family Encoders (BERT, RoBERTa)	GPT-Style LLMs (GPT-4/5, Llama-2/3, etc.)
Typical parameter scale	~110–400 M parameters (base–large) (fine-tunable on a single GPU)	Billions to >100 B parameters (often multi-GPU or hosted)
Typical inference latency	Low, milliseconds to low tens of milliseconds per sequence on commodity GPUs/CPUs	Higher, tens to hundreds of milliseconds per call, often with API overhead
Deployment model	Frequently deployed on premise or private cloud; easy to containerize and co-locate with QoS engines	Commonly accessed via external cloud APIs; full on-prem deployment is costly and complex
Primary role	Encoder: produces fixed-length embeddings for services/users; integrates with CF/MLP QoS predictors	Generator/reasoning agent: produces text, explanations, or decisions from prompts

Table 15. Summary of direction in academic settings and industries.

Key Direction	Academic Focus	Industry/Application Trends
Domain/Transfer Learning []	Adapting BERT to new domains via fine-tuning only parts of the model or adding adapter layers; multi-task contrastive pretraining for cross-domain recommendation.	Cross-domain recommenders using light models, for example, DistilBERT with simple fusion; industry emphasis on reusable models across product lines.
Multi-Modal Fusion	Combining text with images/audio, example, using CLIP for vision, BERT for text, and fusing them in a joint model.	Multimedia recommendation engines that ingest captions, reviews, and images; multi-modal BERT variants in practice.
Interpretability and Hybrid	Hybrid models combining CF/GCN and BERT semantic signals, for example, BeLightRec, disentangled representations, example WSR-DRL; LLMs generating natural explanations.	Use of review snippets or keywords to explain recommendations; commercial explainable AI platforms example, Watson, X.ai.
Knowledge and Context [,]	Enhancing recommendation systems with LLM-generated summaries; using knowledge graphs for improved semantic matching.	Use of knowledge graph APIs, for example, Neo4j combined with BERT features for recommendation. Search engines (Google, Bing) already fuse BERT with KG for richer results.
Prompt-based Learning	Using pre-trained BERT with minimal tuning via prompts, for example, cloze tasks; soft prompt tuning for personalization.	On-demand recommendation via LLM APIs; prompt-based language modeling replacing full retraining.
Model Efficiency	Distillation, quantization, pruning of BERT for low-latency applications.	Use of DistilBERT, MobileBERT in production; ONNX/TensorRT deployment; edge recommendation systems.
Tools and Platforms [,]	Libraries like RecBole, Hugging Face Transformers; benchmarking transformer-based recommendation.	One-click deployment with AWS SageMaker + Hugging Face; BigQuery ML and TensorFlow Hub integrations.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

BERT-Based Approaches for Web Service Selection and Recommendation: A Systematic Review with a Focus on QoS Prediction

Abstract

1. Introduction

1.1. Importance of QoS-Based Selection

1.2. Challenges in Web Service Selection and Recommendation

1.3. Why BERT for Web Service Selection?

1.4. Motivation and Objective

2. Background

2.1. Web Service Selection and Recommendation

2.2. Existing Techniques for Service Selection and Recommendation

3. Systematic Review Methodology

3.1. Research Questions

3.2. Search Strategy and Selection

3.3. Data Extraction

3.4. Quality Assessment

4. Synthesis of the Literature

4.1. Application of BERT on Web Service Selection and Recommendation in the Context of QoS

4.2. Preprocessing Pipelines from Structured Service Descriptions to BERT Inputs

4.3. Advantages and Limitations of Using BERT Models for Web Service Selection and Recommendation

4.4. Future Directions of Using BERT in the Context of Web Service Selection and Recommendation as Well as QoS Predictions

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics