You are currently viewing a new version of our website. To view the old version click .
Future Internet
  • Systematic Review
  • Open Access

27 November 2025

BERT-Based Approaches for Web Service Selection and Recommendation: A Systematic Review with a Focus on QoS Prediction

,
and
1
Faculty of Information Science & Technology, Multimedia University, Melaka 75450, Malaysia
2
Faculty of Computing and Informatics, Multimedia University, Cyberjaya 63000, Malaysia
*
Author to whom correspondence should be addressed.

Abstract

Effective web service selection and recommendation are critical for ensuring high-quality performance in distributed and service-oriented systems. Recent research has increasingly explored the use of BERT (Bidirectional Encoder Representations from Transformers) to enhance semantic understanding of service descriptions, user requirements, and Quality of Service (QoS) prediction. This systematic review examines the application of BERT-based models in QoS-aware web service selection and recommendation. A structured database search was conducted across IEEE, ACM, ScienceDirect, and Google Scholar covering studies published between 2020 and 2024, resulting in twenty-five eligible articles based on predefined inclusion criteria and PRISMA screening. The review shows that BERT improves semantic representation and mitigates cold-start and sparsity issues, contributing to better service ranking and QoS prediction accuracy. However, challenges persist, including limited availability of benchmark datasets, high computational overhead, and limited interpretability of model decisions. The review identifies five key research gaps and outlines future directions, including domain-specific pre-training, hybrid semantic–numerical models, multi-modal QoS reasoning, and lightweight transformer architectures for deployment in dynamic and resource-constrained environments. These findings highlight the potential of BERT to support more intelligent, adaptive, and scalable web service management.

1. Introduction

Web services have significantly contributed to the ever-growing landscape of the Internet and the digital economy. Web services are designed for online interoperability; web services enable dynamic, cross-platform interactions that enhance both functionality and accessibility. The ability to select and recommend web services based on Quality of Service (QoS) attributes is very important to ensure that the user requirements are met effectively, offering optimal response times, reliability, and overall service quality. QoS encompasses a variety of attributes which indicate how well a service meets the user’s expectations and requirements. Some of the top attributes of QoS are reliability, response time, availability, throughput, and security.

1.1. Importance of QoS-Based Selection

The main goal of web service selection is to improve user satisfaction. Employing a QoS-based selection method helps users choose services that satisfy or align with their needs, hence enhancing the overall experience. This is because web services can differ significantly in their reliability and performance []. The operational efficiency of software systems is enhanced by effective web service selection and recommendations. By selecting the services with optimal QoS metrics, software applications can minimize latency, reduce the processing times, and also improve throughput []. Additionally, the ability to dynamically select and adapt the web services based on the QoS data is essential due to the rapidly changing environment. This adaptability ensures that the applications can respond to fluctuations in service availability or performance [].

1.2. Challenges in Web Service Selection and Recommendation

Web services selection and recommendation based on QoS attributes are vital for the effectiveness of software systems. However, there are several challenges faced when it comes to implementing techniques or methods for web service selection and recommendation, especially QoS-based. Determining the attributes which are most relevant for specific applications can be challenging due to the variety and numerous metrics available to evaluate the service quality. The lack of standardization across services makes it difficult to compare the metrics consistently and effectively. QoS data have to be of high quality, of real time, and also consistent for it to have an accurate service selection. Network changes, system load, or outages can compromise the availability of reliable QoS data and this can lead to suboptimal service options []. As the number of available web services continues to grow, efficiently managing and analyzing large-scale datasets while maintaining high performance has become increasingly challenging []. Additionally, it is important to understand the correlations between various QoS attributes to ensure effective service selection. The dynamic nature of these attributes can complicate their interdependencies, which may lead to challenges in accurately predicting service performance [].

1.3. Why BERT for Web Service Selection?

BERT (Bidirectional Encoder Representations from Transformers) model, developed by Google in 2018, has significantly advanced the field of Natural Language Processing (NLP). This model has facilitated improvements in various NLP tasks by leveraging the transformer architecture’s capabilities. The model’s bidirectional training process allows it to understand the context of a word based on the surrounding words (left and right) rather than just the following or preceding texts. This approach led to more profound understanding of language nuances, meanings, and relationships. Some of the advantages of BERT in NLP tasks are described in Table 1.
Table 1. Advantages of BERT in NLP tasks.
BERT enables more accurate semantic alignment between user requirements and service capabilities by comprehensively analyzing natural language service descriptions and queries. By doing so, service retrieval efficiency can be improved. For web service recommendation, BERT can be leveraged to analyze user preferences and historical interactions with the web services through the enhanced recommendation systems. The recommendation method can provide more personalized service suggestions that closely align with the user needs []. BERT’s capabilities can facilitate effectiveness of QoS predictions by enabling models to analyze the service performance descriptions derived from the operational data and user reviews. By understanding these QoS attributes such as reliability, response time, and user sentiment, BERT can inform more accurate forecasting models.
There is a need to systematically classify the methodologies or techniques of applying BERT to QoS prediction to foster clearer understanding and further progress.
The conceptual framework shown in Figure 1 illustrates how the proposed BERT-based model supports QoS-aware web service recommendations. The process starts with collecting various types of web service data, such as descriptions, execution logs, and user reviews. These inputs are then processed through the BERT Semantic Layer, where the model performs embedding and domain-specific fine-tuning to understand the contextual meaning of the information. The resulting representations are used in the following three key components: the QoS Prediction Model, which estimates the quality of service parameters; the User Preference Modeling Layer, which captures and interprets user needs and behaviors; and the Service Retrieval and Ranking Layer, which identifies and orders the most relevant services. Together, these layers produce a QoS-aware recommendation that balances objective service quality with individual user preferences, ensuring more accurate and personalized results.
Figure 1. Conceptual framework of using a BERT-based semantic approach.

1.4. Motivation and Objective

This paper aims to review and understand different BERT methods associated with QoS-based web service selection and recommendation. The review is conducted over high-level referred journals, conferences, and online databases such as IEEE, ACM. These are papers from 2020 to 2024 written in English. The main contributions of the paper are as following:
  • Identifying the gaps in the literature such as how BERT can address unique challenges in web service selection and recommendation and QoS prediction or how it compares against the traditional models.
  • Assessing the performance of BERT models by understanding the strengths and its weaknesses and also when and how to the model is to be applied effectively in the web service alongside with QoS attributes, distinguishing when BERT provides significant gains and when it does not.
  • We identify five high impact research gaps such as QoS attribute neglect, dataset reproducibility, model interpretability, scalability, and cost awareness, and translate them into a structured future research work.
The paper is organized by sections—introduction, background, methodology, BERT applications in the context of web service selection and recommendation, challenges and limitations, comparative analysis, future directions for BERT, and lastly conclusion.

2. Background

2.1. Web Service Selection and Recommendation

A web service is a method for different applications and systems to communicate across the Internet with each other and are an important part of modern software systems. They consist of standards and protocols that enable the data exchange between the systems.
Figure 2 illustrates the flow of the web service invocation. User first interacts with a web service through a browser. The browser then sends the requests over the internet. The web server processes these requests which may involve database, which then generates response back to user via the internet.
Figure 2. Web service process.
The web service selection and recommendation involve various steps for assessing the attributes and characteristics that ensure optimal choices for given selected tasks. The key steps are illustrated in Figure 3. The first phase depicts searching for available web services that meet the functional and non-functional requirements required by users. For second phase, once the potential services are identified, the QoS evaluation is performed to assess the service against the QoS attributes defined, for example, throughput, reliability, availability, and response time. This part helps to rank the services based on the user expectations or requirements and performance. Recommendation systems are used to suggest services to users based on the history, preferences, and the QoS predictions. The recommended services can be compiled into a single unit that performs specific tasks. Finally, the whole process is ongoing or continuously monitoring and this is crucial as the QoS attributes need to be continually assessed to adapt to the requirements based on the dynamic nature of user needs or conditions. This contributes to maintaining user satisfaction over time.
Figure 3. Key steps of web service selection and recommendation.

2.2. Existing Techniques for Service Selection and Recommendation

Traditionally, there are three types of recommender system—collaborative filtering, content-based, and hybrid. Collaborative filtering uses ratings and preferences of other users to recommend services to a particular user. Content-based system collects all the data items into different item profiles based on their description or features. Hybrid systems, on the other hand, combine both collaborative and content-based techniques to provide more accurate recommendations.
Content-based filtering is good at recommending new or less popular items and is less affected by the cold-start problem, but it tends to produce repetitive recommendations with limited diversity. Collaborative filtering, on the other hand, effectively captures user preferences, handles various item types, and adapts to changing user behavior over time. However, it struggles with the cold-start issue and data sparsity in large datasets. Hybrid methods combine the strengths of content-based and collaborative filtering, delivering more accurate and diverse recommendations. However, they are often more complex and resource-intensive, and their performance heavily depends on the quality of the underlying techniques. Choosing the right approach depends on factors like data availability, system scalability, and the need for personalization [,,].

3. Systematic Review Methodology

3.1. Research Questions

Table 2 lists the research questions and the motivations that are kept in mind when preparing this systematic review paper. This will provide a comprehensive and thorough review of BERT’s application in web service selection and recommendation as well as QoS predictions.
Table 2. Research Questions and Motivations.

3.2. Search Strategy and Selection

The search string and the sources from which the papers are identified are the main criteria in the search strategy. The search results are extracted in Research Information Systems (RIS) format and passed to Rayyan tool. The year range is only from year 2020 to 2024 for full text and metadata to keep the studies more relevant. These are taken only for English language studies.
Table 3 refers to the search strategies used according to each section or points.
Table 3. Search String for IEEE.
Table 4 refers to the inclusion and exclusion criteria for the studies. These criteria are required to further select the relevant paper from the initial pool based on the search string.
Table 4. Selection criteria for the studies.
From the defined research questions, the papers are evaluated and selected based on whether the objectives, the research motivations and methods, and the relevant information on the data, findings, results are clearly defined and supported.

3.3. Data Extraction

Information and data are extracted based on the relevant research questions defined in Table 2. Table 5 summarizes the same.
Table 5. Data point and its relevant RQ.

3.4. Quality Assessment

The final articles are assessed based on criteria as follows:
(1)
Whether the objective is clearly defined;
(2)
Whether the methods or techniques are elaborated;
(3)
Whether the authors have provided their findings and results based on proper data analysis.
Figure 4 demonstrates the PRISMA flow on the article selections. The PRISMA flow diagram outlines the systematic review process, beginning with the identification of 163 records from four databases (ACM, IEEE, Science Direct, and Google Scholar), from which 15 records (12 duplicates and 3 others—due to articles not related to BERT) were removed before screening, leaving 148 records for title or abstract screening. During screening, 125 records were excluded, resulting in 27 reports sought for full-text retrieval, of which 2 were unavailable, leaving 25 reports for eligibility assessment. No further exclusions were made, and all 25 studies were ultimately included in the review, demonstrating a transparent and methodical approach to study selection in accordance with PRISMA guidelines.
Figure 4. PRISMA flowchart on article selection.
This review was conducted in accordance with the PRISMA 2020 (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. The full PRISMA checklist and flow diagram are provided in the Supplementary Materials.

4. Synthesis of the Literature

Literature matrix table is derived based on the review performed on the final selected studies as demonstrated in Table 6.
Table 6. Literature review matrix—BERT applications.
Table 7 represents dataset information used in the papers selected for systematic review.
Table 7. Literature review matrix—dataset information.
From the dataset information derived, only 16% (4/25) papers use publicly available datasets. The following are the publicly available datasets:
(1)
WS-DREAM (Liu et al.)—5825 services, 339 users
(2)
ProgrammableWeb (Meghazi et al.)—8400+ services
(3)
Stack Overflow (Alsayed et al.)—API documentation corpus
(4)
FullTextPeerRead (Jeong)—Citation dataset
(5)
WSDream (Liu et al.)—QoS prediction dataset
The findings indicate that 84% of the studies are difficult to reproduce due to the unavailability of public datasets. They have limited ability to compare methods across studies; hence, there is a need for standardized benchmark datasets in the field.
Table 8 demonstrates the quantitative performance across studies. These are papers with quantitative metrics—17/25 (68%)
Table 8. Literature review matrix—performance metrics.
From the performance metrics derived, the critical gap identified is the inconsistent metric reporting—which hampers meta-analysis. The “T” represents the throughput, and “RT” represents Response time. Following summary refers to the following:
(1)
RMSE (Root Mean Square Error): 47.1% of evaluations (8/17)
(2)
MAE (Mean Absolute Error): 23.5% of evaluations (4/17)
(3)
Precision: 23.5% of evaluations (4/17)
(4)
NDCG (Normalized Discounted Cumulative Gain): 11.8% of evaluations (2/17)
(5)
Accuracy: 52.9% of evaluations (9/17)
It is impossible to determine “best” method definitively as the publications are biased towards positive results. As a recommendation, a standardized evaluation protocol can be adopted, for example, always including confidence intervals and statistical significance tests.
From these, we can infer the following key findings:
(1)
Best Classification Performance: Meghazi et al.—DeepLAB-WSC
(2)
Best QoS Prediction Performance: Liu et al.—llmQoS
(3)
Best Semantic Matching: Alam et al.—BERT Variants
(4)
Best Citation Performance: Jeong—BERT+GCN
(5)
Best Efficiency: Zeng et al.—Lightweight BERT
Table 9 lists QoS attributes coverage across all 25 papers (research gap matrix). A total of 60% of QoS attributes have zero or minimal BERT research.
Table 9. Literature review matrix—research gap matrix.
The comprehensive analysis of all twenty-five papers reveals that BERT demonstrates consistent improvements, and 40% adoption rate shows growing interest. Coverage level is denoted by dividing the number of papers against twenty-five reviewed papers. There are diverse applications of BERT across QoS, recommendation, and classification.
Figure 5 represents graphical meta-summary of BERT based QoS research from 2020 to 2024. The figure on the left displays the publication trend showing a steady increase in studies using BERT for web service selection and QoS prediction. The graph on right displays the frequency of QoS attributes analyzed across reviewed works, highlighting that response time and availability dominate current research focus.
Figure 5. Graphical meta-summary of BERT—based QoS research (2020–2024).
RMSE and MAE values represent aggregated averages derived from reported results across twenty-five studies (2020–2024). They illustrate the relative performance trend among traditional, BERT-based, and hybrid models in QoS prediction tasks. This is represented in Figure 6. Left figure displays aggregated comparison of RMSE and MAE across traditional and BERT-based models. The right figure displays forest plot showing mean RMSE values with standard deviation whiskers and a summary diamond representing the pooled mean performance across all models. The vertical dotted line denotes the overall mean RMSE across all models.
Figure 6. Quantitative summary of model performance for QoS prediction.

4.1. Application of BERT on Web Service Selection and Recommendation in the Context of QoS

BERT has significantly enhanced the performance of the web service selection and recommendation, primarily by focusing on improving the understanding of service descriptions, classification accuracy, and ultimately helping in better selection and recommendation outcomes.
BERT’s bidirectional encoding allows each token in a service description to attend to both left and right context, which is important for distinguishing subtle functional and non-functional details in the API texts and for accurately aligning user requirements with service capabilities. Pre-training with MLM (Masked Language Modeling) on large corpora yields robust contextual embeddings that transfer well to relatively small QoS datasets, allowing fine-tuned models such as QoSBERT or ServiceBERT to achieve strong QoS prediction performance without training from scratch. BERT’s subword tokenization handles out of vocabulary identifiers commonly found in API names, parameter labels, and QoS attributes, for example, RESTfulAPI, and RTT_ms, enabling robust encoding of heterogenous service documentation.
In the study by author [], BERT was integrated together with Deep Pyramid Convolutional Neural Network (DPCNN) to capture both local and global contextual information in service descriptions. By doing so, it improves the service classification metrics, which is important in building QoS-aware recommendation systems. The study highlights how classifications based on web service descriptions can discriminate among services, especially in environments where services are functionally similar but differ in QoS parameters.
In the study by author, they proposed llmQoS (Large Language Model Aided QoS), which encodes descriptive attributes of users and services with a pre-trained language model (RoBERTa) and combines these embeddings with a collaborative-filtering prediction network. The model begins by converting user and service attributes into natural language sentences which are then processed by pre-trained LLMs (RoBERTa or Phi3mini). By extracting semantic features from textual attributes using LLMs, the model overcomes the data sparsity problem inherent in collaborative filtering. Even for new users or services with no historical interactions, the LLM features provide meaningful representations based on their descriptive attributes. llmQoS generalizes the BERT-style embedding idea by using a pre-trained encoder-only or decoder-only LLM to extract descriptive feature vectors from user and service attribute sentences, which play the same role as BERT embeddings in enriching CF-based QoS predictors under sparse data. Although llmQoS uses LLMs beyond vanilla BERT, it further validates the following underlying principle: descriptive text embeddings (obtained from LLM/BERT-style encoders) significantly alleviate sparsity and cold-start by providing content features independent of historical interactions. On the WS-DREAM dataset, llmQoS reduces MAE by >20% for throughput and >10% for response time compared to the best traditional baselines at low densities, confirming that rich textual embeddings are especially beneficial when QoS logs are sparse.
BERT has also been used to model user preferences and service semantics from text. One such study is from author [], where WSR-DRL (Web Service Recommendation model based on Disentangled Representation Learning) has been proposed, which is an interpretable web service recommender that makes use of BERT to encode each service’s name and a CNN–BiLSTM to encode its description. This method uses a fine-tuned BERT model that learns subtle meaning in service names and descriptions through its bidirectional multi-head attention. The CLS token is used to generate a compact representation of each service, which is then passed into a disentangled interaction module to support more accurate QoS-aware recommendations. It is noteworthy that BERT captures fine-grained service semantics, which uses a “service—name BERT” and deep text encoders to yield richer features than bag of words. Unlike black-box neural models, WSR-DRL’s disentangled factors are interpretable. Users can understand why a service is recommended. This transparency builds trust and helps users make informed decisions.
ServiceBERT employs domain-specific pre-training on a large corpus of web service and API descriptions, followed by multitask fine tuning for service ecosystem tasks. General purpose BERT misses service specific distinctions (for example, REST vs. SOAP, API composition patterns). ServiceBERT’s domain specific vocabulary and multitask pre training capture these nuances, significantly improving performance on service ecosystem tasks (15 to 25% improvement over vanilla BERT on tagging accuracy). ServiceBERT exploits BERT’s masked-language pre-training and subword tokenization to learn domain-specific embeddings of API names and documentation, using CLS pooling from a fine-tuned BERT-base encoder as the service representation for downstream QoS-aware tagging and recommendation.
ServeNet-BERT uses transfer learning from pre-trained BERT with a sophisticated dual pooling strategy for service classification. The dual pooling strategy ([CLS] + mean) captures both global semantic meaning and detailed token level information. This is effective for long-service descriptions where important details might be distributed throughout the text. Experiments show 8 to 12% accuracy improvement over single pooling strategies. Table 10 refers to other studies performed using BERT, apart from the twenty-five articles reviewed.
Table 10. Other studies of BERT applications.
Overall, BERT’s integration into the processes of web service selection and recommendation creates a robust framework capable of addressing the inherent complexities associated with QoS. As web services continue to proliferate, employing advanced techniques like BERT will be crucial in refining selection processes to meet user expectations effectively.
Table 11 is an extension of Table 10 to explore base model, input representation, fusion method, output layer, training strategy, and dataset used.
Table 11. Other studies of BERT applications (Extension of Table 10).

4.2. Preprocessing Pipelines from Structured Service Descriptions to BERT Inputs

Many studies extract semantically rich fields such as operation name, summary, description, tags, and parameter descriptions from OpenAPI/Swagger or WSDL, then concatenate them into a single text sequence per service. For ServeNet-BERT and WARBERT-style pipelines, the API name and free text description are concatenated, optionally with category labels or path segments, and fed to BERT as “[CLS] name [SEP] description … [SEP]. For older SOAP/WSDL services, studies typically strip the XML tags while preserving element names and documentation, normalize identifiers, removes schema artifacts, and then treats the resulting text as a short “documentation paragraph” for each operation before tokenization by BERT.
In pipeline-style architectures such as QoSBERT [] and ServiceBERT-like models, the preprocessed text is passed through a BERT encoder to obtain a pooled sentence embedding or a sequence of token embeddings for attention or CNN/RNN layers. Figure 7 illustrates the common pipeline adopted across the included studies, starting from WSDL/OpenAPI artifacts and ending at a BERT-based embedding integrated with a QoS prediction head. Table 12 summarizes, for the main BERT-based approaches, which WSDL/OpenAPI fields are used, the approximate resulting text length, and the chosen encoder variant and pooling strategy. This highlights a common pattern: semantically rich name or description fields are flattened into short sequences (40–150 tokens) and encoded by BERT-family models using either CLS pooling or mean pooling.
Figure 7. Typical preprocessing and modeling pipeline for using structured service descriptions in BERT-based QoS prediction.
Table 12. Summary of service specification types, extracted fields, text length, and encoder configurations used across BERT-based web service selection and QoS prediction models.

4.3. Advantages and Limitations of Using BERT Models for Web Service Selection and Recommendation

BERT demonstrates significant potential in enhancing both the efficacy and efficiency in web service selection and recommendation processes. However, this comes with both advantages and limitations accordingly. Table 13 demonstrates further the advantages and limitations of BERT.
Table 13. Advantages and limitations of BERT.
BERT’s contextual embeddings capture semantic nuances of service text and user feedback that are lost in simple keyword models. This enriches latent features for collaborative filtering and content matching. In QoS prediction, BERT features of user/service attributes help overcome cold-start sparsity, markedly reducing RMSE. In recommendation, BERT makes it easier to match mashup requirements or user preferences with API descriptions. Pre-training on domain data means the model “knows” web service jargon. Experiments consistently show that multi-embedding BERT models yield double-digit percentage gains in standard metrics, for example, RMSE, precision, recall.
BERT models are computationally intensive and require careful fine-tuning on domain-specific data to achieve optimal performance. Training ServiceBERT from scratch required assembling thousands of Web API documents. Even fine-tuning a BERT adds computation overhead at inference. In some cases, for response-time QoS prediction, llmQoS improved RMSE by only a few percent in contrast to larger throughput gains. Moreover, transformer models can be opaque. The disentangled learning in WSR-DRL adds interpretability, but most BERT-based methods remain black boxes. Finally, purely text-based BERT models ignore non-textual QoS signals, so they must be combined with numeric features.
The advantage of BERT-based content features at low density stems from the fact that they approximate similarity in a semantic space rather than relying solely on historical co-occurrence in the QoS matrix. By encoding WSDL or OpenAPI descriptions, API names, and other textual metadata into dense vectors, BERT captures functional and non-functional characteristics (e.g., domain, interface style, expected load, latency hints) that are shared by services, even if those services have never been invoked by the same users. When these embeddings are combined with sparse interaction data in a CF or MLP head, the model can generalize from semantically similar services or users, thereby mitigating sparsity and cold-start effects and yielding larger relative error reductions at 5–10% density than at 20% or higher.
In the Large Language Model (LLM) era, BERT-based encoders remain highly competitive for QoS-aware web service recommendation when cost, latency, or data-governance constraints are binding. Their moderate parameter scale and low inference latency make them suitable for high-throughput QoS prediction scenarios, where thousands of users–service pairs must be scored in real time. Moreover, BERT-family models such as BERT and RoBERTa can be fine-tuned and deployed entirely on premise, allowing organizations to retain full control over sensitive service descriptions and QoS logs, whereas GPT-class LLMs are typically accessed via external APIs with higher computational and governance overhead. Table 14 represents the comparison between BERT and GPT-style LLMs.
Table 14. Practical comparison between BERT-family encoders and GPT-style LLMs for QoS-aware service recommendation.
Overall, recent research indicates that BERT-style embeddings enhance web service selection and recommendation by bringing in rich textual knowledge. When integrated carefully (often in hybrid models), they lead to more accurate QoS predictions and more relevant ranked lists of services. However, effectiveness depends on having sufficient textual data (service descriptions, reviews, API docs) and computational resources for training.

4.4. Future Directions of Using BERT in the Context of Web Service Selection and Recommendation as Well as QoS Predictions

From the review, academically, research is moving towards adopting BERT models to domain-specific tasks, fusing multiple data modalities, and combining BERT with other models to overcome challenges of sparse or heterogeneous data. In one study [], the author has addressed BERT’s domain bias by adding a lightweight transformation layer and contrastive objectives to align item representations across domains. Similarly, in another study [], the author integrated BERT’s semantic embeddings of item descriptions with collaborative filtering signals graph-based hybrid models, called BeLightRec.
In QoS prediction, large language models (LLMs) are used to extract latent features from service and user descriptions for which the author showed that combining BERT-like sentence embeddings with collaborative filtering (CF) dramatically improves QoS estimates, cutting prediction error by over 10 to 20% compared to CF alone.
Table 15 summarizes the academic focus as well as the industry or the current application trends.
Table 15. Summary of direction in academic settings and industries.

5. Conclusions

This review provides a detailed overview of how BERT and its variants are advancing research in web service selection, recommendation, and Quality of Service (QoS) prediction. The evidence from recent studies shows that BERT-based models generally achieve better performance than traditional and earlier deep learning approaches. By learning contextual relationships within service descriptions and user requirements, BERT helps bridge the gap between user intent and service functionality, leading to more accurate and adaptive service selection processes.
At the same time, several challenges remain. The computational demands of transformer models, the shortage of domain specific training data, and the absence of standardized benchmarks continue to limit broader adoption. Another concern is the limited interpretability of BERT-based systems, which makes it difficult to fully understand how decisions are made. Addressing these issues is essential for making such models practical in large-scale and real-time service environments.
Research is moving toward developing lighter, more transparent, and more efficient BERT variants that can operate effectively in diverse and resource-constrained settings. Promising directions include combining BERT with graph neural networks, reinforcement learning, and knowledge graphs to improve contextual reasoning and scalability. Advances in model compression, distillation, and prompt-based learning can also help reduce complexity while maintaining performance.
To move the field forward, we highlight three research priorities, that is, firstly to have standardized QoS benchmarks, interpretable and efficient BERT variants, and multi-modal QoS reasoning. In summary, BERT continues to play a central role in shaping intelligent, context-aware, and QoS-driven web service ecosystems. Its ability to understand and represent meaning-rich text positions it as a cornerstone for the next generation of adaptive and trustworthy service management systems that better align with user expectations and operational goals.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/fi17120543/s1, File S1: PRISMA 2020 Main Checklist. The full PRISMA checklist and flow diagram can be downloaded at: https://doi.org/10.17605/OSF.IO/RKZBJ.

Author Contributions

Conceptualization, V.M.R. and R.K.R.; Methodology, V.M.R. and R.K.R.; Software, V.M.R.; Validation, V.M.R., R.K.R., and M.S.S.; Formal analysis, V.M.R.; Writing—original draft preparation, R.K.R. and M.S.S.; Writing—review and editing, R.K.R. and M.S.S.; Supervision, R.K.R.; Project administration, R.K.R. All authors have read and agreed to the published version of the manuscript.

Funding

The APC was funded by Multimedia University.

Data Availability Statement

The original contributions presented in this study are included in the article/Supplementary Materials. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Hasnain, M.; Pasha, M.F.; Ghani, I.; Mehboob, B.; Imran, M.; Ali, A. Benchmark dataset selection of web services technologies: A factor analysis. IEEE Access 2020, 8, 53649–53665. [Google Scholar] [CrossRef]
  2. Sun, X.; Wang, S.; Xia, Y.; Zheng, W. Predictive-trend-aware composition of web services with time-varying quality-of-service. IEEE Access 2020, 8, 1910–1921. [Google Scholar] [CrossRef]
  3. Yuan, Y.; Guo, Y.; Ma, W. Dynamic service composition method based on zero-sum game integrated inverse reinforcement learning. IEEE Access 2023, 11, 111897–111908. [Google Scholar] [CrossRef]
  4. Rajendran, V.; Ramasamy, R.K.; Mohd-Isa, W.N. Improved eagle strategy algorithm for dynamic web service composition in the IoT: A conceptual approach. Future Internet 2022, 14, 56. [Google Scholar] [CrossRef]
  5. Liu, Q.; Wang, L.; Du, S.; Wyk, B.J.V. A method to enhance web service clustering by integrating label-enhanced functional semantics and service collaboration. IEEE Access 2024, 12, 61301–61311. [Google Scholar] [CrossRef]
  6. Bonab, M.N.; Tanha, J.; Masdari, M. A semi-supervised learning approach to quality-based web service classification. IEEE Access 2024, 12, 50489–50503. [Google Scholar] [CrossRef]
  7. Kowsher, M.; Sami, A.A.; Prottasha, N.J.; Arefin, M.S.; Dhar, P.K.; Koshiba, T. Bangla-bert: Transformer-based efficient model for transfer learning and language understanding. IEEE Access 2022, 10, 91855–91870. [Google Scholar] [CrossRef]
  8. Kim, M.; Lee, S.; Oh, Y.; Choi, H.; Kim, W. A near-real-time answer discovery for open-domain with unanswerable questions from the web. IEEE Access 2020, 8, 158346–158355. [Google Scholar] [CrossRef]
  9. Zhang, C.; Qin, S.; Wu, H.; Zhang, L. Cooperative mashup embedding leveraging knowledge graph for web api recommendation. IEEE Access 2024, 12, 49708–49719. [Google Scholar] [CrossRef]
  10. Ramasamy, R.K.; Chua, F.F.; Haw, S.C.; Ho, C.K. WSFeIn: A novel, dynamic web service composition adapter for cloud-based mobile application. Sustainability 2022, 14, 13946. [Google Scholar] [CrossRef]
  11. Roy, D.; Dutta, M. A systematic review and research perspective on recommender systems. J. Big Data 2022, 9, 59. [Google Scholar] [CrossRef]
  12. Ghafouri, S.H.; Hashemi, S.M.; Hung, P.C. A survey on web service QoS prediction methods. IEEE Trans. Serv. Comput. 2020, 15, 2439–2454. [Google Scholar] [CrossRef]
  13. Kumar, S.; Chattopadhyay, S.; Adak, C. TPMCF: Temporal QoS Prediction Using Multi-Source Collaborative Features. IEEE Trans. Netw. Serv. Manag. 2024, 21, 3945–3955. [Google Scholar] [CrossRef]
  14. Liu, H.; Zhang, Z.; Li, H.; Wu, Q.; Zhang, Y. Large Language Model Aided QoS Prediction for Service Recommendation. arXiv 2024. [Google Scholar] [CrossRef]
  15. Atzeni, D.; Bacciu, D.; Mazzei, D.; Prencipe, G. A systematic review of Wi-Fi and machine learning integration with topic modeling techniques. Sensors 2022, 22, 4925. [Google Scholar] [CrossRef] [PubMed]
  16. Xu, Z.; Gu, Y.; Yao, D. WARBERT: A Hierarchical BERT-based Model for Web API Recommendation. arXiv 2025. [Google Scholar] [CrossRef]
  17. Li, M.; Xu, H.; Tu, Z.; Su, T.; Xu, X.; Wang, Z. A deep learning based personalized QoE/QoS correlation model for composite services. In Proceedings of the 2022 IEEE International Conference on Web Services (ICWS), Barcelona, Spain, 10–16 July 2022; IEEE: New York, NY, USA, 2022; pp. 312–321. [Google Scholar]
  18. Long, S.; Tan, J.; Mao, B.; Tang, F.; Li, Y.; Zhao, M.; Kato, N. A Survey on Intelligent Network Operations and Performance Optimization Based on Large Language Models. IEEE Commun. Surv. Tutor. 2025. [Google Scholar] [CrossRef]
  19. Koudouridis, G.P.; Shalmashi, S.; Moosavi, R. An evaluation survey of knowledge-based approaches in telecommunication applications. Telecom 2024, 5, 98–121. [Google Scholar] [CrossRef]
  20. Alsayed, A.S.; Dam, H.K.; Nguyen, C. MicroRec: Leveraging Large Language Models for Microservice Recommendation. In MSR ‘24: Proceedings of the 21st International Conference on Mining Software Repositories, Lisbon, Portugal, 15–16 April 2024; Association for Computing Machinery: New York, NY, USA, 2024; pp. 419–430. [Google Scholar] [CrossRef]
  21. Liu, H.; Zhang, W.; Zhang, X.; Cao, Z.; Tian, R. Context-aware and QoS prediction-based cross-domain microservice instance discovery. In Proceedings of the 2022 IEEE 13th International Conference on Software Engineering and Service Science (ICSESS), Beijing, China, 21–23 October 2022; IEEE: New York, NY, USA, 2022; pp. 30–34. [Google Scholar]
  22. Meghazi, H.M.; Mostefaoui, S.A.; Maaskri, M.; Aklouf, Y. Deep Learning-Based Text Classification to Improve Web Service Discovery. Comput. Y Sist. 2024, 28, 529–542. [Google Scholar] [CrossRef]
  23. Zeng, K.; Paik, I. Dynamic service recommendation using lightweight BERT-based service embedding in edge computing. In Proceedings of the 2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC), Singapore, 20–23 December 2021; IEEE: New York, NY, USA, 2021; pp. 182–189. [Google Scholar]
  24. Zhang, P.; Ren, J.; Huang, W.; Chen, Y.; Zhao, Q.; Zhu, H. A deep-learning model for service QoS prediction based on feature mapping and inference. IEEE Trans. Serv. Comput. 2023, 17, 1311–1325. [Google Scholar] [CrossRef]
  25. Alam, K.A.; Haroon, M. Evaluating Fine-tuned BERT-based Language Models for Web API Recommendation. In Proceedings of the 2024 IEEE International Conference on Cloud Computing Technology and Science (CloudCom), Abu Dhabi, United Arab Emirates, 9–11 December 2024; IEEE: New York, NY, USA, 2024; pp. 135–142. [Google Scholar]
  26. Karapantelakis, A.; Alizadeh, P.; Alabassi, A.; Dey, K.; Nikou, A. Generative AI in mobile networks: A survey. Ann. Telecommun. 2024, 79, 15–33. [Google Scholar] [CrossRef]
  27. Bhanage, D.A.; Pawar, A.V.; Kotecha, K. IT infrastructure anomaly detection and failure handling: A systematic literature review focusing on datasets, log preprocessing, machine & deep learning approaches and automated tool. IEEE Access 2021, 9, 156392–156421. [Google Scholar] [CrossRef]
  28. Qu, G.; Chen, Q.; Wei, W.; Lin, Z.; Chen, X.; Huang, K. Mobile edge intelligence for large language models: A contemporary survey. IEEE Commun. Surv. Tutor. 2025. [Google Scholar] [CrossRef]
  29. Hameed, A.; Violos, J.; Santi, N.; Leivadeas, A.; Mitton, N. FeD-TST: Federated Temporal Sparse Transformers for QoS prediction in Dynamic IoT Networks. IEEE Trans. Netw. Serv. Manag. 2024, 22, 1055–1069. [Google Scholar] [CrossRef]
  30. Huang, W.; Zhang, P.; Chen, Y.; Zhou, M.; Al-Turki, Y.; Abusorrah, A. QoS Prediction Model of Cloud Services Based on Deep Learning. IEEE/CAA J. Autom. Sin. 2022, 9, 564–566. [Google Scholar] [CrossRef]
  31. Le, F.; Srivatsa, M.; Ganti, R.; Sekar, V. Rethinking data-driven networking with foundation models: Challenges and opportunities. In Proceedings of the 21st ACM Workshop on Hot Topics in Networks, Austin, TX, USA, 14–15 November 2022; pp. 188–197. [Google Scholar]
  32. Jeong, C.; Jang, S.; Shin, H.; Park, E.; Choi, S. A Context-Aware Citation Recommendation Model with BERT and Graph Convolutional Networks. arXiv 2019. [Google Scholar] [CrossRef]
  33. Liu, M.; Xu, H.; Sheng, Q.Z.; Wang, Z. QoSGNN: Boosting QoS Prediction Performance with Graph Neural Networks. IEEE Trans. Serv. Comput. 2023, 17, 645–658. [Google Scholar] [CrossRef]
  34. Lian, H.; Li, J.; Wu, H.; Zhao, Y.; Zhang, L.; Wang, X. Toward Effective Personalized Service QoS Prediction From the Perspective of Multi-Task Learning. IEEE Trans. Netw. Serv. Manag. 2023, 20, 2587–2597. [Google Scholar] [CrossRef]
  35. Jirsik, T.; Trčka, Š.; Celeda, P. Quality of service forecasting with LSTM neural network. In Proceedings of the 2019 IFIP/IEEE Symposium on Integrated Network and Service Management (IM), Washington, DC, USA, 8–12 April 2019; IEEE: New York, NY, USA, 2019; pp. 251–260. [Google Scholar]
  36. Guo, C.; Zhang, W.; Dong, N.; Liu, Z.; Xiang, Y. QoS-aware diversified service selection. IEEE Trans. Serv. Comput. 2022, 16, 2085–2099. [Google Scholar] [CrossRef]
  37. Boulakbech, M.; Messai, N.; Sam, Y.; Devogele, T. Deep learning model for personalized web service recommendations using attention mechanism. In Proceedings of the International Conference on Service-Oriented Computing, Rome, Italy, 28 November–1 December 2023; Springer Nature: Cham, Switzerland, 2023; pp. 19–33. [Google Scholar]
  38. Xue, L.; Zhang, F. Lcpcwsc: A web service classification approach based on label confusion and priori correction. Int. J. Web Inf. Syst. 2024, 20, 213–228. [Google Scholar] [CrossRef]
  39. Huang, Y.; Cao, Z.; Chen, S.; Zhang, X.; Wang, P.; Cao, Q. Interpretable web service recommendation based on disentangled representation learning. J. Intell. Fuzzy Syst. 2023, 45, 133–145. [Google Scholar] [CrossRef]
  40. Wang, X.; Zhou, P.; Wang, Y.; Liu, X.; Liu, J.; Wu, H. Servicebert: A pre-trained model for web service tagging and recommendation. In Proceedings of the International Conference on Service-Oriented Computing, Online, 22–25 November 2021; Springer International Publishing: Cham, Switzerland, 2021; pp. 464–478. [Google Scholar]
  41. Yang, Y.; Qamar, N.; Liu, P.; Grolinger, K.; Wang, W.; Li, Z.; Liao, Z. Servenet: A deep neural network for web services classification. In Proceedings of the 2020 IEEE International Conference on Web Services (ICWS), Beijing, China, 19–23 October 2020; pp. 168–175. [Google Scholar]
  42. Wang, Z.; Zhang, X.; Li, Z.S.; Yan, M. QoSBERT: An Uncertainty-Aware Approach Based on Pre-trained Language Models for Service Quality Prediction. IEEE Trans. Serv. Comput. 2025, 1–13. [Google Scholar] [CrossRef]
  43. Liu, P.; Zhang, L.; Gulla, J.A. Pre-train, prompt, and recommendation: A comprehensive survey of language modeling paradigm adaptations in recommender systems. Trans. Assoc. Comput. Linguist. 2023, 11, 1553–1571. [Google Scholar] [CrossRef]
  44. Van, M.M.; Tran, T.T. BeLightRec: A Lightweight Recommender System Enhanced with BERT. In Proceedings of the International Conference on Intelligent Systems and Data Science, Nha Trang, Vietnam, 9–10 November 2024; Springer Nature: Singapore, 2024; pp. 30–43. [Google Scholar]
  45. Kharidia, V.; Paprunia, D.; Kanikar, P. LightFusionRec: Lightweight Transformers-Based Cross-Domain Recommendation Model. In Proceedings of the 2024 First International Conference on Software, Systems and Information Technology (SSITCON), Tumkur, India, 18–19 October 2024; IEEE: New York, NY, USA, 2024; pp. 1–7. [Google Scholar]
  46. Liu, Q.; Zhao, X.; Wang, Y.; Wang, Y.; Zhang, Z.; Sun, Y.; Li, X.; Wang, M.; Jia, P.; Chen, C.; et al. Large language model enhanced recommender systems: Taxonomy, trend, application and future. arXiv 2024, arXiv:2412.13432. [Google Scholar] [CrossRef]
  47. Singh, S. BERT Algorithm used in Google Search. Math. Stat. Eng. Appl. 2021, 70, 1641–1650. [Google Scholar] [CrossRef]
  48. Sun, F.; Liu, J.; Wu, J.; Pei, C.; Lin, X.; Ou, W.; Jiang, P. BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–7 November 2019; pp. 1441–1450. [Google Scholar]
  49. Fine-Tune and Host Hugging Face Bert Models on Amazon Sagemaker|AWS Machine Learning Blog. Available online: https://aws.amazon.com/blogs/machine-learning/fine-tune-and-host-hugging-face-bert-models-on-amazon-sagemaker/ (accessed on 23 October 2025).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.