Service Discovery Method Based on Knowledge Graph and Word2vec

Zhou, Junkai; Jiang, Bo; Yang, Jie; Yang, Junchen; Li, Hang; Wang, Ning; Wang, Jiale

doi:10.3390/electronics11162500

Open AccessArticle

Service Discovery Method Based on Knowledge Graph and Word2vec

by

Junkai Zhou

^1,†

,

Bo Jiang

¹,

Jie Yang

^2,*,

Junchen Yang

^1,†,

Hang Li

^1,†,

Ning Wang

¹ and

Jiale Wang

¹

School of Computer Science and Information Engineering, Zhejiang Gongshang University, Hangzhou 310018, China

²

School of Information, Zhejiang University of Finance & Economics Dongfang College, Haining 314408, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Electronics 2022, 11(16), 2500; https://doi.org/10.3390/electronics11162500

Submission received: 11 July 2022 / Revised: 4 August 2022 / Accepted: 9 August 2022 / Published: 10 August 2022

(This article belongs to the Section Computer Science & Engineering)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Mashup is a new type of application that integrates multiple Web APIs. For mashup application development, the quality of the selected APIs is particularly important. However, with the rapid development of Internet technology, the number of Web APIs is increasing rapidly. It is unrealistic for mashup developers to manually select appropriate APIs from a large number of services. For existing methods, there is a problem of data sparsity, because one mashup is related to a few APIs, and another problem of over-reliance on semantic information. To solve these problems in current service discovery approaches, we propose a service discovery approach based on a knowledge map (SDKG). We embed service-related information into the knowledge graph, alleviating the impact of data sparsity and mining deep relationships between services, which improves the accuracy of service discovery. Experimental results show that our approach has obvious advantages in accuracy compared with the existing mainstream service discovery approaches.

Keywords:

knowledge graph; service discovery; word2vec

1. Introduction

With the rapid increase in the number of Web services, how to obtain appropriate Web services has become a difficult problem for developers. A service recommendation method can personalize Web service recommendations [1,2,3,4,5,6,7,8] for developers. However, the actively recommended Web services are often popular services that developers have used before. When developers try to use some new services, the service recommendation method will have limitations [9]. Because the service discovery [10] method provides Web services according to the user’s request, the Web services provided for developers in specific situations are more in line with the needs of users.

Early service discovery was mainly based on keyword matching [11]; the similarity between the keywords in the user request and the keywords in the service description is judged, and the service with high similarity is provided to the user. However, keyword-based service discovery methods have poor accuracy and performance. Therefore, many scholars have proposed service discovery methods based on information retrieval model matching. However, both the information retrieval model based on the vector space model (VSM) [12] and the information retrieval model based on the term frequency–inverse document frequency (TF–IDF) [13] are affected by the quality of service description. When the service description is incomplete or inaccurate, the quality of the service obtained by this method is not high. To obtain semantically related services, some scholars have proposed ontology-based [14] service discovery methods. This method can effectively identify the function of the service and calculate the semantic similarity [15] between the request and the service. However, the construction of the ontology is time-consuming and labor-intensive, and it is also prone to errors. In addition, clustering-based service discovery methods [16] reduce the scope of service indexes through service clustering algorithms and generally improve performance.

Most existing service discovery methods match services from the service description document information and ignore other service information in the required text, which leads to the vagueness of the required conditions of the target service. The Word2vec model can make use of the context of the word, and its semantic information is more enriched to calculate the text similarity between the demand statement and the service description, thus improving the accuracy of service discovery. The knowledge graph excavates the deep relationship between services, gets rid of the dependence on text, and no longer relies on only the service description information to deeply discover user needs. In this paper, a service discovery method based on a knowledge graph (SDKG) [17] is proposed to address the existing problems in service discovery. The SDKG method uses knowledge graph technology to mine the deep relationship between services and no longer only relies on service description information. If there is a requirement question “I have used the services Google maps and Travel manager, what other Web services may I need ?”, using our method, we first obtain that the service entities are “Google maps” and “Travel manager” and the question word “What are the Web services?”. Then, the service matching template is determined according to the entity type and the question word type; other available services are found according to the used services; the service knowledge map is inquired; the text similarity between the requirement statement and each candidate service description is calculated; finally, the service searching result is obtained. In addition, the previous service discovery model can only obtain services through information such as function descriptions. The SDKG method proposed by us can not only obtain services through information descriptions such as functions, application fields, tags, and categories, but also obtains information such as functions, application fields, tags, and categories of the service according to the service name. At the same time, developers can also discover Web services that may be needed based on the Web service information already used in the project.

Our main contributions are as follows:

We propose a service discovery method based on the knowledge graph and word2vec [18]. This method effectively improves the accuracy of service discovery and can mine deeper relationships between services.
We obtain the service entity through the query statement Cypher statement [19] used by Neo4j [20] and then determine the relationship between the condition entity and the target entity according to the entity type and the question word type.
We obtained all the real services from ProgrammableWeb. Then, the ten categories of services with the largest number of services were selected from the constructed service set as the test set, and a total of 5565 requirement statements were constructed. The results show that our method outperforms existing related methods. To facilitate the reproducibility of our work, we provide an online reproduction package, which is publicly available via https://github.com/zhoujunkai/Service_Axioms (accessed on 11 June 2022).

This paper is structured as follows: Section 2 discusses some representative related work. Section 3 details the SDKG method. Section 4 presents the experimental design and provides an in-depth analysis of the experimental results. In Section 5, we give a summary of our work.

2. Related Work

The service discovery method is to match the appropriate service according to the developer’s request, which is a method of passively obtaining the service. With the rapid increase in the number of Web services, how to discover services accurately and efficiently has become a hot research point in the field of service computing.

At present, the research in the direction of service discovery has made great progress. Many scholars have proposed effective service discovery methods. These methods mainly include service discovery methods based on information retrieval model matching, semantic-based service discovery methods, and clustering-based service discovery methods. Although these service discovery methods have achieved good results, they also have some shortcomings.

2.1. Service Discovery Method Based on Information Retrieval Model Matching

Early service discovery obtained services through keyword matching. For example, Palathingal et al. [11] proposed a service discovery method based on keyword matching. The method extracts the keywords in the request and selects other expert agents from the system repository of the expert domain of the service agent. Subsequently, the selected expert agents pass the service parameters belonging to their domain of expertise to the combined agent, which discovers candidate services according to the user’s selection. Jordy et al. [21] extracted a keyword set “TextDescription” from the service description to evaluate the relevance between the service and the request. However, building such a keyword collection is time-consuming and inaccurate.

Although the service discovery method based on keyword matching is simple, the accuracy of the query is not high. For this reason, many researchers have proposed service discovery methods based on information retrieval model matching, which changes the goal from finding relevant services to finding relevant documents. Among them, Lee et al. [12] proposed to represent Web service descriptions as vectors based on the VSM model. In addition, the request statement to query the service is also represented as a vector. Then, we find the service vector closest to the query sentence vector in the vector space. Since the VSM model relies on words, the accuracy of the method’s discovery of services is affected by the developer’s naming of the document service.

In addition to the VSM model, many scholars have tried to use the TF–IDF model in the information retrieval model. Zhong et al. [13] proposed a service discovery model based on TF–IDF. The model uses the TF–IDF algorithm to evaluate the importance of words in the document, thereby calculating the textual similarity between the service description and the request sentence. The candidate services are then ranked by similarity score and returned to the user. Experiments show that the service discovery method based on TF–IDF has a better effect than the service discovery method based on the VSM. However, since TF–IDF needs to evaluate the occurrence of each word in the Web service in the vector space, the entire vector space must be recomputed every time a new service is published. Recomputing the vector space for a large number of services is not advisable.

2.2. Semantic-Based Service Discovery Approach

The semantic-based service discovery method can identify the function of the service by describing the text, to judge the correlation between the request and the service. Compared with traditional syntax-based service discovery methods, semantic-based methods can automatically discover Web services and more accurately discover appropriate services based on requests.

At present, the most popular method for measuring the semantic similarity of services is the ontology method, so many scholars apply the ontology method to the semantic-based service discovery method to identify the semantic similarity between the request and the available services. Paolucci et al. [22] proposed a service discovery method based on semantic matching of the DARPA Agent Markup Language (DAML) ontology. The method achieves matching of the input and output by semantically referencing the description concept of the DAML in published services and queries. A published service will match a request when all query outputs correspond to published service outputs and all published service inputs correspond to query inputs. There are four levels of matching: exact matching, plug-in matching, inclusive matching, and failed matching. In effect, the output is matched first, and the service with the highest rating is selected when matching. Input matching is only performed if the output matching degree is exact. The matching algorithm compares the output and input one by one and stores the best matching output and input. Bener et al. [23] use an ontology-based context model to describe and obtain context information and then perform service matching based on the input and output. Finally, the degree of service matching is divided into exact matching, plug-in matching, and inclusive matching.

Paliwal et al. [24] proposed a service discovery algorithm combining the ontology method and Latent Semantic Indexing (LSI), which extended the indexing process from simple syntactic information to the semantic level. In the experimental part, the author compared the two methods of discovering services from a set of classified services and discovering services from a set of unclassified services, and the results showed that the LSI method of extending service requests on a set of classified Web services is more effective. Among them, the service request extension includes additional relevant clauses of the initial service request. It turns out that the classification of service collections can localize service collections and provide a more appropriate service match for the requested functionality.

Amorim et al. [25] proposed a Web service discovery method based on ontology and constraint specification. The method has both a functional semantic algorithm and a descriptive semantic method. However, the descriptive semantic algorithm will only be invoked if the functional semantic algorithm matching service fails. Analysis revealed that this approach can handle situations that were previously incapable for purely semantic approaches or other techniques such as syntactic textual analysis of concepts. This means that service discovery based on semantic filters and ontology structure analysis can improve retrieval accuracy and performance. Although ontology-based service discovery methods can accurately match functions semantically, the construction of the ontology is very complicated, which undoubtedly increases the huge workload. At the same time, building an ontology is a tedious and error-prone task. Furthermore, the current lack of standards for integrated or reusable service ontologies also hinders the widespread use of this approach.

2.3. Cluster-Based Service Discovery Method

Service clustering is to group services according to their functional similarity and domain similarity. Studies have shown that, in the case of an increasingly large number of services, using the method of service clustering can effectively improve the accuracy of service discovery and can also greatly reduce the time to discover services. On the one hand, service clustering can help developers narrow down the scope of search services, thereby improving the efficiency of service discovery. On the other hand, service clustering improves the accuracy of service discovery by matching services with similar functions and domains.

Cristina et al. [26] proposed a service discovery method based on ant colony algorithm clustering. The method first calculates the semantic similarity of the service description texts through the ant colony algorithm and uses the semantic similarity as the standard of service clustering. To measure the similarity between services more accurately, this method judges the similarity between services by evaluating the degree of matching between two service ontology concepts. After the Web services are clustered according to the similarity, the scope of the query is located into a specific service category through service requests, and the appropriate service is matched and selected in the specific service category. Finally, the experiments evaluate the performance and accuracy of the method using the Dunn index and within-cluster variance measures. The experimental results show that, compared with other service discovery methods, the service discovery method based on ant colony algorithm clustering can greatly improve both performance and accuracy.

Liu et al. [27] extracted the four features of content, context, hostname, and service name from the Web Services Description Language (WSDL) documents and then clustered them by computing the similarity of features between services. They innovatively used the clustering process as a preprocessor for service discovery and obtained services from the corresponding categories. Meanwhile, Elgazzar et al. [28] also clustered and discovered services based on the similarity of the WSDL documents. However, the difference is that they extracted the five features of content, type, message, port, and service name in the WSDL document and clustered them according to the similarity of this information. The above researchers treated terms in the WSDL texts as isolated words, while ignoring the semantic associations between texts. Liu et al. [29] calculated the semantic distance between terms of service with external knowledge and made full use of the terms in the WSDL to reflect the underlying semantics of Web services. Experiments showed that this method has good results in both service clustering and service discovery.

3. SDKG Method

An overview of our proposed SDKG approach is shown in Figure 1.

First, we analyzed and extracted entities and interrogative words from the user’s service demand questions, then used the prebuilt entity dictionary and interrogative word dictionary to determine the entity type and interrogative word type; we determined the service matching template; then, we constructed the Cyber query statement according to the matching template and used the query statement to obtain the candidate service entity set in the service knowledge graph; finally, we calculated the text similarity according to different situations and classified the services in the candidate service set according to the similarity. The degree ordering forms the final service set.

3.1. The Construction of the Service Knowledge Graph

To mine deep relationships among Web services in service discovery, we constructed a service knowledge graph that is different from service recommendation. Compared with the knowledge graph in service recommendation, the graph used in the SDKG method adds the important entity type of function and reduces the entity type of mashup. The relationship between the API and function is “used_to”; the relationship between API and category is “belong_to”; the relationship between API and tag is “tag”.

The specific service knowledge graph entity relationship diagram is shown in Figure 2. From Figure 2, we can see that API1 and API2 are linked because they both have the same relationship to Tag1. Likewise, API1 and API3 are linked because they both have the same relationship to Function1. API3 does not have tags because (i) it is associated with API1 since they (i.e., API3 and API1) shared the same Function1 rather than the same tags, and (ii) it is associated with API2 since they (i.e., API3 and API2) shared the same Category1 rather than the same tags. It can be seen from this that the knowledge graph can well mine the potential relationship between services. Of course, Figure 2 is just a very simple entity relationship diagram. Two service entities may be connected through multiple relationships in the knowledge graph we created, which is also impossible for general semantic-based service discovery methods.

To better utilize the service knowledge graph to query related service information in the service discovery method SDKG, we embedded entities and relationships into the graph database Neo4j through Cypher statements. The constructed partial service knowledge graph is shown in Figure 3. Figure 3 mainly shows the relationship between API entities and service tags, through which we can discover services according to the service tags.

3.2. Dictionary Building

In the SDKG method, we first need to identify the entity and question word types in the user demand question. To easily and quickly identify the types of entities and interrogative words, we built a Chinese dictionary of interrogative words and a Chinese–English dictionary of entities. The specific dictionary information is shown in Table 1.

As can be seen from Table 1, we constructed a total of four entity types: service name, service label, service category, and service function, as well as four question word types: ask for the service name, ask for the service category, ask for the service tags, ask for the service function. Among them, the name, label, and category of the service were directly crawled from ProgrammableWeb to first build an English dictionary, and then, we translated the elements into Chinese to build a corresponding Chinese dictionary.

The functional elements of the Web cannot be obtained directly, so we first obtained the description information of all Web services from ProgrammableWeb. The natural language processing tool Stanford Parser was then used to identify word parts of speech in the Web service description text and generate the corresponding set Stanford Dependence (SD) [24] by analyzing the grammatical relationship of the two words, denoted as type (gov, dep), where gov is the dominant word, including verbs, prepositions, etc., and dep refers to the subsidiary word, including nouns, noun phrases, etc. For example, the Web service named “yahoo geocoding” is a service published by Yahoo to determine the latitude and longitude of the location, and its description information is “The Geocoding Web Service allows you to find the specific latitude and longitude for an address”. After processing by Stanford Parser, the set type (find, latitude and longitude) is obtained, which is the function of extraction. Translating this function into Chinese is “find latitude and longitude”.

The service request sentences we processed were in Chinese. To obtain the results more efficiently, we translated the English elements in the four English dictionaries of service name, label, category, and function into Chinese through the translation interface and mapped them to the corresponding Chinese dictionary. Taking the service category dictionary as an example, the mapping process is shown in Figure 4.

3.3. Service Inference Matching

After we preprocessed the user demand questions and obtained the entity set and question words, how to infer the structure of the query sentence according to the entities and question words is very important. To this end, we designed seven basic matching templates based on the combination of different entity types and question word types, as shown in Table 2.

It can be seen from Table 2 that in the SDKG method, the user’s demand question is limited. The target of the first query can only be the four elements of service name, service category, service label, and service function, because, after analyzing a large number of user service demand statements, we found that these four elements are the most urgent answers for users. In addition, the conditions for obtaining service information can only be these four elements. At the same time, because the service category dictionary and the label dictionary have the same elements, we stipulated that the conditions in the question should be preceded by the service category or service label identification when the service category and label are involved. Below, we use an example to illustrate the process of service inference matching. For example, when the user’s service requirement question is “I have used the services of Google maps and travel manager, what other Web services may I need?”, the entity can be determined by first traversing the dictionary by “Google maps” and “travel manager” in the question. The type of the set is the service name. Then, according to “which Web services” in the question sentence, it is determined that the question word is the query service name. Using the entity type and the question word type, it can be inferred that the matching Model 1 is used, thereby determining the structure of the Cyber query to be constructed.

3.4. Query Based on Knowledge Graph

After determining the query statement structure through service inference, we need to construct the Cyber query statement. The Cyber statement is a query statement used by the knowledge graph tool Neo4j, which can obtain information that meets the needs through inference between entities and relationships between entities. It can obtain the entities that meet the conditions by querying the entity relationship between two different entity types. Therefore, before constructing a query statement, it is necessary to determine the relationship between entities according to the matching template. In the service knowledge graph we built, there are a total of four entity relationships, and the relationship between the conditional entity and the entity to be queried can be determined according to the entity type and the question word type.

In the SDKG method, each matching template has a corresponding Cyber query format. Then, we generate a specific Cyber statement according to the number of entities in the entity set and find the service information we need in Neo4j through the query statement. For example, when we determine that there is an API entity in the entity after word segmentation and there is a question word indicating the function of the question in the question sentence, we define the function according to the API. Then, the corresponding query statement is used for matching and outputting. The corresponding query statement here should be “MATCH (n:Api)-[r:has]-(m:Function)where n.name=name return n.name As name, m.name As mname”. When the target of the query is service information such as service functions, the SDKG method will return all qualified results. Here, we increase the types of query targets, make the query fields more perfect, improve the quality of the returned results, and thus, improve the corresponding evaluation indicators. When the developer requires discovering suitable services, we reprocess the service entities returned by the knowledge graph to improve the quality of the discovery service. To explore the correlation between requirement sentences and services, we calculated the textual similarity between requirement sentences and the description information of candidate services in the knowledge graph. Finally, the Web service most relevant to the requirement statement is provided to the developer.

3.5. Text Similarity Matching Based on Word2vec Model

When the developer requires discovering services, the SDKG method will first obtain a set of candidate service entities in the knowledge graph through reasoning. To improve the quality of the discovery service, we changed the requirement statement from Chinese to English. Then, we used Word2vec to convert the demand statement and candidate service description text into word vectors and then averaged the word vectors of each text to obtain the text vector. Finally, the similarity between the demand statement and each candidate service is calculated, and the closest top-N service is selected as the final service discovery set.

Word2Vec [30] is a language model that can convert words into vectors, currently mainly for the Continuous Bag-of-Word Model (CBOW) and Skip-gram models. Our proposed SDKG method uses the Skip-gram [31] model. The training objective of the Skip-gram model is mainly to predict the context based on the input words.

Figure 5 is a neural network model;

w_{i}

is the input word. Then, map

w_{i}

to a word vector

V_{w_{i}}

, and use the word vector

V_{w_{i}}

to predict the left and right adjacent k neighbor word vectors. The value of k in the figure is 2. Given a sequence of words

W 1, W 2 \dots W e

, optimize the embedding matrix M and the weights of the neural network by training to maximize the value of the following objective function:

\frac{1}{N} \sum_{i = 1}^{N} \sum_{- k \leq j \leq k, j \neq 0} log P (w_{i + j} ∣ w_{i})

(1)

The calculation method of P(w_i+j|w_i) is shown in (2):

P (w_{i + j} ∣ w_{i}) = \frac{exp (V_{w_{i + j}} V_{w_{i}})}{\sum_{w} exp (V_{w} V_{w_{i}})}

(2)

where w represents all the words in the thesaurus.

Through the Skip-gram model, we can obtain the word vector of each word in the text and we can treat the text as a collection of all the words involved. To represent the vector of the whole text, we used the average pooling method to derive the text vector from the word vector. The derivation formula of text vector di is shown in Formula (3):

d_{i} = \frac{1}{J_{i}} \sum_{j = 1}^{J_{i}} V_{i, j}

(3)

where

V_{i, j}

represents the vector of the

j^{t h}

word of the

i^{t h}

text and

J_{i}

is the number of words in the text.

With the text vector, we can calculate the similarity between the requirement text and the service description text. We used the cosine similarity to calculate the similarity between text vectors, and the calculation formula is shown in Formula (4):

Sim (d_{m}, d_{n}) = \frac{\sum_{i = 1}^{D} (d_{m} \times d_{n})}{\sqrt{\sum_{i = 1}^{D} d_{m}^{2}} \times \sqrt{\sum_{i = 1}^{D} d_{n}^{2}}}

(4)

where D is the dimension of the vector, and we took the value 100.

After calculating the similarity between the requirement text and the service description text, we provide the top-N services with similar semantic functions to the required text to the developers.

4. Empirical Evaluation

4.1. Dataset

The current mainstream service discovery methods are to discover suitable Web services based on the similarity of semantic functions. Therefore, in the experimental comparison, we only considered the part of the SDKG method to discover services based on functions. For the authenticity and validity of the experiment, we scraped all real Web services from ProgrammableWeb. Then, we used the Stanford Parser tool to extract the predicates and objects in the description text of each Web service to build the feature set and used the feature set to mimic the developer to construct the requirement statement. We composed a service set that conforms to the functional description of the requirement statement and uses it as the label of the requirement statement. When the Web service discovered by the service discovery method through the requirement statement is included in the service set in the tag, it indicates that the service discovery method is successfully matched. We selected the ten categories of services with the largest number of services from all Web services as the test set and constructed a total of 5565 requirement statements.

The requirement sentences input by our proposed SDKG method are in Chinese, while the requirement sentences input by the current mainstream service discovery methods are all in English. For this purpose, we prepared two sets of test sets with the same number. One group was composed of English demand statements, while the other group was composed of Chinese demand statements corresponding to each English demand statement. For example, when the constructed English requirement statement is “I want to send a message.”, the other group constructs its corresponding Chinese demand statement. In the comparative experiments, the SDKG method uses the Chinese test set, while the rest of the comparison methods use the English test set.

4.2. Baseline Approaches

In the experiment, to better verify the effectiveness of the SDKG method, we compared it with the current popular service discovery methods. The methods we used for the comparative study are as follows:

(1) WSD-VSM:

The Web Service Discovery method based on the VSM (WSD-VSM) is a service discovery method based on the VSM [12]. The method uses the VSM to represent the service requirement sentences and service description texts as vectors and then calculates the similarity between the vectors. Finally, we select the Web service similar to the requirement statement and provide it to the user.

(2) FBWSD:

The Web Service Discovery method Based on Functional semantics (FBWSD) is a function-based [32,33] service discovery method. The method firstly utilizes the Stanford Parser tool to tokenize the requirement statements and service description texts and then constructs a feature set based on the extracted predicates and objects. Finally, the functional similarity between the requirement statement and the service description text is calculated, and the Web service similar in function to the requirement statement is provided to the user.

(3) SRMWSD-LDA:

The Service Discovery Method based on the LDA model and Semantic information Retrieval model (SRMWSD-LDA) is a service discovery method based on a semantic information retrieval model. This approach uses the Linear Discriminant Analysis (LDA) model [34] to model requirement statements and Web services and then computes the similarity between requirement statements and Web services. Finally, the K-Nearest Neighbor (KNN) algorithm is used to select Web services related to the demand statements and provide them to users.

4.3. Evaluation Metrics

To verify the effectiveness of our method in comparative experiments, we used the three metrics of

P r e c i s i o n

,

R e c a l l

, and

F 1

-score [35,36,37] to evaluate the effect of the SDKG method and other service discovery methods.

P r e c i s i o n @ N

is the accuracy rate of the top-N service discovery. The calculation formula is as follows:

P r e c i s i o n @ N = \frac{∣ {Real A P I s} \cap {Discovered A P I s} ∣}{N}

(5)

Among them, Real APIs is the set of tagged Web services corresponding to the requirement statement and

D i s c o v e r e d A P I s

is the set of top-N services in service discovery.

R e c a l l @ N

is the recall rate of service discovery, and the calculation formula is as follows:

R e c a l l @ N = \frac{∣ {Real APIs} \cap {Discovered APIs} ∣}{∣ {Real APIs} ∣}

(6)

To more accurately judge the effect of the service discovery method, we also compared the

F 1

index of each method, which takes into account both the precision rate and the recall rate, and the calculation formula is as follows:

F 1 = \frac{2 Precision \times Recall}{Precision + Recall}

(7)

4.4. Results and Analysis

Table 3 shows the values of the

P r e c i s i o n

,

R e c a l l

, and

F 1

-score for the four service discovery methods from the top-5 to top-20. Table 3 also shows the experimental comparison of our method and other service discovery methods. Overall, our method is significantly better than the other three methods in terms of Precision, Recall, and

F 1

values.

The two methods based on information retrieval model matching are not ideal. The recall rates of the WSD-VSM and SRMWSD-LDA methods in the top-5 are 4.31% and 13.12%, in the top-10 are 7.19% and 17%, in the top-15 are 8.81% and 19.78%, and in the top-20 are only 10.6% and 23.16%. It can be seen that it is difficult to determine the text similarity between the demand statement and the service description information when the service description information and the demand statement are incomplete. The service discovery method FBWSD based on feature extraction has a better performance in the experiment, and the recall rates of top-5, top-10, top-15, and top-20 can reach 22.46%, 28.75%, 35.94%, and 42.23%, respectively. The reason for the good performance of the FBWSD method is that the service description text usually describes the function of the Web service, so it is more accurate to match the Web service through the function of the service description text and the requirement statement.

As can be seen from the table, compared with other service discovery methods, our method has large advantages in various indicators. In particular, the recall rate and precision rate advantage of the top-5 is the most obvious, which shows that our method can not only accurately hit the target, but also the service ranking of the hit target is also high.

To better show that our method is superior to the three popular service discovery methods, we performed the average ranking of the Friedman test [36]. Table 4 shows the average rankings of our method and baseline methods. A Friedman statistic of 12 with three degrees of freedom corresponds to a p-value of 7.3832 × 10

^{- 3}

(≪0.05), which shows that the original assumption is wrong (that is, each different method has no difference), and each method has a different effect and differences. From Table 4, we can conclude that the average ranking of these four methods from small to large is: SDKG, FBWSD, SRMWSD-LDA, and WSD-VSM. We know that when ranking according to the evaluation index Recall, the larger the ranking value, the worse the effect of the method is. Therefore, from here, we can find that the SDKG method is the best, and the WSD-VSM method is the worst. To further study the obvious differences between the effects of these four methods, we used two non-parametric statistical tests, Holm and Shaffer [36]. Again, we assumed that there is no difference in performance between the two pairwise comparisons. Table 5 is the result of comparing our method with the other three methods at

α

= 0.05 and the result of comparing our method with the other three methods at

α

= 0.10, from which we can conclude that (i) our SDKG method is superior to the other three and (ii) the SDKG method is significantly better than two out of the three other methods (i.e., SRMWSD-LDA and WSD-VSM), except FBWSD.

5. Conclusions and Future Work

Aiming at the problem that the current service discovery methods rely too much on the description text and cannot mine the potential relationship between services, this work proposed a service discovery method based on a knowledge graph. First, we analyzed the service requirement question and judged the entity type and question word type; secondly, we inferred the matching template according to the entity type and question word type; then, we generated Cyber statements according to different matching templates and queried appropriate entities in the knowledge graph; finally, when the query result was a service entity, we calculated the similarity between the demand statement and the candidate service description text and sorted according to the similarity to obtain the final collection of services. Experimental results showed that our method is more accurate than existing service methods. Besides, we built a service question answering system based on this service discovery method.

Author Contributions

Conceptualization, J.W. and B.J.; methodology, J.Y. (Junchen Yang) and J.W.; software, J.Z., J.Y. (Junchen Yang) and H.L.; supervision, J.Y. (Jie Yang) and B.J.; writing—original draft, J.Z. and J.Y. (Junchen Yang); writing—review editing, J.Z. and N.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Natural Science Foundation of Zhejiang Province (Grant No. LY21F020002) and the R&D Program of Zhejiang Province (Grant Nos. 2021C01162, 2019C01004, and 2019C03123).

Data Availability Statement

The data used to support the findings of this study are available from the corresponding author upon request.

Acknowledgments

The authors gratefully acknowledge all the Reviewers for their positive and valuable comments and suggestions regarding our manuscript.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

References

Pan, W.; Chai, C. Structure-aware Mashup service Clustering for cloud-based Internet of Things using genetic algorithm based clustering algorithm. Future Gener. Comput. Syst. 2018, 87, 267–277. [Google Scholar] [CrossRef]
Ko, H.; Lee, S.; Park, Y.; Choi, A. A survey of recommendation systems: Recommendation models, techniques, and application fields. Electronics 2022, 11, 141. [Google Scholar] [CrossRef]
Wang, X.; Liu, X.; Liu, J.; Chen, X.; Wu, H. A novel knowledge graph embedding based API recommendation method for Mashup development. World Wide Web 2021, 24, 869–894. [Google Scholar] [CrossRef]
Botangen, K.A.; Yu, J.; Sheng, Q.Z.; Han, Y.; Yongchareon, S. Geographic-aware collaborative filtering for Web service recommendation. Expert Syst. Appl. 2020, 151, 113347. [Google Scholar] [CrossRef]
Zhang, Y.; Yin, C.; Wu, Q.; He, Q.; Zhu, H. Location-aware deep collaborative filtering for service recommendation. IEEE Trans. Syst. Man Cybern. Syst. 2019, 51, 3796–3807. [Google Scholar] [CrossRef]
Chen, C.; Peng, X.; Xing, Z.; Sun, J.; Wang, X.; Zhao, Y.; Zhao, W. Holistic combination of structural and textual code information for context based api recommendation. In IEEE Transactions on Software Engineering; IEEE: Piscataway, NJ, USA, 2021. [Google Scholar]
Almarimi, N.; Ouni, A.; Bouktif, S.; Mkaouer, M.W.; Kula, R.G.; Saied, M.A. Web service API recommendation for automated mashup creation using multi-objective evolutionary search. Appl. Soft Comput. 2019, 85, 105830. [Google Scholar] [CrossRef]
Duan, L.; Tian, H.; Liu, K. A novel approach for Web service recommendation based on advanced trust relationships. Information 2019, 10, 233. [Google Scholar] [CrossRef] [Green Version]
Pan, W.; Dong, J.; Liu, K.; Wang, J. Topology and topic-aware service clustering. Int. J. Web Serv. Res. 2018, 15, 18–37. [Google Scholar] [CrossRef]
Czerwinski, S.E.; Zhao, B.Y.; Hodes, T.D.; Joseph, A.D.; Katz, R.H. An architecture for a secure service discovery service. In Proceedings of the 5th Annual ACM/IEEE International Conference on Mobile Computing and Networking, Seattle, DC, USA, 15–19 August 1999; pp. 24–35. [Google Scholar]
Palathingal, P.; Chandra, S. Agent approach for service discovery and utilization. In Proceedings of the 37th Annual Hawaii International Conference on System Sciences, Big Island, HI, USA, 5–8 January 2004; IEEE: Piscataway, NJ, USA, 2004; p. 9. [Google Scholar]
Lee, K.H.; Lee, M.y.; Hwang, Y.Y.; Lee, K.C. A framework for xml Web services retrieval with ranking. In Proceedings of the 2007 International Conference on Multimedia and Ubiquitous Engineering (MUE’07), Seoul, Korea, 26–28 April 2007; IEEE: Piscataway, NJ, USA, 2007; pp. 773–778. [Google Scholar]
Li, C.; Zhang, R.; Huai, J.; Guo, X.; Sun, H. A probabilistic approach for Web service discovery. In Proceedings of the 2013 IEEE International Conference on Services Computing, Santa Clara, CA, USA, 28 June–3 July 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 49–56. [Google Scholar]
Bianchini, D.; De Antonellis, V.; Pernici, B.; Plebani, P. Ontology-based methodology for e-service discovery. Inf. Syst. 2006, 31, 361–380. [Google Scholar] [CrossRef]
Corley, C.D.; Mihalcea, R. Measuring the semantic similarity of texts. In Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment, Ann Arbor, MI, USA, 13–18 June 2005; pp. 13–18. [Google Scholar]
Ram, S.; Hwang, Y.; Zhao, H. A clustering based approach for facilitating semantic Web service discovery. In Proceedings of the 15th Annual Workshop on Information Technolgies & Systems (WITS) Paper, Dallas, TX, USA, 9–10 December 2006. [Google Scholar]
Wang, X.; Wu, H.; Hsu, C.H. Mashup-oriented API recommendation via random walk on knowledge graph. IEEE Access 2018, 7, 7651–7662. [Google Scholar] [CrossRef]
Rong, X. word2vec parameter learning explained. arXiv 2014, arXiv:1411.2738. [Google Scholar]
Francis, N.; Green, A.; Guagliardo, P.; Libkin, L.; Lindaaker, T.; Marsault, V.; Plantikow, S.; Rydberg, M.; Selmer, P.; Taylor, A. Cypher: An evolving query language for property graphs. In Proceedings of the 2018 International Conference on Management of Data, Houston, TX, USA, 10–15 June 2018; pp. 1433–1445. [Google Scholar]
Webber, J. A programmatic introduction to neo4j. In Proceedings of the 3rd Annual Conference on Systems, Programming, and Applications: Software for Humanity, Tucson, AZ, USA, 19–26 October 2012; pp. 217–218. [Google Scholar]
Sangers, J.; Frasincar, F.; Hogenboom, F.; Chepegin, V. Semantic Web service discovery using natural language processing techniques. Expert Syst. Appl. 2013, 40, 4660–4671. [Google Scholar] [CrossRef]
Paolucci, M.; Kawamura, T.; Payne, T.R.; Sycara, K. Semantic matching of Web services capabilities. In Proceedings of the International Semantic Web Conference, Sardinia, Italy, 9–12 June 2022; Springer: Berlin/Heidelberg, Germany, 2002; pp. 333–347. [Google Scholar]
Bener, A.B.; Ozadali, V.; Ilhan, E.S. Semantic matchmaker with precondition and effect matching using SWRL. Expert Syst. Appl. 2009, 36, 9371–9377. [Google Scholar] [CrossRef]
Paliwal, A.V.; Bornhovd, C.; Adam, N.R. Web Service Discovery: Adding Semantics through Service Request Expansion and Latent Semantic Indexing. In Proceedings of the 2007 IEEE International Conference on Services Computing, Salt Lake City, UT, USA, 9–13 July 2007; IEEE Computer Society: Los Alamitos, CA, USA, 2007; pp. 106–113. [Google Scholar] [CrossRef]
Amorim, R.; Claro, D.B.; Lopes, D.; Albers, P.; Andrade, A. Improving Web service discovery by a functional and structural approach. In Proceedings of the 2011 IEEE International Conference on Web Services, Washington, DC, USA, 4–9 July 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 411–418. [Google Scholar]
Pop, C.B.; Chifu, V.R.; Salomie, I.; Dinsoreanu, M.; David, T.; Acretoaie, V. Semantic Web service clustering for efficient discovery using an ant-based method. In Intelligent Distributed Computing IV; Springer: Berlin/Heidelberg, Germany, 2010; pp. 23–33. [Google Scholar]
Liu, W.; Wong, W. Discovering homogenous service communities through Web service clustering. In Proceedings of the International Workshop on Service-Oriented Computing: Agents, Semantics, and Engineering, Estoril, Portugal, 12 May 2008; Springer: Berlin/Heidelberg, Germany, 2008; pp. 69–82. [Google Scholar]
Elgazzar, K.; Hassan, A.E.; Martin, P. Clustering wsdl documents to bootstrap the discovery of Web services. In Proceedings of the 2010 IEEE International Conference on Web Services, Miami, FL, USA, 5–10 July 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 147–154. [Google Scholar]
Liu, F.; Shi, Y.; Yu, J.; Wang, T.; Wu, J. Measuring similarity of Web services based on wsdl. In Proceedings of the 2010 IEEE International Conference on Web Services, Miami, FL, USA, 5–10 July 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 155–162. [Google Scholar]
Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient estimation of word representations in vector space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.S.; Dean, J. Distributed representations of words and phrases and their compositionality. Adv. Neural Inf. Process. Syst. 2013, 26, 1421. [Google Scholar]
Ji, X. Research on Web service discovery based on domain ontology. In Proceedings of the 2009 2nd IEEE International Conference on Computer Science and Information Technology, Beijing, China, 8–11 August 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 65–68. [Google Scholar]
Pakari, S.; Kheirkhah, E.; Jalali, M. Web service discovery methods and techniques: A review. Int. J. Comput. Sci. Eng. Inf. Technol. 2014, 4, 1–14. [Google Scholar] [CrossRef]
Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent dirichlet allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. [Google Scholar]
Du, X.; Wang, T.; Wang, L.; Pan, W.; Chai, C.; Xu, X.; Jiang, B.; Wang, J. CoreBug: Improving Effort-Aware Bug Prediction in Software Systems Using Generalized k-Core Decomposition in Class Dependency Networks. Axioms 2022, 11, 205. [Google Scholar] [CrossRef]
Pan, W.; Ming, H.; Kim, D.K.; Yang, Z. PRIDE: Prioritizing documentation effort based on a PageRank-like algorithm and simple filtering rules. In IEEE Transactions on Software Engineering; IEEE: Piscataway, NJ, USA, 2022. [Google Scholar] [CrossRef]
Pan, W.; Ming, H.; Yang, Z.; Wang, T. Comments on “Using k-core Decomposition on Class Dependency Networks to Improve Bug Prediction Model’s Practical Performance”. In IEEE Transactions on Software Engineering; IEEE: Piscataway, NJ, USA, 2022; p. 1. [Google Scholar] [CrossRef]

Figure 1. Overview of the SDKG method.

Figure 2. Service knowledge graph entity relationship diagram.

Figure 3. Service knowledge graph in the SDKG method.

Figure 4. Chinese–English dictionary mapping process of service category.

Figure 5. Skip−gram model structure.

Table 1. Service dictionary.

Element Name	Element Type	Number of Elements	Element Example
Service name	entity	5580	Google Map, Twitter, Google Earth, GitHub, etc.
Service type	entity	1197	Travel, E-commerce, government, technology, music, video, etc.
Service tag	entity	1389	America, shopping, software, electronics, women, etc.
Service function	entity	17,851	Compare prices, search for locations, send mail, and more
Ask for the service name	question words	26	“What are the services”, etc.
Ask for the service category	question words	24	“What is the category”, “Which category does it belong to”, etc.
Ask for service tag	question words	15	“What is the label”, “Which field”, etc.
Ask for service function	question words	42	“What’s the function”, “What can it be used for”, etc.

Table 2. Service matching model.

Template Number	Entity Type	Question Word Type	Query Statement Structure
Template 1	service name	ask for the service name	Discover services based on already used services
Template 2	service type	ask for the service name	Discover services by service class.
Template 3	service tag	ask for the service name	Discover services based on service tags
Template 4	service function	ask for the service name	Discover services based on service capabilities
Template 5	service name	ask for service type	Obtain a service class based on a service name
Template 6	service name	ask for service tag	Obtain a service tag based on a service name
Template 7	service name	ask for service function	Function to obtained service based on a service name

Table 3. Indicator results of four service discovery methods.

	N = 5	N = 10	N = 15	N = 20
Precision
WSD-VSM	0.86%	0.72%	0.59%	0.53%
FBWSD	4.49%	2.88%	2.4%	2.11%
SRMWSD-LDA	2.62%	1.7%	1.32%	1.16%
SDKG	51.33%	28.92%	20.41%	15.90%
Recall
WSD-VSM	4.31%	7.19%	8.81%	10.6%
FBWSD	22.46%	28.75%	35.94%	42.23%
SRMWSD-LDA	13.12%	17%	19.78%	23.16%
SDKG	64.16%	72.3%	76.54%	79.52%
F1
WSD-VSM	1.43%	1.31%	1.11%	1.01%
FBWSD	7.48%	5.24%	4.5%	4.02%
SRMWSD-LDA	4.37%	3.09%	2.47%	2.21%
SDKG	57.03%	41.31%	32.23%	26.50%

Table 4. Average rankings of the algorithms.

Algorithm	Ranking
WSD-VSM	4.0
FBWSD	2.0
SRMWSD-LDA	3.0
SDKG	4.0

Table 5. Holm/Shaffer table for

α

= 0.05/

α

= 0.10.

Table 5. Holm/Shaffer table for

α

= 0.05/

α

= 0.10.

	z = (R₀ − R_i)/SE	p	Holm	Shaffe
$α$ = 0.05
WSD-VSM vs. SDKG	3.2863	0.0010	0.0083	0.0083
SRMWSD-LDA vs. SDKG	2.1908	0.02845	0.0125	0.0166
FBWSD vs. SDKG	1.0954	0.2733	0.0500	0.0500
$α$ = 0.10
WSD-VSM vs. SDKG	3.2863	0.0010	0.0166	0.0166
SRMWSD-LDA vs. SDKG	2.1908	0.02845	0.0250	0.0333
FBWSD vs. SDKG	1.0954	0.2733	0.1000	0.1000

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhou, J.; Jiang, B.; Yang, J.; Yang, J.; Li, H.; Wang, N.; Wang, J. Service Discovery Method Based on Knowledge Graph and Word2vec. Electronics 2022, 11, 2500. https://doi.org/10.3390/electronics11162500

AMA Style

Zhou J, Jiang B, Yang J, Yang J, Li H, Wang N, Wang J. Service Discovery Method Based on Knowledge Graph and Word2vec. Electronics. 2022; 11(16):2500. https://doi.org/10.3390/electronics11162500

Chicago/Turabian Style

Zhou, Junkai, Bo Jiang, Jie Yang, Junchen Yang, Hang Li, Ning Wang, and Jiale Wang. 2022. "Service Discovery Method Based on Knowledge Graph and Word2vec" Electronics 11, no. 16: 2500. https://doi.org/10.3390/electronics11162500

APA Style

Zhou, J., Jiang, B., Yang, J., Yang, J., Li, H., Wang, N., & Wang, J. (2022). Service Discovery Method Based on Knowledge Graph and Word2vec. Electronics, 11(16), 2500. https://doi.org/10.3390/electronics11162500

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Service Discovery Method Based on Knowledge Graph and Word2vec

Abstract

1. Introduction

2. Related Work

2.1. Service Discovery Method Based on Information Retrieval Model Matching

2.2. Semantic-Based Service Discovery Approach

2.3. Cluster-Based Service Discovery Method

3. SDKG Method

3.1. The Construction of the Service Knowledge Graph

3.2. Dictionary Building

3.3. Service Inference Matching

3.4. Query Based on Knowledge Graph

3.5. Text Similarity Matching Based on Word2vec Model

4. Empirical Evaluation

4.1. Dataset

4.2. Baseline Approaches

4.3. Evaluation Metrics

4.4. Results and Analysis

5. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI