Applied Sciences

Research

Jump to: Review

12 pages, 1962 KB

Open AccessArticle

Ext-LOUDS: A Space Efficient Extended LOUDS Index for Superset Query

by Lianyin Jia, Yuna Zhang, Jiaman Ding, Jinguo You, Yinong Chen and Runxin Li

Appl. Sci. 2020, 10(23), 8530; https://doi.org/10.3390/app10238530 - 28 Nov 2020

Cited by 1 | Viewed by 2526

Abstract

Superset query is widely used in object-oriented databases, data mining, and many other fields. Trie is an efficient index for superset query, whereas most existing trie index aim at improving query performance while ignoring storage overheads. To solve this problem, in this paper, [...] Read more.

Superset query is widely used in object-oriented databases, data mining, and many other fields. Trie is an efficient index for superset query, whereas most existing trie index aim at improving query performance while ignoring storage overheads. To solve this problem, in this paper, we propose an efficient extended Level-Ordered Unary Degree Sequence (LOUDS) index: Ext-LOUDS. Ext-LOUDS expresses a trie by 1 integer vector and 3 bit vectors directly map each NodeID to its corresponding position, thus accelerating some key operations needed for superset query. Based on Ext-LOUDS, an efficient superset query algorithm, ELOUDS-Super, is designed. Experimental results on both real and synthetic datasets show that Ext-LOUDS can decrease 50%–60% space overheads compared with trie while maintaining a relative good query performance. Full article

(This article belongs to the Special Issue Knowledge Retrieval and Reuse)

► Show Figures

Figure 1

17 pages, 2317 KB

Open AccessArticle

DAEOM: A Deep Attentional Embedding Approach for Biomedical Ontology Matching

by Jifang Wu, Jianghua Lv, Haoming Guo and Shilong Ma

Appl. Sci. 2020, 10(21), 7909; https://doi.org/10.3390/app10217909 - 8 Nov 2020

Cited by 25 | Viewed by 3925

Abstract

Ontology Matching (OM) is performed to find semantic correspondences between the entity elements of different ontologies to enable semantic integration, reuse, and interoperability. Representation learning techniques have been introduced to the field of OM with the development of deep learning. However, there still [...] Read more.

Ontology Matching (OM) is performed to find semantic correspondences between the entity elements of different ontologies to enable semantic integration, reuse, and interoperability. Representation learning techniques have been introduced to the field of OM with the development of deep learning. However, there still exist two limitations. Firstly, these methods only focus on the terminological-based features to learn word vectors for discovering mappings, ignoring the network structure of ontology. Secondly, the final alignment threshold is usually determined manually within these methods. It is difficult for an expert to adjust the threshold value and even more so for a non-expert user. To address these issues, we propose an alternative ontology matching framework called Deep Attentional Embedded Ontology Matching (DAEOM), which models the matching process by embedding techniques with jointly encoding ontology terminological description and network structure. We propose a novel inter-intra negative sampling skill tailored for the structural relations asserted in ontologies, and further improve our iterative final alignment method by introducing an automatic adjustment of the final alignment threshold. The preliminary result on real-world biomedical ontologies indicates that DAEOM is competitive with several OAEI top-ranked systems in terms of F-measure. Full article

(This article belongs to the Special Issue Knowledge Retrieval and Reuse)

► Show Figures

Figure 1

21 pages, 3825 KB

Open AccessArticle

Intelligent Design for Simulation Models of Weapon Systems Using a Mathematical Structure and Case-Based Reasoning

by Dohyun Kim, Dongsu Jeong and Yoonho Seo

Appl. Sci. 2020, 10(21), 7642; https://doi.org/10.3390/app10217642 - 29 Oct 2020

Cited by 9 | Viewed by 5060

Abstract

The armed forces of major nations have utilized modeling and simulation technologies to develop weapon systems corresponding to changing modern battlefields and reducing the development cycle. However, model design is complex owing to the characteristics of current weapons, which require multiple functions. Therefore, [...] Read more.

The armed forces of major nations have utilized modeling and simulation technologies to develop weapon systems corresponding to changing modern battlefields and reducing the development cycle. However, model design is complex owing to the characteristics of current weapons, which require multiple functions. Therefore, this study proposes a method to support the automated design of weapon system models for simulation. We apply module-based modeling and an intelligent modeling process to our devised method. The former formalizes constituents and constraints regarding an element combination to design the required model, while the latter applies case-based reasoning (CBR) to intelligentize the modeling process based on the results of the former. Using a case study, our proposed method demonstrates that models that respond to operational circumstances can be designed based on simulation results. Consequently, when weapon systems can be represented in formalized structures and constituents, the weapon models can be reusable based on the addition, modification, and replacement of modules in the common structure. The CBR process can provide the models that satisfy the requirements by retrieving similar models and modifying the models. The proposed method is applicable to the process of weapon system design or improvement for changing battlefields. Full article

(This article belongs to the Special Issue Knowledge Retrieval and Reuse)

► Show Figures

Figure 1

10 pages, 2299 KB

Open AccessArticle

Document Re-Ranking Model for Machine-Reading and Comprehension

by Youngjin Jang and Harksoo Kim

Appl. Sci. 2020, 10(21), 7547; https://doi.org/10.3390/app10217547 - 27 Oct 2020

Cited by 1 | Viewed by 3381

Abstract

Recently, the performance of machine-reading and comprehension (MRC) systems has been significantly enhanced. However, MRC systems require high-performance text retrieval models because text passages containing answer phrases should be prepared in advance. To improve the performance of text retrieval models underlying MRC systems, [...] Read more.

Recently, the performance of machine-reading and comprehension (MRC) systems has been significantly enhanced. However, MRC systems require high-performance text retrieval models because text passages containing answer phrases should be prepared in advance. To improve the performance of text retrieval models underlying MRC systems, we propose a re-ranking model, based on artificial neural networks, that is composed of a query encoder, a passage encoder, a phrase modeling layer, an attention layer, and a similarity network. The proposed model learns degrees of associations between queries and text passages through dot products between phrases that constitute questions and passages. In experiments with the MS-MARCO dataset, the proposed model demonstrated higher mean reciprocal ranks (MRRs), 0.8%p–13.2%p, than most of the previous models, except for the models based on BERT (a pre-trained language model). Although the proposed model demonstrated lower MRRs than the BERT-based models, it was approximately 8 times lighter and 3.7 times faster than the BERT-based models. Full article

(This article belongs to the Special Issue Knowledge Retrieval and Reuse)

► Show Figures

Figure 1

16 pages, 1402 KB

Open AccessArticle

Analyzing the Performance of the Multiple-Searching Genetic Algorithm to Generate Test Cases

by Wanida Khamprapai, Cheng-Fa Tsai and Paohsi Wang

Appl. Sci. 2020, 10(20), 7264; https://doi.org/10.3390/app10207264 - 17 Oct 2020

Cited by 5 | Viewed by 4105

Abstract

Software testing using traditional genetic algorithms (GAs) minimizes the required number of test cases and reduces the execution time. Currently, GAs are adapted to enhance performance when finding optimal solutions. The multiple-searching genetic algorithm (MSGA) has improved upon current GAs and is used [...] Read more.

Software testing using traditional genetic algorithms (GAs) minimizes the required number of test cases and reduces the execution time. Currently, GAs are adapted to enhance performance when finding optimal solutions. The multiple-searching genetic algorithm (MSGA) has improved upon current GAs and is used to find the optimal multicast routing in network systems. This paper presents an analysis of the optimization of test case generations using the MSGA by defining suitable values of MSGA parameters, including population size, crossover operator, and mutation operator. Moreover, in this study, we compare the performance of the MSGA with a traditional GA and hybrid GA (HGA). The experimental results demonstrate that MSGA reaches the maximum executed branch statements in the lowest execution time and the smallest number of test cases compared to the GA and HGA. Full article

(This article belongs to the Special Issue Knowledge Retrieval and Reuse)

► Show Figures

Figure 1

19 pages, 776 KB

Open AccessArticle

Enriching Knowledge Base by Parse Tree Pattern and Semantic Filter

by Hee-Geun Yoon, Seyoung Park and Seong-Bae Park

Appl. Sci. 2020, 10(18), 6209; https://doi.org/10.3390/app10186209 - 7 Sep 2020

Viewed by 3537

Abstract

This paper proposes a simple knowledge base enrichment based on parse tree patterns with a semantic filter. Parse tree patterns are superior to lexical patterns used commonly in many previous studies in that they can manage long distance dependencies among words. In addition, [...] Read more.

This paper proposes a simple knowledge base enrichment based on parse tree patterns with a semantic filter. Parse tree patterns are superior to lexical patterns used commonly in many previous studies in that they can manage long distance dependencies among words. In addition, the proposed semantic filter, which is a combination of WordNet-based similarity and word embedding similarity, removes parse tree patterns that are semantically irrelevant to the meaning of a target relation. According to our experiments using the DBpedia ontology and Wikipedia corpus, the average accuracy of the top 100 parse tree patterns for ten relations is 68%, which is 16% higher than that of lexical patterns, and the average accuracy of the newly extracted triples is 60.1%. These results prove that the proposed method produces more relevant patterns for the relations of seed knowledge, and thus more accurate triples are generated by the patterns. Full article

(This article belongs to the Special Issue Knowledge Retrieval and Reuse)

► Show Figures

Figure 1

13 pages, 1535 KB

Open AccessArticle

Machine Learning Evaluation of the Requirement Engineering Process Models for Cloud Computing and Security Issues

by Muhammad Asgher Nadeem and Scott Uk-Jin Lee

Appl. Sci. 2020, 10(17), 5851; https://doi.org/10.3390/app10175851 - 24 Aug 2020

Cited by 3 | Viewed by 3569

Abstract

In the requirement engineering phase, the team members work to get the user requirements, comprehend them and specify them for the next process. There are many models for the requirement engineering phase. There is a need to select the best Requirement Engineering model, [...] Read more.

In the requirement engineering phase, the team members work to get the user requirements, comprehend them and specify them for the next process. There are many models for the requirement engineering phase. There is a need to select the best Requirement Engineering model, and integrate it with cloud computing, that can give the best response to the users and software developers and avoid mistakes in the requirement engineering phase. In this study, these models are integrated with the cloud computing domain, and we report on the security considerations of all the selected models. Four requirement engineering process models are selected for this study: the Linear approach, the Macaulay Linear approach, and the Iterative and Spiral models. The focus of this study is to check the security aspects being introduced by the cloud platform and assess the feasibility of these models for the popular cloud environment SaaS. For the classification of the security aspects that affect the performance of these model, a framework is proposed, and we check the results regarding selected security parameters and RE models. By classifying the selected RE models for security aspects based on deep learning techniques, we determine that the Loucopoulos and Karakostas iterative requirements engineering process model performs better than all the other models. Full article

(This article belongs to the Special Issue Knowledge Retrieval and Reuse)

► Show Figures

Figure 1

21 pages, 4888 KB

Open AccessArticle

Optimization of Associative Knowledge Graph using TF-IDF based Ranking Score

by Hyun-Jin Kim, Ji-Won Baek and Kyungyong Chung

Appl. Sci. 2020, 10(13), 4590; https://doi.org/10.3390/app10134590 - 2 Jul 2020

Cited by 26 | Viewed by 6864

Abstract

This study proposes the optimization method of the associative knowledge graph using TF-IDF based ranking scores. The proposed method calculates TF-IDF weights in all documents and generates term ranking. Based on the terms with high scores from TF-IDF based ranking, optimized transactions are [...] Read more.

This study proposes the optimization method of the associative knowledge graph using TF-IDF based ranking scores. The proposed method calculates TF-IDF weights in all documents and generates term ranking. Based on the terms with high scores from TF-IDF based ranking, optimized transactions are generated. News data are first collected through crawling and then are converted into a corpus through preprocessing. Unnecessary data are removed through preprocessing including lowercase conversion, removal of punctuation marks and stop words. In the document term matrix, words are extracted and then transactions are generated. In the data cleaning process, the Apriori algorithm is applied to generate association rules and make a knowledge graph. To optimize the generated knowledge graph, the proposed method utilizes TF-IDF based ranking scores to remove terms with low scores and recreate transactions. Based on the result, the association rule algorithm is applied to create an optimized knowledge model. The performance is evaluated in rule generation speed and usefulness of association rules. The association rule generation speed of the proposed method is about 22 seconds faster. And the lift value of the proposed method for usefulness is about 0.43 to 2.51 higher than that of each one of conventional association rule algorithms. Full article

(This article belongs to the Special Issue Knowledge Retrieval and Reuse)

► Show Figures

Figure 1

21 pages, 4797 KB

Open AccessArticle

tribAIn—Towards an Explicit Specification of Shared Tribological Understanding

by Patricia Kügler, Max Marian, Benjamin Schleich, Stephan Tremmel and Sandro Wartzack

Appl. Sci. 2020, 10(13), 4421; https://doi.org/10.3390/app10134421 - 27 Jun 2020

Cited by 23 | Viewed by 5888

Abstract

Within the domain of tribology, the science and technology for understanding and controlling friction, lubrication, and wear of relatively moving interacting surfaces, countless experiments are carried out and their results are published worldwide. Due to the variety of test procedures and a lack [...] Read more.

Within the domain of tribology, the science and technology for understanding and controlling friction, lubrication, and wear of relatively moving interacting surfaces, countless experiments are carried out and their results are published worldwide. Due to the variety of test procedures and a lack of consistency in the terminology as well as the practice of publishing results in the natural language, accessing and reusing tribological knowledge is time-consuming and experiments are hardly comparable. However, for the selection of potential tribological pairings according to given requirements and to enable comparative evaluations of the behavior of different tribological systems or testing conditions, a shared understanding is essential. Therefore, we present a novel ontology tribAIn (derived from the ancient Greek word “tribein” (= rubbing) and the acronym “AI” (= artificial intelligence)), designed to provide a formal and explicit specification of knowledge in the domain of tribology to enable semantic annotation and the search of experimental setups and results. For generalization, tribAIn is linked to the intermediate-level ontology EXPO (ontology of scientific experiments), supplemented with subject-specific concepts meeting the needs of the domain of tribology. The formalization of tribAIn is expressed in the W3C standard OWL DL. Demonstrating the ability of tribAIn covering tribological experience from experiments, it is applied to a use case with heterogeneous data sources containing natural language texts and tabular data. Full article

(This article belongs to the Special Issue Knowledge Retrieval and Reuse)

► Show Figures

Graphical abstract

11 pages, 356 KB

Open AccessArticle

Learning Translation-Based Knowledge Graph Embeddings by N-Pair Translation Loss

by Hyun-Je Song, A-Yeong Kim and Seong-Bae Park

Appl. Sci. 2020, 10(11), 3964; https://doi.org/10.3390/app10113964 - 7 Jun 2020

Cited by 10 | Viewed by 4715

Abstract

Translation-based knowledge graph embeddings learn vector representations of entities and relations by treating relations as translation operators over the entities in an embedding space. Since the translation is represented through a score function, translation-based embeddings are trained in general by minimizing a margin-based [...] Read more.

Translation-based knowledge graph embeddings learn vector representations of entities and relations by treating relations as translation operators over the entities in an embedding space. Since the translation is represented through a score function, translation-based embeddings are trained in general by minimizing a margin-based ranking loss, which assigns a low score to positive triples and a high score to negative triples. However, this type of embedding suffers from slow convergence and poor local optima because the loss adopts only one pair of a positive and a negative triple at a single update of learning parameters. Therefore, this paper proposes the N-pair translation loss that considers multiple negative triples at one update. The N-pair translation loss employs multiple negative triples as well as one positive triple and allows the positive triple to be compared against the multiple negative triples at each parameter update. As a result, it becomes possible to obtain better vector representations rapidly. The experimental results on link prediction prove that the proposed loss helps to quickly converge toward good optima at the early stage of training. Full article

(This article belongs to the Special Issue Knowledge Retrieval and Reuse)

► Show Figures

Figure 1

12 pages, 385 KB

Open AccessArticle

An Approach to Knowledge Base Completion by a Committee-Based Knowledge Graph Embedding

by Su Jeong Choi, Hyun-Je Song and Seong-Bae Park

Appl. Sci. 2020, 10(8), 2651; https://doi.org/10.3390/app10082651 - 11 Apr 2020

Cited by 9 | Viewed by 4149

Abstract

Knowledge bases such as Freebase, YAGO, DBPedia, and Nell contain a number of facts with various entities and relations. Since they store many facts, they are regarded as core resources for many natural language processing tasks. Nevertheless, they are not normally complete and [...] Read more.

Knowledge bases such as Freebase, YAGO, DBPedia, and Nell contain a number of facts with various entities and relations. Since they store many facts, they are regarded as core resources for many natural language processing tasks. Nevertheless, they are not normally complete and have many missing facts. Such missing facts keep them from being used in diverse applications in spite of their usefulness. Therefore, it is significant to complete knowledge bases. Knowledge graph embedding is one of the promising approaches to completing a knowledge base and thus many variants of knowledge graph embedding have been proposed. It maps all entities and relations in knowledge base onto a low dimensional vector space. Then, candidate facts that are plausible in the space are determined as missing facts. However, any single knowledge graph embedding is insufficient to complete a knowledge base. As a solution to this problem, this paper defines knowledge base completion as a ranking task and proposes a committee-based knowledge graph embedding model for improving the performance of knowledge base completion. Since each knowledge graph embedding has its own idiosyncrasy, we make up a committee of various knowledge graph embeddings to reflect various perspectives. After ranking all candidate facts according to their plausibility computed by the committee, the top-k facts are chosen as missing facts. Our experimental results on two data sets show that the proposed model achieves higher performance than any single knowledge graph embedding and shows robust performances regardless of k. These results prove that the proposed model considers various perspectives in measuring the plausibility of candidate facts. Full article

(This article belongs to the Special Issue Knowledge Retrieval and Reuse)

► Show Figures

Figure 1

24 pages, 3107 KB

Open AccessArticle

Automatic Classification of Web Images as UML Static Diagrams Using Machine Learning Techniques

by Valentín Moreno, Gonzalo Génova, Manuela Alejandres and Anabel Fraga

Appl. Sci. 2020, 10(7), 2406; https://doi.org/10.3390/app10072406 - 1 Apr 2020

Cited by 14 | Viewed by 5529

Abstract

Our purpose in this research is to develop a method to automatically and efficiently classify web images as Unified Modeling Language (UML) static diagrams, and to produce a computer tool that implements this function. The tool receives a bitmap file (in different formats) [...] Read more.

Our purpose in this research is to develop a method to automatically and efficiently classify web images as Unified Modeling Language (UML) static diagrams, and to produce a computer tool that implements this function. The tool receives a bitmap file (in different formats) as an input and communicates whether the image corresponds to a diagram. For pragmatic reasons, we restricted ourselves to the simplest kinds of diagrams that are more useful for automated software reuse: computer-edited 2D representations of static diagrams. The tool does not require that the images are explicitly or implicitly tagged as UML diagrams. The tool extracts graphical characteristics from each image (such as grayscale histogram, color histogram and elementary geometric forms) and uses a combination of rules to classify it. The rules are obtained with machine learning techniques (rule induction) from a sample of 19,000 web images manually classified by experts. In this work, we do not consider the textual contents of the images. Our tool reaches nearly 95% of agreement with manually classified instances, improving the effectiveness of related research works. Moreover, using a training dataset 15 times bigger, the time required to process each image and extract its graphical features (0.680 s) is seven times lower. Full article

(This article belongs to the Special Issue Knowledge Retrieval and Reuse)

► Show Figures

Figure 1

14 pages, 475 KB

Open AccessArticle

A Noval Weighted Meta Graph Method for Classification in Heterogeneous Information Networks

by Jinli Zhang, Tong Li, Zongli Jiang, Xiaohua Hu and Ali Jazayeri

Appl. Sci. 2020, 10(5), 1603; https://doi.org/10.3390/app10051603 - 28 Feb 2020

Cited by 3 | Viewed by 3587

Abstract

There has been increasing interest in the analysis and mining of Heterogeneous Information Networks (HINs) and the classification of their components in recent years. However, there are multiple challenges associated with distinguishing different types of objects in HINs in real-world applications. In this [...] Read more.

There has been increasing interest in the analysis and mining of Heterogeneous Information Networks (HINs) and the classification of their components in recent years. However, there are multiple challenges associated with distinguishing different types of objects in HINs in real-world applications. In this paper, a novel framework is proposed for the weighted Meta graph-based Classification of Heterogeneous Information Networks (MCHIN) to address these challenges. The proposed framework has several appealing properties. In contrast to other proposed approaches, MCHIN can fully compute the weights of different meta graphs and mine the latent structural features of different nodes by using these weighted meta graphs. Moreover, MCHIN significantly enlarges the training sets by introducing the concept of Extension Meta Graphs in HINs. The extension meta graphs are used to augment the semantic relationship among the source objects. Finally, based on the ranking distribution of objects, MCHIN groups the objects into pre-specified classes. We verify the performance of MCHIN on three real-world datasets. As is shown and discussed in the results section, the proposed framework can effectively outperform the baselines algorithms. Full article

(This article belongs to the Special Issue Knowledge Retrieval and Reuse)

► Show Figures

Figure 1

22 pages, 1535 KB

Open AccessArticle

A Heterogeneous Battlefield Situation Information Sharing Method Based on Content

by Cuntao Liu, Wendong Zhao, Aijing Li and Junsheng Zhang

Appl. Sci. 2020, 10(1), 184; https://doi.org/10.3390/app10010184 - 25 Dec 2019

Cited by 1 | Viewed by 3946

Abstract

Various information systems adopt different information category standards and description methods even for the same battlefield space. This leads to the heterogeneity of information distributed in different systems, which hinders the information sharing among different systems. In this paper, we adopt the idea [...] Read more.

Various information systems adopt different information category standards and description methods even for the same battlefield space. This leads to the heterogeneity of information distributed in different systems, which hinders the information sharing among different systems. In this paper, we adopt the idea of schema mapping and design a framework of realizing heterogeneous information sharing based on content. We design a concept-logic tree model to organize the battlefield situation information and realize the mapping from local concept models in different systems to a global unified concept model. By constructing a unified information space, we realize the centralized organization, storage, and management of entity description information. Then, the information broadcasting mechanism and content based information query mechanism are designed to realize information sharing. Theoretical analysis and experiment results verify the effectiveness of the proposed framework. Full article

(This article belongs to the Special Issue Knowledge Retrieval and Reuse)

► Show Figures

Figure 1

19 pages, 1898 KB

Open AccessArticle

A Weighted PageRank-Based Bug Report Summarization Method Using Bug Report Relationships

by Beomjun Kim, Sungwon Kang and Seonah Lee

Appl. Sci. 2019, 9(24), 5427; https://doi.org/10.3390/app9245427 - 11 Dec 2019

Cited by 8 | Viewed by 3725

Abstract

For software maintenance, bug reports provide useful information to developers because they can be used for various tasks such as debugging and understanding previous changes. However, as they are typically written in the form of conversations among developers, bug reports tend to be [...] Read more.

For software maintenance, bug reports provide useful information to developers because they can be used for various tasks such as debugging and understanding previous changes. However, as they are typically written in the form of conversations among developers, bug reports tend to be unnecessarily long and verbose, with the consequence that developers often have difficulties reading or understanding bug reports. To mitigate this problem, methods that automatically generate a summary of bug reports have been proposed, and various related studies have been conducted. However, existing bug report summarization methods have not fully exploited the inherent characteristics of bug reports. In this paper, we propose a bug report summarization method that uses the weighted-PageRank algorithm and exploits the 'duplicates’, ‘blocks’, and ‘depends-on’ relationships between bug reports. The experimental results show that our method outperforms the state-of-the-art method in terms of both the quality of the summary and the number of applicable bug reports. Full article

(This article belongs to the Special Issue Knowledge Retrieval and Reuse)

► Show Figures

Figure 1

12 pages, 1579 KB

Open AccessArticle

An Approach to Constructing a Knowledge Graph Based on Korean Open-Government Data

by Jae-won Lee and Jaehui Park

Appl. Sci. 2019, 9(19), 4095; https://doi.org/10.3390/app9194095 - 30 Sep 2019

Cited by 9 | Viewed by 6390

Abstract

A data platform collecting the whole metadata held by government agencies and a knowledge graph showing the relationship between the collected open-government data are proposed in this paper. By practically applying the data platform and the knowledge graph to the public sector in [...] Read more.

A data platform collecting the whole metadata held by government agencies and a knowledge graph showing the relationship between the collected open-government data are proposed in this paper. By practically applying the data platform and the knowledge graph to the public sector in Korea, three improvements were expected: (1) enhancing user accessibility across open-government data; (2) allowing users to acquire relevant data as well as desired data with a single query; and (3) enabling data-driven decision-making. In particular, the barriers for citizens to acquire the necessary data have been greatly reduced by using the proposed knowledge graph, which is considered to be important for data-driven decision-making. The reliability and feasibility of constructing a metadata-based open-data platform and a knowledge graph are estimated to be considerably high as the proposed approach is applied to a real service of the public sector in Korea. Full article

(This article belongs to the Special Issue Knowledge Retrieval and Reuse)

► Show Figures

Figure 1

Review

Jump to: Research

29 pages, 1808 KB

Open AccessReview

The Impact of Controlled Vocabularies on Requirements Engineering Activities: A Systematic Mapping Study

by Arshad Ahmad, José Luis Barros Justo, Chong Feng and Arif Ali Khan

Appl. Sci. 2020, 10(21), 7749; https://doi.org/10.3390/app10217749 - 2 Nov 2020

Cited by 13 | Viewed by 4925

Abstract

Context: The use of controlled vocabularies (CVs) aims to increase the quality of the specifications of the software requirements, by producing well-written documentation to reduce both ambiguities and complexity. Many studies suggest that defects introduced at the requirements engineering (RE) phase have a [...] Read more.

Context: The use of controlled vocabularies (CVs) aims to increase the quality of the specifications of the software requirements, by producing well-written documentation to reduce both ambiguities and complexity. Many studies suggest that defects introduced at the requirements engineering (RE) phase have a negative impact, significantly higher than defects in the later stages of the software development lifecycle. However, the knowledge we have about the impact of using CVs, in specific RE activities, is very scarce. Objective: To identify and classify the type of CVs, and the impact they have on the requirements engineering phase of software development. Method: A systematic mapping study, collecting empirical evidence that is published up to July 2019. Results: This work identified 2348 papers published pertinent to CVs and RE, but only 90 primary published papers were chosen as relevant. The process of data extraction revealed that 79 studies reported the use of ontologies, whereas the remaining 11 were focused on taxonomies. The activities of RE with greater empirical support were those of specification (29 studies) and elicitation (28 studies). Seventeen different impacts of the CVs on the RE activities were classified and ranked, being the two most cited: guidance and understanding (38%), and automation and tool support (22%). Conclusions: The evolution of the last 10 years in the number of published papers shows that interest in the use of CVs remains high. The research community has a broad representation, distributed across the five continents. Most of the research focuses on the application of ontologies and taxonomies, whereas the use of thesauri and folksonomies is less reported. The evidence demonstrates the usefulness of the CVs in all RE activities, especially during elicitation and specification, helping developers understand, facilitating the automation process and identifying defects, conflicts and ambiguities in the requirements. Collaboration in research between academic and industrial contexts is low and should be promoted. Full article

(This article belongs to the Special Issue Knowledge Retrieval and Reuse)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Knowledge Retrieval and Reuse

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (17 papers)

Research

Review

Further Information

Guidelines

MDPI Initiatives

Follow MDPI