An Ontology-Driven Personalized Faceted Search for Exploring Knowledge Bases of Capsicum

: Capsicum is a genus of ﬂowering plants in the Solanaceae family in which the members are well known to have a high economic value. The Capsicum fruits, which are popularly known as peppers or chili, have been widely used by people worldwide. It serves as a spice and raw material for many products such as sauce, food coloring, and medicine. For many years, scientists have studied this plant to optimize its production. A tremendous amount of knowledge has been obtained and shared, as reﬂected in multiple knowledge-based systems, databases, or information systems. An approach to knowledge-sharing is through the adoption of a common ontology to eliminate knowledge understanding discrepancy. Unfortunately, most of the knowledge-sharing solutions are intended for scientists who are familiar with the subject. On the other hand, there are groups of potential users that could beneﬁt from such systems but have minimal knowledge of the subject. For these non-expert users, ﬁnding relevant information from a less familiar knowledge base would be daunting. More than that, users have various degrees of understanding of the available content in the knowledge base. This understanding discrepancy raises a personalization problem. In this paper, we introduce a solution to overcome this challenge. First, we developed an ontology to facilitate knowledge-sharing about Capsicum to non-expert users. Second, we developed a personalized faceted search algorithm that provides multiple structured ways to explore the knowledge base. The algorithm addresses the personalization problem by identifying the degree of understanding about the subject from each user. In this way, non-expert users could explore a knowledge base of Capsicum efﬁciently. Our solution characterized users into four groups. As a result, our faceted search algorithm deﬁnes four types of matching mechanisms, including three ranking mechanisms as the core of our solution. In order to evaluate the proposed method, we measured the predictability degree of produced list of facets. Our ﬁndings indicated that the proposed matching mechanisms could tolerate various query types, and a high degree of predictability can be achieved by combining multiple ranking mechanisms. Furthermore, it demonstrates that our approach has a high potential contribution to biodiversity science in general, where many knowledge-based systems have been developed with limited access to users outside of the domain.


Introduction
Along with the advancement of Information and Communication Technology (ICT), scientists have generated a tremendous amount of data, including biodiversity data. The era of Biodiversity Big Data has already emerged [1]. Biodiversity data cover a wide range of life forms on Earth within its many regions, ecosystems, and habitats. Multiple new challenges have been introduced including data collection and processing, mobilization, imputation, sharing, and integration [2]. Therefore, more advanced technologies are needed to manage it. Data-intensive science [3], which is also recognized as the fourth paradigm of scientific discovery, serves as a scientific methodology to analyze the large volume of biodiversity data.
One type of biodiversity data is the characteristics of living organisms at the morphological level. Specific morphological characteristics provide basic information for understanding the structure of an organism, the relationship between structure and function, as well as plant classification. The morphological characteristics also provide plant biologists and taxonomists with a framework to assess the differences or similarities between species. Therefore, the considerable data of characteristics serve as a favorable parameter for accurate identification and description of the plant species. As an example, biologists have used big data and genetic approaches to understand the evolution of the plant form and physiology [4].
Capsicum is a genus of flowering plants in the Solanaceae family in which the members are well known for having a high economic value. Capsicum fruits, which are popularly known as peppers or chili, have been widely used by people as spices in many cuisines worldwide. Chilli also can be used as a raw material for many products such as sauces, food coloring, and medicine. The genus of Capsicum consists of nearly 30 species and approximately 50.000 varieties [5]. Certainly, there is great variation between and within the pepper species.
The morphological characteristics of Capsicum can be used as essential information to identify the species of a plant. That information has been collected and shared through multiple knowledge-based systems, online databases, or information systems. Regarding data-intensive science, the challenge lies in the data-sharing approach. Most of the data are shared among scientists or experts who know the meaning of the data. Hence, the data are difficult to be consumed by non-expert users or the public. On the other hand, the latter group of users has been recognized as an essential constituent in the scientific discovery process, especially in collecting substantial amounts of data and engaging with the public [6].
In this paper, we introduce a method to overcome this challenge. Our method utilizes a faceted search to organize information such that data in a repository can be explored more systematically. This method would help non-expert users to start and focus on finding relevant information from a large number of knowledge bases that contain characteristics of Capsicum.

Motivation
A solution to overcome the knowledge understanding discrepancy of a domain of interest is to use an ontology-based approach. An ontology-shared, explicit and formal conceptualizations of a domain [7]-describes a knowledge-based program through the definition of a set of representational terms. As a set of objects and relationships among them, a common ontology is highly suitable for a variety of knowledge-sharing activities to guarantee consistency.
To share knowledge about plant anatomy, morphology, and the stages of plant development, the Plant Ontology (PO) has been developed and continuously expanded [8]. PO adopts the data model of the Gene Ontology (GO) [9] to annotate gene expression and phenotype data of plant structures and stages of plant development. As a common reference ontology for plant structures and development stages, the PO solves terminology disparities used by scientists from different projects and groups. The PO has been used to integrate multiple online plant genomics portals and databases, such as the Arabidopsis Information Resource (TAIR) (https://arabidopsis.org (accessed on 28 June 2021)), the Sol Genomics Network (SGN) (https://solgenomics.net (accessed on 28 June 2021)), the Maize Genetics and Genomics Database (MaizeGDB) (https://maizegdb.org (accessed on 28 June 2021)), the Oryzabase Database (http://shigen.nig.ac.jp (accessed on 28 June 2021)), and the Gramene Database (http://gramene.org (accessed on 28 June 2021)) to name a few.
The PO has been used successfully to share knowledge among scientists, for example, to accurately describe the plant development stages across species [10]. Unfortunately, the ontology was designed to share knowledge among scientists who have knowledge about the subject. It is not necessarily consumable by non-expert users, those who are not familiar with the subject. These non-expert users include students, young scientists, or even citizen scientists concerned with the knowledge. This group of users has been recognized as an essential research tool due to its capability in providing data at an extensive scale and fine-grained resolution [11]. Besides being powerful in providing large amounts of data, citizen scientists could convey information to the public more conveniently [6].
Based on the previously described situations, we outlined a few things that motivated our work as follow: 1.
The PO can be used to describe plant characteristics, from anatomy and morphology to the stages of plant development. It is suitable to share knowledge among scientists but not necessarily with non-expert users.

2.
For non-expert users, when describing a less familiar object (for example, a flower of a plant), they tend to describe it based on generic properties or attributes. For example, to describe the petal of a flower, they would describe it based on familiar properties such as color, size, texture, etc.

Challenges
For non-expert users, finding relevant information from a less familiar knowledge base would be a daunting task from the beginning. How to formulate search keywords, refine search keywords, and filter the results, are examples of challenges facing this group of users. In this work, we formulated these challenges as follow: 1.
How to start to explore a knowledge base of Capsicum by describing a generic morphological character. Searching should start from a point, for example, by defining at least one plant character. The start point could be any point in the knowledge base, regardless of its generality or specificity.

2.
How to refine the search results by selecting the most relevant criteria/group. Finding the most relevant criteria/group is the main challenge.

3.
How to sort multiple results to be presented to the users. When multiple criteria/groups are identified as relevant, they need to be sorted to provide users with the most relevant first. Finding the way to sort the results is the next challenge.
Our main goal is to enable non-expert users with minimal knowledge of plant characteristics to consume the collected knowledge of Capsicum. Our approach relies on an information searching technique, a so-called faceted search. Faceted search is a search technique organizing the search outputs into groups with different topics that enable users to filter the results and to find the desired information quickly [12]. However, in contrast with existing works that have utilized faceted search as a solution to information overload, we use the technique to help non-expert users explore the less familiar database. The method suggests the most relevant criteria/group that can be used to narrow down the search and to focus the data exploration process.
The rest of the paper is organized as follows. A few related works are discussed in Section 2, where we also outline our contributions. In Section 3, we describe our approach, especially the proposed algorithm that consists of two parts matching and ranking procedures. The implementation of the algorithm is described in Section 4. We evaluate the algorithm and discuss our findings in Section 5 before finally summarizing this paper with a few conclusions and future works in Section 6.

Related Work
We align our work with two broad research areas: the development of ontology as a bridge to unify diverse terminologies in plant science and ontology-driven faceted search. This section describes several related works from each area and outlines our contribution at the end.
Ontology has been recognized as a vital component for interoperability across knowledgebased systems [7]. Ontologies are fundamental for unifying diverse terminologies and are increasingly used by scientists in many fields, including the online web search engines [13]. An ontology-underpinned emergency response system for water pollution accidents has been proposed to meet the demand of the government and public users for sustainable monitoring and real-time emergency response [14]. An ontological model can also represent context, driven by events, in academic domains by integrating five modular contexts, namely Person, temporal (time), physical space (location), network, and academic events [15]. Furthermore, an ontology also can be used to integrate a variety of quality assessment methods into a unified model for assessing the quality of a website [16].
Over the past years, many structured vocabularies, databases, and information systems have been developed to allow scientists to exchange knowledge about plant traits [17]. Ontology-based solutions are widely used to represent knowledge in this domain, which includes a set of terms to describe the classes in the domain and the relationships among terms [18]. For example, a schema to represent data from multiple biodiversity information systems that are available on the Web was constructed to enable Linked Biodiversity Data [19]. The International Rice Information System (IRIS) was developed to handle rice functional genomics data diversity, including genomic sequence data, molecular genetic data, expression data, and proteomic information [20]. The BRENDA enzyme information system (https://brenda-enzymes.org/ (accessed on 28 June 2021)) was developed from a database to a competence center for enzyme-related information, which combines manually curated enzyme data with proteomic and genomic information [21]. The Pepper Expressed Sequence Tags (EST) database was constructed that consists of 122,582 sequenced ESTs and 116,412 refined ESTs that are available from 21 pepper EST libraries [22]. Much work has also been conducted utilizing local resources. For example, the development of an ontology for an Indonesian medicinal plant [23], an ontology for Thai Zingiberaceae [24], and an ontology for plant genetic resources in the Gene bank of the Institute of Plant Genetic Resources in the town of Sadovo [25]. Specifically for the Plant Ontology (PO), it can be used to describe not only plant anatomy and morphology but also the stages of plant development [8]. Knowledge and information detail about plant traits, genotype, and phenotype are usually used as basic information in PO. It should facilitate a formal description of phenotypes and standardized annotation of plant traits that accurately describe plant anatomy and morphology. In the previous release of PO, the so-called Plant Structure Ontology (PSO), there were three types of parent-child relationships in PSO that can be used to associate two terms, namely is_a, part_of, and develops_from [26]. PO in combination with another ontology can also be used to extract entity-quality relationships from digitized taxon descriptions [27].
On the other hand, faceted search as a human-computer interaction technique has been identified as a practical approach to handle a vast amount of data [28]. It provides interactivity and easy-to-use data visualization solution to guide the decision-making process through the classification of information, so-called "facets". It has been used in many domains, for example, to provide dynamic faceted search solutions over enterprise databases for a domain-independent system [29] and to ensure a high-quality recommendation of scientific articles based on a query paper [30]. A visual recommendation system can also be built based on a faceted browsing technique that provides interactive navigation of automatically generated visualizations [31]. In the bioenergy domain, a faceted search system can be used to eliminate sense disambiguation of search terms [32]. A faceted search visualization technique can also support categorized access to heterogeneous and unstructured biomedical data sources [33]. This technique can also support a metasearch system to search and gather data resources from multiple Open Linked Data Projects [34]. Furthermore, it facilitates a coarse-grained and fine-grained exploration of geographic maps through interactive widgets [35].
In most of the faceted-search solutions, ontology plays a significant role. An ontology can be integrated with faceted navigation to improve information retrieval results through a query expansion mechanism [36]. By representing entities of an ontology as the facets and ontological instances as facets values, a personalized search interface can be constructed by matching the facets and user profile [37]. An Unsupervised Ontology was extracted from heterogeneous and unstructured biomedical content to be aligned with multiple existing biomedical ontologies to enable data exploration from heterogeneous biomedical resources [33]. For ontology-driven solutions, multiple search techniques/algorithms have been used to explore relatively large search spaces, for example, using the Hill-Climbing algorithm to learn the Bayesian network of a mushrooms dataset to identify to most relevant content [28]. A random walk-based framework has been used to induce a sub-network consisting of related nodes of the scientific article citations or content similarity network [30]. Relevant facets can also be predicted by using a Random Forest (RF) model [38], based on the most frequent queries of the most similar users [34], based on the ability of the user to provide desired values for each facet [29], as well as based on statistical and perceptual measures [31].
As mentioned in Section 1, several ontologies have been developed for plant research. The most prominent one is the Plant Ontology (PO), which adopts a data model of Gene Ontology (GO) that covers flowering plants in general. GO, for example, has been used to cluster and assemble the Pepper EST database to contribute to the analysis of gene synteny as part of a chili pepper sequencing project [22]. Using PO as the basis for an ontology-driven faceted search is an ideal solution. However, as mentioned in Section 1, the ontology is too complex and hardly understandable by non-expert users.
In this work, we attempt to construct a search-based method that can be used to explore a knowledge base of Capsicum. The goal is to provide a faceted search that enables non-expert users to explore knowledge bases with less prior knowledge about characteristics of Capsicum. Our work contributes in at least two aspects:

1.
Development of an ontology that intends to communicate knowledge to non-experts users. Instead of using existing ontologies, we developed a small yet powerful ontology to describe the characteristics of Capsicum. The ontology was not intended to be complete but to be easily consumed by non-expert users.

2.
Utilization of faceted search technique to drive search process. This technique has been widely used to overcome information overload or searching from a large amount of data. In contrast, our faceted search was intended to search from an unfamiliar database where the amount of data is not necessarily significant.

Method
In this section, we discuss our research methodology. First, we introduce our solution, followed by our research procedures. After that, we explain each research step especially for knowledge modeling, matching, and ranking procedures. At the end, we describe our method to evaluate the results.

Capsicum Search: A Personalized Faceted Search
We propose a personalized faceted search solution for exploring the knowledge bases of Capsicum. The proposed solution consists of two parts: an ontology as a shared common understanding of the subject and a search algorithm. The ontology will be used as the base to produce a list of facets, while the search algorithm will filter and order the list further according to specific criteria. It is expected that the constructed list of facets is highly appropriate for the need of users. The search algorithm can be explained as follow:

1.
A domain ontology is represented as a Directed Acyclic Graph (DAG) that consists of vertices and edges starting from the most generic part of a plant. The leaves of the graph are the most specific parts of the plant.

2.
A list of queries is represented as path traversal procedures, where the encoded entities and properties can be located correctly in the graph.

3.
Based on the graph and path traversal procedures, relevant entities are identified based on entities' relationships in the graph, for example, based on relationships of siblings, sub-graphs, etc.

4.
All relevant entities become the list of facets to be presented to the users that can be used to refine their search results further. Figure 1 shows our research procedures, which consist of several steps. Before explaining each step, we present some basic definitions that we borrow from graph theory [39]. To avoid confusion, we use entity and node interchangeably, where an entity is represented as a node in the graph.

Research Procedures
Definition 1 (Knowledge Base as Graph). Knowledge Base is a collection of knowledge obtained from experts. The knowledge is represented as a graph, where nodes of the graph are related to each other using specific relationships. A tree is a special kind of graph that contains no cycles.

Definition 2 (Level).
A level of a node in a tree is how far the node is from the root, where the root has level 0. The higher the level, the further the node from the root.

Definition 3 (Path).
A path between two nodes is the possible way to reach one node from another.

Definition 4 (Branch)
. The branch of a tree, a sub-tree consists of smaller connected nodes. Two nodes are in the same branch if there is a path between them without going through the root.
We explain several most important research steps in the following sub-sections, including knowledge modeling, matching, and ranking procedures.

Knowledge Modeling and Query Formulation
In this step, knowledge from domain experts was acquired through a series of intensive discussions and become the base of our ontology development. There are two groups of ontology development models that are widely used, namely waterfall and iterative-increment life-cycle models [40,41]. Multiple prominent ontology engineering methodologies come under the latter group, including Methontology [42] and NeOn [40]. For example, Methontology has been successfully used to develop ontologies in the legal domain [43], while NeOn methodology in the human resources domain [44] and healthcare domain [41]. A combination of both methodologies was also possible, for example, integrating methods and models of assessing the quality of Internet services [45].
To build our ontology, we adopted the ontology engineering methodology presented by Uschold and Gruninger [46]. It belongs to the waterfall model group, where the stages of the ontology engineering process were performed sequentially [41]. The methodology was adopted because it provides flexibility to the ontology developers in our team to lead and drive discussions with domain experts. We performed the following stages:

1.
Identification of the purpose and scope of the ontology. As mentioned in Section 1, we share knowledge about the characteristics of Capsicum with non-expert users. Therefore, we expected that the ontology should cover characteristics of Capsicum identifiable by this group of users.

2.
Building the ontology, which covers the ontology capture, coding, and integration with existing ontologies. We identified entities, properties, and data types, including how entities are related to each other. After that, we represented the identified objects using the Resource Description Framework (RDF) [47] and Ontology Web Language (OWL) [48]. For coding the ontology, we actively used the Protégé ontology editor [49]. For integration with existing ontologies, we adopted the terms from the Plant Ontology [8].

3.
Evaluation. We evaluated the ontology by using competence questions [46] to carry out reasoning with different characteristics of Capsicum. This evaluation ensures that a list of correct entities can be obtained when a common characteristic is provided.
To order the obtained entities as facets, we use a ranking mechanism explained in Section 3.5.

4.
Documentation. We generated the documentation of our ontology by using the WIDOCO tool [50]. It generated human-readable descriptions of terms and summaries with integration with other external information.
In the query formulation step, a list of anticipated questions was formulated. We consider entities and properties from each question. To be precise, from each obtained question, we extracted entities, properties, and values from every property. As a result, a query list, represented as tuples (entity, property, and values), is obtained.

Matching
In this step, the constructed ontology and the obtained list of queries are aligned to identify the matching pairs. As the representation of the degree of understanding, the queries of users are various. Each user has a different level of background knowledge about the subject. Therefore, the matching procedure should be adapted to fit those variations of degrees. We characterize users based on how they define their queries in four groups as follow:

1.
Familiar with only one part of the plant. Users in this type only provide the description of a specific part and ignore other generic or more specific parts.

2.
Familiar with the generic parts of the plant. Users in this type provide a relatively generic description of the whole plant without focusing on a specific part.

3.
Focused on small parts of the plant. Users in this type provide more specific descriptions that are related to each other.

4.
Combination of generic and focused. Users in this type provide random descriptions of the plant. Figure 2 depicts the relevant four types of possible cases when users provide descriptions. The knowledge base is represented as a tree, where the root (color black) is the plant itself and other nodes are parts of it. The relationship (represented as an arrow from one node to another) represents if the lower node is part of the higher node. The nodes with green color are two defined nodes (based on user description), and gray are the related nodes.

1.
Matching #1, users describe a plant using only one entity. In this case, we selected the entities at the same level as well as entities in the same branch, as shown in Figure 2a.
We called this matching mechanism a single-entity personalization method.

2.
Matching #2, users describe a plant using two entities located at the same level in the graph. In this case, we selected entities that are located at the same level, as shown in Figure 2b. We called this matching mechanism a level-based personalization method.

3.
Matching #3, users describe a plant using two entities not located at the same level but the same branch. In this case, we selected entities that are located at the same branch, as shown in Figure 2c. We called this matching mechanism a branch-based personalization method.

4.
Matching #4, users describe a plant using two entities not located at the same level or the same branch. In this case, we selected entities at the same level from both entities as well as entities from the branches, as shown in Figure 2d. We called this matching mechanism a level-and-branch-based personalization method.
All selected entities were then used as the list of facets to be presented to users. To this point, the order of the facets is random, so it is unpredictable. Therefore, a ranking mechanism is essential to ensure that the identical output is produced when using the identical query. This mechanism is explained further in the following sub-section.  Four matching mechanisms for identifying relevant entities based on the matching of one or more defined entities. The entity with black color is the root, green is the selected nodes, and gray is the identified nodes.

Ranking
A ranking mechanism is required to anticipate how to deliver multiple results to users. In this step, all obtained entities are ranked to determine their orders according to a specific criterion. As final results, we obtained a list of ordered entities to be used as facets in our faceted search engine. Back to our example in Figure 2, we obtained 4, 3, 4, and 9 facets to be presented to users as shown in Figure 2a-d, respectively. In a graph with complex relationships, the number of facets can be enormous, and therefore, a mechanism to order them is necessary.
Our ranking mechanism utilizes the relationships between entities as well as properties from each entity. We defined three ranking mechanisms to order the obtained facets as follow:

1.
Ranking #1: Select the matched entities with similar properties to the provided question. For example, if the query contains the property "Color", then all entities with "Color" are ordered first.

2.
Ranking #2: Select the matched entities with a higher number of properties. An entity with a more detailed description (based on the number of available properties) is ordered first.

3.
Ranking #3: Select the more generic entity first. Since the generality of entities can be obtained through their levels, a lower-level entity is ordered first.
We expected that the list of facets to be presented to users should be complete and predictable.

Evaluation
For evaluation, we used a randomized comparator under the assumption that if the proposed system is robust, it should produce an identical list of facets for an identical query regardless of its execution time. In this case, we computed the degree of predictability of produced results. If two facets x and y have an equal chance to be ordered first, then the degree of predictability is low, meaning that the produced results can be randomly ordered. Given F as the list of produced facets and R ∈ F as the list of facets that was randomly ordered according to the employed ranking mechanisms m, the degree of predictability is computed as follow: As shown in Equation (1), a degree predictability P is computed according to the ranking mechanism m. It works by computing the fraction of randomly ordered items R from the total items F. The value of P is between 0, representing fully random (zero predictability) to 1 (high predictability).

Result
In this section, we describe the results of our work. We start by discussing the developed ontology, followed by a prototype of our faceted search implementation. Figure 3 shows the developed ontology to deliver knowledge of Capsicum to non-expert users. In its initial version, the ontology consists of 21 entities, 2 object properties, 11 data type properties, and 4 species as individuals (Capsicum annuum, Capsicum f rutescens, Capsicum chinense, and Capsicum pubescens). It simplifies the Plant Ontology, where it focused only on the fundamental characteristics that most non-expert users can recognize.  Table 1 shows the list of main entities available in our ontology. They can be arranged into five levels, where level 0 belongs to the root of the plant. Object relationships in our ontology are partOf and its inverse hasPart to represent if an entity is part of another entity.  Table 2 displays the list of data type properties in our ontology. It consists of a few general characteristics that can be used to describe Capsicum plant. The table also shows the number of entities that are correlated with each relevant property. Property "Color" can be used to describe 13 entities, followed by "Shape" and "Length", each with 9 and 7 entities, respectively.  Table 3 shows the defined values for several properties in our ontology. We update the list regularly, and the latest version of the ontology is available online (https://ricover. hpc.lipi.go.id/ontocapsicum/ (accessed on 28 June 2021)).

Search Algorithm
Algorithm 1 displays the algorithm used to produce an ordered list of facets to be presented to the users to refine their search further. Given G as the graph representation of the ontology, Q as the query that represented as a list of tuple (entity, property, and value), and R as the ranking mechanisms, its primary goal is to find a list of relevant entities F to be used as facets. The algorithm works by seeking the matching entities, given the graph and the list of queries (line no. 1). The matched entities obtained will determine which matching mechanisms utilized. If the size of matched entities is only 1, then the mechanism #1 is executed (line no. 18). Otherwise, a further determination is executed (line no. 2). Based on the determination for level and branch, the algorithm executes ranking mechanisms #4 or #3 or #2 (line no. 11, 13, and 15, respectively). Finally, after all relevant entities were collected, the ranking mechanism is executed further to ensure the predictability of the results (line no. 23).

Evaluation
The evaluation was conducted by generating a list of possible queries performed by a person and by measuring their effects on produced relevant facets. First, the relevant facets are identified through the matching between queries and the ontology. After that, the facets are ordered according to specific criteria. When a similar query is executed multiple times on a faceted search engine application, the list of produced facets should be the same and ordered consistently. This evaluation calculates how consistent the produced facets when multiple types of queries are applied.
We consulted domain experts and collected a list of search queries that were possibly provided by non-expert users. This consultation is required to ensure that a query can be answered by our matching procedure (in-scope query). We did not consider the out-ofscope queries because the matching procedure does not produce any results for this kind of query. According to our ontology, we identified entities and properties from each query and found that most of the queries consist of one entity and up to three properties. Figure 4 shows the composition of entities and properties in the collected queries. For entity, 76% of them were about "Fruit", followed by 19% and 5% about "Leaf" and "Stem" respectively as shown in Figure 4b. The composition of properties is shown in Figure 4a, where the most used property was "Shape", followed by "Color", "Length", "Position", etc.
For evaluation, we execute the algorithm to measure the degree of predictability of the produced results from two main steps in our algorithm, namely matching and ranking, as explained in Section 3.2. Based on the defined four types of matching mechanisms, we investigated a few queries that fit each mechanism as follow: 1.
Case 1. The query contains only one node (fit with the matching mechanism #1). Testing case 1 uses permutation of three entities, namely "Fruit", "Leaf", and "Stem". All of them belong to level 1, and they are suitable for case 1.

2.
Case 2. The query contains two nodes, where both nodes are at the same level (fit with the matching mechanism #2). Testing case 2 is conducted by using a combination of "Petals", and "Seed". 3.
Case 3. The query contains two nodes, where both nodes are at the same branch (fit with the matching mechanism #3). Testing case 3 uses a combination of nodes in the same branch with multiples levels from three entities, such as "Fruit", "Stamen", and "Flower". 4.
Case 4. The query contains two nodes, where both nodes are neither at the same level nor at the same branch (fit with the matching mechanism #4). Testing case 4 is conducted with a combination of entities as nodes.
Based on the composition of entities and properties extracted from queries as shown in Figure 4, we selected a list of queries representing all four cases described above. As a result, 19 queries were selected that fairly represent all four cases and four individuals that are available in our ontology, as shown in Table 4.  For each query, we ran the algorithm to obtain the list of facet candidates. We collected the candidates after the matching and before a ranking mechanism was applied. Our intention was to identify the best combination of ranking mechanisms providing the highest degree of predictability. These intermediate results are shown in Table 5.   Table 5 shows the search results after applying a relevant matching mechanism for every query. The produced results are randomly ordered, and therefore, it is necessary to apply a ranking mechanism to ensure its predictability.
We applied the three ranking mechanisms (based on property similarity, number of properties, and levels) described in Section 3.5 and measured their degrees of predictability. The results are shown in Figure 5. Figure 5a shows the distribution of degree predictability when using the combination of the three ranking mechanisms. We obtained the highest degree of predictability when combined all three ranking mechanisms (value of mean = 0.49). The second higher degree was obtained by combining ranking mechanisms #2 (number of properties) and #3 (levels) with a mean = 0.44. A combination of ranking mechanism #1 (property similarity) and #2 produced a lower degree of predictability (value of mean = 0.29), followed by a combination of ranking mechanism #1 and #3 with a value of mean = 0.19. Furthermore, a detailed comparison for every query using every possible combination of ranking mechanism is shown in Figure 5b. In most of the queries, a combination of all three ranking mechanisms and a combination of ranking mechanism #2 and #3 are superior over the others. Surprisingly, the combination of ranking mechanism #1 and #2 was performed well when the number of matched entities is low, as shown in the results for questions no. 1 to 10.  Finally, we summarize a few things from the results explained above. First, the simplification of the complex ontology is critical to ensure that the ontology can fit targeted users. We used a simplified version of the Plant Ontology to allow for a knowledgesharing process about morphological characteristics of Capsicum species to non-expert users. Second, the alignment between the constructed ontology and queries is possible through multiple matching mechanisms. All queries can be aligned into one of four available matching mechanisms. Third, a combination of multiple ranking mechanisms is necessary to increase the degree of predictability of produced results. There is no single ranking mechanism that is more important than the other. Overall, we demonstrated that the proposed solution can be implemented effectively with promising results.

Conclusions
In general, a Knowledge Base (KB) can be seen as a technology that captures and stores knowledge from human experts. A Knowledge-Based System (KBS) uses a KB to make intelligent decisions, for example, to support decision-making processes, learning, and other activities. In most cases, a KB is highly domain-oriented, based on the expertise of experts in a specific domain. It is necessary to have sufficient knowledge about the field to be able to consume a KB properly.
A challenge occurs when knowledge-sharing from a KB with persons who are not familiar with the domain, for example, citizen scientists. The involvement of citizen scientists has been recognized as an essential factor in science explorations. Their involvement is in collecting data and at the same time in consuming data, for example, to identify species. An approach for knowledge-sharing is through an ontology that can be seen as a shared common understanding about a domain. Using a common ontology for a variety of knowledge-sharing activities would guarantee consistency. In the domain of plant science, several ontologies have been developed to support knowledge-sharing among scientists. They were intended to be consumed by scientists who have knowledge about the domain but not by users who have less or even no knowledge about it.
In this paper, we tackled the challenge using an ontology-driven knowledge-sharing approach that is consumable by non-expert users. Our work started by eliciting knowledge from experts and formulated them in an ontology. The ontology was developed by focusing on generic characteristics and left out detailed information. Furthermore, we developed a faceted search application to consume the ontology and to provide relevant facets that users can use to refine their queries further. In this way, non-expert users should explore a KB about Capsicum with minimal background knowledge.
We characterized users into four groups based on their ways to describe objects in a KB, in our case, parts of a plant. As a result, we introduced four matching mechanisms to identify relevant entities in our ontology and three ranking mechanisms to order the identified entities. To evaluate our search system, we measured the degree of predictability of produced facets. Given an identical query, the system should provide an equal output. We found that different types of queries can be mapped into one of the available matching mechanisms. Furthermore, by combining all three ranking mechanisms, we obtained the highest degree of predictability of the list of facets.
In the future, we expand our current faceted search application vertically by using more sophisticated ontology and horizontally by using ontology from different areas. Specifically, we use a color index to avoid ambiguity in defining the value for the entity "Color". In its current implementation, we use pre-defined values such as "Dark Green" where every person might have a different definition. Additionally, we standardize values for specific entities. Values for the entity "Shape" such as "rounded" and "bell shape", should be standardized to avoid multiple interpretations. We are also considering transforming our ontology into a fuzzy ontology because fuzzy logic has suitable formalisms to handle imprecise and uncertain knowledge [51]. We envisioned that our approach has an essential impact on biodiversity science in general, where many KBs have been developed but they are difficult to be consumed by users outside of the domain. Further, non-expert users such as citizen scientists contribute significantly to knowledge-sharing and dissemination of science to the public.