A Comprehensive Survey of Facet Ranking Approaches Used in Faceted Search Systems

Ali, Esraa; Caputo, Annalina; Jones, Gareth J. F.

doi:10.3390/info14070387

Open AccessArticle

A Comprehensive Survey of Facet Ranking Approaches Used in Faceted Search Systems

by

Esraa Ali

^*

,

Annalina Caputo

and

Gareth J. F. Jones

ADAPT Centre, School of Computing, Dublin City University, D09 E432 Dublin, Ireland

^*

Author to whom correspondence should be addressed.

Information 2023, 14(7), 387; https://doi.org/10.3390/info14070387

Submission received: 31 May 2023 / Revised: 3 July 2023 / Accepted: 3 July 2023 / Published: 7 July 2023

(This article belongs to the Special Issue Advances in Recommender Systems, Information Retrieval and Adaptive Systems)

Download

Browse Figures

Versions Notes

Abstract

:

Faceted Search Systems (FSSs) have gained prominence as one of the dominant search approaches in vertical search systems. They provide facets to educate users about the information space and allow them to refine their search query and navigate back and forth between resources on a single results page. Despite the importance of this problem, it is rare to find studies dedicated solely to the investigation of facet ranking methods, nor to how this step, aside from other aspects of faceted search, affects the user’s search experience. The objective of this survey paper is to review the state of the art in research related to faceted search systems, with a focus on existing facet ranking approaches and the key challenges posed by this problem. In addition to that, this survey also investigates state-of-the-art FSS evaluation frameworks and the most commonly used techniques and metrics to evaluate facet ranking approaches. It also lays out criteria for dataset appropriateness and its needed structure to be used in evaluating facet ranking methods aside from other FSS aspects. This paper concludes by highlighting gaps in the current research and future research directions related to this area.

Keywords:

facet ranking; facets; faceted search; faceted search systems

1. Introduction

Faceted search is one of the mainstream search paradigms in vertical search engines. In addition to famous look-up search systems, faceted search systems (FSS), also known as faceted browsing or faceted navigation systems, provide an alternative way for the user to navigate through the search space. In this context, facets are attributes or meta-data that describe the underlying content collection. Faceted search is the de facto search approach for many domain-specific search engines such as e-commerce and e-tourism. The primary distinction between faceted search and other forms of web search is that users can explore the information space through facets. To do this, the searcher uses the facets as filters to navigate the data and learn more about the research area in general.

As the magnitude of data in a collection increases, the number of facets and their values become impractical to display in a single page. Providing users with too many facets has been shown to overwhelm and distract them [1,2,3]. Existing faceted browsers overcome this problem by either displaying a small number of facets and making the rest accessible through a ‘more’ button or by displaying only the facet titles without the values, and if the user is interested in a facet, they can click on the title to view its values. In either case, ranking the top facets is required as it guides the searcher in understanding the main aspects of the information space being explored.

Although faceted search is an area that is heavily studied in the literature, it is rare to find studies dedicated solely to the investigation of facet ranking methods, nor to how this step, aside from other aspects of faceted search, affects the user’s search experience. It is usually considered a complementary step to the facet generation process. This paper aims to cover existing FSS literature with a focus on the methods used in facet ranking, how they can be evaluated, and the impact of this step on the search process.

To conduct the survey, a systematic approach was employed to collect relevant publications from the selected academic research engines. The survey paper collected publications from various reputable academic research engines such as Google Scholar, ACM, Springer, IEEE Access, Semantic Scholar, and Microsoft Academic. To ensure comprehensive coverage, several keywords were utilized in the search process, including ‘Facets’, ‘Faceted Search’, ‘Faceted Browsing’, ‘Faceted Systems’, and ‘Facet Ranking’. The search process involved setting a specific time frame or searching window, which spanned from the year 2000 to the year 2023, ensuring that the survey included the most recent publications within a defined timeframe.

To narrow down the search results and ensure relevance, various filtering criteria were utilized. These criteria may have included limiting the search to peer-reviewed journal articles, conference proceedings, and scholarly publications. Additionally, filters such as language, publication type, and relevance to the topic were applied to refine the results.

To enhance the comprehensiveness of the survey, the search strategy was not solely reliant on the research engines. Additional efforts were made to identify influential authors in the field and track their contributions. This was achieved by following the publications of these authors and investigating their works for relevant papers that may not have been captured in the initial search results.

Moreover, citation analysis was conducted on the highly influential papers identified during the search process. By examining the citations of these papers, additional relevant publications were identified, allowing for more comprehensive coverage of the most recent work in the research domain.

Overall, this methodology employed a combination of systematic literature search techniques, filtering criteria, author tracking, and citation analysis to ensure a thorough and up-to-date collection of publications for the survey.

This survey begins with a brief overview of faceted search systems in general, what they are, how they operate, and examples of existing systems. In addition to that, key FSS aspects relevant to the facet ranking problem are covered in Section 2, including information need, user interaction, the search process, underlying data structures, and the information needs within FSSs. Other FSS aspects, such as query understanding, data indexing, and visualization, are important for the user experience, but they are outside the scope of this survey as they have less impact on the facet ranking process. Figure 1 illustrates the key FSS aspects which impact the facet ranking process and are therefore addressed in this survey.

This is followed by Section 3, which focuses on introducing facets, their definition, and how they are classified in the literature. It also draws attention to the facet generation phase in more detail due to its impact on the facet ranking process. Section 4 is dedicated to reviewing existing approaches to facet ranking found in the literature. These can broadly be classified into personalized and non-personalized ranking methods. The section provides a discussion of the benefits and drawbacks of each approach. The next section in this article, Section 5, covers the most commonly adopted evaluation strategies and metrics used in FSS and focuses on how facet ranking can be evaluated aside from other FSS components. It also discusses how the domain and search task choices affect the evaluation process. Finally, the survey is concluded with a summary of the FSSs literature in Section 6. It also highlights the current and open research directions in this field.

2. Faceted Search

Faceted search refers to a family of look-up systems which enable users to explore, digest, analyze, and navigate through complex multidimensional information spaces [4]. It provides an easy to use interaction paradigm for users in which they use common metadata or attributes (facets) to browse the information objects being searched. Several studies have found that users like faceted search systems, they found them intuitive and easy to use [2,5].

The browsing paradigm for user interaction in information seeking systems emerged in the literature as early as the 1930s. It was originally based on facet analysis theory introduced by mathematicians in information sciences. This theory was further developed and widely adopted later in the field of information retrieval, where it gained popularity [6]. Currently, faceted search is the dominant approach used in vertical search domains such as e-commerce websites, e-tourism websites and digital libraries. Terminology-wise, faceted search is also mentioned in the literature as faceted browsing, faceted navigation, multifaceted search, or guided navigation [4,6].

Formally, Tzitzikas et al. [7] define faceted search as:

“Asession-based interactive method for query formulation (commonly over a multi-dimensional information space) through simple clicks that offers an overview of the result set (groups and count information), never leading to empty results sets.”

Figure 2 shows an example of a faceted search system for Wikipedia that utilizes DBpedia relationships [8]. The system is called Faceted Wikipedia Search. Faceted search refers to a family of look-up systems which enable users to explore, digest, analyze, and navigate through complex multidimensional information spaces [4]. It provides an easy-to-use interaction paradigm for users in which they use common metadata or attributes (facets) to browse the information objects being searched. Several studies have found that users like faceted search systems; they found them intuitive and easy to use [2,5].

The browsing paradigm for user interaction in information-seeking systems emerged in the literature as early as the 1930s. It was originally based on facet analysis theory introduced by mathematicians in information sciences. This theory was further developed and widely adopted later in the field of information retrieval, where it gained popularity [6]. Currently, faceted search is the dominant approach used in vertical search domains such as e-commerce websites, e-tourism websites, and digital libraries. Terminology-wise, faceted search is also mentioned in the literature as faceted browsing, faceted navigation, multifaceted search, or guided navigation [4,6].

Formally, Tzitzikas et al. [7] define faceted search as:

“Asession-based interactive method for query formulation (commonly over a multi-dimensional information space) through simple clicks that offers an overview of the result set (groups and count information), never leading to empty results sets.”

Figure 2 shows an example of a faceted search system for Wikipedia that utilizes DBpedia relationships [8]. The system is called Faceted Wikipedia Search. The interface displays the facets and their values on the left side of the screen. Facets based on the type of the searched objects are displayed first in a separate box (Item Type Selection), followed by other groups of facets. The interface displays the facets and their values on the left side of the screen. Facets based on the type of the searched objects are displayed first in a separate box (Item Type Selection), followed by other groups of facets.

In Faceted Wikipedia Search, facet values are ordered by count. The interface presents the three facet values with the highest count for display and makes the remaining values for each facet available through a ‘more’ button. The top panel in the middle helps to keep track of the selected facets, whilst the middle of the page shows a ranked list of resources that satisfy the currently applied facet filters or conditions.

2.1. The Search Process

The search process in a typical FSS involves a number of iterative steps [4], in which the searchers can:

Type or refine a search query, or
Navigate through multiple, independent facet hierarchies that describe the data by drill-down (refinement) or roll-up (generalization) operations.

In an FSS which supports search queries (for example [9] and others), a typical usage scenario starts with the user entering a search query. The system processes this query to find a list of relevant resources. These resources can be RDF entities, documents, or multimedia objects, depending on the underlying information representation. In case the FSS does not support keyword search queries, initially, all the information space is considered relevant, and the first page is populated with the same starting documents each time.

Regardless of the nature of the data, FSSs assume that the data have attributes in common. The next step is to generate a set of common facets and their values. The facet values are then collected, grouped, counted, and can be organized into hierarchies. For example, if the facet values are dates, the system can try to find the best grouping by year, month, or day.

This organization gives a better understanding of the data distribution, and it is useful for minimizing the drill down time. After that, an appropriate label for each facet and its values are produced and used in the display. At this stage, the initial page is populated, and from this point forward, the user starts navigating and exploring the data [1,5,7,10,11,12,13,14,15,16,17].

2.2. User Interaction Model

A user begins an interaction with the FSS after the first page is populated with data. A typical FSS user interface allows the user to select and deselect facets and to filter the search results according to the currently selected set of facets. Such interfaces allow the user to add more than one facet in order to narrow down or restrict the search results. Users can also remove one or more facets from the current selection set in order to broaden or expand the search results. Providing users with the means to select and deselect multiple facets enables them to build complex filtering conditions to satisfy their search needs [4,7].

To avoid the user from feeling lost in the system, the current set of applied filters is displayed at all times. This is important in allowing the user to identify the current search state. At the same time, it also supports the user in deciding which action should be performed next. As soon as the searcher performs any action on the facets or search results, the FSS reflects this in the results in an interactive and responsive manner regardless of the data size or scale.

Moreover, at any state of the search session, since the search result set will never be empty, the FSS continuously aggregates, groups and organizes the searched objects in a meaningful and concise way. This improves the usability of the system and gives the searcher the ability to learn and understand the information space being searched [2,5,18,19,20] and others.

2.3. Information Needs

From the perspective of user information needs, FSSs can be divided between two classes: precision-oriented systems and recall-oriented systems [7].

2.3.1. Precision-Oriented Systems

In precision-oriented systems, users look for one target resource. For example, in e-commerce systems, the user’s intention is to locate a specific product to buy. The user intent might be meta information about a single or specific resource, such as the location of a specific store. In these systems, the FSS supports the user by presenting only relevant results and helps them to narrow down the information space quickly without becoming lost in the system. User Interface (UI) in this category should guide the user by presenting the relevant search results in a concise and focused manner.

The search task, in this case, is usually well-defined; users know what they are looking for and can recognize the sought resources and their associated facets as soon as they see them. An FSS in this category aims at finding methods to minimize user effort and time spent in locating the desired resource.

As mentioned earlier, precision-oriented systems are widely adopted in the e-commerce domain, examples are [21,22]. Technical support is another domain where FSSs have been developed to help support personnel in locating a specific troubleshooting document through facets, e.g., in order to help a customer resolve a complaint [23,24].

2.3.2. Recall-Oriented Systems

On the other hand, recall-oriented FSSs are exploratory in nature. They aim at educating users about the information space being searched, where users are typically interested in locating a group or set of resources related to their information needs rather than a single resource (like in precision-oriented systems). Users of recall-oriented FSSs carry out educational, investigative, or exploratory search tasks.

An FSS which belongs to this category implements advanced visualization, aggregation, and summarization techniques to support complex user interaction needs. Aggregation and summarization techniques help the searcher to gain insights about the content and its organization. These tools are also needed to support user navigation by going back and forth to explore sub-spaces of the data being searched in a responsive and flexible manner.

Search tasks in recall-oriented FSS are open-ended, iterative, incremental, target multiple items or attributes, and involve uncertainty since users do not know what they are looking for. The user gains knowledge and is able to focus their search intentions as they browse the data.

Examples of recall-oriented systems include [20,25]. Another system called Hippalus [16] investigated a search task in the politics domain, in which users were asked to educate themselves about the political parties participating in the elections. This task had no defined final target, and each user could take a different path to achieve it.

Digital libraries are another domain where most search tasks are recall-oriented. In this setting, a user engages with a FSS to learn about a topic of interest [26,27,28].

Identifying the type of information need in a search task has a major impact on the type of FSS that should be used. The information need governs which parts of the information space should be presented to the user and in which order. It also determines whether the FSS should aggregate and summarize all data and facilitate exploration for recall-oriented search or whether it should locate and focus only on relevant portions of the information space to present it to the user in a precision-oriented search.

This decision also affects how the search system should be evaluated; in recall-oriented systems, the user spending more time and interacting more with the system can be a good sign. By contrast, in precision-oriented systems, this can be a sign of a poorly designed FSS, as these systems aim at helping users to finish their information-seeking task as quickly as possible with minimum effort. Such systems target minimizing user effort to find the desired relevant resource.

2.4. Underlying Data Structure

The data structures used by an FSS can be categorized into three main categories:

Structured data: The underlying dataset is derived from well-structured knowledge graphs or linked data. Resources in this case are entities, with facets and their values collected from the entities’ types, ontologies, attributes, or properties. Faceted browsing is the de facto standard for navigating structured datasets [7].
However, faceted browsers based on knowledge bases still struggle when dealing with large volumes of triples. These methods require extensive querying of the triple stores to collect data about the facets and their values in order to support dynamic and interactive interfaces. Therefore, most of the systems adopting this approach are evaluated on small, domain-specific ontologies [1,7,16,20].
Several software engineering and architectural considerations are involved in deciding how the data should be stored and retrieved in RDF stores in an interactive and responsive manner. In some cases, tools such as Facetize have been developed to prepare and transform structured data for faceted search [29]. Examples of an FSS operating on this kind of dataset can be found in [1,8,9,16,20].
Unstructured data: These datasets contain unstructured data, e.g., for example, audio, images, or text such as web pages or user tweets. This data often shares some common characteristics with structured data which can be deemed as facets. Special techniques to process this data, according to the data type and search task, to extract facet values from the data are deployed. The design of these extraction methods often needs to take into account aspects such as processing time and algorithm complexity for extracting features from this kind of data. Example systems include [14,30,31,32].
Semi-structured data: Semi-structured resources are objects that have some structured attributes or metadata but are also associated with unstructured data, e.g., long textual research papers, images, or audio files. The majority of facets in these datasets are obtained from the structured part of the data (i.e., attributes and their values). However, in some cases, they can also be extracted or generated from the unstructured part, e.g., top keywords in a research paper for an academic search engine. Example systems include [3,19,33,34,35,36].

The underlying information structure in an FSS determines how the facets are extracted. In structured datasets, it is a straightforward process since the facets and their values are directly collected from the properties of the resources. The FSS is then responsible for filtering, aggregating, organizing, and ranking the facets before presenting them to the searcher. An additional step is added in FSS functioning on semi-structured and unstructured datasets to generate facets or their values. Several algorithms can be employed for this purpose depending on the search domain and search task. Considerations related to how the extracted filters will be applied to navigate the data are also important when designing these systems. They include the risk of propagating errors from the facet generation phase to the following phases of FSS. On the other hand, facet ranking methods can also utilize semi-structured datasets without having to generate the facets. In this case, facets and their values can be extracted from the structured section of the data. However, the unstructured part is utilized to generate useful features for facet ranking [37].

3. Facets

3.1. What Is a Facet?

The word facet means ‘little face’ and is used to describe the different attributes of the object. The term facet originated from Facet Theory [6] and has been extended to be used in information science. Recently, the term has been interpreted to refer to the aspects or dimensions which describe an item or information object in IR literature. Multiple independent facets, in the faceted search context, provide alternative ways of referring to the same item [6].

Facets can be used as conditional filters that facilitate browsing the information space with respect to different attributes of objects. When using such as browser, the user selects facets to “zoom into” or narrow the search results, or they can deselect facets to “zoom out”, in this way widening the search scope. They can also move from one set of resources to another using multiple navigation routes.

A recent user study aimed at understanding faceted search from the human perspective [2]. This study noted that users interact with facets from the beginning to the end of their search session. In these experiments, the authors found that searchers employ facets distinctively at different stages of the search process and that they also use the facets implicitly without applying them to the search results. In this case, facets support the searchers in learning and understanding the information space they explore. It was also observed in their findings that although most participants liked faceted search, some of them were concerned about the choice overload introduced by facets. This potential for confusion or overload illustrates the importance of carefully selecting and ranking the most relevant facets to the users.

In the IR literature, the term ‘facet’ is used to denote the criteria or the field in the resource to which the filtering will be applied. In relation to this, the term ‘facet value’ refers to the specific literal or entity value used when deciding if the resource should be included in the result set or not based on this facet. For example, in the library domain, a book’s author, title, and publication year are considered facets. The facet values, in this case, might be, for example, William Shakespeare, Romeo and Juliet, and 1595.

Niu et al. [2] argue that facets should not be confused with traditional search filters. Although the authors acknowledge that they share some common characteristics, since they are both used to exclude items that do not satisfy certain criteria, they are different. Facets cover several dimensions of the data, whereas search filters are simply applied to a single dimension.

In addition to this, facets extend the concept of filtering by covering complex data structures and hierarchies. Furthermore, they aid the user in learning and understanding the information space being searched. They also educate the user about what is available and provide a means to reach and explore the data.

3.2. Different Facet Types

Since facets are associated with a complex variety of data structures, several categorizations can be used to classify facets and their values. From a UI design perspective, Vandic et al. [21] classified facets based on the data type of the facet values they contain. According to their classifications, facets can be either Qualitative or Numeric facets. Examples of numeric facets are age and price, which contain only numerical values. Where qualitative facets are further classified into Nominal facets and Boolean facets. Boolean facets can have the values True, False, or unspecified, whereas nominal facets contain any number of literal values, e.g., product display type or movie director.

From another perspective, facets can also be categorized according to the structure of the facet values belonging to this facet [38]. Facet values can be flat, e.g., author names or colors of t-shirts, or hierarchical, e.g., the facet country with a value equal to ‘Ireland’, which belongs to ‘Europe’ in the countries taxonomy. Facet values can also be grouped into ranges, such as product price range or event dates grouped by year.

Tzitzikas et al. [7] categorized facets extracted from structured or semantic data into two main groups. In the first group, facets are extracted from isA or isSubClassOf relationships and are called Type-based Facets (t-facets). They identify types of resources in the information space. The values, in this case, can be flat, but most commonly, they belong to a multilevel taxonomy. In this case, they are also called Hierarchical Facet Categories [39].

In the second group, facets that are collected from other entity attributes or relationships with other entities are called Property-based Facets (p-facets). In contrast to the previous group, p-facets often have flat values, but they can also belong to a hierarchical taxonomy, although this is less frequently the case.

This categorization is applicable to the majority of FSSs (see Table 1) and can be adopted regardless of the underlying data structure used, i.e., beyond the semantic data representation. Structured data which involve resources with several classes can have t-facets driven from the types taxonomy. Some faceted browsers utilize only t-facets, especially when they operate on resources with rich hierarchical taxonomies.

Hearst introduced an FSS, which uses several t-facets to navigate food recipes [39]. The author suggested using multiple categorical t-facets rather than employing one large taxonomy. Other examples of systems using only t-facets are [40,41,42].

One key challenge for systems adopting t-facets is that the hierarchical taxonomy from which the t-facets are derived needs to be predefined. In the case of structured data, this taxonomy can be generated from the class ontology. In other cases, the taxonomies are manually defined by the owners of the FSS.

In the absence of an existing taxonomy, general ones such as WordNet are adopted in the literature; this involves a mapping step from resources to their corresponding WordNet types [5,38,39,43]. FacetX also attempts to overcome the taxonomy limitation by automatically constructing a taxonomy from retrieved top results [42].

Other FSSs use the two facet types but handle them separately by showing the t-facets first, so the searcher can determine the type of resources first before looking into other p-facets [8,10]. DFS [38] ranks the top t-facets first and then selects the top p-facets for each t-facet to be presented to the end user. The idea of grouping and presenting t-facets hierarchically first before other p-facets, is widely followed by e-commerce and shopping websites, where customers choose the department they are interested in first and then use other attributes to filter the search results.

Other FSSs which are based on a single resource type usually use only p-facets. The majority of the remaining FSSs mix the two facet types and handle them in the same way [30,32,36,44], many others are reported in the summary Table 1 at the end of this paper.

Understanding the characteristics of the facets in the system is crucial to the development of the FSS. It affects both the back-end design of how facets are retrieved, grouped, and ranked. It also dictates how the conditional filtering occurs on the data. On the other hand, from a front-end perspective, different organization and visualization techniques can be chosen to present the facet according to their facet values, types, and structure.

Table 1. Summary of Faceted Search Systems in the Literature.

Method Name	Year	Information Need	Data Structure	Domain	Evaluation	Facets
Method Name	Year	Information Need	Data Structure	Domain	Evaluation	Types	Ranking	Handling	Generation
Flamenco [5]	2003	R	Semi	Images	T	F	A	Same	-
OntoViews [11]	2004	R	Yes	Museums	-	F	M	Diff	M
Dakka et al. [43]	2005	-	Semi	Images + TV + Web	O	TF	IS	-	NLP
Faceted Categories [39]	2006	R	No	Food	T	TF	A	-	-
MSpace [13,45]	2006	R	Yes	Music	S	F	M + U	Same	Attr.
BrowseRDF [1]	2006	P	Yes	Digital Libraries+Criminal Records	T	F	IS	Same	Attr.
Koren et al. [3]	2008	P	Semi	Movies	S	F	CF + U	Same	Attr.
AFGF [40]	2010	P	No	Medical+Digital Libraries	O	TF	IS + C + Q	-	NLP
Facetedpedia [10]	2010	R	Yes	Wikipedia	T	F	IS + Q	Diff.	Attr.
Zowl et al. [46]	2010	P	No	Images	S	F	L	Same	NLP + EL
Faceted Wikipedia [8]	2010	R	Yes	Wikipedia	-	F	IS	Same	Attr.
Factic [15,47]	2010	P	Yes	Digital Libraries + Jobs + Images	T	F	U + L	Same	Attr.
FACeTOR [48]	2010	P	Semi	Cars + Movies	T	F	IS	Same	Attr.
AdaptiveTwitter [30]	2011	P	No	Social Media	S	F	U	Same	NLP
Let et al. [18]	2012	R	Yes	General Web	-	F	U	Same	Attr.
IOS [49]	2012	R	No	Digital Libraries + Fishery	T	TF	IS	-	EL
Faccy [22]	2013	P	Semi	E-commerce	S	F	IS + Q	Same	Attr.
Sah and Wade [25]	2013	R	Yes	Tourism	-	TF	U + C	-	Attr.
Liberman and Lempel [50]	2014	P	Semi	General Web	S	TF	L	-	M
FWS [14]	2014	-	No	General Web	S	TF	C + Q	-	NLP
FacetTree [51]	2014	R	Semi	Digital Libraries	T	F	-	Same	M + NLP
FeRoSA [26]	2016	R	Semi	Digital Libraries	T	F	M	Same	-
Hippalus [16]	2016	R	Yes	Politics + Sports + Marine	T	TF	M + U	Same	M
Vandic et al. [21]	2017	P	Semi	E-commerce	S	F	Q + IS	Same	Attr.
Facet Embeddings [27]	2017	R	No	Digital Libraries	O	TF	C	-	NLP
SemFacet [20,52]	2017	R	Yes	E-commerce	O	F	IS	Same	Attr.
SemanticScholar [19,28]	2018	R	Yes	Digital Libraries	-	F	IS	Same	Attr.
Bivens et al. [23]	2019	P	No	Technical Support	-	F	Q + U	Same	-
Feddoul et al. [53]	2019	R	Yes	Wikidata	T	F	IS	Same	Attr.
NaLa-Searc [44]	2020	P	Yes	Medical	T	F	-	Same	Attr.
Chantamunee et al. [36]	2020	P	Semi	Movies	S	F	CF	Same	Attr.
DFS [24,38]	2020	P	No	Technical Support	S	F	Q	Diff.	NLP + EL
FacetX [42]	2020	P	No	Jobs + Food + Movies	-	TF	-	-	NLP
Ali et al. [37,54,55]	2021	P	Semi	Tourism	S	T	U	-	Attr.
He et al. [56]	2021	R	No	Email Search	T	F	U + IS	Diff.	NLP
HSEarch [32]	2021	P	No	Medical	T	F	IS	Same	NLP + EL
PreFace [41,57]	2021	R	No	Digital Libraries	T	TF	Q + IS		NLP + EL
Glass et al. [58]	2021	P	No	Technical Quality Assurance	S	F	Q	Same	NLP
RelFacet [59]	2021	R	Yes	DBpedia	T	PF	IS	-	Attr.
Knowledge Explorer [60]	2022	R	Yes	Geospatial	-	F	M + A	Diff.	Attr.
Schoegje et al. [61]	2022	R	No	Government	T	F	A	Same	M
Gollub et al. [62]	2023	R	Yes	DL	T	F	IS	Same	Attr.
Relatedly [63]	2023	R	No	DL	T	F	M + Q	Diff.	M + NLP
Sampo UI [64,65]	2023	R	Yes	DL + Art + Law + War	T	F	A + IS	Same	Attr.

IN: Information Need
- R: Recall-oriented.
- P: Precision-oriented.

Eval: Evaluation
- S: Simulation-based.
- T: Task-based.
- O: Other.

Facet Types Handling
- Same: Same.
- Diff.: Different.

Facet Types
- TF: T-facets.
- PF: P-facets.
- F: T-facets and P-facets.

Domain
- DL: Digital libraries.
- Web: General web.
- CR: Criminal records.

Facet Ranking
- A: Alphabetical.
- M: Manual.
- CF: Collaborative Filtering.
- Q: Query relevance.
- C: Content based.
- IS: Information Structure.
- U: User based (Personalized).
- L: Usage Logs.

DS: Data Structure
- Yes: Structured.
- No: Unstructured.
- Semi: Semi-structured.

Facet Generation
- M: Manual.
- Attr.:From Objects Attributes.
- NLP: Using NLP Techniques.
- EL: Entity Linking.

3.3. Facet Generation Methods

Facet generation is defined as the task of automatically discovering and extracting facets of information objects from their textual content [40]. The methods used in generating the facets and their values rely heavily on both the underlying data structure and the domain of the FSS. Several approaches have been adopted by faceted browsers in generating facets for text-based systems. In some domains, such as e-commerce websites, the facets, and their values are predefined by the administrators or domain experts.

Other FSSs built for scientific publications also have static predefined facets (e.g., author, year, publication type, keywords, etc.), but their values are automatically extracted from the articles using Natural Language Processing (NLP) techniques [19,26,27].

Faceted browsers can also generate facets automatically from a selected group of documents [3,40] or Wikipedia text [8,10,32]. These approaches employ NLP algorithms, entity recognition, and sentence parsing to identify the facets and their possible values from the unstructured text. These approaches require expensive processing and indexing of the document set and are usually domain-specific and hard to scale for large document collections.

Kong and Allan [14] made an attempt to extend the same facet generation concept to the general web. This was achieved by querying traditional search engines first to retrieve relevant documents; then, the same NLP steps were applied to generate the facets. Similar ideas allowing faceted exploration for search results were also explored by Faflios et al. [49] and Kitsos et al. [66]. Modern pre-trained deep learning models have also been adopted to generate facet values. In Relatedly [63], a research engine for digital libraries, facet values representing paragraph titles of the publications were generated using pre-trained models from unstructured text. This was designed to enable searchers to find parts of the papers that they are interested in more quickly.

NLP-based facet generation approaches are scalable to the general web. They also allow the user to digest and review large amounts of information on one page rather than requiring them to open many links on different pages, which can be a tedious and time-consuming task. However, approaches based on document collections still lack an understanding of the relationships between entities, and as the values of the facets are only those extracted from text, and these approaches might this lack coverage of all possible facet values.

A third mainstream approach in FSS relies on knowledge bases. In this case, the data source for facets is represented by entities that are linked/connected through properties or attributes. Most of the FSSs in this category are based on ontologies and RDF data [7,16,17,19].

Facets generated from knowledge bases provide a better understanding of relationships between entities and their types than those generated from unstructured data. With the increased adoption of Linked Open Data (LOD) initiatives, knowledge bases can also compete with the wider data coverage of facets and their values. Moreover, the extraction of facets based upon well structured ontologies facilitates the process of organizing, grouping, and aggregating the facet values by utilizing relationships subClassOf and subPropertyOf.

Abel et al. suggested an FSS for Twitter data [30]. The system semantically enriches tweets with DBpedia using entity linking. The generated facets are then the types of entities tagged in the tweet, and the facet values are the extracted entities.

The idea of linking unstructured text to knowledge bases to enable faceted browsing of textual items was introduced by Inan et al. [32]. They developed a system to extract entities, which enabled browsing of construction health reports using faceted search methods.

In order to avoid error propagation from the facet generation phase to the facet ranking one, Ali et al. proposed a ranking approach which uses t-facets directly derived from the dataset ontology [37,54,55]. Since no generation step is needed, the evaluation solely reflects the ranking step performance. cite my work

4. Facet Ranking

Despite the importance of Facet Ranking, it is rare to find studies dedicated solely to the investigation of this problem, nor to how this step, aside from other aspects of faceted search, affects the user’s search experience. It is usually considered a complementary step to the facet generation process. We have identified the following ranking methods in the literature related to FSSs.

These are summarized in Figure 3, which shows the identified strategies followed for ranking facets. The figure is an extended version of the one proposed by Tzitzikas et al. [7]. In our version, personalization strategies are included. The FSS literature uses single or combined strategies to select the top facets according to their use case or search task. The following sub-sections examine examples of each approach.

4.1. Manual Systems

Generally speaking, many of the established faceted search applications use a manual facet selection process, which provides users with a static list of pre-selected facets. This list is usually determined by domain experts [11,26]. In addition, they might provide ranking methods for the facet values. Other systems start with a manually defined list but give users the option to re-arrange the facets according to their preferences to support their exploration needs [16,45].

This manual selection process is subject to human bias and becomes impractical as the magnitude and dimension of the data increase. Moreover, the importance and order of facets can change during a search session and from one person to another. Predefined facets may, in any case, not be helpful to the information seekers [2].

4.2. Non-Personalized Methods

4.2.1. Information Structure-Based Ranking

Facet ranking based on the information structure of the underlying collection is a very popular family of methods in faceted search literature. In order to induce facet importance, these methods leverage the structural characteristics of the facets, their values, or the items they cover.

One of the most well-known FSSs, Flamenco, sorts the facets and facet values alphabetically [5,39]. This approach is still followed by many current FSSs, e.g., [60,61,62]. Arranging facets and facet values by frequencies in the dataset is adopted by several systems, including VisiNav, faceted Wikipedia, and many others [8,12]. A recent FSS called Sampo UI gives the user the choice of ranking the facets and their values according to their frequencies in the dataset or alphabetically [64].

FSSs that operate using predefined facets also use predefined facet ordering. They rank the facet values according to their frequencies in the dataset [19,27]. Such ranking methods neither reflect the general importance of the facets nor their relevance to the user’s interests.

Oren et al. [1] ordered the facets alphabetically; however, they used automated ranking to highlight important facets. Facets are highlighted by changing the scale of their font size in the UI. The proposed ranking method is based on the information structure of data. It computes a navigation quality score using weighted multiplication of several facet metrics. The metrics include predicate frequency, objects’ cardinality, which indicates how many items belong to this predicate, and finally, predicate balance, which is a score that reflects to what extent this predicate balances the navigation tree.

SemFacet [17,20,52] introduced a heuristic score for facet ranking. The score combines three characteristics: (1) diversity, (2) selectivity, and (3) nesting depth. The first characteristic favors facets which cover new items that are not covered by other facets. The selectivity aspect prefers facets that narrow the search space more rapidly than other facets. The third characteristic, nesting depth, prefers facets with higher depth values in the facet hierarchy. Finally, the three characteristics are combined using multiplication. The approach is based on information structure and does not take the query or the user into account.

The idea of minimizing navigation cost was also followed by FACeTOR [48]; this ranks facets based on heuristics that aim at minimizing navigation cost. The heuristic approximates a weighted version of the SetCover method. The SetCover heuristic was originally introduced in [43] and counts how many unique items are covered by this t-facet value. In order to obtain this count, a greedy approach is followed. This selects facets with the highest count, then marks all the facet’s associated items as covered, then selects the next facet as the one with the highest not covered item count, and so on.

Another information structure-based ranking method was proposed by Feddoul et al. [53]. This method takes into consideration intra-facet and inter-facet metrics. Intra-facet metrics assign scores to each facet independent of other facets. The score favors popular facets, those with fewer facet values, and facets with a similar number of facet values. On the other hand, inter-facet metrics focus on the relationships between different facets. For this, they calculate a score that favors facets that are not similar to each other, this helps to diversify the final facet list. The similarity between two facets is derived from the shortest-path and the depth between them in the knowledge graph. The method starts by calculating the intra-facet metrics, it then proceeds to sort the top N facets using the intra-facet metrics, and then follows a greedy approach to select the facets with the best inter-facet score.

Gollub et al. followed the inter-facet relationship concept in facet values ranking [62,67]. In their system, users select facets from a predefined list of facets and add them to a ‘facet pipe’. The facet values of a selected facet are then ranked according to their relevance to the values of adjacent facets in that pipe. A facet value is ranked higher if it is related to a larger number of facet values from the left or the right side of the facet pipe. This helps users to gain a better understanding of the relationships between the values of different facets.

Commercial FSSs began with the manually predefined ordering of facets. However, with the increasing amount of information available, they moved toward automatic selection and ranking methods. Faceted product shopping systems are usually precision-oriented and target minimizing the effort needed by the customers to allocate the desired product.

Faccy [22] employed a selection algorithm that aims at partitioning the search space into the most effective manner. The algorithm enables the user to finish the search task with the least amount of drill-down. This method calculates facet utility given the top-m products retrieved by the search engine after a query is submitted by the customer. Several models have been introduced to calculate this utility score. These combine the probability that this facet contains the target product and the expected drill-down effort needed to reach the same product. Finally, facets and their values are sorted according to their best utility score.

Another system for product search suggested ordering facet hierarchies in product navigation so as to minimize the drill-down needed to reach a product at any stage [21]. This approach ranks facet values based on the entropy of the products they cover and how they best split the navigation tree.

The calculated entropy is then weighted by the product count. This approach follows the steps of an older system called Facetedpedia. It also optimizes a navigational cost model to enhance the facet hierarchy ordering [10].

4.2.2. Query Relevance Based

Ranking methods in this category consider the submitted query and/or the resources retrieved in response to this query. Facetedpedia [10] introduced an algorithm to select and rank the p-facets hierarchies given the input query and the set of target articles retrieved by a keyword-search engine (document-level search engine). The algorithm employs several navigation cost metrics based on the values of the p-facets and the number of links in the target articles. The authors also developed a separate model to generate and order the Relevant Category Hierarchy (RCH), which utilizes the input query, the retrieved articles, and the cost calculated in the previous step. They refer to t-facets as RCH-Induced Facets. In order to find the top-N t-facets, they employ a hierarchical t-facet taxonomy, and the p-facets calculated cost to choose the sub-graph of t-facets with minimum cost. At the end, the t-facets are presented to the information seeker as a plain list of categories.

In a similar way, query relevance is also considered by a system called IOS [49]. The authors proposed a ranking method using frequencies of t-facets weighted by their rank in the top results. Therefore, a facet appearing in the lower results will receive a lower score than those at the top. The concept of deriving facet relevance from the relevance of the documents it covers is followed by Glass et al. [58]. After a traditional IR system ranks the set of relevant documents according to the input query, this approach calculates the facet score using a greedy method which maximizes the facet Discounted Cumulative Gain (DCG) value. The facet DCG score is obtained by aggregating the inverse of the ranks of the documents associated with this facet.

Liberman and Lempel [50] modeled a faceted search session as a query followed by a single drill down. They combined information structure with query relevance to rank facet values. The ranking finds the top facet values which promote the target document to the first page of results. Query relevance is achieved in two ways: firstly, only facet values in the returned search results will be considered in ranking; second, the score of the document returned by the search engine is utilized as an indicator of whether this facet value covers more relevant results or not.

The AFGF system links a cluster of facet terms to their corresponding WordNet taxonomy using the Jacquard distance between the terms and the input query; the similarity score is then used to prune the WordNet tree and select a facet list to be presented as main topics [40].

DFS [24,38] introduced an approach which produces categorical facet ranks based on the facet’s relevance to the input query. Following this, the top categorical facets (t-facets), as well as their most common five p-facets, are collected and presented to the user. This approach calculates the relevance between the t-facet and query based on their similarity in vector space using a K-Nearest Neighbor (KNN) algorithm. The vector space representation is generated using the searched for objects’ entity embeddings. On the other hand, Relatedly, ref. [63] followed a different approach in ranking facet values. Facet values in this system are paragraph topics extracted from publications. It ranks the facet values higher if they cover more sub-topics related to the user query. PreFace [41,57] is an FSS that retrieves concepts’ prerequisites in the form of t-facets. First, it calculates a score that reflects facet and query relevance. The score considers two aspects: (1) the similarity between the query language model and the facet language model, (2) the quality of the facet depicted by how many entities belong to each facet and the similarity between those entities, as it favors t-facets with higher count of entities and stronger inter-entity similarity.

This approach balances the trade-off between query relevance and concept diversity. To achieve this, PreFace follows a greedy approach which first selects the best facets according to their relevance score and then iterates over the remaining t-facets to pick the ones with higher diversity from the already selected facets.

4.2.3. Usage Logs Based Ranking

Ranking the facets according to statistical analysis of search query logs is adopted by Zwol et al. [68]. They use machine learning to combine signals from several information sources to train their model. In addition to this, they use user clicks as feedback to build a ground truth for the proposed learning model [46,69].

Other shopping systems have started to pre-compute facet ranks from user query and click logs [7]. This approach might reflect the importance of facets from the customer’s perspective. However, it assumes the availability of a large number of historical logs to derive these kinds of statistics.

The facet ranking methods discussed so far either rely on the structure of the underlying dataset, assume that facet relevance is associated only with query relevance, or assume that facet importance can be inferred from users’ collective usage logs. Non-personalized facet ranking methods neglect the individual user preferences. These methods may reflect a degree of relevance, but the relevance of the facets can be both user and situation dependent, which is not addressed by any of these approaches.

4.3. Personalized Methods

Although state-of-the-art approaches for faceted browsing present a wealth of research in this area, few approaches have analyzed the role of personalization in faceted search and how adapting facet ranking according to user preference can impact their search experience. Personalization is based on individual user profiles collected from the user’s explicit statement of interests or inferred from their current or previous interactions with the system. Personalization usually happens at the facet ranking phase and/or the document retrieval phase. This section reviews the limited existing work on FSS personalization methods from a facet ranking perspective.

4.3.1. Session Based Ranking

Factic [15,47] is an FSS that personalizes by building models from semantic usage logs. Semantic usage logs store details of the resources visited by the user using triples with a timestamp. Several layers of user adaption are implemented and integrated with different weights to enhance the facet relevance model. The ranking model contains in-session user behavior (highest weight), combined with the aggregated behavior of the same user collected from their previous sessions. The model also includes information from similar user profiles and global user statistics (lowest weight).

However, this model suffers from the cold start problem. It requires a considerable amount of users to provide a new user with meaningful ordering for facets. Although the ordering might reflect the general importance of the facets and their values, it still misses the opportunity to reflect individual user interests. The model takes time to implicitly infer user preferences, but users might be bored and abandon the system by then.

Interestingly, during their evaluation, the authors found that suggesting suitable ontological concepts from the user’s history improved the total task completion time and decreased the number of user clicks. This is due to the fact that these hierarchical concepts created effective navigation shortcuts.

Sah and Wade [25] also employ session-based user interaction to personalize search concepts (t-facets). As soon as the user selects a t-facet, the system re-organizes the others according to their similarity to this concept. It also personalizes (re-ranks) the retrieved documents according to the selected concepts and query expansion. The proposed system did not address the hierarchical nature of the categories.

4.3.2. History Based Ranking

Koren et al. [3] suggested a collaborative, personalized ranking by leveraging explicit user feedback about the facets. They take user ratings into consideration to build a relevant model for individuals. In conjunction with this, they also use aggregated ratings to build a collaborative model for new users in order to address the cold start problem and provide initial good facets in the absence of a user profile.

An approach to personalize facets by matching them to a user profile collected from social media was proposed by Le et al. [18]. It builds user profiles from their social media profiles. It collects objects with which the user has interacted and uses them to infer their interests; it then maps these interests to Wikipedia articles to build the user profile. This is achieved by extracting top TF.IDF terms from the user-favored social media objects as well as the articles associated with the facet. When displaying the facets, the algorithm highlights the facets according to the number of terms matched with the user profile. The same idea was followed by Nguyen et al. [31]. He et al. [56] adopted TF-IDF scores to rank the top categorical facets to the user; the categorical facets were extracted from previous user emails to provide interactive visual facets to the user.

Adaptive Twitter search system implements four strategies to rank facet values [30]. The first strategy is based on facet value frequency in tweets. The second strategy personalizes the ranking based on a user model. The user model contains entities extracted from the user’s old tweets. The facet values are weighted higher if they exist in the user profile. The third strategy favors facet values from more recent tweets. The fourth and final strategy uses diversification to order the facet values. To achieve this, it ranks the facet list using the first strategy; it then randomly picks facet values from the end of the list and assigns higher ranks to them.

Bivens et al. [23] proposed a personalized facet ranking method using query relevance and user profiles. User facet profiles are collected from previous search logs, where the query relevance is established using topic models. The query is used first to retrieve a set of relevant document results; then facets are collected from these results. The facets are ordered based on users’ facet profiles collected from their recorded usage logs.

Ali et al. [37,54,55] introduced an approach which addresses the FSS personalization problem by using user profiles collected from their historical ratings in the system. This approach is concerned with type-based facets and how they can be ranked while preserving their multilevel hierarchical structure. They introduced three different t-facet ranking methods; the first follows a probabilistic approach in ranking by promoting t-facets that the user liked before and are relevant to the current query [54]. The second approach uses vector-based BERT models to represent the topic of the t-facets and uses this with a Rocchio relevance model to decide the rank of the t-facets [55]. The final approach utilizes a deep learning neural network to rank the t-facets. It extracts features covering CF signals, personalization signals as well as query relevance signals and uses them to decide the rank of the t-facet [37].

5. Evaluating Facet Ranking and Faceted Search Systems

It is crucial to understand how existing work evaluated their FSSs. The overall goal of the evaluation of a search system depends on the search tasks to be supported and the information needs that it is intended to address. However, even for the same search task, different investigations of FSSs vary in the strategies they follow to measure the success of their proposed system.

FSS evaluation strategies can broadly be divided into two types: task-based evaluation using real users or simulated user evaluation. We survey both strategies in the following sections:

5.1. Task-Based Evaluation

In task-based approaches, researchers recruit a number of participants to experiment with the developed FSS. Participants are given a search task to fulfill using the system. Researchers employ a range of methods and metrics to assess the effectiveness of their systems. For example, Robert et al. [51] evaluate their FSS using pre- and post-task questionnaires to measure the participant’s knowledge level and satisfaction with the system. A similar approach was followed by Hippalus [16], in which the system developers prepared a user study with 38 people and asked them to watch a video tutorial, then complete a search task and answer a questionnaire. Both studies measured user satisfaction based on the questionnaire answers and used this as a metric to assess system success.

This type of evaluation can also utilize exploratory search evaluation metrics when evaluating recall-oriented systems. In this scenario, the user interaction is observed to compare different facet ordering or facet generation techniques. Measurements are collected from search logs, click behavior, and eye-tracking tools. In addition, metrics such as time to complete the task, counts of query reformulations, and task success can be used. The values of these metrics are then analyzed to study user interaction behavior with the system [1,5,7,16,21,47,56,59,62].

Although task-based evaluation is popular among recall-oriented systems, it is also adopted by precision-oriented systems. In this case, the same metrics can have different objectives; for example, precision-oriented systems aim at minimizing user clicks and task completion time, while recall-oriented systems might favor more interactions from the user and longer session times.

Factic [15], which is a precision-oriented system, followed a task-based evaluation strategy. Users were asked to engage in specific scenarios (e.g., to find jobs with specific properties) and recorded the number of clicks and the task completion time.

5.2. Simulation-Based Evaluation

Koren et al. [3] argue that user-based studies, while undoubtedly useful, are very limited because they are expensive to conduct, hard to repeat, and the number of users is usually limited, which makes their results inconclusive and not reproducible, especially in personalized search systems. They instead suggest an approach that simulates the clicking behavior of users in the context of an FSS.

They propose an evaluation approach that attempts to measure how well user information need is satisfied compared to the amount of effort required by the user. User information need is assumed to be fulfilled by locating the target resource using the FSS. Based on this idea, the proposed evaluation rewards users’ actions taken toward finding this intended target. The goal of the evaluation is to minimize the effort needed by the user to fulfill his/her search needs.

This is the most adopted simulation model for precision-oriented faceted search systems in the literature [21,22,30,37,54,55,70]. For example, the Adaptive Twitter search system [30] followed this evaluation approach to simulate users finding tweets. They use this evaluation method to compare non-personalized and personalized facet ranking techniques.

Vandic et al. [21] also adopted a simulated user evaluation method. Different models of user clicking behavior are used, and metrics related to user effort to scan facets and their values are computed, that is, how many times a facet or facet value is (de)selected. They also calculated performance measures such as the computation time needed to ordered retrieve facets, and session success, which is defined as the number of clicks needed to find the target resource.

The INEX 2011 Data Centric-Faceted Search task [70] also focused on facet ranking evaluation. The organizers adopted the same user simulation model. Moreover, they developed a number of evaluation metrics to measure the user effort needed to find the first relevant target. They also introduced other metrics to measure how many relevant results were covered by a given ranked facet list. The simulation strategy and metrics were adopted by other researchers in the field [37,54,55].

5.3. Other Methods

FSS researchers also report a variety of other metrics when evaluating their systems. For instance, Facet Embeddings [27] studies the quality of the generated facet topics in comparison to standard NLP topic-generation methods. Dakka et al. [43] also focus on evaluating the quality of the generated facets and their ranking; they defined two metrics: (1) coverage, which measures the extent to which the generated facet hierarchy covers all data (i.e., all results are reachable using the generated hierarchy), the higher the coverage, the better the system, and (2) cost, which is the average path length to reach any object from the root of the tree, the lower the cost, the better the system.

Faceted browsers also include additional performance evaluation measures, especially when the system operates on a large scale or complex data or when the system needs to carry out complex NLP to extract facets or their values and aggregate large volumes of data [4,30,52].

Performance metrics include run-time and responsiveness of the UI from the query submission until the results page is populated, query execution time, and time to update the interface after a facet is selected or deselected. In addition to this, it is common practice to report the specifications of machines used in running the experiments. These metrics are compared against other systems on different datasets with different scales (for example, number of triples).

5.4. Evaluation Domains and Collections

The literature on experimentation with faceted browsing reports a wide work on a variety of domains and search tasks. Some systems have been developed for specific search verticals. For example, faceted search is the dominant approach used in e-commerce websites; therefore, several authors use product search as a domain to evaluate their faceted search systems [21]. Flamenco and Zwol et al. [5,68] both used digital image search as the domain for their experiments. He et al. [56] evaluated their system in the email search domain, whereas Glass et al. [58] experimented with two datasets in technical question and answer systems.

Digital libraries and scientific publications are another domain that has received a great deal of attention in the faceted search literature [19,27,47,51,62,63]. This is because search tasks in this domain are exploratory in nature; they involve learning and investigating and are open-ended. The data are also semi-structured and has clear common facets that describe research papers.

Other faceted browsers have been developed based on specific data formats (e.g., semantic data), and they have been customized and evaluated using several domains. Factic [47] evaluated their system by experimenting on multiple domains: scientific publications, job offerings, and digital images. Hippalus [16] was also evaluated in multiple domains, including politics, sports, and marine species. Sampo UI is a recent FSS functioning over semantic data and with experimentation covering several domains, including Digital Libraries, Art, War, and Law [64,65].

Chantamunee et al. [34] suggested a personalized facet ranking based on Collaborative Filtering (CF), and used the MoviesLens dataset to evaluate their work. The average rating given by the user to the facet is used as groundtruth; they report RMSE values to measure the effectiveness of the ranking method. This experimental setup might be useful in prediction tasks, but it does not assess how the final facet list will assist the user in reaching their target.

Due to the novelty of the personalized facet ranking task, this area lacks groundtruth datasets for use in the evaluation process. Most of the reviewed literature in the facet ranking area uses its own custom-created datasets. The area thus lacks a standardized process to create benchmarked datasets that can be used to compare different ranking methods. In work introduced by Ali et al. [71], they overcome this by defining a framework to guide the selection of appropriate domain, search task, and dataset. They also proposed a groundtruth creation framework that customizes and transforms existing datasets to fit the purpose of the evaluation of facet ranking evaluation methods. The framework is demonstrated on two different datasets in the tourism domain.

6. Summary of FSS Classification

Table 1 shows a summary of the categories used in the existing literature related to FSS classification. The first column contains the system name (if available), otherwise, it contains the authors’ name. The publication year of the cited paper is presented in the second column. Table entries are sorted using the year column. The third column contains the information need (IN) categorization and shows whether the system is Recall-oriented (R) or precision-oriented (P).

The classification of FSS based on the underlying data structure is outlined in the fourth column (DS), with (Yes) meaning the data is fully structured, (No) meaning the data is unstructured, and finally (Semi) denoting semi-structured data. The domain(s) of evaluation of each system are listed in the fifth column, followed by the evaluation (Eval) column, which states the adopted strategy to evaluate the FSS. The values for this column are: (T) denoting task-based evaluation, (S) marking simulation-based evaluation, and (-) for unknown (not mentioned in the paper) or (O) for other methods.

The rest of the columns focus on the facets in each FSS. Types of facets used by the FSS are shown in the seventh column, with values: (TF) meaning that the FSS operates only on type-based facets, where (PF) means the system operates only on property-based facets, and finally (F) means the system uses both types of facets. The next column summarizes the facet ranking approach (see Figure 3 for notations).

The ninth column (Handling) highlights if the FSS uses the same ranking or display method for both facet types (p-facets and t-facets) or not. The value is only provided for systems that operate using both types of facets. The last column summarizes the facet generation method used: in structured systems, facets are usually derived from objects’ attributes (Attr.), whereas in other data types, they might be generated manually (M), using NLP, Entity linking (EL) or a combination of them (denoted by the plus sign). The last row of the table contains a summary of the notations.

Table 1 shows the classification of the surveyed systems using the aspects discussed in this paper. From the classification summary we can observe that the majority of FSSs operate on knowledge graphs or structured data [1,7,8,11,16,18,19,20,25,44,45,47,53,59], or the structured part of semi-structured data [3,5,26,35,43,48,50,51]. In this case the facets are derived from the existing attributes. Faceted browsers which operate on unstructured textual data are less common and generally employ NLP or Entity linking techniques to generate facets [14,23,24,30,32,39,40,41,42,46,56,58]. Other unstructured data types, such as audio or video, are not explored by FSS literature.

FSSs which are recall-oriented, generally favor task-based evaluation, as the search tasks are usually open-ended and involve uncertainties, it is hard to assess FSSs using simulated users as this does not reflect complex human cognitive behavior [5,10,16,26,27,39,49,51,53,56,59]. In this case, the evaluation domains also are exploratory in nature to support such search scenarios.

On the other hand, precision-oriented systems are suitable for some specific search scenarios and domains. Since they generally target quick task completion time, they mostly adopt simulated evaluations [3,22,24,30,46,50,58]. Personalized faceted search systems also favor simulation-based approaches, as for personalization to be proven useful, requires large scale experiments [3,30,36]. This is especially true in situations where personalization is based on historical interactions rather than current session interactions.

It is also noticeable that most FSSs developed to date operate on single domain [3,5,11,13,20,21,22,23,24,25,26,27,28,30,36,39,41,44,46,51], although some experiment with multiple domains [1,40,43,47,48], and a few FSSs attempt open domains search, e.g., for the general web [14,18,50].

6.1. Key Research Challenges

In this section, we summarize the key challenges faced while developing a new facet ranking approach:

6.1.1. Establishing Facet Relevance

With the increasing size and complexity of the collections being searched, the task of deciding which facets should be manifested to the user and in which order becomes more difficult. The relevance of the facet is subjective. It depends on several factors, including the users’ interests, the facet’s relevance to the input query, and current search contexts, as well as its general importance in the collection. All these factors contribute to the facet relevance and should be considered by the ranking approach.

Moreover, from the personalization perspective, the relevance of the facets varies not only from one person to another but also for the same person; it changes from one situation to another. Users’ knowledge, interests, and, therefore, search needs, evolve with time. Keeping track of an updated user profile and reflecting that in the search results adds difficulty to the problem.

6.1.2. Maintaining the Multilevel Taxonomy Structure

This is especially true in the t-facets case and in some cases where p-facets have a hierarchical structure. When the facets originate from a large multilevel taxonomy, the difficulty of the ranking process increases; it goes beyond computing a score for each facet; it involves other decisions, such as which levels of the taxonomy need to be displayed to the user. Additionally, the ranking needs to consider how to order and rank the facet preserving the original taxonomy without confusing the user.

6.1.3. Avoid Adding Complexity to the FSS

The facet ranking phase is usually triggered after the search engine retrieves a set of relevant documents as a response to an initial query submitted by the user. This is a document-level score generation phase that often requires heavy computation depending on the underlying IR method used. Adding more complex processing and computation for the facet ranking layer is not desirable. Instead, a light and effective method is crucial as it impacts users’ experience and their perception of the system. At the same time, the ranking method should effectively aid the searcher in narrowing and focusing the information space on retrieving the most relevant results according to the users’ interests and desires.

6.1.4. Deciding on a Search Task and Its Objectives

Moreover, defining the search task is pivotal. It affects how the ranking happens, i.e., whether it should favor covering as much information as possible by giving the user a broad idea about the topic or favor minimizing the navigation time to enable the searcher to find the target resource as quickly as possible. In addition to these two, there are many other scenarios where the ranking objective will vary according to the domain and the search task. Focusing on a well-defined search task and objective is fundamental for the development of a proper facet ranking method.

6.1.5. Evaluating Facet Ranking

The most challenging part of this research area is to find an appropriate well-established evaluation methodology. This gives confidence in the interpretation of results. Previous studies of facet ranking have not distinguished different facet types in their evaluation. For example, they did not handle the special case of type-based facets. This creates a set of challenges to be faced while evaluating the proposed approach: (1) The lack of t-facet ranking baseline methods. (2) As a result, the area also lacks existing bench-marked datasets that fit the purpose of this research. (3) Existing evaluation metrics are also developed for the generic facet ranking; they do not consider the hierarchical nature of the t-facets. This is also true for personalized facet ranking approaches.

7. Conclusions and Future Directions

This article surveyed key publications and research trends related to faceted search systems. It presented literature on faceted browsers from different perspectives. The chosen perspectives are those that are closely related or affect how the facet ranking happens. Other components, while no doubt crucial to the FSS, do not affect the facet ranking process and, therefore, are not included in this survey.

The current FSS literature rarely examines the effect that the facet ranking process has on the evaluation metrics, separate from the facet generation step and other aspects of FSS. Questions related to how errors propagate from the facet generation phase to the facet ranking phase also remain unanswered. This is a direction that needs further investigation.

The faceted search systems reviewed vary in how they use and rank the facets. A number of systems provide only t-facets to the searchers [14,16,25,27,39,40,42,43,50,57]. However, these systems do not handle the hierarchical nature of the t-facets during the ranking. The hierarchical nature of the facets is also neglected in other FSSs, which mix both t-facets and p-facets. New ranking methods which take the hierarchical nature of the facets into consideration are needed.

Systems which mix the type of facets rank them in the same way [1,3,5,13,21,22,26,30,32,47,51]. Some differentiate the t-facets only in the UI by displaying them separately from other facet types. It is less common for systems that rank both types of facets to provide a separate ranking for each facet type [10,11,24]. Type-based facets carry useful categorical characteristics which should be leveraged by the ranking algorithm.

Ranking mixed facet types usually employs techniques based on information structure, search logs, or query relevance. This ranking might indicate the general importance and the query relevance of the facets, but it does not take into account the individual user interests.

Moreover, the majority of the personalized FSSs employ the same ranking as both t-facets and p-facets [3,16,18,23,30,45,47]. Providing a personalized method for t-facets based on historical user feedback is an area which is new and not explored by the existing literature.

Although FSSs have proven to be useful in vertical search systems, adapting and developing methods to extend it to the general web remains an evolving research direction with potential benefits.

Author Contributions

Conceptualization, E.A. and A.C.; methodology, E.A.; software, E.A.; validation, E.A. and A.C.; investigation, E.A.; writing—original draft preparation, E.A.; writing—review and editing, A.C. and G.J.F.J.; supervision, A.C. and G.J.F.J.; funding acquisition, A.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the ADAPT Centre, funded by Science Foundation Ireland Research Centres Programme (Grant 13/RC/2106; 13/RC/2106_P2) and co-funded by the European Regional Development Fund.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

FSS	Facet Search Systems
CL	Collaborative Filtering
MF	Matrix Factorization
NLP	Natural Language Processing
IR	Information Retrieval
UI	User Interface

References

Oren, E.; Delbru, R.; Decker, S. Extending faceted navigation for RDF data. In International Semantic Web Conference; Springer: Berlin/Heidelberg, Germany, 2006; pp. 559–572. [Google Scholar]
Niu, X.; Fan, X.; Zhang, T. Understanding Faceted Search from Data Science and Human Factor Perspectives. ACM Trans. Inf. Syst. 2019, 37. [Google Scholar] [CrossRef]
Koren, J.; Zhang, Y.; Liu, X. Personalized interactive faceted search. In Proceedings of the 17th international conference on World Wide Web, Beijing China, 21–25 April 2008; pp. 477–486. [Google Scholar]
Ben-Yitzhak, O.; Golbandi, N.; Har’El, N.; Lempel, R.; Neumann, A.; Ofek-Koifman, S.; Sheinwald, D.; Shekita, E.; Sznajder, B.; Yogev, S. Beyond Basic Faceted Search. In Proceedings of the 2008 International Conference on Web Search and Data Mining, Palo Alto, CA, USA, 11–12 February 2008; Association for Computing Machinery: New York, NY, USA, 2008; pp. 33–44. [Google Scholar] [CrossRef]
Yee, K.P.; Swearingen, K.; Li, K.; Hearst, M. Faceted metadata for image search and browsing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Ft. Lauderdale, FL, USA, 5–10 April 2003; pp. 401–408. [Google Scholar]
Dumais, S. Faceted Search. In Encyclopedia of Database Systems; Springer: Boston, MA, USA, 2009; pp. 1103–1109. [Google Scholar] [CrossRef]
Tzitzikas, Y.; Manolis, N.; Papadakos, P. Faceted exploration of RDF/S datasets: A survey. J. Intell. Inf. Syst. 2017, 48, 329–364. [Google Scholar] [CrossRef]
Hahn, R.; Bizer, C.; Sahnwaldt, C.; Herta, C.; Robinson, S.; Bürgle, M.; Düwiger, H.; Scheel, U. Faceted Wikipedia Search. In Proceedings of the Business Information Systems: 13th International Conference, BIS 2010, Berlin, Germany, 3–5 May 2010; Abramowicz, W., Tolksdorf, R., Eds.; Springer: Berlin/Heidelberg, Germany, 2010; pp. 1–11. [Google Scholar] [CrossRef] [Green Version]
Papadaki, M.E.; Spyratos, N.; Tzitzikas, Y. Towards Interactive Analytics over RDF Graphs. Algorithms 2021, 14, 34. [Google Scholar] [CrossRef]
Li, C.; Yan, N.; Roy, S.B.; Lisham, L.; Das, G. Facetedpedia: Dynamic Generation of Query-dependent Faceted Interfaces for Wikipedia. In Proceedings of the 19th International Conference on World Wide Web, Raleigh North, CA, USA, 26–30 April 2010; ACM: New York, NY, USA, 2010; pp. 651–660. [Google Scholar] [CrossRef]
Mäkelä, E.; Hyvönen, E.; Saarela, S.; Viljanen, K. OntoViews—A Tool for Creating Semantic Web Portals. In Proceedings of the Semantic Web—ISWC 2004: Third International Semantic Web Conference, Hiroshima, Japan, 7–11 November 2004; McIlraith, S.A., Plexousakis, D., van Harmelen, F., Eds.; Springer: Berlin/Heidelberg, Germany, 2004; pp. 797–811. [Google Scholar] [CrossRef] [Green Version]
Harth, A. VisiNav: A system for visual search and navigation on web data. Web Semant. Sci. Serv. Agents World Wide Web 2010, 8, 348–354. [Google Scholar] [CrossRef]
Wilson, M.L.; White, R.W.; White, R.W. Evaluating advanced search interfaces using established information-seeking models. J. Am. Soc. Inf. Sci. Technol. 2009, 60, 1407–1422. [Google Scholar] [CrossRef] [Green Version]
Kong, W.; Allan, J. Extending faceted search to the general web. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, Shanghai, China, 3–7 November 2014; pp. 839–848. [Google Scholar]
Tvarozek, M.; Bieliková, M. Personalized faceted navigation in semantically enriched information spaces. Adv. Semant. Media Adapt. Pers. 2009, 2, 181–201. [Google Scholar]
Tzitzikas, Y.; Dimitrakis, E. Preference-enriched Faceted Search for Voting Aid Applications. IEEE Trans. Emerg. Top. Comput. 2016, 7, 218–229. [Google Scholar] [CrossRef]
Arenas, M.; Grau, B.C.; Kharlamov, E.; Marciuška, Š.; Zheleznyakov, D. Faceted search over RDF-based knowledge graphs. Web Semant. Sci. Serv. Agents World Wide Web 2016, 37, 55–74. [Google Scholar] [CrossRef]
Le, T.; Vo, B.; Duong, T.H. Personalized Facets for Semantic Search Using Linked Open Data with Social Networks. In Proceedings of the 2012 Third International Conference on Innovations in Bio-Inspired Computing and Applications, Kaohsiung, Taiwan, 26–28 September 2012; pp. 312–317. [Google Scholar] [CrossRef]
Xiong, C.; Power, R.; Callan, J. Explicit Semantic Ranking for Academic Search via Knowledge Graph Embedding. In Proceedings of the International World Wide Web Conference Committee (IW3C2), Perth, Australia, 3–7 April 2017. [Google Scholar]
Kharlamov, E.; Giacomelli, L.; Sherkhonov, E.; Grau, B.C.; Kostylev, E.V.; Horrocks, I. Semfacet: Making hard faceted search easier. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, Singapore, 6–10 November 2017; pp. 2475–2478. [Google Scholar]
Vandic, D.; Aanen, S.; Frasincar, F.; Kaymak, U. Dynamic Facet Ordering for Faceted Product Search Engines. IEEE Trans. Knowl. Data Eng. 2017, 29, 1004–1016. [Google Scholar] [CrossRef]
Vandic, D.; Frasincar, F.; Kaymak, U. Facet selection algorithms for web product search. In Proceedings of the 22nd ACM International Conference on Conference on Information & Knowledge Management, San Francisco, CA, USA, 27 October–1 November 2013; pp. 2327–2332. [Google Scholar]
Bivens, J.A.; Deng, Y.; El Maghraoui, K.; Mahindru, R.; Ramasamy, H.V.; Sarkar, S.; Wang, L. Dynamic Faceted Search. U.S. Patent 10,242,103, 26 March 2019. [Google Scholar]
Mihindukulasooriya, N.; Mahindru, R.; Chowdhury, M.F.M.; Deng, Y.; Fauceglia, N.R.; Rossiello, G.; Dash, S.; Gliozzo, A.; Tao, S. Dynamic Faceted Search for Technical Support Exploiting Induced Knowledge. In Proceedings of the Semantic Web—ISWC 2020; Pan, J.Z., Tamma, V., d’Amato, C., Janowicz, K., Fu, B., Polleres, A., Seneviratne, O., Kagal, L., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 683–699. [Google Scholar]
Sah, M.; Wade, V. Personalized Concept-Based Search and Exploration on the Web of Data Using Results Categorization. In Proceedings of the Semantic Web: Semantics and Big Data; Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S., Eds.; Springer: Berlin/Heidelberg, Germany, 2013; pp. 532–547. [Google Scholar]
Chakraborty, T.; Krishna, A.; Singh, M.; Ganguly, N.; Goyal, P.; Mukherjee, A. FeRoSA: A Faceted Recommendation System for Scientific Articles. In Proceedings of the Advances in Knowledge Discovery and Data Mining: 20th Pacific-Asia Conference, PAKDD 2016, Auckland, New Zealand, 19–22 April 2016; pp. 528–541. [Google Scholar]
Houben, G.J. Facet Embeddings for Explorative Analytics in Digital Libraries. In Proceedings of the Research and Advanced Technology for Digital Libraries: 21st International Conference on Theory and Practice of Digital Libraries, TPDL 2017, Thessaloniki, Greece, 18–21 September 2017; p. 86. [Google Scholar]
Ammar, W.; Groeneveld, D.; Bhagavatula, C.; Beltagy, I.; Crawford, M.; Downey, D.; Dunkelberger, J.; Elgohary, A.; Feldman, S.; Ha, V.; et al. Construction of the Literature Graph in Semantic Scholar. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers); Association for Computational Linguistics: New Orleans, LA, USA, 2018; pp. 84–91. [Google Scholar] [CrossRef] [Green Version]
Kokolaki, A.; Tzitzikas, Y. Facetize: An Interactive Tool for Cleaning and Transforming Datasets for Facilitating Exploratory Search. arXiv 2018, arXiv:1812.10734. [Google Scholar]
Abel, F.; Celik, I.; Houben, G.J.; Siehndel, P. Leveraging the semantics of tweets for adaptive faceted search on twitter. In The Semantic Web–ISWC 2011; Springer: Berlin/Heidelberg, Germany, 2011; pp. 1–17. [Google Scholar]
Nguyen, H.S.; Pham, H.P.; Duong, T.H.; Nguyen, T.P.T.; Le, H.M.T. Personalized Facets for Faceted Search Using Wikipedia Disambiguation and Social Network. In Advanced Computational Methods for Knowledge Engineering; Springer: Berlin/Heidelberg, Germany, 2016; pp. 229–241. [Google Scholar]
Inan, E.; Thompson, P.; Yates, T.; Ananiadou, S. HSEarch: Semantic Search System for Workplace Accident Reports. In Proceedings of the ECIR 2021: Advances in Information Retrieval; Hiemstra, D., Moens, M.F., Mothe, J., Perego, R., Potthast, M., Sebastiani, F., Eds.; Springer International Publishing: Cham, Switzerland, 2021; pp. 514–519. [Google Scholar]
Mahdi, M.N.; Ahmad, A.R.; Ismail, R. Improving Faceted Search Results for Web-based Information Exploration. Int. J. Adv. Sci. Eng. Inf. Technol. 2020, 10, 1143–1152. [Google Scholar] [CrossRef]
Chantamunee, S.; Wong, K.W.; Fung, C.C. Collaborative Filtering for Personalised Facet Selection. In Proceedings of the 10th International Conference on Advances in Information Technology; Association for Computing Machinery: New York, NY, USA, 2018. [Google Scholar] [CrossRef]
Chantamunee, S.; Wong, K.W.; Fung, C.C. Deep Autoencoder on Personalized Facet Selection. In Proceedings of the Neural Information Processing; Gedeon, T., Wong, K.W., Lee, M., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 314–322. [Google Scholar]
Chantamunee, S.; Wong, K.W.; Fung, C.C. An exploration of user–facet interaction in collaborative-based personalized multiple facet selection. Knowl.-Based Syst. 2020, 209, 106444. [Google Scholar] [CrossRef]
Ali, E.; Caputo, A.; Lawless, S.; Conlan, O. Where should I go? A deep learning approach to personalize type-based facet ranking for POI suggestion. In Proceedings of the Web Information Systems Engineering–WISE 2021: 22nd International Conference on Web Information Systems Engineering, WISE 2021, Melbourne, VIC, Australia, 26–29 October 2021; Springer International Publishing: Cham, Switzerland, 2021; pp. 207–215. [Google Scholar]
Kong, B.; Rajshree, N.; Gliozzo, A.M.; Fauceglia, N.R.; Farrell, R.G.; Chowdhury, M.F.M.; Mathur, A. Dynamic Faceted Search on a Document Corpus. US Patent App. 16/399,180, 5 November 2020. [Google Scholar]
Hearst, M.A. Clustering versus Faceted Categories for Information Exploration. Commun. ACM 2006, 49, 59–61. [Google Scholar] [CrossRef] [Green Version]
Latha, K.; Veni, K.R.; Rajaram, R. Afgf: An automatic facet generation framework for document retrieval. In Proceedings of the International Conference on Advances in Computer Engineering (ACE), Bangalore, India, 20–21 June 2010; pp. 110–114. [Google Scholar]
Upadhyay, P.; Ramanath, M. PreFace++: Faceted Retrieval of Prerequisites and Technical Data. In Proceedings of the Advances in Information Retrieval; Hiemstra, D., Moens, M.F., Mothe, J., Perego, R., Potthast, M., Sebastiani, F., Eds.; Springer International Publishing: Cham, Switzerland, 2021; pp. 554–558. [Google Scholar]
Affolter, R.; Weiler, A. FacetX: Dynamic Facet Generation for Advanced Information Filtering of Search Results. In Proceedings of the EDBT/ICDT Workshops, Copenhagen, Denmark, 30 March 2020. [Google Scholar]
Dakka, W.; Ipeirotis, P.G.; Wood, K.R. Automatic Construction of Multifaceted Browsing Interfaces. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management; Association for Computing Machinery: New York, NY, USA, 2005; pp. 768–775. [Google Scholar] [CrossRef]
Sánchez-Cervantes, J.L.; Alor-Hernández, G.; Paredes-Valverde, M.A.; Rodríguez-Mazahua, L.; Valencia-García, R. NaLa-Search: A multimodal, interaction-based architecture for faceted search on linked open data. J. Inf. Sci. 2020, 47, 0165551520930918. [Google Scholar] [CrossRef]
schraefel, m.; Wilson, M.; Russell, A.; Smith, D.A. MSpace: Improving Information Access to Multimedia Domains with Multimodal Exploratory Search. Commun. ACM 2006, 49, 47–49. [Google Scholar] [CrossRef]
van Zwol, R.; Pueyo, L.G.; Muralidharan, M.; Sigurbjornsson, B. Ranking entity facets based on user click feedback. In Proceedings of the 2010 IEEE Fourth International Conference on Semantic Computing, Pittsburgh, PA, USA, 22–24 September 2010; pp. 192–199. [Google Scholar]
Tvarožek, M.; Bieliková, M. Factic: Personalized exploratory search in the semantic web. In Proceedings of the International Conference on Web Engineering; Springer: Berlin/Heidelberg, Germany, 2010; pp. 527–530. [Google Scholar]
Kashyap, A.; Hristidis, V.; Petropoulos, M. FACeTOR: Cost-Driven Exploration of Faceted Query Results. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management; Association for Computing Machinery: New York, NY, USA, 2010; pp. 719–728. [Google Scholar] [CrossRef]
Fafalios, P.; Kitsos, I.; Marketakis, Y.; Baldassarre, C.; Salampasis, M.; Tzitzikas, Y. Web searching with entity mining at query time. In Proceedings of the Information Retrieval Facility Conference; Springer: Berlin/Heidelberg, Germany, 2012; pp. 73–88. [Google Scholar]
Liberman, S.; Lempel, R. Approximately optimal facet value selection. Sci. Comput. Program. 2014, 94, 18–31. [Google Scholar] [CrossRef]
Móro, R.; Bieliková, M.; Burger, R.; Facet Tree for Personalized Web Documents Organization. Web Information Systems Engineering—WISE 2014: 15th International Conference, Thessaloniki, Greece, 12–14 October 2014; Proceedings, Part I; Springer International Publishing: Cham, Switzerland, 2014; pp. 372–387. [Google Scholar] [CrossRef]
Kharlamov, E.; Giacomelli, L.; Sherkhonov, E.; Grau, B.C.; Kostylev, E.V.; Horrocks, I. Ranking, aggregation, and reachability in faceted search with semfacet. In Proceedings of the ISWC 2017 Posters & Demonstrations and Industry Tracks co-located with 16th International Semantic Web Conference (ISWC 2017), Vienna, Austria, 23–25 October 2017. [Google Scholar]
Feddoul, L.; Schindler, S.; Löffler, F. Automatic Facet Generation and Selection over Knowledge Graphs. In Proceedings of the Semantic Systems. The Power of AI and Knowledge Graphs; Acosta, M., Cudré-Mauroux, P., Maleshkova, M., Pellegrini, T., Sack, H., Sure-Vetter, Y., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 310–325. [Google Scholar]
Ali, E.; Caputo, A.; Lawless, S.; Conlan, O. A Probabilistic Approach to Personalize Type-based Facet Ranking for POI Suggestion. In Proceedings of the Web Engineering. ICWE 2021; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2021; Volume 12706, pp. 175–182. [Google Scholar]
Ali, E.; Caputo, A.; Lawless, S.; Conlan, O. Personalizing type-based facet ranking using BERT embeddings. In Further with Knowledge Graphs. Studies on the Semantic Web 53; IOS Press Ebooks: Amsterdam, The Netherlands, 2021. [Google Scholar]
He, C.; Micallef, L.; Serim, B.; Vuong, T.; Ruotsalo, T.; Jacucci, G. Interactive Visual Facets to Support Fluid Exploratory Search. arXiv 2021, arXiv:2108.00920. [Google Scholar]
Upadhyay, P.; Ramanath, M. PreFace: Faceted Retrieval of Prerequisites Using Domain-Specific Knowledge Bases. In Proceedings of the The Semantic Web—ISWC 2020; Pan, J.Z., Tamma, V., d’Amato, C., Janowicz, K., Fu, B., Polleres, A., Seneviratne, O., Kagal, L., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 601–618. [Google Scholar]
Glass, M.; Chowdhury, M.F.M.; Deng, Y.; Mahindru, R.; Fauceglia, N.R.; Gliozzo, A.; Mihindukulasooriya, N. Dynamic Facet Selection by Maximizing Graded Relevance. In InterNLP 2021; Association for Computational Linguistics: Toronto, Canada, 2021; p. 32. [Google Scholar]
Aso, T.; Amagasa, T.; Kitagawa, H. A system for relation-oriented faceted search over knowledge bases. Int. J. Web Inf. Syst. 2021, 17, 698–713. [Google Scholar] [CrossRef]
Liu, Z.; Gu, Z.; Thelen, T.; Estrecha, S.G.; Zhu, R.; Fisher, C.K.; D’Onofrio, A.; Shimizu, C.; Janowicz, K.; Schildhauer, M.; et al. Knowledge explorer: Exploring the 12-billion-statement KnowWhereGraph using faceted search (demo paper). In Proceedings of the 30th International Conference on Advances in Geographic Information Systems, Seattle, WA, USA, 1–4 November 2022; pp. 1–4. [Google Scholar]
Schoegje, T.; de Vries, A.; Pieters, T. Adapting a Faceted Search Task Model for the Development of a Domain-Specific Council Information Search Engine. In Proceedings of the Electronic Government; Janssen, M., Csáki, C., Lindgren, I., Loukis, E., Melin, U., Viale Pereira, G., Rodríguez Bolívar, M.P., Tambouris, E., Eds.; Springer International Publishing: Cham, Switzerland, 2022; pp. 402–418. [Google Scholar]
Gollub, T.; Brockmeyer, J.; Stein, B.; Potthast, M. Dynamic Exploratory Search for the Information Retrieval Anthology. In Proceedings of the Advances in Information Retrieval: 45th European Conference on Information Retrieval, ECIR 2023, Dublin, Ireland, 2–6 April 2023; pp. 242–247. [Google Scholar]
Palani, S.; Naik, A.; Downey, D.; Zhang, A.X.; Bragg, J.; Chang, J.C. Relatedly: Scaffolding Literature Reviews with Existing Related Work Sections. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, Hamburg, Germany, 23–28 April 2023; pp. 1–20. [Google Scholar]
Hyvönen, E. Digital Humanities on the Semantic Web: Sampo Model and Portal Series. Semant. Web 2023, 14, 729–744. [Google Scholar] [CrossRef]
Ikkala, E.; Hyvönen, E.; Rantala, H.; Koho, M. Sampo-UI: A full stack JavaScript framework for developing semantic portal user interfaces. Semant. Web 2022, 13, 69–84. [Google Scholar] [CrossRef]
Kitsos, I.; Magoutis, K.; Tzitzikas, Y. Scalable entity-based summarization of web search results using MapReduce. Distrib. Parallel Databases 2014, 32, 405–446. [Google Scholar] [CrossRef]
Gollub, T.; Hutans, L.; Al Jami, T.; Stein, B. Exploratory Search Pipes with Scoped Facets. In Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval, Santa Clara, CA, USA, 2–5 October 2019; pp. 245–248. [Google Scholar]
Van Zwol, R.; Sigurbjornsson, B.; Adapala, R.; Garcia Pueyo, L.; Katiyar, A.; Kurapati, K.; Muralidharan, M.; Muthu, S.; Murdock, V.; Ng, P.; et al. Faceted exploration of image search results. In Proceedings of the 19th international conference on World Wide Web, Raleigh North, CA, USA, 26–30 April 2010; pp. 961–970. [Google Scholar]
van Zwol, R.; Garcia Pueyo, L.; Muralidharan, M.; Sigurbjörnsson, B. Machine Learned Ranking of Entity Facets. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval; Association for Computing Machinery: New York, NY, USA, 2010; pp. 879–880. [Google Scholar] [CrossRef]
Wang, Q.; Ramírez, G.; Marx, M.; Theobald, M.; Kamps, J. Overview of the INEX 2011 Data-Centric Track. In Proceedings of the Focused Retrieval of Content and Structure; Geva, S., Kamps, J., Schenkel, R., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; pp. 118–137. [Google Scholar]
Ali, E.; Caputo, A.; Lawless, S.; Conlan, O. Dataset creation framework for personalized type-based facet ranking tasks evaluation. In Proceedings of the Experimental IR Meets Multilinguality, Multimodality, and Interaction: 12th International Conference of the CLEF Association, CLEF 2021, Virtual Event, 21–24 September 2021; Springer International Publishing: Cham, Switzerland, 2021; pp. 27–39. [Google Scholar]

Figure 1. Taxonomy of Faceted Search Aspects Relevant to Facet Ranking.

Figure 2. Faceted Wikipedia Search.

Figure 3. Summary of Facet Ranking Strategies.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ali, E.; Caputo, A.; Jones, G.J.F. A Comprehensive Survey of Facet Ranking Approaches Used in Faceted Search Systems. Information 2023, 14, 387. https://doi.org/10.3390/info14070387

AMA Style

Ali E, Caputo A, Jones GJF. A Comprehensive Survey of Facet Ranking Approaches Used in Faceted Search Systems. Information. 2023; 14(7):387. https://doi.org/10.3390/info14070387

Chicago/Turabian Style

Ali, Esraa, Annalina Caputo, and Gareth J. F. Jones. 2023. "A Comprehensive Survey of Facet Ranking Approaches Used in Faceted Search Systems" Information 14, no. 7: 387. https://doi.org/10.3390/info14070387

APA Style

Ali, E., Caputo, A., & Jones, G. J. F. (2023). A Comprehensive Survey of Facet Ranking Approaches Used in Faceted Search Systems. Information, 14(7), 387. https://doi.org/10.3390/info14070387

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Comprehensive Survey of Facet Ranking Approaches Used in Faceted Search Systems

Abstract

1. Introduction

2. Faceted Search

2.1. The Search Process

2.2. User Interaction Model

2.3. Information Needs

2.3.1. Precision-Oriented Systems

2.3.2. Recall-Oriented Systems

2.4. Underlying Data Structure

3. Facets

3.1. What Is a Facet?

3.2. Different Facet Types

3.3. Facet Generation Methods

4. Facet Ranking

4.1. Manual Systems

4.2. Non-Personalized Methods

4.2.1. Information Structure-Based Ranking

4.2.2. Query Relevance Based

4.2.3. Usage Logs Based Ranking

4.3. Personalized Methods

4.3.1. Session Based Ranking

4.3.2. History Based Ranking

5. Evaluating Facet Ranking and Faceted Search Systems

5.1. Task-Based Evaluation

5.2. Simulation-Based Evaluation

5.3. Other Methods

5.4. Evaluation Domains and Collections

6. Summary of FSS Classification

6.1. Key Research Challenges

6.1.1. Establishing Facet Relevance

6.1.2. Maintaining the Multilevel Taxonomy Structure

6.1.3. Avoid Adding Complexity to the FSS

6.1.4. Deciding on a Search Task and Its Objectives

6.1.5. Evaluating Facet Ranking

7. Conclusions and Future Directions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI