Active Faceted Search

(1)

Instructor

Active Faceted Search

Sean Weber

Helsinki November 16, 2020 MSc thesis

UNIVERSITY OF HELSINKI Department of Computer Science

(2)

Faculty of Science Department of Computer Science Sean Weber

Active Faceted Search Computer Science

MSc thesis November 16, 2020 35 pages + 4 appendix pages

faceted search, bandit, exploratory search

We present Active Faceted Search, a technique which allows a user to iteratively refine search results by selecting from a set of facets that is dynamically refined with each iteration. The facets presented are selected using a contextual multi armed bandit model. We first describe the computational model of a system which implements Active Faceted Search. We also create a web application to demonstrate an example of a system that can use an active faceted search component along with more traditional search elements such as a typed query and sidebar component. We perform simulations to compare the performance of the system under different parameters. Finally, we present a user experiment in which users are instructed to perform tasks in order to compare Active Faceted Search to traditional search techniques.

ACM Computing Classification System (CCS):

Information systems Web searching and information discovery Information systems Web search engines

Information systems Search interfaces

Software and its engineering Search-based software engineering

Tekijä — Författare — Author

Työn nimi — Arbetets titel — Title

Oppiaine — Läroämne — Subject

Työn laji — Arbetets art — Level Aika — Datum — Month and year Sivumäärä — Sidoantal — Number of pages

Tiivistelmä — Referat — Abstract

Avainsanat — Nyckelord — Keywords

Säilytyspaikka — Förvaringsställe — Where deposited

Muita tietoja — övriga uppgifter — Additional information

(3)

1 Introduction

Traditional classification systems classify objects in a tree based hierarchy. Users are most familiar with this type of system as it is used in every day applications such as file systems (/home/user/documents/form.pdf), postal addresses

(country/region/zip_code/street/address/apartment), vehicle selection

(make/model/year/chassis), etc. However not all data can easily be classified in a tree structure. Often a child node may fit under several parent nodes. For example, when classifying movies, a romantic comedy can fall under /movies/romance/

romantic_comedy, or /movies/comedy/romantic_comedy. This can lead to item duplication or running into premature dead-ends while traversing the tree. Faceted classification systems extend this by assigning multiple facets to each object. Facets can be thought of as attributes that make up an item.

A faceted search system builds off faceted classification by allowing users to search for information and refine results by applying filters based on facets to the underlying data collection. For example, an implementation for searching for smart phones can include facets such as: color, brand, screen size, or supported features. When a user selects a facet, search results are updated with items that match. Multiple facets can be selected to further refine results. A faceted search system has two main advantages. The first is that it allows users to easily specify constraints in an intuitive manner. The second is that it allows the user to explore features that they may not have known about in advance [1]. For these reasons, faceted search is considered a form of exploratory search.

Faceted search systems have several advantages over typed query systems that are most commonly used in popular search engines today. One such downside of typed query systems, known as the vocabulary problem, is that it relies on the users to know ahead of time the correct terminology to search for. Often if a term has multiple synonyms, a user might search for one synonym, while the classification system uses another [2]. By presenting facets to the user, faceted search gets around this issue by ensuring only terms used in the system can be selected. The other main downside of typed query systems is that they do not allow users to iteratively refine results. Instead, users are forced to restart the search process with a new search term every time [3]. Faceted search allows users to select and remove facets when they wish to further refine search results. User also perceive faceted search systems to be more organized than non-faceted search systems [4].

There are several unsolved problems in the field of faceted search. One such problem is determining how to present facets to the user. This is trivial when there are so few facets that they can all fit in a single page. However, the problem quickly becomes apparent as the number of facets grows. There has been much research into how to determine which facets to suggest, such as ranking based on how many total items match the facet, how much the facet will filter current items, and how similar the facet is to previously selected facets.

In order to determine which facets to recommend, we need a way to measure the

(6)

similarity between facets. Much of previous research focuses on creating a prede- termined facet hierarchy to link facets together [5] [6] [7]. These can be created either manually by a human expert, or more recently by generating the hierarchy automatically using an algorithm. In a system such as this, the user does not have much involvement in facet suggestion.

We introduce active faceted search that involves the user in the facet prediction process. The system learns user preferences from interactions and recommends facets from a set that can be arbitrarily large. This creates an active learning loop between the system and the user. As the user interacts with the system, suggested facets and search results are refined according to the user’s needs.

With active faceted search, we model facet suggestion as a contextual multi-armed bandit problem [8]. The facets we suggest aim to be related to the facets already selected, but to also allow a degree of exploration. This provides a diverse set of relevant suggestions and ensures that users are not just recommended synonyms for facets already selected. On the other hand, we do not want to add any exploration effect for search results, instead opting to find the most accurate results. Because of this, search results are simply the items that are the closest match for the selected facets.

We aim to analyze the effectiveness of various tools in assisting in search tasks.

This includes an active facet component, a manually curated facet component, and an algorithmically curated facet component. We compare these to a baseline of searching using a typed query only. We also compare several combinations of these components to determine whether there is any synergy between them. We want to determine which configuration gives the best experience to users. We especially want to determine the benefits and drawbacks of the active faceted component. To demonstrate the effectiveness of our system when the number of facets is large, we use a data set with over 3000 facets.

We report an experiment which simulates retrievability of items using various search configurations. The simulation is meant to measure the theoretical performance of our system assuming the user is attempting to find a specific item. We measure the success rate of finding an item, the average number of facets that need to be selected before an item is found, and the average number of facets that are suggested, but skipped due to being irrelevant. We use this simulation to test various parameters of our model.

We also report the results of a user study where users were asked to complete several tasks, with each task using a separate configuration of components. We measure several aspects, including task completion time each task, rank performance of search results, interaction analysis, and user satisfaction of the system. We also report results of a post-study survey where users were allowed to give feedback on various components.

(7)

2 Background

2.1 Faceted Classification

The concept of faceted classification was first proposed in 1933 in a system known as Colon Classification [9]. Prior to this, documents were most commonly stored in the Dewey Decimal system, where each document is assigned a single topic and only allowed for document lookup in a tree style hierarchy [10]. With the introduction of the colon classification system, documents were assigned five different facets, with each having a range of possible values. These facets were: personality, matter, en- ergy, space, and time. It allowed users to look up documents based on a combination of facets.

2.2 Faceted Search

Faceted Search is a technique which utilizes a faceted classification system to augment a traditional query based search. It allows users to narrow down their search results by selecting which facets they are most interested in. Users explore significantly more in a faceted search system [4] [11].

The first system to utilize faceted search was Flamenco in 2003. [5] Flamenco was a system designed to retrieve images that provided hierarchical faceted metadata of the images in order to refine results. Images used were from the Thinker collection provided by the Fine Arts Museum of San Francisco. Facets were extracted from natural language descriptions of each work by matching words to categories in WordNet [12] and saving the categories with a high occurrence in the data set.

Figure 1 demonstrates the system’s UI.

2.3 Industry Applications

In more recent years, faceted search has become a popular topic of research, particularly in the subject of online commerce. It has found use on popular online stores such as Amazon, Ebay, Newegg, and Wal-Mart [13]. In has also been implemented in open source projects such as Apache Solr [7]. Faceted search is especially powerful in this domain, as a standard search can return tens of thousands of products and customers often do not know exactly whet they want ahead of time. Figure 2 shows an example of a search for smart phones on Amazon.com. The sidebar on the left provides several facets that the user can use to help narrow down search suggestions.

2.4 Adaptive Approach

An important problem of faceted search is determining which facets to show to the user, particularly when the amount of facets is large. This work has traditionally

(8)

Figure 1: Flamenco was the first system to utilize faceted search in 2003. Users are able to type in a query, or select facets from the sidebar. Users are also able to remove facets that they have previously selected. The facets selected in this example are "Asia" and "fabrics". [5]

been done by a human expert.

Adaptive Faceted Search systems are ones which can automatically modify facet rankings based on context. Much research has been done regarding offline ranking of facets. Stolz et al. presented a form of Adaptive Faceted Search which could automatically generate a facet structure for a given dataset by modeling facet space as a decision tree and determining the best split that would reduce the result space [14]. Abel et al. created a faceted search system for Twitter posts which personalized facet rankings for each user based on the context of the user’s post history. They’ve demonstrated an improvement in search rankings four times higher than faceted search without personalization [15].

In 2018 IBM has applied for a patent for dynamic faceted search which aims to automatically create facets offline and rank the presentation of facets differently for each user based on personalized user data [16].

(9)

Figure 2: An example of faceted search on Amazon.com. Here we can see the available facets: International Shipping, Amazon Prime, Eligible for Free Shipping, Cell Phone Display Size, Average Customer Review, Phone Color, and Cell Phone Features. Selecting a value for a facet filters out all products that do not match that value. Some facets, such as International Shipping, are boolean facets with only two possible values.

2.5 Dynamic Approach

While offline adaptation of faceted rankings are beneficial, we can go a step further and take a more active role to modify facet rankings as the user progresses through the search process. In 2007, Vandic et al. proposed a system which recalculated facet rankings at each search iteration [17]. In this system, facets were ranked using an algorithm which took into account how the facets were distributed among documents in the result space. As facets are selected, results are filtered, and facets are recalculated only using the filtered results.

2.6 Recent Work

Semfacet was developed in 2015 as a system for semantic faceted search. It was designed to query Resource Description Framework (RDF) datasets represented with Web Ontology Language (OWL) ontologies. Facets were represented as pairs of predicates and values. The system was able to utilize faceted structure to limit the

(10)

scope of queries in order to reduce the execution time of searches. The creators were able to show that evaluation of faceted queries in their system was feasible in polynomial time. Facet suggestions are updated with each iteration. [6]

FacetSearch was introduced in 2018 as a system to explore and analyze bibliographic data [18]. FacetSearch was extended from Paperlens, a system for visualizing trends in scientific literature [19].

Research was also done in order to find ways to improve speed of faceted search systems. Klungre introduced a technique to reduce computing time of calculating facet suggestions by indexing parts of queries. This increases speed, but with the tradeoff of a slight reduction in accuracy [20]. Mukhopadhyay et al devised a technique for Apache Spark to filter on cached results of previous queries when facets are added in order to reduce query execution time [21].

In 2019, Niu et al utilized various machine learning models to predict facet interactions and performed a user study to determine user behavior and preferences when interacting with a faceted search system [22]. Users tended to use facets throughout the entire search process. Most users in the experiment claimed that they liked the idea of faceted search but had problems with the specific implementation used in the study. They also mentioned that the facets added an element of choice overload.

The authors concluded that future faceted search implementations should be more responsive to user input.

(11)

3 Active Faceted Search

Unlike many other approaches which attempt to group facets into an ontological hierarchy, our approach simply leaves them as a flat set of boolean facets where each facet has two possible values in the user interface: selected or unselected.

Users interact with the active faceting system in the following way: at each iteration, the system takes the list of facets seen and the list of facets selected and uses them to predict a ranking of facets not yet seen. The top ranking facets are presented to the user in groups of five. The user responds by selecting a facet of interest. If the facets presented are not sufficient, they may request the next group of facets indefinitely until one is found.

Search results are then ranked according to how well they fit the combination of all selected facets. A result can be ranked highly even if it does not match a specific facet, so long as it matches well with others. That is to say that in our system, successive iterations never reduce the result space, they merely change the rankings of results. This contrasts with conventional faceted search where facet selection acts as an orthogonal filter. This allows our system to learn from feedback that is suboptimal or even conflicting.

3.1 Computational Model

We model the facet recommendation process as a contextual multi-armed bandit problem [8]. Each facet in the data is an arm of the bandit model and we use Term Frequency-Inverse Document Frequency (TF-IDF) [23] statistics to estimate the expectation of each arm. Rather than always offering the most optimal facet, this system balances the exploration/exploitation tradeoff by offering facets that are near optimal, but have a high uncertainty. The user is able to reduce this uncertainty by rewarding arms that they have found useful for the given task.

3.1.1 Weighing Facets

We begin with a set of documents D, a set of facets F, and a set of tuples, each of which associates a document with a facet. We assign weight facet-document pairs by calculate the term frequency inverse document frequency (TF-IDF) score of each pair. A standard way to measure term frequency is to simply count the occurrences of each pair. This method gives undue weight to documents with larger amounts of terms without regards to specificity of each term. We calculate the augmented term frequency by counting the frequency of the document-facet pair and dividing that by the count of the most frequently occurring term in that document.

tf(f, d) = count(f, d)

max(count(f, d) :f ∈F)

The inverse document frequency measures the rarity of a term. We calculate the

(12)

inverse document frequency of a term by dividing the total number of documents by the number of documents associated with that term. We then take the log of that number.

idf(f) = log( |D|

|d :t|)

Multiplying the augmented term frequency with the document frequency gives us the TFIDF score.

T F IDF(f, d) =tf(f, d)·idf(f)

We create a weight matrix W which associates the TF-IDF score of each facet to each document.

3.1.2 Ranking Facets

The system needs to balance exploration and exploitation while taking into account facets that were selected on previous steps. For these reasons, we have decided to choose a contextual bandit approach. The system ranks facets according to the regularized LinRel algorithm [24].

The user selects one or more facets. At each step t, we define the matrix X_t to be a subset of W which only contains selected facets. Then we obtain estimates for ordinary least squares regression for each facet f:

a_f =x_f ·(X_t^TX_t+µI)⁻¹·X_t^T

Where µis a regulation value and I is an identity matrix of size |F|.

We do not want to rank facets simply on relevance. Instead, we would like to rank them based on upper confidence bound.

ucb=a_f ·y_t+ c 2||a_f||

Where cis a variable used to adjust the confidence estimate. In future sections, we demonstrate the effect of using different values ofc.

We apply a penalty pto the UCB of facets that have already been presented to the user that is multiplied based on the number of times presented, s.

ucb=af ·yt+ c

2||af|| −p·s

(13)

3.1.3 Ranking Search Results

In contrast to facet ranking, we no not apply any sort of exploration to search results. Thus, we determine the ranking of search results by how relevant they are to selected facets. We can achieve this using standard linear regression.

aI = (X_t^TXt+µI)⁻¹·X_t^T

3.2 User Interface and Interaction Design

Figure 3: A screenshot of the user interface of the system. It is composed of four elements. A) the typed query element. B) The list of selected facets. C) The list of facets that are actively recommended to the user. D) The ranked search results with the matching and non-matching tags shown under each result.

We have created an interactive web application to demonstrate active faceted search.

In the following example, we populate our model using data from the Movielens 20M dataset [25]. This dataset contains a set of movies as well as user submitted tags associated with each movie. Here we use user-submitted tags as facets and movies as search results.

The interface of the system is composed of four main elements which are shown in Figure 3. They consist of: a typed query element (area A), a list of selected facets (area B), a list of active facets recommended to the user (area C), and a list of search results (area D). The user can add a facet through the typed query element or by selecting an actively suggested facet (area C). When a user begins to type in the typed query box, a dropdown appears which lists valid tags matching the current input. An example of the dropdown can be seen in 4. An error message is given if

(14)

the user submits an input that is not a valid tag. The user can remove a facet by clicking it in the list of selected facets. Whenever a user adds or removes a facet, the active facets and search results are recalculated using the context of selected facets.

An important problem to consider is a perceptual phenomenon know as change blindness, when users fail to recognize when objects change [26]. It may not be easy for the user to tell when the facets have been updated as the appear visually similar in the interface. In order to combat this, we add an animation minimizing the facet element and search results element after the user selects a facet. The elements may remained minimized for a second or two while calculating results, depending on the power of the computer running the system. A second animation expands the newly populated elements. These explicit transitions point out to the user that the elements have been updated.

Figure 4: A step by step illustration of a user running through the system. 1) The user searches for "dystopia". 2) Active facets are updated. The user clicks on

"cyberpunk". 3) Active facets are updated again. 4) Results are shown.

Figure 4 illustrates an interaction sequence when using the system. In step 1, the user decides that they want to watch a movie involving "dystopia". The user uses the typed query element to search for this facet. A query auto-completion helps the user select an intended facet. In step 2, the active facet element suggests a list of related facets. The user decides to refine their search by selecting an element from the list. In this case, they add "cyberpunk". In step 3, the active facet element is updated and new facets are recommended based on the previously selected facets.

Search results are also updated in response to interactions with the typed query or active facet elements. Step 2 and 3 can be repeated until the user is satisfied with the search results. In step 4, the user selects a movie that fits their criteria.

(15)

4 User Experiment

A user experiment was conducted to compare performance of various system configurations under real world conditions. We wish to see what effect, if any, the addition of an active faceting component has on traditional search configurations.

4.1 Participants

36 participants were selected for our study. Participants were mostly university students. The median age of participants is 24. There were 16 males and 20 females.

We asked users to select their highest level of education. 3 participants selected primary school as their highest level of education, 13 selected secondary school, 15 selected bachelors degree, 4 selected masters degree, and 1 selected PHD.

To further attract participants, we’ve advertised compensation for participation.

Participants were compensated with a movie ticket valued about 10 euros.

4.2 Procedure

The evaluation was conducted as an online experiment. The participants accessed the system over the Internet. Participants were first directed to a page explaining the components of the system. They were then lead to a page where they could sub- mit their personal information, including: name, email address, age, sex, and level of education. The user was then lead to the next page where a pop-up modal dialog box gave instructions for the first task. The user is given unlimited time to read the instructions, after which they click a button to dismiss the dialog. Upon dis- missal, the system was revealed and a countdown timer began. The user was given 10 minutes to complete each task. The system had a total of six tasks and six configurations. Each configuration consisted of a different combination of components.

The different configurations were:

• Typed Query

• Typed Query, Active Facet

• Typed Query, Manual Sidebar

• Typed Query, Active Facet, Manual Sidebar

• Typed Query, LDA Sidebar

• Typed Query, Active Facet, LDA Sidebar

The configuration of the system was counter balanced with each task so that every participant completed a task with each system configuration in a varying order.

(16)

The user was asked to select five movies that fit certain criteria unique to each task.

When the user completed a task by selecting the fifth movie, or when the countdown timer ran out, the user was presented with another pop up modal presenting a form where the user was asked to rate their satisfaction of each movie selection on a scale of one to five. After completing the form and dismissing the dialog, the experiment was incremented to the next task and the pop up instructions dialog for the next task was presented to the user. This process repeated for all tasks. Upon completion of all tasks, the participant was redirected to a survey where they could give feedback on the system.

4.3 Data and Search Tasks

The system was indexed with the MovieLens 20M dataset which includes 465,564 user submitted tags for 27,278 movies taken between 1995 and 2015 [25]. Low frequency tags were removed resulting in over 3000 possible facets.

A feature of MovieLens allowed users to assign arbitrary tags to movies. A database of movie and tag pairs was imported. All tags were converted to lower-case so to not count tags with different levels of capitalization as unique tags. Tags which occurred less than 20 times were removed as they were unlikely to provide value and greatly increased the size of matrices in our calculations. Term Frequency Inverse Document Frequency (TF-IDF) scores were calculated for every combination of movie and tag.

We utilized augmented frequency in order to avoid a bias for movies with larger amounts of tags. We determined related tags by measuring cosine similarity of tag vectors.

Search tasks consisted of finding five movies that fulfill a set of qualifications as specified by the task description. The qualifications were selected in a way that they would not have a direct match to tags and would require the user to engage with the system. The search tasks were:

• Select 5 movies that you would want to watch to keep you awake on a long flight

• Select 5 movies that contain romance and sci-fi, but avoid gun violence

• Select 5 movies that are funny and insightful

• Select 5 movies that you would watch with a grandparent or older relative

• Select 5 movies that you would watch if you wanted to avoid Hollywood movies

• Select 5 movies that are scary and futuristic

The participants were free to use any system functions available as they wished.

The time for completing each task was restricted to 10 minutes. All participants completed all six tasks.

(17)

4.3.1 Traditional Facet Components

We would like to compare the effectiveness of the active faceting component to other facet components that are commonly utilized in search solutions. We also would wish to see how well these different components work together. Specifically, we observe two sidebar facet components, one using manually curated facets, and the using facets automatically generated using Latent Dirichlet Allocation (LDA).

4.3.2 LDA Calculated Facet Sidebar

Latent Dirichlet Allocation is a generative model which works by assuming that a collection of documents is generated from a mixture of a set of latent topics. Each topic in turn is associated with different distributions of facets [27]. We create a sidebar by fitting a LDA model to our dataset to estimate 10 topics. We then list the top 10 facets associated with each topic. Some facets were repeated in different topics. The total number of unique facets is 53. We can see the output of the results in Table 1.

Topic 1 dystopia, space, surreal, time travel, action, thought-provoking, aliens, atmospheric, artificial intelligence, alternate reality

Topic 2 twist ending, atmospheric, fantasy, psychology, social commentary, dystopia, classic, dark comedy, surreal, magic

Topic 3 twist ending, surreal, sci-fi, psychology , stylized, dark, dystopia, violence, disturbing, fantasy

Topic 4 funny, quirky, dark comedy, romance, cult film, hilarious, surreal, black comedy, classic, quentin tarantino

Topic 5 sci-fi, quentin tarantino, atmospheric, surreal, superhero, alternate reality, violence, adventure, dystopia, stylized

Topic 6 atmospheric, comedy, sci-fi, dystopia, stylized, surreal, dark comedy, action, social commentary, violence

Topic 7 thought-provoking, sci-fi, psychology, atmospheric, twist ending, alternate reality, dreamlike, philosophy, quirky, visually appealing Topic 8 psychology, atmospheric, brad pitt, psychological, nonlinear, sur-

real, great ending, edward norton, complicated, dark comedy Topic 9 criterion, classic, atmospheric, based on a book, oscar (best pic-

ture), black and white, imdb top 250, clv, erlend’s dvds, national film registry

Topic 10 comedy, quirky, dark comedy, pixar, animation, quentin tarantino, satire, action, romance, stylized

Table 1: Facets generated by Latent Dirichlet Allocation

(18)

4.3.3 Manually Curated Facet Sidebar

Manually curated facets are simply facets that were hand picked by an expert. Since we use a dataset of movies in our experiment, we have chosen to copy a movie facet structure from a top online retailer. We chose a top online retailer so we could have a good comparison to a facet structure that is actually used in a real world situation.

Table 2 shows sidebar movie facets taken from Amazon’s online shop, with facet names modified to match facets in our dataset. The total number of unique facets is 25.

Group 1 action, adventure, comedy, documentary, drama, educational, for- eign, horror, family, thriller, romance

Group 2 bleak, exciting, feel good, funny, scary, suspenseful, touching Group 3 friendship, love, music, obsession, relationships, revenge, romance

Table 2: Manually curated facets taken from Amazon [28]

4.4 Measures

We use a mixture of measures in order to determine the fitness of our system under various configurations. In order to ensure a proper balance between search performance and user experience, several metrics are taken.

4.4.1 Task Completion Time

We measure task completion time, which is the duration from when the user dis- misses the modal to begin each task to the point when the user selects the final movie, completing that task. We do not count the time taken to read the task instructions or the time needed to complete the survey.

4.4.2 Mean Reciprocal Rank

We measure the performance of interface component configurations by measuring the mean reciprocal rank (MRR) of selected results throughout the experiment [29].

The MRR is defined as:

M RR= 1 N

N

X

i=1

rank_i

The mean reciprocal rank is a useful measure of search engine performance when the target results are known. In our case, the target documents are the documents selected by the participant in response to the task descriptions. The more effective the search system is, the higher it should rank the target documents, and the greater the mean reciprocal rank should be.

(19)

4.4.3 Rank Increase

We wish to compare how selections of different components increase the rank of target selections. We compare the mean rank increase between the typed query component, active facet component, and sidebar facet component in configurations that contain all three components. A higher rank increase signifies that the component is better at finding the target documents.

4.4.4 User Satisfaction

We give users a survey after completing all tasks asking them a series of questions about the different search setups provided. We ask them to rate each component on a scale from 1-5 on how easy the component facilitates expressing their preferences.

We also ask 9 multiple choice questions asking which component was best at a certain aspect. The full list of questions can be seen in table 6 in the next section.

(20)

5 Simulation of facet interaction

We found it helpful to utilize a simulation of how users would interact with our system in order to tune parameters prior to running any study with real users. For this simulation, we utilize the MovieLens 20M dataset. The documents we wish to find are movies, and facets are user submitted tags with TFIDF scores calculated as specified in previous sections.

5.1 c value simulation

A simulation was set up to measure the effects of different weights between exploration and exploitation. We can achieve this by adjusting thec values of the active facet component. A higher value of c leads to a higher amount of exploration, and a lower value of c leads to a higher amount of exploration. A c value of 0 means no exploration at all. For this simulation, we took a random sampling of 10,000 documents. We iterate through the list, for each document following the following algorithm:

1. The active facet component recommends a facet

2. Select that facet if it is relevant to the document. Otherwise, skip that facet.

3. Continue steps 1 and 2 until either:

• The selected document is in the top 10 search results

• The document has no more relevant facets

We determine relevance by whether the TFIDF score for the facet and document combination surpasses a certain threshold. In this simulation the relevance threshold was 0.5. One measure that we wanted to find was how many facets needed to be selected before the document was found. Obviously, a smaller number of facet selections means a smaller amount of interactions a user would have to make. We can see in Figure 5 that adding a small amount of exploration lead to a reduction in the number of selected facets. Larger values performed slightly worse, but still outperformed a cvalue of 0.

Another element to measure is how many facets were skipped throughout the experiment. Having to skip facets also contributes to the amount of interactions a user would have to make, which could be detrimental to the overall experience. We can see in Figure 6 that a small c value of 0.5 performed the best. But as the c value increased to higher amounts, the system started to perform worse, with values above 1 being worse than no exploration at all. This demonstrates that a bandit approach to facet recommendation is more effective than plain regression, provided that the amount of exploration is constrained to a limited amount.

(21)

Figure 5: Mean number of facets that needed to be selected before the document is found. We can see that adding a slight amount of exploration reduces the number of facets required.

Figure 6: Mean number of facets that were skipped per document during the simulation. A small amount of exploration was beneficial, but a high amount led to worse results than no exploration at all.

(22)

5.2 Retrievability Simulation

We performed retrievability simulations to determine how much of the document space is reachable with different configurations. We would like to see how the different sidebar configurations compare and how well the active facet assistant augments them. The algorithm for this simulation is as follows:

1. A list of sidebar facets is generated

2. The active facet component (if applicable) recommends a facet

3. Select the most relevant facet from either the sidebar or active facet component. If none are relevant, skip to the next facet in the active facet component.

4. Continue steps 2 and 3 until either:

• The selected document is in the top 10 search results

• The document has no more relevant facets

Figure 7: Percent of documents that were retrieved by sidebar configuration. We can see that the addition of the active facet assistant increased the retrievability of documents.

As we can see from the results in Figure 7, the manually curated sidebar had much lower results than the LDA calculated sidebar. This is most likely due to the fact that the manually curated sidebar had a much smaller number of facets to choose from;

with only 25 facets, compared to 53 unique facets in the LDA sidebar. Interestingly enough, when the active facet assistant was introduced, the combination of active

(23)

facet and manually curated sidebar slightly outperformed the combination of active facet and LDA generated sidebar. This may be due to the simulation overfitting when selecting from the sidebar.

We also wanted to measure the position of the document in the search results as more facets are selected. Figure 8 shows the results. Again, we can see that the manually curated sidebar does not perform well on its own, but works very nicely when combined with the active facet assistant. In all other configurations aside from manual only, we can see that the majority of the position improvement occurs within the first few iterations. After about 6 iterations, no further improvement occurs.

Figure 8: Position of the document in the search results. A lower position is more likely to be seen by the user. As more facets are selected, the document position improves under all configurations except manual curation.

The active facet component shows a finite number of recommended facets at a time (in this simulation, just one) and allows the user to skip to the next set of facets if none are relevant. For configurations that include the active facet component, we would like to see how many tags are skipped. Figure 9 shows the mean number of tags skipped at each iteration by configuration. We can see that there are a much higher mean number of tags skipped when combined with the manually curated facet sidebar. This reflects the fact that the greater number of facets in the LDA sidebar allows the user to add more context to start with.

As the number of iterations of the simulation increases, the amount of skipped tags in both configurations trends upwards. This is to be expected as each document has a finite set of relevant facets so as more facets are found, the amount of undiscovered relevant facets decreases. That decreases the probability that a the next suggested facet would match an undiscovered relevant facet.

(24)

Figure 9: The number of tags skipped in the active faceting component when paired with different sidebar components. More tags are skipped when paired with the sidebar containing manually curated facets.

(25)

6 Results

6.1 Task Completion Time

One of the most straightforward measures of a search engine’s success is the time required to find satisfactory results. Although it may not be the most important measure, users generally become frustrated when a task takes too much time.

The differences in task completion time between different configurations did not significantly differ. Table 3 and Figure 10 demonstrate the time needed to complete each task by system configuration. We can see that adding the active facet component seems to have slightly increased the time required to complete each task over the baseline of search only.

Configuration Mean Std Min 25% 50% 75% max

Active Only 201.27 176.17 25.65 74.16 133.41 289.54 714.64 Search Only 181.74 132.05 17.53 83.01 136.57 257.31 586.79 LDA + Active 160.37 120.58 26.54 85.14 119.87 191.81 586.09 Manual 176.53 121.20 15.95 89.21 161.37 226.55 600.48 Manual + Active 178.79 107.63 31.25 102.18 134.76 247.97 454.66 LDA 161.17 106.84 43.62 76.33 107.00 242.18 421.92 Table 3: Time needed to complete each task based on system configuration

Figure 10: Time needed to complete each task

In order to determine whether the difference between distributions differ significantly, we perform the Wilcoxon rank-sum test for each combination of system

(26)

configurations. As we can see in Table 4, the distribution of completion times do not differ significantly between system configurations. The largest difference was between ’Manual + Active Faceting’ and ’LDA + Active Faceting’ with a p-value of 0.102.

Active Manual Manual + Active LDA LDA + Active

Search Only 0.453 0.469 0.414 0.270 0.271

Active 0.451 0.355 0.402 0.371

Manual 0.417 0.340 0.205

Manual + Active 0.144 0.102

LDA 0.473

Table 4: P-value results of Wilcoxon–Mann–Whitney test comparing task completion time among various configurations

We can see that changing the search configuration has no significant affect on the task completion time. While these results may be disappointing, it shows that an active facet element, sidebar, or any combination can be added to a search system without negatively affecting the system’s performance.

6.2 Temporal performance analysis

In order to gauge how user behavior changes over time while using the system, we’ve analyzed user interaction of all components over time under various system configurations. Figure 11 demonstrates these results.

In every system configuration users begin by interacting with the typed query component. This is expected, as it is this the most familiar component to users. It is also the only component to appear in all configurations and is the only component which can be used to quickly select any facet. It should also be noted that the typed query component and sidebar component both appear upon starting a task, whereas the active facet assistant requires the user to select an initial facet through some other component before it can perform its calculations. We can also observe that in almost all configurations, the user abandons the typed query component early on in favor of the other two components.

When given configurations which include all three components, users began with the typed query component, then switched to the other two components, not giving specific precedence to either one.

It is interesting to note the configuration that includes the active facet and typed query components. In this configuration the users interacted with the active facet component for much longer that they have when the sidebar element was also present. Users also spent more time on each task in this configuration than in any other, even beating out the search only configuration.

(27)

Figure 11: Elements selected over time for each system configuration

6.3 Rank Analysis

We would like to know which element leads to the greatest rank increase per interaction. In order to do this, we looks at configurations that contained all three elements. We also note the documents that the user ultimately selected.

For every user interaction, we measure how the rank of selected documents changed.

Figure 12 shows the mean rank increase of each element. The results show that the active facet element contributed to the highest increase in rank by a large margin.

While the sidebar component was around equal to the search component.

In order to determine which configuration resulted in the best overall refinement of results, we analyze the rank of each movie that the user selected throughout the task. We calculate the Mean Reciprocal Rank (MRR) by taking the rank of the movie after each interaction, averaging those ranks, then taking the reciprocal of that average [29].

(28)

Figure 12: Rank increase by component

We calculate a MRR for each system configuration. We also performed a Wilcoxon- Mann-Whitney test to determine if the difference of results was significant. As we can see in Table 5, the MRR results for most configurations differed significantly.

The exception was between the Manual + Active configuration and LDA + Active configuration. This implies that the difference between sidebar structures does not matter much when the active facet component is introduced.

Active Faceting

Manual Manual + Active

LDA LDA + Active Search Only <0.001 <0.001 <0.001 <0.001 <0.001

Active Faceting 0.011 <0.001 0.026 0.0001

Manual 0.030 <0.001 0.052

Manual + Active <0.001 0.430

LDA <0.001

Table 5: P-value results of Wilcoxon–Mann–Whitney test comparing mean reciprocal rank among various configurations

We can see the MRR of all configurations in Figure 13. We can see that the configuration that only includes the search box component performs the worst. Interest- ingly enough, the MRR of configurations with either sidebar component decreased with the addition of the active facet component. This is counter to what one may expect since the active facet component leads to a greater rank increase per iteration. This discrepancy could be explained by the greater number of suggested facets overloading the user and causing them to change course throughout the search task.

(29)

Figure 13: Mean Reciprocal Rank (MRR) by system configuration

6.4 User Survey

Perhaps one of the most important measures of the system, and one of the hardest to measure objectively, is user satisfaction. We surveyed users in order to gauge their opinion of search results.

Figure 14: User rated satisfaction of results by system configuration

(30)

At the end of each task we’ve asked users to rate how satisfied they were with their findings on a scale of 1-5. We then grouped these ratings by system configuration.

Results are shown in Figure 14. We can see that the system configuration leading to the lowest user satisfaction is the one which only includes the typed query element.

Adding either a sidebar element or active facet element both increased median user satisfaction and reduced variance.

In addition to measuring user satisfaction of search results, it is important to measure user satisfaction of the search system itself. Users were directed to a final questionnaire after completing all tasks. We asked users to rate on a scale from 1-5 how each component made it easy to express their preferences. The results can be seen in Figure 15. The results show that users found the active facet and sidebar components equally effective in expressing their preferences. Both were more effective than the typed query component.

Figure 15: User submitted response of which component made it easy to express preferences

We also asked several questions asking the user to choose which component was the most effective in a certain aspect. The full list of questions can be seen in Table 6.

The results of the survey can be seen in Figure 16. The results show that the typed query component was by far the most picked when asked which component was the quickest to learn and get familiar with. In contrast, the active facet component was the least picked for this category. This should not be surprising as this is the first time users are experiencing this component. The typed query component was also the best at giving the user a sense of control in telling the system what they want.

However, the typed query component was the least chosen component in regards to recommending search criteria that matches the user’s interest. In addition, the

(31)

Survey questions

Which component helped you the most to complete the task?

Which component recommended me search criteria that best matched my in- terests?

Which component helped you to best discover new search criteria?

Which component allowed you to best express what you like/dislike?

Which component best explained why the results were recommended to me?

Which component best provided me information that was sufficient for me to make a decision?

Which component was the quickest to learn and to get familiar with?

With which component you were most convinced that you found the correct results?

With which component you felt most in control of telling the system what you want?

Table 6: Survey questions

typed query component was not as effective in helping the user understand why the search results were recommended to them. The active facet component was most effective in this regard. For most questions in the survey, the active facet component and sidebar component were somewhat equally chosen, with the sidebar facet typically being chosen slightly more.

It is apparent that users are divided in their preferences for each component. There is no single component that is preferred in all contexts, and there is no component that is universally neglected. We can determine that each component is better suited to a different problem.

(32)

Figure 16: Results of the user survey when asked which component helped the most at a certain aspect. Full questions are listed in table 6

(33)

7 Discussion

The results of the retrievability simulations shows that suggesting facets using a contextual multi-armed bandit strategy was beneficial to achieving good search results.

We have seen that adding a small amount of exploration was successful in reducing the number of facet selections needed to retrieve each document. A large amount of exploration was less effective, yet still more effective than none at all. Adding a small amount of exploration also reduced the number of facets that needed to be skipped. However by this metric, a large amount of exploration actually performs worse than no exploration at all. From this we can conclude that favoring exploration too highly leads to seeing facets that are too irrelevant. While having no exploration at all leads to seeing facets that are too similar to previously selected facets.

We could see in the simulations of the sidebar facet interactions that a manually curated facet sidebar with no active facet assistant greatly underperforms in document retrieval compared to other configurations. This can mostly be explained by the fact that the manually curated list of facets is so much smaller than the list of facets we could automatically generate with Latent Dirichlet Allocation. Further research will need to be performed on sidebar performance based on the number of facets provided. However, when the active facet component is added, the configuration with manually curated facets competes equally well to the configuration with LDA generated facets. And both configurations outperform the LDA sidebar alone.

One reason why online retailers and other organizations would prefer a small sidebar would be to save space and to not overwhelm users with too many choices, as users have stated problems of choice overload in faceted search systems in the past [22].

The tradeoff is that offering a limited selection makes it more difficult for the user to find relevant facets, and thus relevant search results. Adding the active facet component allows the system to maintain a small number of facets on screen at any given time, but the dynamic updating of facets allows for a much larger selection, and thus a greater performance increase. This would make the addition of an active facet component an attractive option for modern online search engines that wish to limit the number of facets in their user interface. If screen space is not of great concern, then the large number of LDA generated facets achieve similar results. In terms of development effort, it may be faster and easier to utilize a larger sidebar rather than implement the active facet component, as the active facet component is a more complex solution.

From the user experiments, we have seen that the addition of the active facet component does not compromise any measures of the search task. There were no significant differences in our measurements when comparing different system configurations.

This includes task completion time and user satisfaction. There was some improvement in MRR of active facet configuration and sidebar configurations over the search only configuration. We did notice that whenever a component is in place, it is actively used by participants, which conforms to previous research [30] [31]. Users were also divided on which component they preferred, leading to the conclusion

(34)

that none of the components were universally liked or disliked. Preferences for each component in the user survey varied based on how they helped in certain aspects of the task. This would imply that different components fill different niches, and that there is no universal configuration that would be optimal in all contexts. If the optimal metric for a search engine is speed, then the best configuration would most likely only contain a typed query element. But if the the goal of a search engine is to recommend new elements that match their interest, then the addition of a sidebar or active facet element would be largely beneficial.

The typed query component is still the most important component in the system.

Users will almost always use it before any other component, but will later switch to other components to refine their search. Users find it the component that is the quickest to learn and get familiar with, likely do to it being ubiquitous throughout the web. There is no doubt that it holds an extreme advantage in allowing the user to quickly select any facet in the dataset, provided they know its name ahead of time.

The autocomplete feature implemented in our system may have also given the search component a slight edge in facet exploration over traditional search components without such a feature since it doesn’t require the user have perfect recall of facet names.

Users were more divided on whether the sidebar or active facet component was the most helpful. There was little difference in behavior when looking at the temporal analysis, with users interacting with the sidebar and active facet component in similar patterns. Survey results also showed users split on whether they found the sidebar or active facet component a better fit for each of the 9 aspects. The main exception was when asked which component was the quickest to learn and get familiar with, in which very few users selected the active facet component. This is completely expected, as this is a newly introduced component and users are almost guaranteed to have come in contact with a sidebar in the past.

The active facet component was by far the most effective in increasing the rank of results per interaction. Comparatively, the typed query component and the sidebar component were about equal to each other. This has a few interesting implications.

This would imply that the active facet component offers facets that are more de- scriptive of the users intent. This would make sense when comparing to the sidebar component, since the active facet component offers a much more personalized selection of facets. Facets selected from the typed query component are mostly limited to terms the user can think of (in addition to typing suggestions) which would likely be limited to general terms. This would make the active facet component a particularly appealing choice in contexts where one would want to minimize the number of interactions.

Those who wish to create future search systems may wish to implement both sidebar components and active facet components in their systems to augment the typed query component that is already standard in most search solutions. We have seen from the results that their inclusion does not negatively affect the performance of the system, but can result in improved performance in some situations. It may come

(35)

down to user preference whether they would prefer to be shown one component over the other, or perhaps both at the same time.

It may also be valuable to have the option to opt for the active facet component in situations when a sidebar is not feasible due to other constraints. For example, the active facet component takes up less space than a sidebar, allowing it to more easily be worked into an interface where space is tightly constrained. This would be ideal for an interface designed for mobile phones. The limited number of choices offered per iteration makes it ideal for an interface that uses a number pad, such as a TV remote control. A facet could be selected by pressing its associated number.

This would certainly be less cumbersome than trying to navigate through a sidebar with such an input device. It could also be particularly appealing as a solution for various Internet of Things devices that are increasing in popularity, but have limited interface options.

(36)

8 Conclusion

We have introduced Active Faceted Search, a tool to assist users in the facet discovery and selection process. We use a online contextual bandit approach to dynamically suggest facets and refine suggestions based on the users selections. This gives users great control in the search process, while not overwhelming them with too many choices all at once.

We have created a proof of concept web application to demonstrate active faceted search as well as compare it to other existing techniques. Our system uses a large dataset consisting of tens of thousands of movies and over 3000 unique tags to act as facets. The system is able to perform calculations for active facet suggestions in real time with minimal latency to the user. We have built the system to be modular, allowing it to be run with different combinations of components.

We have also analyzed the performance of this system in a user study. We have shown that the addition of an active facet component does not compromise the quality of results, while potentially adding subjective advantages to users. We have shown that users find the addition of an active facet component about as helpful as the addition of a sidebar component in assisting to complete a search task. We have seen that when an active facet element is provided, users tend to interact with it.

It is uncertain how much of that interaction is due to the novelty of the element, or genuine user preference. Further studies with larger user groups would be helpful.

We have offered a new tool in the toolbox of elements that can be used to assist in exploratory search. Future search engines may chose to implement an active facet component to augment search, possibly with the addition of a sidebar element as well. An active facet component has similar preference among users to a sidebar component, yet may be easier to implement in a space constrained setup, such as on a mobile display or an IOT device.

(37)

References

1 G. Marchionini, “Exploratory search: from finding to understanding,” Commu- nications of the ACM, vol. 49, no. 4, pp. 41–46, 2006.

2 G. W. Furnas, T. K. Landauer, L. M. Gomez, and S. T. Dumais, “The vocabulary problem in human-system communication,” Communications of the ACM, vol. 30, no. 11, pp. 964–971, 1987.

3 B. Zheng, W. Zhang, and X. F. B. Feng, “A survey of faceted search,” Journal of Web engineering, vol. 12, no. 1&2, pp. 041–064, 2013.

4 B. Kules and B. Shneiderman, “Users can change their web search tactics: Design guidelines for categorized overviews,” Information Processing & Management, vol. 44, no. 2, pp. 463–484, 2008.

5 K.-P. Yee, K. Swearingen, K. Li, and M. Hearst, “Faceted metadata for image search and browsing,” inProceedings of the SIGCHI conference on Human factors in computing systems, pp. 401–408, ACM, 2003.

6 M. Arenas, B. C. Grau, E. Kharlamov, Š. Marciuška, and D. Zheleznyakov,

“Faceted search over rdf-based knowledge graphs,” Web Semantics: Science, Ser- vices and Agents on the World Wide Web, vol. 37, pp. 55–74, 2016.

7 C. Hostetter, “Faceted searching with apache solr,” ApacheCon US, vol. 2006, 2006.

8 L. Li, W. Chu, J. Langford, and R. E. Schapire, “A contextual-bandit approach to personalized news article recommendation,” in Proceedings of the 19th international conference on World wide web, pp. 661–670, ACM, 2010.

9 S. R. Ranganathan, Colon classification. Madras Library Association, Madras, 1939.

10 M. Dewey,Abridged decimal classification and relative index for libraries. Forest Press, 1915.

11 M. Käki, “Findex: search result categories help users when document ranking fails,” in Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 131–140, 2005.

12 G. A. Miller, “Wordnet: a lexical database for english,” Communications of the ACM, vol. 38, no. 11, pp. 39–41, 1995.

13 D. Tunkelang, “Faceted search,” Synthesis lectures on information concepts, retrieval, and services, vol. 1, no. 1, pp. 1–80, 2009.

14 A. Stolz and M. Hepp, “Adaptive faceted search for product comparison on the web of data,” in International Conference on Web Engineering, pp. 420–429, Springer, 2015.

(38)

15 F. Abel, I. Celik, G.-J. Houben, and P. Siehndel, “Leveraging the semantics of tweets for adaptive faceted search on twitter,” in International Semantic Web Conference, pp. 1–17, Springer, 2011.

16 J. A. Bivens, Y. Deng, K. El Maghraoui, R. Mahindru, H. V. Ramasamy, S. Sarkar, and L. Wang, “Dynamic faceted search,” Aug. 16 2018. US Patent App. 15/898,777.

17 D. Vandic, S. Aanen, F. Frasincar, and U. Kaymak, “Dynamic facet ordering for faceted product search engines,” IEEE Transactions on Knowledge and Data Engineering, vol. 29, no. 5, pp. 1004–1016, 2017.

18 C. K. Kreutz, P. Boesten, A. Witry, and R. Schenkel, “Facetsearch: A faceted information search and exploration prototype,”

19 B. Lee, M. Czerwinski, G. Robertson, and B. B. Bederson, “Understanding research trends in conferences using paperlens,” in CHI’05 extended abstracts on Human factors in computing systems, pp. 1969–1972, ACM, 2005.

20 V. Klungre and M. Giese, “Evaluating a faceted search index for graph data,”

in OTM Confederated International Conferences" On the Move to Meaningful Internet Systems", pp. 573–583, Springer, 2018.

21 A. Mukhopadhyay, H. Kim, and K. Anyanwu, “Scalable exploratory search on knowledge graphs using apache spark,” in 2018 IEEE 27th International Con- ference on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE), pp. 154–159, IEEE, 2018.

22 X. Niu, X. Fan, and T. Zhang, “Understanding faceted search from data science and human factor perspectives,” ACM Transactions on Information Systems (TOIS), vol. 37, no. 2, pp. 1–27, 2019.

23 G. Salton and C. Buckley, “Term-weighting approaches in automatic text retrieval,” Information processing & management, vol. 24, no. 5, pp. 513–523, 1988.

24 P. Auer, “Using upper confidence bounds for online learning,” inProceedings 41st Annual Symposium on Foundations of Computer Science, pp. 270–279, IEEE, 2000.

25 F. M. Harper and J. A. Konstan, “The movielens datasets: History and context,”

Acm transactions on interactive intelligent systems (tiis), vol. 5, no. 4, pp. 1–19, 2015.

26 P. J. Durlach, “Change blindness and its implications for complex monitoring and control systems design and operator training,” Human–Computer Interaction, vol. 19, no. 4, pp. 423–451, 2004.

(39)

27 D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” Journal of machine Learning research, vol. 3, no. Jan, pp. 993–1022, 2003.

28 Amazon, “Amazon.com: Online shopping for electronics, apparel, computers, books, dvds and more,” Jan. 2018.

29 N. Craswell, “Mean reciprocal rank.,” Encyclopedia of database systems, vol. 1703, 2009.

30 T. Bogaard, L. Hollink, J. Wielemaker, J. van Ossenbruggen, and L. Hardman,

“Metadata categorization for identifying search patterns in a digital library,”

Journal of Documentation, 2019.

31 B. Kules, R. Capra, M. Banta, and T. Sierra, “What do exploratory searchers look at in a faceted search interface?,” in Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries, pp. 313–322, 2009.

32 J. Koren, Y. Zhang, and X. Liu, “Personalized interactive faceted search,” in Proceedings of the 17th international conference on World Wide Web, pp. 477–

486, ACM, 2008.

33 M. Tvarožek et al., “Exploratory search in the adaptive social semantic web,”

Information Sciences and Technologies Bulletin of the ACM Slovakia, vol. 3, no. 1, pp. 42–51, 2011.

34 M. A. Hearst and J. O. Pedersen, “Reexamining the cluster hypothesis: scat- ter/gather on retrieval results,” in Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 76–84, ACM, 1996.

(40)

Figure 17: Typed query and active faceting only

Figure 18: Typed query and sidebar only

(41)

(42)

Figure 20: Instructions page that the user sees upon opening the system

(43)

information

(44)

Figure 23: Upon completing a task, users are asked to rate their selections

Active Faceted Search