Haystack US 2022

Talks from the Search & Relevance Community at the Haystack Conference!

The conference sessions were held at the Violet Crown movie theater in central Charlottesville.

This was our our Event Safety and Code of Conduct.

Day 1, Wednesday, April 27th

Time Track 1 Track 2
8:00-9:00am EDT Registration

Location: Entrance of the Violet Crown

9:00-9-45am EDT Opening Keynote

Charlie Hull
Location: Theater 5

10:00-10:45am EDT Learning a Joint Embedding Representation for Image Search using Self Supervised Means

Image search interfaces either prompt the searcher to provide a search image (image-to-image search) or a text description of the image (text-to-image search). Image to Image search is generally implemented as a nearest neighbor search in a dense image embedding space, where the embedding is derived from Neural Networks pre-trained on a large image corpus such as ImageNet. Text to image search can be implemented via traditional (TF/IDF or BM25 based) text search against image captions or image tags.

In this presentation, we describe how we fine-tuned the OpenAI CLIP model (available from Hugging Face) to learn a joint image/text embedding representation from naturally occurring image-caption pairs in literature, using contrastive learning. We then show this model in action against a dataset of medical image-caption pairs, using the Vespa search engine to support text based (BM25), vector based (ANN) and hybrid text-to-image and image-to-image search.

Sujit Pal
Location: Theater 5

An approach to modelling implicit user feedback for optimising e-commerce search

More than other domains, e-commerce search depends on implicit user feedback to optimise search result ranking as buying decision criteria such as ‘an attractive price’ and ‘brand sympathy’ are very hard to make explicit. On the other hand, this decision making can be observed implicitly in web tracking.

E-commerce search cannot just use more generally known approaches to click modelling. For example, the common assumption that users would view search results sequentially doesn't hold for grid layouts and our model will will have to deal with further contextual biases such as the device type or even time of the day.

In this talk, I shall introduce an approach to using implicit user feedback based on Bayesian hierarchical modelling. It will provide a solution for dealing with contextual biases more generally. The model will cope with varying quantities of observations and it allows to incorporate different types of events, such as clicks and checkouts.

René Kriegler
Location: Theater 7

11:00am-11:45pm EDT Beyond precision and recall – ensuring 'aboutness' in topical classification using confidence scores

Taxonomies play an important role in many LexisNexis products, allowing our customers to run searches using predefined topics either as pre or post-filters. Because our topical classification is automated, there can be a wide array of relevance in results, from the very relevant to the more marginal. We need to ensure that documents containing a heavy breadth and depth of discussion of a particular topic surface at the top of the results list. This presentation will demonstrate how stamping Confidence Scores into documents, in addition to a topic code, is crucial to achieving this goal of ‘aboutness’. It will cover experiments that ‘boost’ or ‘re-rank’ using the confidence scores and the internal tools used to measure resulting improvements in relevance. The presentation will outline the methods, both machine learning and rule-based, used to develop a confidence score with a consistent meaning across content types and underlying classification technologies.

Mark Shewhart & Sophie Lagace & Kimberly Hoffbauer
Location: Theater 5

Scalable Semantic Search for Online Learning Applications

Semantic search is one of the Course Hero's key products where a student can type her course's question and get an answer from hundreds of millions answered questions. The sentence embedding model sits at the core of the semantic search by which we generate a vector index for questions. One challenge is to pick the best embedding model in terms of accuracy & running-time. We have designed an evaluation framework where we use Quora duplicate questions and Faiss similarity search. By using this, we proved that a few of the pre-trained Sentence-BERT models outperform the Universal Sentence Encoder. This led us to run an A/B experiment where we showed that Sentence-BERT could improve search coverage rate by 35%. Next, to improve the the semantic search performance further we have started fine tuning the Sentence-BERT models with our search engagement data. We will present some of our findings and the challenges that we have encountered while working on the semantic search problem.

Kazem Jahanbakhsh
Location: Theater 7

11:45pm-1:00pm EDT Lunch on your own

Find lunch at one of the many options available on Charlottesville’s Downtown Mall

Location: Your choice!

1:15-2:00pm EDT Bayesian Optimization of Relevance at Shopify

Recently, Bayesian Optimization of a simple Elasticsearch query has been shown to deliver the best non-neural relevance on the MSMarco Ranking Task (https://www.elastic.co/blog/improving-search-relevance-with-data-driven-query-optimization). For these reasons, at Shopify, we’ve adopted bayesian optimization as a core part of our relevance experimentation workflow.

Bayesian Optimization allows machine learning optimization without needing to deploy complex model infrastructure. It optimizes any component of a query or index by finding the ideal values for boosts and other parameters. We feel it’s an important first step before introducing a complete LTR workflow.

At Shopify we want to share lessons learned from building our own search relevance bayesian optimizer from scratch. In this talk, we’ll share how it works, and how it’s used with every relevance experiment, and why it should be part of every relevance engineers toolset.

Doug Turnbull & Andy Toulis
Location: Theater 5

Big Vector Search - The Billion-Scale Approximate Nearest Neighbor Challenge

Despite the broad range of algorithms for Approximate Nearest Neighbor vector search, most empirical evaluations of algorithms have focused on smaller datasets, typically of 1 million points. However, deploying recent advances in embedding based techniques for search, recommendation and ranking at scale require ANNS indices at billion, trillion or larger scale. Barring a few recent papers, there is limited consensus on which algorithms are effective at this scale vis-`a-vis their hardware cost.

We recently completed the first Billion-Scale Approximate Nearest Neighbor Challenge (sponsored by NeurIPS2021), which compared ANNS algorithms at billion-scale by hardware cost, accuracy, and performance on 6 billion scale datasets, most of them recently introduce to the community. We set up an open source evaluation for both standardized and specialized hardware.

In this talk, we will discuss the new datasets and how we compared relative performance of the algorithms.

George Williams
Location: Theater 7

02:15-3:00pm EDT Search Radar Brainstorm

We will brainstorm updates to our Search Radar created at last year’s Haystack.

Location: Theater 5

3:15-4:45pm EDT Lightning Talks

Quick discussions about anything around search relevance!

Location: Theater 5

5:30-6:30pm EDT Haystack Reception (included with registration)

All attendees are welcome. The location is Kardinal Hall. It is about a 10 minute walk from the conference venue.

Location: Kardinal Hall

6:30-8:00pm EDT Dinner (included with registration)

All attendees are welcome. The location is Kardinal Hall. It is about a 10 minute walk from the conference venue.

Location: Kardinal Hall

Day 2, Thursday, April 28th

Time Track 1 Track 2
8:00-9:00am EDT Coffee

Location: Entrance of the Violet Crown

9:00-9:45am EDT Personalized Search - Building a prototype to infer the user's interest

In the world of Search, understanding the intend of the user is often seen as the holy grail. When a user performs multiple search and click actions while having a conversation with the search engine, then this behavior reveals a piece of her/his interest. A search engine that is aware of the user's interest is able to add a personal layer in its responses and this could add a new dimension of accuracy and value to a search implementation.

But what technology does it take to build it? What data is needed? How well does it really work?

This presentation describes the journey to find a practical implementation of a recommendation engine. It answers all the questions above and more. We'll guide you through the lessons learned while creating an engine that generates potentially interesting items for the user based on collaborative filtering and anomaly detection. We'll demonstrate a prototype where even a minimal set of user actions could lead to a personalized search experience.

Tom Burgmans
Location: Theater 5

Searching through large graphs using Elasticsearch

The National Audiovisual Institute (INA) is a repository of all French audiovisual archives, being responsible for archiving over 180 radio and television services, 24/7, since 1995. The generated metadata describing this content represents the equivalent of over 50 million documents (images, audio and video fragments, text excerpts). Due to the heterogeneity of the content, the data model is directly inspired from the conceptual models of cultural heritage, represented by a large graph with complex relations between generic entities. The challenge for building a global search engine for this particular use case is twofold: indexing speed and the implementation of complex full text search capabilities with high performance. Our talk describes the key choices for the graph representation, facilitating the indexing process of the documents, as well as the technical framework set up around Elasticsearch, implementing dedicated search APIs required by different functional areas.

Radu Pop
Location: Theater 7

10:00-10:45am EDT AI Driven Search

Nextdoor is the diverse multi-sided marketplace where on the one side we have neighbors having the most diverse sets of intents. Their interests are explicitly expressed in the search bar and form quite a unique demand vector. People are searching for local neighbors to connect with, trying to find classifieds or recommendations for their next pet projects just to name a few.

At the same time, with the latest advances in the area of Machine Learning and AI, the modern search needs to be intelligent, domain-aware, contextual, and personalized.

Using these guiding principles, the Search Team at Nextdoor has built modern, AI-Powered Search. In this talk, we will present the search advances we have conducted in the area of Query Understanding, Recall, and Ranking using principles we have listed above.

Bojan Babic
Location: Theater 5

Engagement DCG vs Subject Matter Expert DCG - Evaluating the Wisdom of the Crowd

Evaluating the relevance of a search engine result using Discounted Cumulative Gain (DCG) is a common way of quantifying query-document relevance precision. DCG may be computed using a customer engagement method or a subject matter expert (SME) evaluation method. It is a frequent but untested assumption that the results from these two methods of DCG computation are similar in size and correlate well. The difficulties involved with performing a comparison study have prevented rigorous testing of this assumption.

Using an identical set of 375 frequent Natural Language queries, the same highly optimized search engine algorithm stack, and well-defined, rigorous test methodologies, both engagement and SME DCG results are computed and compared. Results show that engagement and SME DCG results are not similar in magnitude or trend correlation across the entire set of queries. Reasons for the discrepancies, including assumptions and various biases underlying the methods are discussed.

Doug Rosenoff
Location: Theater 7

11:00-11:45am EDT Search Radar Apply

We will apply our brainstorming from yesterday’s session to update the Search Radar. Join us for a hands-on event where we want your input!

Location: Theater 5

11:45pm-1:15pm EDT Lunch on your own

Find lunch at one of the many options available on Charlottesville’s Downtown Mall

Location: Your choice!

1:15-2:15pm EDT Women in Search Panel

Women in tech are noticeably underrepresented, and women in the search space are even more rare. Join us for a multi-perspective panel discussion featuring women working in the Search field. We will talk about career development, breaking into the Search, gendered experiences in the workplace, and more! The goal of this panel is to empower women, encourage their allies, and show that Search is a welcoming field that needs diverse perspectives to thrive.

Led by Audrey Lorberfeld with panelists - Chen Karako, Jess Peck, Ellen Voorhees, Julie Tibshirani
Location: Theater 5

2:30-3:15pm EDT Building Retrieval Test Collections

Information retrieval test collections---benchmark search tasks consisting of a corpus, a query set, relevance judgments, and associated evaluation metrics---are foundational infrastructure for off-line evaluation of search systems. High-quality test collections accelerate development of effective search algorithms and facilitate technology transfer, but building large-scale, representative test collections is challenging. The Text REtrieval Conference (TREC, trec.nist.gov) has built test collections for a variety of search tasks in the past thirty years using different techniques as task and budget required. Recent examination of some TREC collections shows that they have withstood the test of time, but others have weaknesses that are hard to detect. This talk will recap lessons learned from building dozens of test collections that suggest best practices for building your own collection for your own problem.

Ellen Voorhees
Location: Theater 5

3:30-4:15pm EDT AI based approaches to improve data quality before indexing

It is well known that good data makes great search experiences. Or the other, less positive, way around : garbage in, garbage out. AI powered searches usually focus on the search itself and improving relevance on top of an already existing index. In this talk we will focus on data ingestion: optimizations and improvements that can be made by AI and machine learning algorithms to improve data quality prior to indexing it. Some examples are: enriching data with automatic categorization, improving OCR translations, improving media files transcriptions, improving crawling and web pages parsing. All these in the context of data for search engines: a use case that induces or allows some specific optimization.

Lucian Precup
Location: Theater 5

OpenSearch - Ecommerce Search & Discovery Platform- Powered by querqy

Create a personalization platform for e-commerce Search & Discovery experiences that your customers and developers will love. Powered by Querqy; an umbrella for open source tools and libraries that helps you create a powerful e-commerce search platform quickly. The focus is on optimizing search relevance from day one, beyond the out-of-the-box capabilities of the OpenSearch engines. This also includes a powerful UI tool for managing onsite search keywords and queries. It provides a OpenSearch Dashboards interface for maintaining and deploying Querqy Rules. OpenSearch - Ecommerce platform helps you Increase conversions, enable typo tolerance, synonyms, add advanced, dynamic filters to the shopping experience.

Anirudha Jadhav & Pratik Shenoy & Dr. Johannes Peter
Location: Theater 7

4:15-4:30pm EDT Closing

Location: Theater 5