Haystack 2020 Schedule

Talks from the Search & Relevance Community at the Haystack Conference!

Day 1, Tuesday, April 28th

Time Track 1 Track 2
5:30-7:00pm Pre-Haystack Reception (included with registration)

-Any attendee who is in town is welcome
-Optional, quick, facilitated networking activity

Location: Draft Taproom Charlottesville

Day 2, Wednesday, April 29th

Time Track 1 Track 2
8:00-9:00am Registration and continental breakfast

Location: Lobby and Lounge

9:00-9:15am Welcome, Announcements and Keynote Intro

Location: Theater 5

9:15-10:00am Opening Keynote - Beyond Relevant Search

We developed an item-item collaborative recommendation algorithm derived from markov chains and co-occurrences in order to take advantage of anonymous sessions data as implicit feedback.

Doug Turnbull - Location: Theater 5

10:00-10:10am Break
Question Answering as Search - the Anserini Pipeline and Other Stories

In the last couple of years, we have seen enormous breakthroughs in automated Open Domain Restricted Context Question Answering, also known as Reading Comprehension, where the task is to find an answer to a question from a single document or paragraph. A potentially more useful task is to find an answer for a question from a corpus representing an entire body of knowledge...

Sujit Pal - Location: Theater 4

10:50-11:00am Break
Solr - Beyond the Core

Open source search technologies are used around the world in a broad tapestry of different uses and by companies of very different sizes. Industry giants like Box for use Solr for document search, Airbnb uses Solr for metadata search, and Salesforce for a mix of both. It is used by Target and Sears to help customers find the right product as fast as possible, and by WhiteHouse.gov for site search.

Adam Walz - Location: Theater 4

11:45am-1:15pm Find lunch at one of the many options available on Charlottesville's Downtown Mall
1:15-1:55pm Using Knowledge Graph to improve Enterprise Search experience

FINRA has many millions of documents and database records that staff need to search through to find information relevant to regulatory activities. Searching across the large set of documents and structured database records using relevance ranked text search does not present items together that the users know are related. Relevance ranking discriminates using TF/IDF, and related techniques, but does not bring together items that are not related by relevance.

Dmitriy Shvadskiy & Dmytro Dolgoplolov - Location: Theater 5

1:55-2:05pm Break
2:05-2:45pm Taxonomic Search - a powerful lever for boosting Relevance and understanding query intent

A carefully built, well applied and intuitively surfaced taxonomy can really help us reach the holy grail of search, namely increased recall and precision in results. This talk looks at how we construct taxonomies at LexisNexis that match content volume and user need to provide topics that are granular enough to be actionable, insightful and helpful in decision-making. It covers both rule-based and machine learning classification approaches to content enrichment and demonstrates how we combine these to achieve compelling and differentiating accuracy.

Mark Fea - Location: Theater 5

2:45-2:55pm Break
2:55-3:35pm TBA

TBA - Location: Theater 5

3:35-4:00pm Break

Snacks served
Optional Facilitated Networking Activity

4:15-4:45pm Lightning Talks

Location: Theater 5

4:45-5:00pm Closing Remarks

Location: Theater 5

6:00-9:00pm Haystack Party (included with registration)

-Cocktail Hour, 6-6:30pm
-Dinner service at 6:30pm
-After-dinner Entertainment at 7:30pm

Location: The Space Downtown

5:00-6:00pm Break

Day 3, Thursday, April 30th

Time Track 1 Track 2
8:00-9:30am Continental Breakfast service
9:30-10:10am Deep Learning for Search in e-commerce

NLP based Deep Learning Models for finding the intent of a Query in a particular taxonomy/categories: Description and Jupyter-notebook demonstration: Multi-Label/Multi-Class Classification Model from scratch in Keras o Feature Engineering in Spark Scala and pandas...

Sonu Sharma & Atul Argarwal - Location: Theater 5

10:10-10:20am Break
10:20-11:00am E-Commerce Search - how JCPenney powers Site Search with Machine Learning

Search plays a critical role in E-Commerce. At JC Penney, we receive millions of customer queries of different degree in specification. The task of Search is to match it best to an extensive catalog of an online department store. To do that, JCPenney re-platformed Search in 2018, moving from a licensed search engine to SOLR, with full ownership to cloud-hosting and relevancy ranking in-house.

Martin Baumgartel & Swaminathan Krishnamurthy - Location: Theater 5

Comparing Embeddings Based Search Methods and BM25 Results

Improvements to keyword searching is providing utility but at increasing complexity of development and cost of deployment. In other words the industry is advancing to the far right of the curve of the economic law of diminishing returns. At LexisNexis we have started to explore Embeddings Based Search using bag of words and bi-directional methods. During this session we will share our approach and compare the value and cost of vectorization based search and plain jane BM25.

Bert Staub - Location: Theater 4

11:00-11:10am Break
Improving relational queries search results with bag of entities and graph search

Relational queries like “IT jobs in Virginia” or “Authors of Information Retrieval books” etc are common. Standard query parsers that analyze queries as a Bag of Words(BoW) although retrieve quality results, they fail to incorporate the context and correlation in scene. Example query such as “Authors of IR books” can possibly return writers of Search Engine, Data mining, ML, and related topics. In this implementation, a custom query parser is developed which extracts entities from the user’s search query and deduces relationships among the extracted entities to eventually incorporate the related notion into leading search query.

Rajani Maski - Location: Theater 4

11:50am-1:15pm Find lunch at one of the many options available on Charlottesville's Downtown Mall
1:15-1:55pm Web-scale considerations for using machine learned models in search

With all the recent advances in for instance neural information retrieval (such as transformer models) it is tempting to use such models as signals in your relevancy computation. However, these models are costly to evaluate, particularly over the entire corpus, so to achieve web-scale performance one must usually introduce some sort of approximations. In this talk we will take a look at how to build a search engine with traditional text search features such as BM25...

Lester Solbakken - Location: Theater 5

1:55-2:05pm Break
2:05-2:45pm Rewriting queries - a discussion based on the Querqy framework

In many search applications, query rewriting seems to be a forgotten or only loosely defined concept. Often, it is implemented late in the development process in order to overcome specific search relevance issues. This means that the implications of integrating several query rewriting components (e.g. synonyms and boosting queries), their impact on ranking functions, such as BM25, and their interaction with field weights are often unclear and sometimes surprising.

René Kriegler - Location: Theater 5

Understanding Scoring Through Examples

The built-in scoring mechanism in Elasticsearch and Solr can seem mysterious to beginners and experienced practitioners alike. Instead of delving into the mathematical definitions of TFxIDF and BM25, this talk will help you develop an intuitive understanding of these metrics by walking you through a series of simple examples. Each example consists of a query and list of several indexed documents.

Rudi Seitz - Location: Theater 4

2:45-2:55pm Break
Relevance through Machine Learning-based Data Enrichment and Enhanced Visualization

Relevance through Machine Learning-based data enrichment and enhanced visualization - i.e. Helping users better understand and navigate a body of content. Presentation will provide an overview of a broad spectrum of techniques...

Christopher Ball - Location: Theater 4

3:35-3:45pm Break
3:45-4:25pm Search relevance pipelines at Shipt - machine learning, query understanding and ranking

Search relevance pipeline consists of machine learning, query understanding processes, and customized ranking with historical engagement info. Query understanding is the process of inferring the intent of the search keyword by extracting semantic meaning from the search query.

Dipak Parmar & Bart Masters - Location: Theater 5

4:25-4:45pm Thanks for coming! Invitation to wander around downtown charlottesville...