Haystack 2020 Schedule

Talks from the Search & Relevance Community at the Haystack Conference!

Day 1, Tuesday, April 28th

Time Track 1 Track 2
5:30-7:00pm Pre-Haystack Reception (included with registration)

-Any attendee who is in town is welcome
-Optional, quick, facilitated networking activity

Location: Draft Taproom Charlottesville

Day 2, Wednesday, April 29th

Time Track 1 Track 2
8:00-9:00am Registration and continental breakfast

Location: Lobby and Lounge

9:00-9:15am Welcome, Announcements and Keynote Intro

Location: Theater 5

9:15-10:00am Opening Keynote - Beyond Relevant Search

We developed an item-item collaborative recommendation algorithm derived from markov chains and co-occurrences in order to take advantage of anonymous sessions data as implicit feedback.

Doug Turnbull - Location: Theater 5

10:00-10:10am Break
10:10-10:50am How to start climbing the Relevance Mountain - and make sure you can keep climbing!

Search relevance can be daunting to a newbie, especially when there is pressure from your stakeholders to improve it. The technical jargon and high entry barrier of relevance engineering can leave a newbie feeling overwhelmed with no idea where to start. DCG, nDCG, precision@k, MAP, MRR, ERR, LTR. What does all of this mean?! Why can't you just fix the relevance algorithm and move on?

Anthony Groves - Location: Theater 5

Question Answering as Search - the Anserini Pipeline and Other Stories

In the last couple of years, we have seen enormous breakthroughs in automated Open Domain Restricted Context Question Answering, also known as Reading Comprehension, where the task is to find an answer to a question from a single document or paragraph. A potentially more useful task is to find an answer for a question from a corpus representing an entire body of knowledge...

Sujit Pal - Location: Theater 4

10:50-11:00am Break
11:00-11:40am An Exploration of Search Visualization Strategies

This talk will cover various visualization strategies to better understand your search index and the search results themselves.

Dan Worley - Location: Theater 5

Solr - Beyond the Core

Open source search technologies are used around the world in a broad tapestry of different uses and by companies of very different sizes. Industry giants like Box for use Solr for document search, Airbnb uses Solr for metadata search, and Salesforce for a mix of both. It is used by Target and Sears to help customers find the right product as fast as possible, and by WhiteHouse.gov for site search.

Adam Walz - Location: Theater 4

11:45am-1:15pm Find lunch at one of the many options available on Charlottesville's Downtown Mall
1:15-1:55pm Using Knowledge Graph to improve Enterprise Search experience

FINRA has many millions of documents and database records that staff need to search through to find information relevant to regulatory activities. Searching across the large set of documents and structured database records using relevance ranked text search does not present items together that the users know are related. Relevance ranking discriminates using TF/IDF, and related techniques, but does not bring together items that are not related by relevance.

Dmitriy Shvadskiy & Dmytro Dolgoplolov - Location: Theater 5

Context sensitive autocomplete suggestions using LSTM and Pair-wise learning

Autocomplete is a predominant feature in e-commerce search. By being relevant, Autocomplete should help users quickly find the query they intended to type with minimal keystrokes. This talk presents an approach on how this is acheived by considering the users context as a signal for re-ranking the query suggestions. A user context is based on a diverse sequence of events - searches, product interactions, category browse etc. It is generated using an LSTM model that is optimized by using pairwise ranking of queries.

Dileep Kumar Patchigolla & Minohar Sripada - Location: Theater 4

1:55-2:05pm Break
2:05-2:45pm Taxonomic Search - a powerful lever for boosting Relevance and understanding query intent

A carefully built, well applied and intuitively surfaced taxonomy can really help us reach the holy grail of search, namely increased recall and precision in results. This talk looks at how we construct taxonomies at LexisNexis that match content volume and user need to provide topics that are granular enough to be actionable, insightful and helpful in decision-making. It covers both rule-based and machine learning classification approaches to content enrichment and demonstrates how we combine these to achieve compelling and differentiating accuracy.

Mark Fea - Location: Theater 5

Evolving Relevance

A carefully built, well applied and intuitively surfaced taxonomy can really help us reach the holy grail of search, namely increased recall and precision in results. This talk looks at how we construct taxonomies at LexisNexis that match content volume and user need to provide topics that are granular enough to be actionable, insightful and helpful in decision-making. It covers both rule-based and machine learning classification approaches to content enrichment and demonstrates how we combine these to achieve compelling and differentiating accuracy.

Tim Allison - Location: Theater 4

2:45-2:55pm Break
2:55-3:35pm TBA

TBA - Location: Theater 5

How to Build your Training Set for a Learning to Rank Project

Learning to rank (LTR from now on) is the application of machine learning techniques, typically supervised, in the formulation of ranking models for information retrieval systems. With LTR becoming more and more popular (Apache Solr supports it from Jan 2017), organisations struggle with the problem of how to collect and structure relevance signals necessary to train their ranking models. This talk is a technical guide to explore and master various techniques to generate your training set(s) correctly and efficiently.

Alessandro Benedetti - Location: Theater 4

3:35-4:00pm Break

Snacks served
Optional Facilitated Networking Activity

4:15-4:45pm Lightning Talks

Location: Theater 5

4:45-5:00pm Closing Remarks

Location: Theater 5

6:00-9:00pm Haystack Party (included with registration)

-Cocktail Hour, 6-6:30pm
-Dinner service at 6:30pm
-After-dinner Entertainment at 7:30pm

Location: The Space Downtown

5:00-6:00pm Break

Day 3, Thursday, April 30th

Time Track 1 Track 2
8:00-9:30am Continental Breakfast service
9:30-10:10am Deep Learning for Search in e-commerce

NLP based Deep Learning Models for finding the intent of a Query in a particular taxonomy/categories: Description and Jupyter-notebook demonstration: Multi-Label/Multi-Class Classification Model from scratch in Keras o Feature Engineering in Spark Scala and pandas...

Sonu Sharma & Atul Argarwal - Location: Theater 5

What Do They Want? Optimizing Search When Users Enter Broad Terms

Talk about strategies used to optimize broad, generic search terms - which typically account for the majority of searches on a site. From understanding refinement data, deriving intent and then applying optimizations to get customers down the path to what they are actually looking for. Discuss the struggles when your site has a large assortment with numerous potential options.

Lisa Kowalkowski - Location: Theater 4

10:10-10:20am Break
10:20-11:00am E-Commerce Search - how JCPenney powers Site Search with Machine Learning

Search plays a critical role in E-Commerce. At JC Penney, we receive millions of customer queries of different degree in specification. The task of Search is to match it best to an extensive catalog of an online department store. To do that, JCPenney re-platformed Search in 2018, moving from a licensed search engine to SOLR, with full ownership to cloud-hosting and relevancy ranking in-house.

Martin Baumgartel & Swaminathan Krishnamurthy - Location: Theater 5

Comparing Embeddings Based Search Methods and BM25 Results

Improvements to keyword searching is providing utility but at increasing complexity of development and cost of deployment. In other words the industry is advancing to the far right of the curve of the economic law of diminishing returns. At LexisNexis we have started to explore Embeddings Based Search using bag of words and bi-directional methods. During this session we will share our approach and compare the value and cost of vectorization based search and plain jane BM25.

Bert Staub - Location: Theater 4

11:00-11:10am Break
11:10-11:50am Click logs and insights - Putting the search experts in your audience to work

(User, Query, Document) - This simple tuple helps give shape and depth to the often flat information retrieval landscape. With a few quick transformations, this dataset can help you suggest query completions, display related queries, propose synonym candidates, generate taste profiles, and forecast demand. This talk will help search engineers and product owners to get a feel for the data relationships created by this useful feedback loop. (Note that while many of these product features can be further optimized using ML, this talk requires no previous knowledge of machine learning applications in search.)

Peter Dixon-Moses - Location: Theater 5

Improving relational queries search results with bag of entities and graph search

Relational queries like “IT jobs in Virginia” or “Authors of Information Retrieval books” etc are common. Standard query parsers that analyze queries as a Bag of Words(BoW) although retrieve quality results, they fail to incorporate the context and correlation in scene. Example query such as “Authors of IR books” can possibly return writers of Search Engine, Data mining, ML, and related topics. In this implementation, a custom query parser is developed which extracts entities from the user’s search query and deduces relationships among the extracted entities to eventually incorporate the related notion into leading search query.

Rajani Maski - Location: Theater 4

11:50am-1:15pm Find lunch at one of the many options available on Charlottesville's Downtown Mall
1:15-1:55pm Web-scale considerations for using machine learned models in search

With all the recent advances in for instance neural information retrieval (such as transformer models) it is tempting to use such models as signals in your relevancy computation. However, these models are costly to evaluate, particularly over the entire corpus, so to achieve web-scale performance one must usually introduce some sort of approximations. In this talk we will take a look at how to build a search engine with traditional text search features such as BM25...

Lester Solbakken - Location: Theater 5

Top 10 Lessons learned in search projects the past 10 years

One of the things I like, working as a consultant, is working on different challenges, find parallels between the various solutions, and learn a lot along the way. I worked for customers among the most prominent e-commerce, travel, and food industries in The Netherlands. In this talk, I want to share the top 10 lessons learned from doing search projects at these customers for the past ten years.

Jettro Coenradie - Location: Theater 4

1:55-2:05pm Break
2:05-2:45pm Rewriting queries - a discussion based on the Querqy framework

In many search applications, query rewriting seems to be a forgotten or only loosely defined concept. Often, it is implemented late in the development process in order to overcome specific search relevance issues. This means that the implications of integrating several query rewriting components (e.g. synonyms and boosting queries), their impact on ranking functions, such as BM25, and their interaction with field weights are often unclear and sometimes surprising.

René Kriegler - Location: Theater 5

Understanding Scoring Through Examples

The built-in scoring mechanism in Elasticsearch and Solr can seem mysterious to beginners and experienced practitioners alike. Instead of delving into the mathematical definitions of TFxIDF and BM25, this talk will help you develop an intuitive understanding of these metrics by walking you through a series of simple examples. Each example consists of a query and list of several indexed documents.

Rudi Seitz - Location: Theater 4

2:45-2:55pm Break
2:55-3:35pm Thought Vectors, Knowledge Graphs, and Curious Death(?) of Keyword Search

The world of information retrieval is changing. BERT, Elmo, and the Sesame Street gang are moving in, shouting the gospel of 'thought vectors' as a replacement for traditional keyword search. Meanwhile many search teams are now automatically extracting graph representations of the world, trying their best to also provide more structured answers in the search experience.

Trey Grainger - Location: Theater 5

Relevance through Machine Learning-based Data Enrichment and Enhanced Visualization

Relevance through Machine Learning-based data enrichment and enhanced visualization - i.e. Helping users better understand and navigate a body of content. Presentation will provide an overview of a broad spectrum of techniques...

Christopher Ball - Location: Theater 4

3:35-3:45pm Break
3:45-4:25pm Search relevance pipelines at Shipt - machine learning, query understanding and ranking

Search relevance pipeline consists of machine learning, query understanding processes, and customized ranking with historical engagement info. Query understanding is the process of inferring the intent of the search keyword by extracting semantic meaning from the search query.

Dipak Parmar & Bart Masters - Location: Theater 5

Not all those who browse are lost - few-shot and zero-shot personalization for digital commerce using deep architectures.

Personalization in IR is one of the hottest topics in the AI-takes-all economy: we should not aim to be 'just' semantically relevant, but also tailor results to users' preferences and intent. However, personalization in digital commerce is easier said than done: most shoppers visit a given store no more than twice a year, and bounce rates across verticals show that it is important to personalize as early as possible.

Jacopo Tagliabue - Location: Theater 4

4:25-4:45pm Thanks for coming! Invitation to wander around downtown charlottesville...