Haystack 2021 Schedule
Talks from the Search & Relevance Community at the Haystack Conference!
Wednesday, September 29th, 2021
All times listed are EDT.
Location: Conference Room
Peter Morville - Location: Conference Room and Online
Script Scores and back again - A tale of merchandising algorithms in Elasticsearch
This talk is about the flexibility and performance of painless scripting in Elasticsearch. That flexibility brought big wins to the business at SimpleTire and it continues to do so by being easily adaptable as new merchandising insights are discovered. This story has 4 chapters: 1) Inherited a multi-objective ranking model that was built via an ETL job and stored in SQL for index time inclusion. 2) Tasked with migrating the algorithm into Elastic for real time scoring. 3) Solved using Painless scripting for maximum flex-ability and extend-ability for the future. 4) Long term wins: a-b testing easiness, weighting adjustments, bottom line value, rescoring for personalization This talk is generally valuable to any e-commerce team because it walks through the process of converting business logic into search logic, all with an eye on improving the bottom line.
Nate Day - Location: Conference Room and Online
The web search bootstrapping problem
In this talk, we will review how the recent breakthroughs in AI are exploited to create search engines totally based on AI-generated data - thus eliminating the need to collect users’ data and solving the cold-start problem. In particular, we will focus on: 1. How can we generate search queries that are almost identical to a real user’s ones? 2. How can we exploit the generated queries to predict user intent?
Roi Krakovski - Location: Conference Room and Online
Semantic Product Search – Vector Search for E-Commerce
Information retrieval today is undergoing a paradigm shift, away from the prevailing techniques of the past few decades. Increasingly the focus is moving away from keyword and entity driven search and the inverted index that supports those approaches, in favor of more complex models supported by dense instead of sparse index structures. Neural IR models for retrieval and ranking are becoming increasingly popular but building and scaling these systems presents many challenges. In this talk we present an overview of the current state of Neural IR from the perspective of a large e-commerce company. Among the topics covered will be extracting signals from the clickstream, transformer models and ‘do you need one?’, augmenting the inverted index by predicting keywords, and hard negative mining. We will also cover an exciting new research area, framing semantic search as an Extreme Multi-Label Classification problem, and why the future of semantic search may lie in machine-learned indexes.
Simon Hughes - Location: Conference Room and Online
Find lunch at one of the many options available on Charlottesville's Downtown Mall
The Text REtrieval Conference (TREC)
The TREC project at the National Institute of Standards and Technology has created standard test sets and evaluation methodology to support the development of methods for content-based access to material structured for human consumption since 1992. Starting with (massive-for-the-time) two gigabytes of newswire text and progressing to web-scale data collections, TREC has examined a variety of tasks including question answering, retrieving digital video, web search, legal discovery, secondary use of electronic health records, and sentiment analysis in blogs and tweets. TREC's "coopetition" paradigm emphasizes individual experiments evaluated on a benchmark task. This has had three major impacts: improved effectiveness of information access algorithms; cross-fertilization of ideas across research groups with the eventual transfer of technology into products; and the formation of new research areas enabled by the construction of critical infrastructure.
Ellen Voorhees - Location: Conference Room and Online
Applying User Signals like a Relevance Engineering Ninja
User signals (clicks, purchases, etc.) are among the most useful inputs for improving search relevance. They can be used to directly optimize your head queries (signals boosting), to personalize search results, to learn domain-specific terminology (misspellings, synonyms, etc.), or to build click models as training data for automated Learning to Rank. Most organizations struggle to properly store their signals, let alone best utilize them to optimize relevance. In this talk, you’ll learn best practices for collecting, processing, and applying signals to enhance relevance. We’ll cover live code examples of index- and query-time signals boosting, fighting signal spam and bias, and applying quality- and time-based weights to your models. We'll show the various kinds of personalization and click models you can train from signals to improve ranking. You'll come away from this talk with some new tools in your relevance engineering toolbox, and some open-source code examples to get started!
Trey Grainger - Location: Conference Room and Online
Learning to Boost - Logistic Regression to Optimize Elasticsearch Boosts
Choosing field boost values can make or break your Elasticsearch query. One popular data-driven approach to identify the relative importance of fields is Learning to Rank. However, LTR typically requires fitting a complex Machine Learning model and incorporating a separate plugin or service to implement it in production. Beyond manual tuning or grid search, is there a middle ground that’s data-driven but easier to implement? In this talk, we introduce an approach where we create a regression model to directly determine optimal Elasticsearch boost values. We will cover parsing search explanations for historical queries to create the features, assigning pairwise labels based on a judgment list, and evaluating the boosts the model produces. While not a replacement for Learning to Rank, this automatic approach led to a 1.2% increase in MAP@5 from the guess-and-checked version that took 6 months to develop and enables quick iteration for future query changes.
Nina Xu & Jenna Bellassai - Location: Conference Room and Online
OLX's Journey to a Relevant Search
In this presentation, we’ll talk about OLX’s journey to a relevant search that fits well with our classifieds business model. We went from a simple search engine with really poor results to a new search model that can return highly relevant results and also solves a lot of the problems that come together with the application of traditional methods. We’ll explain some of the problems associated with our use case and show what we did to solve each of those problems. From simple to more complex solutions like the application of default BM25, bayesian optimization and finally a new method we like to call “Term Podium”! We’ll also talk about how we measured our success with metrics such as NDCG, diversity and novelty together with business metrics.
Leonardo Wajnsztok - Location: Conference Room and Online
Marcus Eagan - Location: Conference Room and Online
Haystack Reception (included with registration)
-Any attendee who is in town is welcome