Haystack EU 2023
Talks from the Search & Relevance Community at the Haystack Conference!
This was our Event Safety, Code of Conduct and COVID-19 Policy.
Our venue was TUECHTIG, Berlin Oudenarder Straße 16, 13347 Berlin.
Wednesday, September 20th, 2023
|Introduction and Welcome
We at OpenSource Connections started ‘Haystack - The Search Relevance Conference’ back in 2018. In his keynote, “The keystone”, our then CTO, Doug Turnbull, laid out the importance of a search relevance community that is based on the principles of openness and knowledge sharing. Five years on, the world of search has changed considerably: the community has grown more than we could ever imagine and search relevance has become foundational for many search teams. On the other hand, we realize that a good search means a lot more than good search relevance and AI provides us with new opportunities that we as a community are only beginning to understand. In his keynote talk, René will reflect on the role of our community in understanding and implementing good search today.
René Kriegler, OpenSource Connections
|How my team moved Search from the #1 to #7 challenge to solve, without changing relevance
The talk aims to illustrate how relevance and ranking are not always the best way to improve search. Drawing from my experience at Front, where search was the number 1 problem to solve (2 years in the top position), I’ll illustrate how reducing complexity and friction was the most pragmatic approach with a team of 4 engineers, operating in a high-scale environment (millions of queries a week). Based on my experience working as a search product manager at Front, a customer operation platform, and Algolia, a search API solution, I will share my learnings on how to identify the critical problems to solve by mapping the user journey and frictions, prioritizing, defining a success metric in a non-transactional context, convincing frustrated stakeholders, and reducing the time to impact through experimentation. This presentation will include the methodology used and the solutions we put in place.
Stéphane Renault, Front
|Reciprocal Rank Fusion (RRF) or How to Stop Worrying about Boosting
Reciprocal rank fusion (RRF) combines multiple result sets with different relevance indicators — like lexical and dense or sparse vector search — into a single result set. One of the great attributes of RRF is that it requires no tuning, and the different relevance indicators do not have to be related to each other to achieve high-quality results. If you have been using boosting to combine different types of searches, this is the talk for you. We'll dive into the algorithm, how to use it, and how it might surprise you when coming from BM25 search.
Philipp Krenn, Elastic
|Break, Learn, Refine – The Art of Hypothesis-Driven Development of ML-Powered Search
Uzum Market is a rapidly growing e-com in Uzbekistan with more than 500k items available. It’s impossible to navigate such a multi-categorical catalog without a large-scale search system, thus making it a vital technology for the marketplace. It includes many parts: sparse retrieval, spelling correction, typing suggestions, linear and gradient boosting ML models for ranking. Neural retrieval is about to join the gang. We started with a much simpler pipeline and we’ve been improving the search continuously ever since. We’d like to share with the community mistakes we’ve made along the way, principles we deduced from them, and what kind of rails were built to streamline hypothesis testing. We’ll dive into: - Metrics and evaluation. Why our initially chosen offline metrics for search relevance turned out to be completely wrong and how we determined the right ones - Targets for LTR models. What kind of clickstream data we are using and why the attribution modeling is crucial - AB testing
Andrey Kulagin, Uzum Market
|Shedding Light on Positional Bias: Strategies for Mitigation
People often click on the first things they see, not just because they're relevant, but because of their position and because Google wants us to scroll less and click on the first ad items. While working on ML-based ranking, we often algorithmically promote things that are already quite popular, making them even more popular and building a true self-reinforcing bad search. In this talk, we will discuss how to deal with these biases and how to make them improve the search quality and not ruin it. How to make the ranking model itself less biased? Is it possible to remove biases from the training data? While working on a search platform for a global food delivery service, we performed many experiments in this area and hit all the pitfalls - and can share real-world test results for comparison
Burak Isikli, Delivery Hero
|Beyond the known: exploratory and diversity search with vector embeddings
In a world inundated with information, not every search journey begins with a specific destination in mind. We perfected known-item searches and personalized recommendations, but there exists a vast, often underappreciated realm where users embark on quests without a precise goal, driven by curiosity, uncertainty, or the desire for exploration. This talk will dive deep into exploratory and diverse search paradigms. We'll unravel the complexities behind these search modes, differentiate them from our traditional understanding of search and recommendation, and discuss innovative strategies to serve users in these scenarios better.
Kacper Łukawski, Qdrant
|Multilingual Search Support in European E-commerce: A Journey with Apache Lucene
This presentation delves into the intricacies of implementing a multilingual search engine that efficiently supports over twenty European languages using platforms like Elasticsearch, OpenSearch, and Solr. We share the unique challenges posed by each language, and the strategic decisions made to ensure optimal functionality. Apache Lucene, renowned for its robust suite of Analyzers, Tokenizers, and Token Filters, also offers extensibility through plugins and integrations. Capitalizing on this flexibility, our team embarked on a mission to design and realize a comprehensive E-commerce search engine tailored for the diverse linguistic landscape of Europe. This talk will unravel the technical journey - the hurdles faced, insights gained, and the pivotal role of Apache Lucene in catering to multilingual user bases.
Lucian Precup, Adelean
|Navigating Neural Search: Avoiding Common Pitfalls
Neural search, often known by various names, including semantic search, has reached a stage where it is primarily associated with learned vector representations of queries and documents. This dense representational method reduces the scoring process to a vector similarity function. In this talk, we take a holistic approach and demystify the neural networks behind these vector representations - the text embedding models. We explore various open-source text embedding models, discussing choosing one by considering factors like language capabilities, task alignment, accuracy, and cost-effectiveness. Finally, we look at embedding retrieval or vector search and how introducing approximate vector search can degrade the accuracy so much that significantly cheaper retrieval methods will be favorable.
Jo Kristian Bergum, Vespa
|Closing day 1
Haystack Europe Social (included with registration)
All attendees are welcome. The location is Vagabund Brauerei Kesselhaus just behind TUECHTIG - there will be drinks and pizza available plus perhaps some games!
Thursday, September 21st, 2023
|Introduction and Welcome Back
|Using Vector Databases to Scale Multimodal Embeddings, Retrieval and Generation
Many real-world problems are inherently multimodal, from the communicative modalities humans use such as spoken language and gestures to the force, proprioception, and visual sensors ubiquitous in robotics. In order for machine learning models to address these problems and interact more naturally and wholistically with the world around them and ultimately be more general and powerful reasoning engines we need them to understand data across all of its corresponding image, video, text, audio, and tactile representations. In this talk, I will discuss how we can use multimodal models, that can see, hear, read, and feel data(!), to perform cross-modal retrieval/search at the billion-object scale with the help of vector databases. I will also demonstrate, with live code demos and large-scale datasets, how being able to perform this cross-modal retrieval in real-time can help us guide the generative capabilities of large language models by grounding it in the relevant source material.
Zain Hasan, Weaviate
|Women of Search
Last year's Women of Search talk emphasized that despite the remarkable achievements of women in the world of search and information retrieval, companies in the sector still face challenges in attracting and retaining female talent. Our discussion delved into these challenges, presenting insights from a brief survey and celebrating the accomplishments of women in the field. Building on last year's insights, this year we will talk about a Women of Search case study. It's essential to acknowledge the hurdles we faced: gathering contributions for the study proved to be a formidable challenge, leading to moments of disappointment. However, our persistence eventually led us to a company in the search space that sets a shining example of nurturing the unique qualities of women. In this session, we not only reflect on the positive findings from the case study but also explore the transformative power of women leaders in the workplace. We shed light on the distinctive contributions they bring to the table and envision an ideal workplace where women are not just equally valued but also celebrated. Through this journey, we hope to inspire more organizations to follow in the footsteps of those who recognize the true value of gender diversity and empowerment.
Atita Arora, OpenSource Connections & Istvan Simon & Paige Tyrrell
|Strategies for using alternative queries to mitigate zero results and their application to online marketplaces
We make great efforts to avoid returning an empty search result list to our users. Badly handled zero results queries not only provide a bad user experience but they can also impact our business negatively. On classified ads platforms, like Wallapop, this problem is even more salient as the content is user-generated and as the range of tradable items is almost unbounded and the users come from very diverse backgrounds. The recent developments in vector search and the availability of powerful language models for semantic search seem to solve the problem: we could always return some results. On the other hand, it can often be unclear to the user why a certain result shows up in the results when it comes from vector search. In addition, it would be hard to justify the permanent computational effort to vectorize millions of documents for classified ads that have a very short lifespan. Following up on the strategies for query relaxation that were presented in a talk at Haystack US 2019, we will discuss approaches to providing the user with alternative queries - like query relaxation and query term substitution - that are based on Large Language Models. We will also share our experiments on using simpler approaches to query relaxation that are based on query term statistics. These approaches can provide the user with a better experience and can be used in combination or as a replacement for vector search - not just in the context of a classified ads platform.
René Kriegler, OpenSource Connections & Jean Silva
|Mastering Hybrid Search: Blending Classic Ranking Functions with Vector Search for Superior Search Relevance
Building on the bedrock of classic search and ranking systems based on established algorithms like BM25, the advent of vector search has opened doors to developing hybrid search systems that deliver highly relevant results. Though ranking algorithms in Lucene-based products are powerful, they sometimes come up short, leading to low recall with zero results or poor relevancy. In this session, we'll explore the use of unstructured data such as images and text for fuzzy searching and enhanced search relevancy. We'll discuss best practices of using pre/post metadata filtering as well as defining hybrid ranking that combines both BM25 scores with vector-based algorithms such as HNSW. Additionally, we will delve into the complexity of moving from research to production, achieving real-time at scale while managing the tradeoff between performance and accuracy. We invite participants to join us in exploring the combined might of traditional and vector search paradigms.
Ohad Levi, Hyperspace
|Search Engines: Combining Inverted and ANN Indexes for Scale
Search engines have traditionally employed inverted indexes to quickly filter documents. With the rise of vector embeddings and large language models, search engines are now adding ANN indexes. Combining inverted indexes and ANN indexes into the same system introduces a number of implementation challenges including: * How to handle the large amount of RAM required to hold vector data and indexed structures * How to distribute an ANN graph across multiple shards and avoid expensive reindexing * How to update vector embeddings or metadata quickly * How to avoid contention between heavy indexing and vector search We will discuss these challenges and how to elegantly design a system that can efficiently leverage multiple indexes in parallel for hybrid search. We’ll also discuss how combining traditional approaches and new approaches to search can yield an even better result than using two different database solutions.
Anubhav Bindlish, Rockset
|A Practical Approach for Few Shot Learning with SetFit for Scaling Up Search and Relevance Ranking on a Large Text Database
SetFit (Sentence Transformer Fine Tuning) is a recently proposed few-shot learning technique that has achieved state-of-the-art results for multiple classification problems in label-scarce settings, even outperforming GPT-3 in many cases. As for learning to rank, SetFit may be especially important when there are fewer training samples available. In our case study, we collected a small dataset in the legal research domain, consisting of real-world search queries along with relevant and irrelevant results, manually annotated by lawyers or law students. After that, we trained a model using the SetFit technique and we generated word embeddings for a larger dataset, for semantic searching. We also trained a ranking model using SetFit and compared results with other approaches for the same language and the legal domain. In this talk, we present SetFit and its ranking application and we discuss the results of our experiments.
Fernando Vieira da Silva, N2VEC
|Evaluating embedding based retrieval beyond historical search results
Embedding based retrieval (EBR; a.k.a. vector search) provides an efficient implementation of semantic search and has seen wide adoption in e-commerce. While EBR models are often trained on historical user-engagement data that signifies query-item relevance, to select A/B test candidates, we need independent metrics to predict the relevance of their recall that expands null and low search results in production. This is because with null and low queries, an item's low engagement history might reflect not its low relevance, but the failure of the existing search engine. This talk presents a number of ways to leverage organizational knowledge to evaluate the quality of query and item embeddings, and how well EBR on top of those may improve on relevance while expanding the recall.
Yu Cao, eBay
Short talks on a variety of subjects
Max Nigri, Artem Lukanin, Daniel Wrigley, Maximilian Werk, Marcos Rebelo, Roman Grebennikov, Lucian Precup,
|Closing day 2