Haystack EU 2024

Talks from the Search & Relevance Community at the Haystack Conference!

Below is our draft schedule, which is subject to change.

Please read our Event Safety, Code of Conduct and COVID-19 Policy.

Our venue will be TUECHTIG, Berlin Oudenarder Straße 16, 13347 Berlin.

Monday, September 30th, 2024

Time (CET) Track 1
8:00am-9:00am Registration
9:00am-9:15am Introduction and Welcome
9:15am-10:00am Keynote - AI-Powered Search: Navigating the Evolving Lexicon of Information Retrieval

The field of information retrieval is evolving at a rapid pace, becoming a cornerstone of the generative AI space and absorbing different, yet familiar, techniques and terminology from previously tangential areas of AI world. Search engines are becoming vector databases, and vector databases (and even traditional SQL and NoSQL databases) are becoming search engines. In the process, we’re embracing new terminology like RAG, bi-encoders, cross-encoders, multimodal search, hybrid search, dense/sparse vectors and representations, contextualized late interaction, and quantization. But are these all new concepts, and if not, how do we mentally compare them to previous tried-and-true techniques like learning to rank (vs cross-encoders), lexical search (vs sparse vector search), semantic search (vs search over embeddings), collaborative filtering (vs latent behaviors embeddings), approximate nearest neighbors (vs quantization), knowledge graphs (vs foundation models), and dense vector search (vs bi-encoders)? Pulling from his experience writing (and frequently updating!) the newly-released book AI-Powered Search, Trey Grainger will provide a survey of the evolving landscape of search and relevance, highlighting how our traditional search toolbox and terminology are expanding in exciting ways and discussing what’s incoming on the frontier of search and AI.

Trey Grainger, Searchkernel

10:00am-10:45am Leveraging User Behavior Insights to Enhance Search Relevance

Even with the all advancements in search engine technologies with vectorDBs and LLMs, for Search practitioners 'Measuring the Search holistically' is still a problem that does not have a standard solution. To test different search strategies, you must capture user clickstream data to measure the impact on end-user experience and inform decisions on optimal search strategy. By understanding factors like search queries, result clicks and subsequent searches, the search system can be continuously optimized to better match user intent and deliver more relevant results. The existing clickstream solutions are more generic and not tailor made for search use-cases. User Behavior Insights (UBI) enables the collection and analysis of rich user search interactions, from queries to result engagements. This talk will demonstrate how organizations can leverage UBI dashboards and integrate insights into their search process to continuously optimize the search relevance for their end users.

Aswath N Srinivasan, AWS @ OpenSearch

10:45am-11:00am Break
11:00am-11:45am Re-Thinking Re-Ranking

Re-ranking systems often rely on a 'cascading' approach, where an initial set of documents is re-sorted to create a final result list. However, this method has a critical flaw: it can miss out on the most relevant documents by filtering them out too early, which reduces recall and hampers overall performance. In this talk, I will share an alternative to the cascading approach that brings in additional relevant documents during the re-ranking process. This technique improves the efficiency and effectiveness of both vector search and heavy-weight re-rankers. I'll also discuss ongoing efforts to bring this technique from academia to industry. Cascading is dead, long live re-ranking!

Sean MacAvaney, University of Glasgow

11:45am-12:30pm Learning-To-Rank Framework - how to train an army of models?

At OLX, we faced a significant challenge: the need to experiment with rankings across many countries and categories, totaling over 100. This situation became a bottleneck, hindering our Data Scientists from progressing in their work. Given our limited human resources, it was imperative to seek more automated solutions. We implemented LTR Framework, allowing our Data Scientists to adjust configurations and select precise features and targets for optimization. The models are stored in a Model Store, where they can be deployed for A/B ranking tests.

Marcin Gumkowski, OLX Group & Catarina Gonçalves, OLX Group

12:30pm-1:30pm Lunch
1:30pm-2:15pm What You See Is What You Search: Vision Language Models for PDF Retrieval

Extracting information from complex document formats like PDFs usually involves a multi-step process, including text extraction, OCR, layout analysis, chunking, and embedding. This extraction process is resource-intensive, and the quality can vary, resulting in poor retrieval quality (garbage-in, garbage-out). ColPali, a newly proposed retrieval model, presents a more efficient alternative using Vision Language Models (VLMs) to embed entire PDF pages, including text, figures, and charts. The resulting contextualized multi-vector representations of the PDF page improve retrieval quality while simplifying the extraction and indexing process. This talk introduces ColPali, how to represent ColPali in Vespa, and ColPali's superior performance on the Visual Document Retrieval (ViDoRe) Benchmark.

Jo Kristian Bergum, Vespa.ai

2:15pm-3:00pm Building a Multimodal LLM-Based Search Assistant Chatbot to Enhance Housing Search

QuintoAndar Group is the largest housing platform in Latin America, leveraging cutting-edge AI technologies to streamline the housing search process, reducing paperwork and increasing accessibility. To elevate our users' experience even further, we developed a groundbreaking search experience by adopting contrastive vision-language models and Large Language Models (LLMs) to build a multimodal chat-based search assistant. We adopted contrastive models as embedding generators to enable multimodal search capabilities, and utilized LLMs to interpret and extract user preferences within a conversational interface. In this presentation, we share our insights and experiences from this development process. We discuss the adopted architecture and how we reconcile good engineering practices while working with LLMs. We also discuss the integration of our traditional search mechanism with multimodal capabilities and explain how we use a chatbot to guide this enhanced search assistant.

Tetiana Torovets, QuintoAndar & Giulio Santo, QuintoAndar & Lucas Cardozo, QuintoAndar

3:00pm-3:15pm Break
3:15pm-4:00pm Evaluating E-commerce and Marketplace Search: User Perception vs. Business Metrics

Precision and recall have long been recognized as a fundamental trade-off in search. Especially in e-commerce and marketplaces, search optimization often involves finding the right balance between these two concepts. This talk will explore how user perception of search quality (UX metrics) and business performance metrics (e.g conversions) correlate with precision and recall, making it challenging for search practitioners to optimally tune search relevance. In particular, we will illustrate this problem using the concept of query specificity and examine how different modern techniques, such as vector search and sparse expansions, compare to traditional lexical search. Finally, we will explore possible solutions to tackle this issue, such as simple UI changes, demonstrating that search is not solely about having the best retrieval or ranking algorithms.

Julien Meynet, Wallapop

4:00pm-4:45pm Unlock NextGen Product Search with ML and LLM Innovations

In the realm of search, Machine Learning plays a pivotal role in enhancing the user experience throughout the entire lifecycle, from ingesting documents to delivering highly relevant results for user queries. This session will showcase various ML integrations tailored to optimize outcomes for user queries in a retail scenario, accompanied by a live demonstration. We will explore cutting-edge techniques such as query understanding and rewriting using Large Language Models (LLMs), document enrichment, sparse, dense, and hybrid retrievers, as well as contextual re-ranking of results. Discover how to harness the power of LLM agents to dynamically select the most suitable retriever for each user query, with an LLM acting as a proxy evaluator, providing feedback on the results at every iteration. This innovative approach aims to significantly improve overall retrieval quality without hurting the search latency by adopting semantic cache capabilities to reduce the LLM calls.

Hajer Bouafif, Amazon & Praveen Mohan Prasad, Amazon

4:45pm-5:00pm Closing day 1
5:30pm Haystack Europe Social (included with registration)

All attendees are welcome. The location is Vagabund Brauerei Kesselhaus just behind TUECHTIG - there will be drinks and pizza available plus perhaps some games!

Tuesday, October 1st, 2024

Time (CET) Track 1
9:00am-9:15am Introduction and Welcome Back
9:15am-10:00am Nixiesearch: running Lucene over S3, and why we are building our own serverless search engine

Is your search cluster stuck in 'status: red' due to over-complicated maintenance? Are modern vector databases still falling short in solving your real-world search problems? You’re not alone. Companies like Uber, Doordash, Amazon, and Yelp have turned to running their search backends on Lucene over S3 for its simplicity and reliability. We're introducing Nixiesearch, an open source Lucene-based search engine designed for operational simplicity—where nodes are stateless and all index data is stored on S3. Nixiesearch offers the full range of familiar Lucene features (such as filters, facets and suggestions), while also having “AI batteries included”, such as RAG and text+image embeddings handled by the engine itself. In this talk, we'll dive into the design trade-offs we made between simplicity and complexity, and explore why Nixiesearch might (or might not) be a good fit for your search needs.

Roman Grebennikov,

10:00am-11:00am Women of Search Present: The Life of a Search System

In a tech landscape that would have been deemed science fiction just a few years ago, terms like 'vector search' and 'LLMs' have become as commonplace as 'coffee' and 'good morning' in our daily conversations. Being part of this rapidly evolving community is exhilarating, yet it often feels like everyone but you is already leveraging the latest technology. If your aim is to build the most intelligent, efficient, and cost-effective search system, where do you begin? Do you dive into the cutting edge, start from the basics, or find a balance somewhere in between? To explore this question, Women of Search will present The Life of a Search System. This talk will be anchored in real-world examples from our community, showcasing how today's search systems have evolved—and should continue to evolve—to meet modern expectations. Together, we will cut through the AI jargon and focus on practical steps to start building.

Elzbieta Jakubowska, Independent

11:00am-11:15am Break
11:15am-12:00pm Exploring Vector Search at Scale

Milvus is an open-source vector database built to power Gen AI solutions. 80% of the data in the world is unstructured data, and vector databases are the databases that help you get valuable insights from unstructured data. With this in mind, we built Milvus as a distributed system on top of other open-source solutions, including MinIO and Kafka, to support vector collections that exceed billion-scale. This session will explore the architecture decisions that make it possible to have Vector Search at Billion scale. We will talk about the different indexes that are needed, why being distributed is important and what are the tweaks that are needed to achieve such a scale. This talk will also have a live demo to showcase the capability of Vector Search at Scale.

Stephen Batifol, Milvus / Zilliz

12:00pm-12:45pm Scaling vector search into production without breaking the bank: Vector Quantization and Adaptive Retrieval

Everybody loves vector search but... The problem is that prod-level deployment requires boatloads of CPU and GPU compute. The bottom line is that if deployed incorrectly vector search can be prohibitively expensive compared to classical alternatives. I’ll talk about optimizations that'll allow you to perform real-time billion-scale vector search on your laptop! The solution: quantizing vectors and performing adaptive retrieval. These techniques allow you to balance and tune memory costs, latency performance, and retrieval recall very reliably.

Zain Hasan, Weaviate

12:45pm-1:45pm Lunch
1:45pm-2:30pm Relevance Proof in Yelp Search: LLM-Powered Annotations

In Yelp Search, we heavily rely on user reviews while choosing relevant search results, and we've also incorporated other relevant business information into our search system. Search result annotations are a key accompaniment to the results, as 'review highlights' annotations can explain to the user why a business is relevant for their intent. We use LLM expansions to power these annotations use cases, while also leveraging our existing search index and highlighting functionality. In this talk, we’ll discuss the challenges we faced in building these annotations, including incorporating LLM outputs in our retrieval system, scaling up to 100% of traffic, and other difficulties in dealing with large amounts of textual data. Lastly, we’ll explain how we re-purposed our intelligent annotation system to create a new endpoint for internal RAG applications.

Pallavi Patil, Yelp

2:30pm-3:15pm Boosting LLM accuracy with Entity Resolution-based RAG

Enterprises are increasingly looking to run Large Language Models (LLMs) on private, internal data that LLMs have never seen and which must remain confidential. Retrieval Augmented Generation (RAG) enhances LLMs with data from specific, controlled sources. Typically, RAG uses vector databases, which excel at retrieving information from unstructured data. However, for structured data like customer records, a different RAG approach may be better. In this talk, we introduce Entity RAG, which uses real-time entity resolution to provide LLMs with accurate, unified data about real-world entities such as customers, companies, and products.

Steven Renwick, Tilores

3:15pm-3:30pm Break
3:15pm-4:00pm You are only as good as your embeddings - how to train high quality models for production vector search

This talk covers practicalities of training embedding models for production (multi-modal) vector search. Topics will span data, training, and evaluation. More specifically the talk will cover: - How to optimize model training for business objectives by leveraging historic search-result interactions. - The importance of data quality, including how to think about query-document mixes, duplicates, and query coverage. - Leveraging existing search results for behavior retention and model regularization. - Adapting strategies from recommendation systems, such as bias terms, linear re-ranking and query-result interaction matrices. - Loss functions, base models for fine-tuning, and key hyperparameter considerations. - Production-aware training techniques, including optimization for vector databases, vector fusing, and binary/truncation-aware training. - Efficient updating without re-indexing. - Transitioning from offline to online A/B testing, with a focus on novelty-based splits.

Robertson Taylor, Marqo

4:00pm-4:45pm Lightning Talks

Short talks on a variety of subjects, to be signed up for on the day

Various speakers, Various companies

4:45pm-5:00pm Closing day 2