Haystack EU 2025
Talks from the Search & Relevance Community at the Haystack Conference!
Below is our draft schedule, which is subject to change.
Please read our Event Safety, Code of Conduct and COVID-19 Policy.
Our venue was TUECHTIG, Berlin Oudenarder Straße 16, 13347 Berlin.
Tuesday September 23rd, 2025
Time (CET) | Track 1 | |
---|---|---|
8:00am-9:00am | Registration | |
9:00am-9:15am | Introduction and Welcome | |
9:15am-10:00am |
Keynote
The new age of AI-fueled automation and experiences has reframed the Search value story, and the value of your work. Not long ago, search and automation tools were commodities, and improvement budgets were limited to only the top use cases. The rapid improvement of LLMs has radically changed the ROI model for search relevance though. Today, the long tail of use cases is within reach and the addition of agentic users is pushing old value propositions to new highs. Through this the core value of search has been unlocked and multiplied. David Louis Hollembaek, Veeva Systems |
|
10:00am-10:45am |
From BM25 to Mixture-of-Encoders: Evaluations for Next-Gen Search and Retrieval Systems
Modern user queries require a mix of structured and unstructured data in order to achieve satisfactory retrieval performance. This is where traditional search methods fall short. In this talk, we dive into retrieval evaluation, comparing keyword, vector, hybrid, and late-interaction models with Superlinked’s mixture-of-encoders approach. We examine how each approach fares in real-world scenarios (e.g. a query for “5 guests under $200 with 4.8+ rating”). Using benchmark datasets and real production use cases, we share metrics, evaluation methodology, and common pitfalls. We introduce Superlinked’s mixture-of-encoders approach, where dedicated encoders for various data types like text, numbers, and categories combined with LLM-driven query understanding enable more accurate and scalable retrieval. Finally, we discuss how to productionize this system and share use cases from travel to e-commerce, pointing toward the future of multi-attribute and meta data aware embeddings search. Filip Makraduli, Superlinked |
|
10:45am-11:00am | Break | |
11:00am-11:45am |
How we scaled an internal GenAI platform at Bayer to over 60.000 users supporting hundreds of use cases
At Bayer, we started building an internal GenAI platform called myGenAssist a little over 2 years ago. What started out as a small prototype, has now grown into a global platform used by tens of thousands of employees every month across Bayer and its subsidiaries. In this presentation, I will explain the advanced features of the platform (research agents connected to many internal and external data sources, a low-code workflow automation engine & more), how we were able to scale it and the challenges we faced along the way. I will offer insights into the technical architecture, design decisions and considerations as well as the non-technical factors influencing the success or failure of a product like this. Hendrik Hogertz, Bayer |
|
11:45am-12:30am |
Billion-scale hybrid retrieval in a single query
Vector databases became the de facto solution for embedding-based retrieval, which reveals its limits as users realize that similarity is not relevance. As a workaround, current solutions offer "hybrid retrieval" implemented as separate queries on disjoint indexes with late fusion of partial results based on rank or scores. In this talk, we present a fundamentally different model for billion‑scale hybrid retrieval, built from the ground up to address these challenges. Our storage format and query engine were designed to unify dense & sparse vectors, keywords, filters, and user‑defined scoring functions into a single distributed query, without relying on separate indexes and late fusion. This approach gives us a flexible query language that enables search practitioners to optimize relevance in their respective domains without having to manage and sync multiple data stores. Marek Galovic, TopK, Inc. |
|
12:30pm-1:30pm | Lunch | |
1:30pm-2:15pm |
Evolving DeepSearch Agent: Iterative enhancements for scalable, high-recall enterprise search
Search in enterprise settings is uniquely challenging—documents are siloed, context is implicit, and user queries are often ambiguous. At Box, we built the DeepSearch Agent, a multi-step, agentic retrieval system powered by LangGraph, to address these complexities. Unlike traditional search systems, DeepSearch orchestrates embedding and keyword based retrieval, LLM-based re-ranking, semantic filtering, and metadata extraction in a dynamic, modular flow. Each component was iteratively improved and deployed in production, enabling rapid experimentation and measurable improvements in answer recall. This talk walks through our journey evolving DeepSearch from static rerankers to a fully agentic system, including how we built evaluation loops, debug tools, and fallback strategies to make it reliable at scale. Attendees will leave with architecture patterns and practical lessons for building composable, AI-powered retrieval systems tailored to real-world enterprise usecases. Roy Shubhro, Box |
|
2:15pm-3:00pm |
Agentic search tuning: faster and better
Getting good results from a search engine is hard. Too hard. We all know that the virtuous circle of search algorithms, then search measurement and analysis, then search tuning, and back to search algorithms. New algorithms are available from search vendor; and there are more and more tools for measurement and analysis (OpenSearch UBI, OpenSource Connections Quepid, OpenSearch Search Relevance Workbench). But experimentation is slow and tuning is still manual. Now, though, by taking advantage of LLM-based agents combined with interleaved A/B testing, we can automate the process, making it faster and more accurate. Building on an agentic infrastructure, we create a collection of agents, each specialized in a particular business problem and incorporating a variety of search strategies. The agents not only create tests and evaluate them, but also orchestrate their deployment. Stavros Macrakis, OpenSearch @ AWS |
|
3:00pm-3:15pm | Break | |
3:15pm-4:00pm |
Improving Relevance in RAG - Lessons learned from the LiveRAG challenge
Any successful Retrieval Augmented Generation (RAG) system hinges on its ability to retrieve relevant context. This talk presents key lessons learned from our participation in this year's SIGIR LiveRAG challenge, where we were tasked with building a RAG system to generate accurate answers under time constraints. We will deconstruct the RAG pipeline we developed, focusing on our journey to improve the retrieval component. Specifically, we'll highlight how synthetic data generation was crucial in the evaluation process. Beyond the theoretical, we'll also share practical insights into the engineering challenges of running a large-scale RAG system locally and the solutions we implemented to overcome them. This talk is for anyone interested in building more robust and reliable RAG applications. Matthias Krüger, OpenSource Connections |
|
4:00pm-4:45pm |
Harnessing AI to strengthen trustworthy information
Misinformation travels faster than ever, but AI can help us fight back. In this session, we share how we built a platform combining search, an intelligent assistant, and a Retrieval-Augmented Generation (RAG) system to support fact-checking and news classification. By empowering journalists and editors with actionable insights, we explore how AI can promote transparency, strengthen critical thinking, and rebuild trust in information. Lucian Precup, Adelean |
|
4:45pm-5:30pm |
Lightning Talks
Quick discussions about anything around search relevance! Various Speakers, Various Companies |
|
5:30pm | Closing day 1 | |
5:30pm |
Haystack Europe Social (included with registration) All attendees are welcome. The location is Vagabund Brauerei Kesselhaus just behind TUECHTIG - there will be drinks and snacks available plus perhaps some games! |
Wednesday, Sept 24th, 2025
Time (CET) | Track 1 | |
---|---|---|
9:00am-9:15am | Introduction and Welcome Back | |
9:15am-10:00am |
Hybrid Image Search AT SCALE: Lessons in Accuracy, Latency, and Cost
When users can't describe what they want, they show it. Image search has emerged as a powerful way to capture user intent in e-commerce. But building a system that is accurate, fast, merch oriented and cost-effective at industrial scale is no easy feat. In this talk, we'll share how the Search & Publication team at Adeo (Leroy Merlin group) built a scalable hybrid image search engine serving over 10 million products, combining visual embeddings, textual signals, and knowledge-graph-enhanced metadata. We will share with you: - How we built a hybrid architecture combining image embeddings and LLM based lexical search, backed by our Knowledge Graph. - Our journey into vector quantization techniques (bf16, int8, int4, int2, BBQ in Elasticsearch), and how they impacted latency, precision and cost trade-offs. François Gaillard, Adeo Services & Guilherme De Freitas Guitte, Adeo Services |
|
10:00am-10:45am |
Women of Search Present
tba, tba |
|
10:45am-11:00am | Break | |
11:00am-11:45am |
From LLM-as-a-Judge to Human-in-the-Loop: Rethinking Evaluation in RAG and Search
Everyone’s using LLMs as judges. In this talk, we’ll explore techniques for LLM-as-a-judge evaluation in Retrieval-Augmented Generation (RAG) systems, where prompts, filters, and retrieval strategies create endless variations. This begs the question, but how do you evaluate the judges? ELO rankings in chess are a system that calculates the relative skill levels of players based on their game results, with higher ratings indicating stronger players. We introduce RAGElo, an ELO-style ranking framework that uses LLMs to compare outputs without needing gold answers - bringing structure to subjective judgments at scale. Then we showcase the integration of RAGElo into the Search Relevance Workbench, released in OpenSearch 3: a human-in-the-loop toolkit that lets you dig deep into search results, compare configurations, and spot issues metrics miss. Together, these tools balance automation and intuition - helping you build better retrieval and generation systems with confidence. Fernando Rejon Barrera, Zeta Alpha & Daniel Wrigley, OpenSource Connections |
|
11:45am-12:30am |
Beyond Keywords: Measuring Multimodal Search Quality
Modern search systems return both text and images, requiring evaluation beyond traditional keyword matching. Our approach leverages embedding models to assess search quality across all modalities simultaneously. We present an open-source tool that quantifies semantic relevance between queries and multimodal results, enabling developers to measure and improve search experiences. This framework captures nuanced relationships between text and visual content, providing actionable metrics for optimizing multimodal search systems. Philippe Bouzaglou, Vectra |
|
12:30pm-1:30pm | Lunch | |
1:30pm-2:15pm |
Future-Proofing E-commerce Search Architecture for Conversational Commerce and Beyond
Tired of slow, irrelevant search results costing you conversions? We were too. This session pulls back the curtain on how we transformed our e-commerce search platform, improving hybrid search latency by triple digits and cutting infrastructure costs by 20% while simultaneously delivering improved customer experience. Discover our federated and distributed architecture, a powerful blend of precise keyword search and advanced semantic models, engineered to deliver highly relevant results even for complex or niche queries. We'll discuss how this architecture not only boosted conversions for challenging zero- and low-result sets but also created a flexible foundation for future innovations like RAG for conversational commerce. Learn the practical strategies and architectural patterns that allowed us to achieve these significant gains, enabling faster feature delivery and a measurable impact on our bottom line. Jens Kürsten, OTTO GmbH & Co. KGaA |
|
2:15pm-3:00pm |
Smart Recall: Enhancing Local LLM Conversations with Embedding-Aware Context Retrieval
How can you make your local LLM feel less forgetful? This session will introduce a practical service architecture for improving contextual continuity in chat applications using locally stored conversation history. We’ll walk through a Python-based approach that dynamically retrieves and rewrites prior turns based on semantic similarity which leverages embeddings, token limits, and summarisation to provide relevant memory windows to your model. Attendees will learn how to structure past interactions, filter for importance, and integrate efficient recall mechanisms to ensure local LLMs stay coherent, concise, and contextually aware. Lucas Jeanniot, Eliatra |
|
3:15pm-3:30pm | Break | |
3:15pm-4:00pm |
Commoditizing Inference: Why Your Query Language Should Speak AI
AI mode inference is making its way into every modern search stack, powering semantic retrieval, result re-ranking, and text generation. In this talk, we’ll explore what it means to commoditize inference as a query-native primitive, just like filters or scoring functions. After a quick overview of Elasticsearch's inference APIs, we’ll walk through how inference can be invoked directly from the query DSL. We'll discuss the benefits of integrating these primitives into the query layer ()simplicity, composability, and accessibility) as well as the tradeoffs compared to managing inference in your application code. To finish, we’ll then introduce new inference primitives in Elasticsearch Query Language (ES|QL) (EXT_EMBEDDING, COMPLETION, and RERANK) and show how they bridge the gap between low-level control of the code and declarative expressiveness of a DSL. The session will be practical and example-driven with plenty of example for search practitioner and analyst. Aurelien Foucret, Elastic |
|
4:00am-4:45am |
Hybrid search Lessons learned
Designing hybrid search solutions requires decision-making. Not only is it a balancing act to find the right hardware resources but also choosing what to research when the team capacity is limited. In this talk we'll explain designing a system that works for both RAG and conventional search use cases. We'll zoom in on the re-usable nature of a nested index structure, how to build hybrid queries, how to control latency, how to tune relevance for an LLM, how to manage costs and other practical aspects that a search team has to deal with. This is also a story about rolling out a new way of information retrieval in an organization that has traditionally relied on keyword matching. Hybrid search doesn't sell itself, needs to be explained and well prototyped. Join this session, learn about our lessons learned and walk away with practical strategies, real-world examples, and pitfalls to avoid when bringing hybrid search into production in enterprise environments. Tom Burgmans, Wolters Kluwer & Mohit Sidana, Wolters Kluwer |
|
4:45pm-5:00pm | Closing day 2 |