Haystack EU 2025

Talks from the Search & Relevance Community at the Haystack Conference!

Below is our schedule.

Please read our Event Safety, Code of Conduct and COVID-19 Policy.

Our venue was TUECHTIG, Berlin Oudenarder Straße 16, 13347 Berlin.

Tuesday September 23rd, 2025

Time (CET)	Track 1
8:00am-9:00am	Registration
9:00am-9:15am	Introduction and Welcome
9:15am-10:00am	Keynote The new age of AI-fueled automation and experiences has reframed the Search value story, and the value of your work. Not long ago, search and automation tools were commodities, and improvement budgets were limited to only the top use cases. The rapid improvement of LLMs has radically changed the ROI model for search relevance though. Today, the long tail of use cases is within reach and the addition of agentic users is pushing old value propositions to new highs. Through this the core value of search has been unlocked and multiplied. David Louis Hollembaek, Veeva Systems
10:00am-10:45am	From BM25 to Mixture-of-Encoders: Evaluations for Next-Gen Search and Retrieval Systems Modern user queries require a mix of structured and unstructured data in order to achieve satisfactory retrieval performance. This is where traditional search methods fall short. In this talk, we dive into retrieval evaluation, comparing keyword, vector, hybrid, and late-interaction models with Superlinked’s mixture-of-encoders approach. We examine how each approach fares in real-world scenarios (e.g. a query for “5 guests under $200 with 4.8+ rating”). Using benchmark datasets and real production use cases, we share metrics, evaluation methodology, and common pitfalls. We introduce Superlinked’s mixture-of-encoders approach, where dedicated encoders for various data types like text, numbers, and categories combined with LLM-driven query understanding enable more accurate and scalable retrieval. Finally, we discuss how to productionize this system and share use cases from travel to e-commerce, pointing toward the future of multi-attribute and meta data aware embeddings search. Filip Makraduli, Superlinked
10:45am-11:00am	Break
11:00am-11:45am	Fishbowl Forum: Live Search & AI Debate with Audience Takeover With Charlie Hull, Trey Grainger, Shaun Brazendale, Aditya Varun Chadha, Fernando Rejon Barrera Various Speakers, Various Companies
11:45am-12:30am	Billion-scale hybrid retrieval in a single query Vector databases became the de facto solution for embedding-based retrieval, which reveals its limits as users realize that similarity is not relevance. As a workaround, current solutions offer "hybrid retrieval" implemented as separate queries on disjoint indexes with late fusion of partial results based on rank or scores. In this talk, we present a fundamentally different model for billion‑scale hybrid retrieval, built from the ground up to address these challenges. Our storage format and query engine were designed to unify dense & sparse vectors, keywords, filters, and user‑defined scoring functions into a single distributed query, without relying on separate indexes and late fusion. This approach gives us a flexible query language that enables search practitioners to optimize relevance in their respective domains without having to manage and sync multiple data stores. Marek Galovic, TopK, Inc.
12:30pm-1:30pm	Lunch
1:30pm-2:15pm	Agentic search tuning: faster and better Getting good results from a search engine is hard. Too hard. We all know that the virtuous circle of search algorithms, then search measurement and analysis, then search tuning, and back to search algorithms. New algorithms are available from search vendor; and there are more and more tools for measurement and analysis (OpenSearch UBI, OpenSource Connections Quepid, OpenSearch Search Relevance Workbench). But experimentation is slow and tuning is still manual. Now, though, by taking advantage of LLM-based agents combined with interleaved A/B testing, we can automate the process, making it faster and more accurate. Building on an agentic infrastructure, we create a collection of agents, each specialized in a particular business problem and incorporating a variety of search strategies. The agents not only create tests and evaluate them, but also orchestrate their deployment. Stavros Macrakis, OpenSearch @ AWS & Daniel Wrigley, OpenSource Connections
2:15pm-3:00pm	Improving Relevance in RAG - Lessons learned from the LiveRAG challenge Any successful Retrieval Augmented Generation (RAG) system hinges on its ability to retrieve relevant context. This talk presents key lessons learned from our participation in this year's SIGIR LiveRAG challenge, where we were tasked with building a RAG system to generate accurate answers under time constraints. We will deconstruct the RAG pipeline we developed, focusing on our journey to improve the retrieval component. Specifically, we'll highlight how synthetic data generation was crucial in the evaluation process. Beyond the theoretical, we'll also share practical insights into the engineering challenges of running a large-scale RAG system locally and the solutions we implemented to overcome them. This talk is for anyone interested in building more robust and reliable RAG applications. Matthias Krüger, OpenSource Connections
3:00pm-3:15pm	Break
3:15pm-4:00pm	Harnessing AI to strengthen trustworthy information Misinformation travels faster than ever, but AI can help us fight back. In this session, we share how we built a platform combining search, an intelligent assistant, and a Retrieval-Augmented Generation (RAG) system to support fact-checking and news classification. By empowering journalists and editors with actionable insights, we explore how AI can promote transparency, strengthen critical thinking, and rebuild trust in information. Lucian Precup, Adelean
4:00pm-5:00pm	Lightning Talks Quick discussions about anything around search relevance! Various Speakers, Various Companies
5:00pm	Closing day 1
5:00pm	Haystack Europe Social (included with registration) All attendees are welcome. The location is Vagabund Brauerei Kesselhaus just behind TUECHTIG - there will be drinks and snacks available plus perhaps some games!

Wednesday, Sept 24th, 2025

Time (CET)	Track 1
9:00am-9:15am	Introduction and Welcome Back
9:15am-10:00am	Hybrid Image Search at Scale: Lessons in Accuracy, Latency, and Cost When users can't describe what they want, they show it. Image search has emerged as a powerful way to capture user intent in e-commerce. But building a system that is accurate, fast, merch oriented and cost-effective at industrial scale is no easy feat. In this talk, we'll share how the Search & Publication team at Adeo (Leroy Merlin group) built a scalable hybrid image search engine serving over 10 million products, combining visual embeddings, textual signals, and knowledge-graph-enhanced metadata. We will share with you: - How we built a hybrid architecture combining image embeddings and LLM based lexical search, backed by our Knowledge Graph. - Our journey into vector quantization techniques (bf16, int8, int4, int2, BBQ in Elasticsearch), and how they impacted latency, precision and cost trade-offs. François Gaillard, Adeo Services & Guilherme de Freitas Guitte, Adeo Services
10:00am-10:45am	Women of Search Present: AI Agents: From Hype to Reality The AI agent revolution promises to transform products, but reality is more nuanced. This session brings together three perspectives: where the hype originates, what users actually need vs what's promised, and the technical limitations we face today. Through startup investment, product management, and AI architecture lenses, we'll bridge the gap between market excitement and practical implementation. Attendees will learn how to build AI agents that solve real problems rather than chase technological possibilities. Angeley Mullins, Aetheris Ventures & Olena Gorbatiuk, Independent & Atita Arora, Voyager Search
10:45am-11:00am	Break
11:00am-11:45am	From LLM-as-a-Judge to Human-in-the-Loop: Rethinking Evaluation in RAG and Search Everyone’s using LLMs as judges. In this talk, we’ll explore techniques for LLM-as-a-judge evaluation in Retrieval-Augmented Generation (RAG) systems, where prompts, filters, and retrieval strategies create endless variations. This begs the question, but how do you evaluate the judges? ELO rankings in chess are a system that calculates the relative skill levels of players based on their game results, with higher ratings indicating stronger players. We introduce RAGElo, an ELO-style ranking framework that uses LLMs to compare outputs without needing gold answers - bringing structure to subjective judgments at scale. Then we showcase the integration of RAGElo into the Search Relevance Workbench, released in OpenSearch 3: a human-in-the-loop toolkit that lets you dig deep into search results, compare configurations, and spot issues metrics miss. Together, these tools balance automation and intuition - helping you build better retrieval and generation systems with confidence. Fernando Rejon Barrera, Zeta Alpha & Daniel Wrigley, OpenSource Connections
11:45am-12:30am	Beyond Keywords: Measuring Multimodal Search Quality Modern search systems return both text and images, requiring evaluation beyond traditional keyword matching. Our approach leverages embedding models to assess search quality across all modalities simultaneously. We present an open-source tool that quantifies semantic relevance between queries and multimodal results, enabling developers to measure and improve search experiences. This framework captures nuanced relationships between text and visual content, providing actionable metrics for optimizing multimodal search systems. Philippe Bouzaglou, Vectra
12:30pm-1:30pm	Lunch
1:30pm-2:15pm	Future-Proofing E-commerce Search Architecture for Conversational Commerce and Beyond Tired of slow, irrelevant search results costing you conversions? We were too. This session pulls back the curtain on how we transformed our e-commerce search platform, improving hybrid search latency by triple digits and cutting infrastructure costs by 20% while simultaneously delivering improved customer experience. Discover our federated and distributed architecture, a powerful blend of precise keyword search and advanced semantic models, engineered to deliver highly relevant results even for complex or niche queries. We'll discuss how this architecture not only boosted conversions for challenging zero- and low-result sets but also created a flexible foundation for future innovations like RAG for conversational commerce. Learn the practical strategies and architectural patterns that allowed us to achieve these significant gains, enabling faster feature delivery and a measurable impact on our bottom line. Jens Kürsten, OTTO GmbH & Co. KGaA
2:15pm-3:00pm	Smart Recall: Enhancing Local LLM Conversations with Embedding-Aware Context Retrieval How can you make your local LLM feel less forgetful? This session will introduce a practical service architecture for improving contextual continuity in chat applications using locally stored conversation history. We’ll walk through a Python-based approach that dynamically retrieves and rewrites prior turns based on semantic similarity which leverages embeddings, token limits, and summarisation to provide relevant memory windows to your model. Attendees will learn how to structure past interactions, filter for importance, and integrate efficient recall mechanisms to ensure local LLMs stay coherent, concise, and contextually aware. Lucas Jeanniot, Eliatra
3:00pm-3:15pm	Break
3:15pm-4:00pm	Commoditizing Inference: Why Your Query Language Should Speak AI AI mode inference is making its way into every modern search stack, powering semantic retrieval, result re-ranking, and text generation. In this talk, we’ll explore what it means to commoditize inference as a query-native primitive, just like filters or scoring functions. After a quick overview of Elasticsearch's inference APIs, we’ll walk through how inference can be invoked directly from the query DSL. We'll discuss the benefits of integrating these primitives into the query layer (simplicity, composability, and accessibility) as well as the tradeoffs compared to managing inference in your application code. To finish, we’ll then introduce new inference primitives in Elasticsearch Query Language (ES\|QL) (EXT_EMBEDDING, COMPLETION, and RERANK) and show how they bridge the gap between low-level control of the code and declarative expressiveness of a DSL. The session will be practical and example-driven with plenty of example for search practitioner and analyst. Aurélien Foucret, Elastic
4:00am-4:45am	Hybrid search Lessons learned Designing hybrid search solutions requires decision-making. Not only is it a balancing act to find the right hardware resources but also choosing what to research when the team capacity is limited. In this talk we'll explain designing a system that works for both RAG and conventional search use cases. We'll zoom in on the re-usable nature of a nested index structure, how to build hybrid queries, how to control latency, how to tune relevance for an LLM, how to manage costs and other practical aspects that a search team has to deal with. This is also a story about rolling out a new way of information retrieval in an organization that has traditionally relied on keyword matching. Hybrid search doesn't sell itself, needs to be explained and well prototyped. Join this session, learn about our lessons learned and walk away with practical strategies, real-world examples, and pitfalls to avoid when bringing hybrid search into production in enterprise environments. Tom Burgmans, Wolters Kluwer & Mohit Sidana, Wolters Kluwer
4:45pm-5:00pm	Closing day 2