Smart Recall: Enhancing Local LLM Conversations with Embedding-Aware Context Retrieval

Lucas Jeanniot • Location: TUECHTIG • Back to Haystack EU 2024

How can you make your local LLM feel less forgetful? This session will introduce a practical service architecture for improving contextual continuity in chat applications using locally stored conversation history. We’ll walk through a Python-based approach that dynamically retrieves and rewrites prior turns based on semantic similarity which leverages embeddings, token limits, and summarisation to provide relevant memory windows to your model. Attendees will learn how to structure past interactions, filter for importance, and integrate efficient recall mechanisms to ensure local LLMs stay coherent, concise, and contextually aware.

Download the Slides Watch the Video

Lucas Jeanniot

Eliatra

Lucas is a machine learning engineer at Eliatra. Driven by curiosity and a passion for data analysis, he enjoys the challenge of crafting efficient algorithms and refining data pipelines to make sense of complex information. With a focus on practical applications, Lucas enjoys leveraging technology to optimize processes and improve decision-making in search & retrieval applications.