Better Semantic Search with Hybrid (Sparse-Dense) Search

Roie Schwaber-Cohen • Location: Theater 5 • Back to Haystack 2023

Vector search has become increasingly popular, especially with the recent growth of dense embedding models. However, these models require large amounts of data for training and fine-tuning, which is problematic when data is scarce and domain-specific terminology is crucial. Before dense embedding models were widely used, keyword-based algorithms like TF-IDF and BM25, which produce sparse embeddings, were the go-to solutions. While these algorithms perform well, they don’t allow us to query naturally, as we often don’t know the exact terms we’re looking for. On the other hand, dense embeddings allow us to search based on the intended "semantic meaning" rather than the exact term. Hybrid search aims to combine the strengths of sparse and dense embedding models. This approach has the potential to significantly improve vector search accuracy and usefulness in a wide range of situations. In this talk, we’ll learn how we can leverage hybrid search to build better semantic search applications

Download the Slides Watch the Video

Roie Schwaber-Cohen

Pinecone

Roie is a staff developer advocate at Pinecone. He is a full-stack software engineer with a deep passion for AI and data-intensive applications.