Search Engines: Combining Inverted and ANN Indexes for Scale
Anubhav Bindlish • Location: TUECHTIG • Back to Haystack EU 2023
Search engines have traditionally employed inverted indexes to quickly filter documents. With the rise of vector embeddings and large language models, search engines are now adding ANN indexes.
Combining inverted indexes and ANN indexes into the same system introduces a number of implementation challenges including:
- How to handle the large amount of RAM required to hold vector data and indexed structures
- How to distribute an ANN graph across multiple shards and avoid expensive reindexing
- How to update vector embeddings or metadata quickly
- How to avoid contention between heavy indexing and vector search
We will discuss these challenges and how to elegantly design a system that can efficiently leverage multiple indexes in parallel for hybrid search. We’ll also discuss how combining traditional approaches and new approaches to search can yield an even better result than using two different database solutions.
Anubhav joined Rockset as a software engineer in 2021, and has been working in the data indexing and query execution space. Prior to this he worked at Meta Platforms (Facebook) for 5 years. Here he worked in the Integrity Infrastructure team building a platform that employed ML rules to keep bad actors off Facebook.