Distributing Vector Databases: Vector Indexing at Distributed SQL Scale
Michael Goddard • Location: Theater 7 • Back to Haystack 2025
“This talk starts with a strongly-stated central premise: vector indexing on a distributed SQL database presents some difficult challenges, yet the result can be extremely rewarding. Specifically, it’s difficult to avoid making the index a bottleneck in a distributed system, while at the same time it’s challenging to keep the index up-to-date without impacting performance of the system or the quality of search results.
In this session we’ll explain the important advantages of distributed SQL, namely resilience, scalability, and data locality, and then go into detail on Cockroach-SPANN (C-SPANN), the distributed indexing protocol that Cockroach Labs have invented (and implemented) in order to build a distributed vector index that is both fast, and easy to keep up-to-date.
With C-SPANN CockroachDB is able to efficiently answer approximate nearest neighbor queries with high accuracy, low latency, and fresh results - all at the scale of our largest customers, with millions or even billions of indexed vectors.
Join us in this talk to learn more about how vector data search is evolving on top of the distributed SQL infrastructure that’s become so critical to modern global applications. You’ll also see a demo of C-SPANN in action.”
Michael Goddard
Cockroach Labs