From 0 – Production with BBQ at GitHub
Session Abstract
Rolling out semantic search is easy right? Just turn on some vectors and bim bam boom you have vector search… Right? Turns out when you’re GitHub sized it’s not quite that easy. We’ll walk through the process we took, the lessons we’ve learned, and how you can build a plan to deploy vector search easier.
Session Description
Lets turn back the clock and walk through the steps we took to get vector search with BBQ quantization rolled out at GitHub.
##### Timeline:
- Lets do semantic search – by the way what does semantic search do?
- Our MVP – taught us nothing
- What do you mean we need capacity? – the challenges of calculating compute for search at scale
- Indexing in prod – now we index a large subset of our data with BBQ… or we would’ve but we hit ALL the bugs with BBQ and Elasticsearch’s reindex
- We built a new MVP! – here is where our learnings about search with vector search started to become material (including: linear retrievers, scoring, oversampling, num_candidates, and more!)
- Our fifth? attempt to ingest data – where our sharding strategy for FTS collided with our index settings and the way Lucene merges work for vectors.
Through all these, I’ll go through the tools and techniques we used to determine what was happening and how we got through it. Finally, I’ll show you the roadmap that we now have in place so other (internal) users at GitHub can begin to create semantic search experiences across the entire company.