From 0 – Production with BBQ at GitHub

Session Abstract

Rolling out semantic search is easy right? Just turn on some vectors and bim bam boom you have vector search… Right? Turns out when you’re GitHub sized it’s not quite that easy. We’ll walk through the process we took, the lessons we’ve learned, and how you can build a plan to deploy vector search easier.

Session Description

Lets turn back the clock and walk through the steps we took to get vector search with BBQ quantization rolled out at GitHub.

##### Timeline:

Lets do semantic search – by the way what does semantic search do?
Our MVP – taught us nothing
What do you mean we need capacity? – the challenges of calculating compute for search at scale
Indexing in prod – now we index a large subset of our data with BBQ… or we would’ve but we hit ALL the bugs with BBQ and Elasticsearch’s reindex
We built a new MVP! – here is where our learnings about search with vector search started to become material (including: linear retrievers, scoring, oversampling, num_candidates, and more!)
Our fifth? attempt to ingest data – where our sharding strategy for FTS collided with our index settings and the way Lucene merges work for vectors.

Through all these, I’ll go through the tools and techniques we used to determine what was happening and how we got through it. Finally, I’ll show you the roadmap that we now have in place so other (internal) users at GitHub can begin to create semantic search experiences across the entire company.

Main Stage

06.May 2026

11:15am - 12:00pm

Talk

David Tippett

GitHub

All Speakers

All Sessions