Learning to hybrid search: combining BM25, neural embeddings and customer behavior into an ultimate ranking ensemble

Roman Grebennikov • Location: Theater 5 • Back to Haystack 2023

Traditional term search has good precision but lacks semantics. Neural search is good at semantics but misses customer behavior. LTR approach adapts to customer behavior, but only if your baseline retrieval is good enough.

The current hype about neural search can make an impression that it’s the ultimate solution for all problems of legacy term search and LTR. You only need to do a very simple thing of fine-tuning a neural network to notice all the dependencies between queries, documents and customer behavior on all the data you have. But what if instead of replacing A with B, you can combine the strengths of all the approaches?

In this talk, we will take an example of an e-commerce search with an Amazon’s ESCI dataset and compare traditional text matching and LTR approaches with neural search methods on real data. We will show how combining multiple old, and new approaches in a in a single hybrid system can deliver an even better result than each of them separately.

Download the Slides Watch the Video

Roman Grebennikov

Delivery Hero SE

Principal Engineer at DeliveryHero SE, working on search personalization and recommendations. A pragmatic fan of functional programming, learn-to-rank models and performance engineering.