Scalable Semantic Search for Online Learning Applications

Kazem Jahanbakhsh • Location: Theater 7 • Back to Haystack 2022

Semantic search is one of the Course Hero’s key products where a student can type her course’s question and get an answer from hundreds of millions answered questions. The sentence embedding model sits at the core of the semantic search by which we generate a vector index for questions. One challenge is to pick the best embedding model in terms of accuracy & running-time. We have designed an evaluation framework where we use Quora duplicate questions and Faiss similarity search. By using this, we proved that a few of the pre-trained Sentence-BERT models outperform the Universal Sentence Encoder. This led us to run an A/B experiment where we showed that Sentence-BERT could improve search coverage rate by 35%. Next, to improve the the semantic search performance further we have started fine tuning the Sentence-BERT models with our search engagement data. We will present some of our findings and the challenges that we have encountered while working on the semantic search problem.

Download the Slides

Kazem Jahanbakhsh

Course Hero

Kazem Jahanbakhsh is a Staff ML Engineer at Course Hero who has worked on one of the Course Hero's recommendation systems which predicts what course documents a student needs next week using an ML model for demand forecasting. Kazem has also been involved on improving Course Hero search engine algorithms including query understanding model and semantic search embedding models and nearest-neighbor search