A Cheap Trick for Semantic Question Answering for the GPU challenged

Sujit Pal • Location: Theater 7 • Back to Haystack 2023

The ability to handle long question style queries is often de rigueur for modern search engines. Search giants such as Bing and Google are addressing this by building Large Language Models (LLMs) into their search pipelines. Unfortunately, this approach requires large investments in infrastructure and involves high operational costs. It can also lead to loss of confidence when the LLM hallucinates non-factual answers.

A best practice for designing search pipelines is to make the search layer as cheap and fast as possible, and move heavyweight operations into the indexing layer. With that in mind, we present an approach that combines the use of LLMs during indexing to generate questions from passages, and matching them to incoming questions during search, using either text based or vector based matching. We believe this approach can provide good quality question answering capabilities for search applications and address the cost and confidence issues mentioned above.

Download the Slides Watch the Video

Sujit Pal

Elsevier Health

Sujit Pal works at Elsevier Health Markets as Technical Research Director. His introduction to search began as part of a team implementing a MySQL based Altavista search engine replacement at CNET Networks. He was an early internal user of the Solr search engine before it became an Apache project. Later he helped to build a Lucene based medical search engine for Healthline, and then led the development of a Solr based variant of the engine for Elsevier, which combined many emergent Solr features with Natural Language Processing and Machine Learning technologies. At Elsevier, he has worked on various search adjacent functionality to apply ML techniques to improve search quality.