Learning to Boost - Logistic Regression to Optimize Elasticsearch Boosts
Nina Xu and Jenna Bellassai • Location: Conference Room and Online • Back to Haystack 2021
Choosing field boost values can make or break your Elasticsearch query. One popular data-driven approach to identify the relative importance of fields is Learning to Rank. However, LTR typically requires fitting a complex Machine Learning model and incorporating a separate plugin or service to implement it in production. Beyond manual tuning or grid search, is there a middle ground that’s data-driven but easier to implement? In this talk, we introduce an approach where we create a regression model to directly determine optimal Elasticsearch boost values. We will cover parsing search explanations for historical queries to create the features, assigning pairwise labels based on a judgment list, and evaluating the boosts the model produces. While not a replacement for Learning to Rank, this automatic approach led to a 1.2% increase in MAP@5 from the guess-and-checked version that took 6 months to develop and enables quick iteration for future query changes.
Download the Slides Watch the VideoNina Xu is a Data Scientist at Guru, where she is helping Guru fulfill the vision to bring teams the knowledge they need to do their best work when they need it. Currently she is focused on using Machine Learning to improve search relevance in Guru. Nina has also worked on improving Guru’s AI Suggest Expert feature, which guides teams to choose the right subject-matter experts to be responsible for the correct pieces of knowledge. Prior to transitioning into a career in data science, Nina was an Assistant Professor at Bucknell University, where she taught college statistics courses. Nina holds a PhD in Biostatistics from New York University, where she used statistical methods to study the long term health effects of the 9/11 World Trade Center disaster.
Jenna Bellassai is a Data Scientist on the search team at Guru. Prior to working at Guru, she worked as a Data Scientist at Picwell. She previously contributed to the Data-Driven Interactive Narrative Engine at the USC Institute for Creative Technologies and volunteered as the media coordinator for the Philadelphia chapter of Women in Data. At Guru, she is focused on improving search relevance so that teams can more easily find trusted information to do their best work.