Phrase Query Completion with Apache Solr and SuggestComponent

Tomasz Sobczak • Back to Haystack 2018

Phrase Query Completion with Apache Solr and SuggestComponent

Query completion is one of the fundamental features of search. It is almost always the first interaction between the user and the search application. When the user types in a few characters, auto-suggester should immediately offer relevant content. It improves search precision, acts as a discovery tool and can increase conversion rates in e-commerce world. Probably most will agree that the biggest challenge is to prepare the completer's data. Text processing and analyzing dictate basic function of suggesters: matching and retrieving.

We love Apache Solr and use it in a large part of our customer projects. However, in many cases we have encountered, the Suggester needs to operate on long text fields, like product descriptions or news texts. There is no easy option to efficiently use it this way, since almost all lookup implementations return entire field content or work in ways which our customers find non-intuitive. In our talk we want to focus on ways of making Apache Solr's SuggestComponent work well with large text fields. We want to present some of the approaches we use to handle this, from simple ideas to more advanced NLP tools. When it comes to suggesting qualified phrases from chunks of text, our ideas are aimed at working great in terms of user experience as well as system efficiency.

View the Slides
Tomasz Sobczak

I work as a consultant and team leader providing search-driven solutions (enterprise / e-commerce search), Apache Solr trainings & consultancy. I help organizations to store, manage and find information. I’m an advisor building search strategies and managing findability investments. On a daily basis I focus on information retrieval, information architecture, search relevancy, search analytics. Regarding the technology, I'm an Apache Solr and Elasticsearch expert. Last but not least, I'm responsible for business development in Findwise Poland. My main areas of interest are: search solutions, machine learning and data science.