The Solr Synonyms Maze: Pros, Cons, and Pitfalls of Various Synonyms Usage Patterns

Bertrand Rigaldies • Back to Haystack 2018

The topic of Synonyms in Solr has been historically a charged topic, with on-going and lively discussions within the search community about various aspects of synonyms usage such as index- vs query-time trade-offs, tokens graphs challenges (a.k.a., "sausagization") with multi-term synonyms, and various parsed query outcomes to name just a few examples. In this talk, we will review and summarize the current state of the "synonyms art" in Solr, by inventorying and demonstrating various synonyms usages, and their pros, cons, and pitfalls. For example, we will lift the hood and examine synonyms expansion in tokens graphs and their corresponding parsed queried. We will also examine how to configure single- and multi-term synonyms, and their possible interferences in an analysis chain. And we will tip toe into the topic of "term taxonomies" as possibly a better approach than "flat" synonyms. As closing remarks, we will offer practical recommendations based on our experiences as search consultants.

Bertrand Rigaldies

Bertrand has 25+ years of experience in leading people and projects in the conception, architecture, design, development, and support of innovative software solutions in various industries and application domains. As a life-long software developer, Bertrand has built a variety of cool systems and applications such as a Neural Network-based Optical Recognition System (ORC) and Voice Recognition System-based telephone interview system in the 90’s, IP telephony and unified email+voice messaging systems in the 2000’s, and large data submission, processing, warehousing, reporting, and analytics systems in the 2010’s.

As an experienced data architect, application developer, and search engineer (his latest passion), Bertrand can engineer a solution which meets your information retrieval goals.