Tim Allison • Location: Theater 4 • Back to Haystack 2020
This talk builds on work by Simon Hughes and others to apply genetic algorithms (GA) and random search for finding optimal parameters for relevance ranking. While manual tuning can be useful, the parameter space is too vast to be confident that one has found optimal parameters without overfitting. We’ll present Quaerite (https://github.com/tballison/quaerite), an open source toolkit that allows users to specify experiment parameters and then run a random search and/or a GA to identify the best settings given ground truth.We’ll offer an overview of mapping parameter space to a GA problem in both Solr and Elasticsearch, the importance of the baked-in n-fold cross-validation, and the surprises and successes found with deployed search systems.Watch the Video
Tim has been working in natural language processing since 2002. In the last 5+ years, his focus has shifted to content/metadata extraction (and evaluation), advanced search and relevance tuning. Tim is the founder of Rhapsode Consulting LLC, and he currently works as a data scientist at NASA's Jet Propulsion Laboratory. Tim is a member of the Apache Software Foundation (ASF), the chair/VP of Apache Tika, and a committer on Apache OpenNLP (2020), Apache Lucene/Solr (2018), Apache PDFBox (2016) and Apache POI (2013). Tim holds a Ph.D. in Classical Studies, and in a former life, he was a professor of Latin and Greek.