From clicks to models, the Wikimedia LTR pipeline

Erik Bernhardson • Back to Haystack 2018

Will share a high level overview of the pipeline that transforms user click behavior into labeled data and then LTR models for Wikimedia sites. Primarily this is join web+search logs -> normalize queries -> group ""same"" normalized queries -> sample to some subset of grouped queries -> learn relevance -> collect feature data from plugin -> split/fold into sets for CV -> hyperparameter tuning -> deploy. Hope to at least touch on various problems we've run into along the way.

View the Slides

Erik Bernhardson

Erik is a senior software engineer and the technical lead for the Search Platform team in Wikimedia Foundation. In this capacity he has recently led the team to replace a hand built ranking function with machine learned ranking