From clicks to models, the Wikimedia LTR pipeline
Erik Bernhardson • Back to Haystack 2018
Will share a high level overview of the pipeline that transforms user click behavior into labeled data and then LTR models for Wikimedia sites. Primarily this is join web+search logs -> normalize queries -> group ""same"" normalized queries -> sample to some subset of grouped queries -> learn relevance -> collect feature data from plugin -> split/fold into sets for CV -> hyperparameter tuning -> deploy. Hope to at least touch on various problems we've run into along the way.
View the SlidesErik is a senior software engineer and the technical lead for the Search Platform team in Wikimedia Foundation. In this capacity he has recently led the team to replace a hand built ranking function with machine learned ranking