How to Build your Training Set for a Learning to Rank Project

Alessandro Benedetti • Location: Theater 4 • Back to Haystack 2020

Learning to rank (LTR from now on) is the application of machine learning techniques, typically supervised, in the formulation of ranking models for information retrieval systems. With LTR becoming more and more popular (Apache Solr supports it from Jan 2017), organisations struggle with the problem of how to collect and structure relevance signals necessary to train their ranking models. This talk is a technical guide to explore and master various techniques to generate your training set(s) correctly and efficiently.

Expect to learn how to:

model and collect the necessary feedback from the users (implicit or explicit)
calculate for each training sample a relevance label which is meaningful and not ambiguous (Click Through Rate, Sales Rate ...)
transform the raw data collected in an effective training set (in the numerical vector format most of the LTR training library expect) Join us as we explore real world scenarios and dos and don'ts from the e-commerce industry.

Watch the Video

Alessandro Benedetti

Sease

Alessandro Benedetti is a R&D Software Engineer, Search Consultant and founder of Sease. His focus is on R&D in Information Retrieval, Information Extraction, Natural Language Processing, and Machine Learning. He firmly believes in Open Source as a way to build a bridge between Academia and Industry and facilitate the progress of applied research. At Sease Alessandro is working on Search/Machine learning R&D projects, trainings and consultancies. When he isn't on clients projects, he is actively contributing to the open source community and presenting the applications of leading edge techniques in real world scenarios at meetups and conferences such as ECIR, the Lucene/Solr Revolution, Fosdem, ApacheCon, Haystack and Open Source Summit.