Building Retrieval Test Collections

Ellen Voorhees • Location: Theater 5 • Back to Haystack 2022

Information retrieval test collections—benchmark search tasks consisting of a corpus, a query set, relevance judgments, and associated evaluation metrics—are foundational infrastructure for off-line evaluation of search systems. High-quality test collections accelerate development of effective search algorithms and facilitate technology transfer, but building large-scale, representative test collections is challenging. The Text REtrieval Conference (TREC, trec.nist.gov) has built test collections for a variety of search tasks in the past thirty years using different techniques as task and budget required. Recent examination of some TREC collections shows that they have withstood the test of time, but others have weaknesses that are hard to detect. This talk will recap lessons learned from building dozens of test collections that suggest best practices for building your own collection for your own problem.

Download the Slides

Ellen Voorhees

National Institute of Standards and Technology

Ellen Voorhees is a Fellow at the US National Institute of Standards and Technology (NIST) where for most her tenure she managed the Text REtrieval Conference (TREC) project, a project that develops the infrastructure required for large-scale evaluation of search engines. Voorhees received a B.Sc. in computer science from the Pennsylvania State University, and M.Sc. and Ph.D. degrees in computer science from Cornell University. Prior to joining NIST she was a Senior Member of Technical Staff at Siemens Corporate Research in Princeton, NJ where her work on intelligent agents applied to information access resulted in three patents. Voorhees is a fellow of the ACM and an inaugural member of ACM SIGIR's Academy.