Semantic Product Search – Vector Search for E-Commerce

Simon Hughes • Location: Conference Room and Online • Back to Haystack 2021

Information retrieval today is undergoing a paradigm shift, away from the prevailing techniques of the past few decades. Increasingly the focus is moving away from keyword and entity driven search and the inverted index that supports those approaches, in favor of more complex models supported by dense instead of sparse index structures. Neural IR models for retrieval and ranking are becoming increasingly popular but building and scaling these systems presents many challenges. In this talk we present an overview of the current state of Neural IR from the perspective of a large e-commerce company. Among the topics covered will be extracting signals from the clickstream, transformer models and ‘do you need one?’, augmenting the inverted index by predicting keywords, and hard negative mining. We will also cover an exciting new research area, framing semantic search as an Extreme Multi-Label Classification problem, and why the future of semantic search may lie in machine-learned indexes.

Download the Slides Watch the Video

Simon Hughes

Simon has a PhD in Computer Science from DePaul with a concentration on NLP and machine learning, and has over 8 years experience working as a data scientist and 15 years experience working within software development. He worked on multiple search and recommender engines, including building job and resume search engines for as well as working on Home Depot’s e-commerce search platform in his current role. He is currently the Principal Data Scientist on the core search team, leading initiatives to improve overall relevancy and conversion on their search platform. He is also co-author of 3 SIGIR papers published during his time at Home Depot, and 11 papers on applying AI for educational purposes, and has given many industry talks over the years on semantic search and relevancy tuning.