AI based approaches to improve data quality before indexing
Lucian Precup • Location: Theater 5 • Back to Haystack 2022
It is well known that good data makes great search experiences. Or the other, less positive, way around : garbage in, garbage out. AI powered searches usually focus on the search itself and improving relevance on top of an already existing index. In this talk we will focus on data ingestion: optimizations and improvements that can be made by AI and machine learning algorithms to improve data quality prior to indexing it. Some examples are: enriching data with automatic categorization, improving OCR translations, improving media files transcriptions, improving crawling and web pages parsing. All these in the context of data for search engines: a use case that induces or allows some specific optimization.
Lucian Precup is the CTO of all.site - the collaborative search engine developed at Station F in Paris. With his colleagues at Adelean, Lucian develops solutions for indexing, searching and analyzing data. Lucian regularly shares his knowledge in specialized conferences and organizes the Search & Data Meetup.