Searching through large graphs using Elasticsearch

Radu Pop • Location: Theater 7 • Back to Haystack 2022

The National Audiovisual Institute (INA) is a repository of all French audiovisual archives, being responsible for archiving over 180 radio and television services, 24/7, since 1995. The generated metadata describing this content represents the equivalent of over 50 million documents (images, audio and video fragments, text excerpts). Due to the heterogeneity of the content, the data model is directly inspired from the conceptual models of cultural heritage, represented by a large graph with complex relations between generic entities. The challenge for building a global search engine for this particular use case is twofold: indexing speed and the implementation of complex full text search capabilities with high performance. Our talk describes the key choices for the graph representation, facilitating the indexing process of the documents, as well as the technical framework set up around Elasticsearch, implementing dedicated search APIs required by different functional areas.

Radu Pop

Radu provides Consulting Services as Solutions Architect at Adelean. He handles projects around Elasticsearch and Adelean’s A2 search technology. He oversees the integration and evolution of search engines within large e-commerce platforms, marketplaces or organizations' data lakes. Prior to joining Adelean, Radu acquired a solid experience in Web archiving, operating large scale crawling systems in the context of several European research projects. He holds a PhD in Computer Science and a MSc in Distributed Systems.