MiniCoil: A Hybrid Sparse Retrieval Model for Scalable and Context-Aware Semantic Search
David Myriel • Location: Theater 7 • Back to Haystack 2025
“The MiniCoil Sparse Retrieval Model introduces an innovative approach to scalable semantic search by blending the interpretability of sparse retrieval with the contextual depth of dense embeddings. Designed for high performance with minimal computational overhead, MiniCoil strikes a balance between efficiency and semantic richness.
At its core, MiniCoil generates a compact, sparse representation by leveraging transformer-based embeddings as a foundation, combined with trained, meaning-preserving layers that achieve significant dimensionality reduction. This approach ensures that semantic information is retained while maintaining computational efficiency.
The hybrid design supports dynamic vocabulary expansion, seamlessly falling back to BM25 for out-of-vocabulary terms. This ensures robust and reliable retrieval across diverse datasets. Additionally, MiniCoil’s domain-agnostic architecture makes it a versatile, general-purpose retrieval solution while enabling fine-tuning for specialized applications, such as legal and medical search.
MiniCoil is ideal for powering search engines, enterprise knowledge systems, and conversational AI.
This session will delve into its core architecture, training methodologies, and practical use cases, equipping attendees with actionable insights to develop efficient, context-aware retrieval systems that balance speed, accuracy, and interpretability.”
David Myriel
Qdrant