Sparse Retrieval | Minimalist Innovation LLC

The Best of Both Worlds: Learned Sparse Retrieval (SPLADE) For Entity Resolution

Entity resolution breaks when exact matching is too brittle and dense vectors blur identities. This post introduces SPLADE, a learned sparse retrieval model that keeps inverted indexes and token-level explainability while adding transformer-powered expansion and reweighting. We walk through where SPLADE beats BM25 and dense search, where it can fail on SKUs and over-expansion, and how to run it in Postgres/ParadeDB for large-scale product, customer, or patient identity.

Gandhinath Swaminathan

3 days ago10 min read

Warehouse worker scanning package labels with a handheld barcode scanner.

When “Almost” Isn’t Good Enough: Why Top Engineers Still Rely On BM25

BM25 looks old on paper, but it still decides which records are worth comparing when identifiers can’t afford to be “almost” right. This post walks through the TF‑IDF roots of BM25, how k1 and b shape the scoring curve, and why Lucene, Elasticsearch, and OpenSearch still rely on it. You’ll see how term statistics, not embeddings, keep product codes, SKUs, and customer records anchored during entity resolution.

Gandhinath Swaminathan

Jan 85 min read

The Best of Both Worlds: Learned Sparse Retrieval (SPLADE) For Entity Resolution

When “Almost” Isn’t Good Enough: Why Top Engineers Still Rely On BM25

Heterogeneous Knowledge Graphs: Multi-Hop Reasoning Beyond Pairwise Matching

The Best of Both Worlds: Learned Sparse Retrieval (SPLADE) For Entity Resolution