top of page


When “Almost” Isn’t Good Enough: Why Top Engineers Still Rely On BM25
BM25 looks old on paper, but it still decides which records are worth comparing when identifiers can’t afford to be “almost” right. This post walks through the TF‑IDF roots of BM25, how k1 and b shape the scoring curve, and why Lucene, Elasticsearch, and OpenSearch still rely on it. You’ll see how term statistics, not embeddings, keep product codes, SKUs, and customer records anchored during entity resolution.

Gandhinath Swaminathan
Jan 85 min read
bottom of page