top of page
AI & Analytics


Why Probabilistic Record Linkage Still Matters
Probabilistic record linkage still matters because identity data is messy and match decisions carry real financial and compliance risk. This article explains the intuition behind Fellegi–Sunter and Bayesian record linkage, shows how they control false merges and splits across noisy customer and product records, and points to modern tools and books that help you put these ideas into practice.

Gandhinath Swaminathan
2 days ago5 min read


Heterogeneous Knowledge Graphs: Multi-Hop Reasoning Beyond Pairwise Matching
Pairwise matching treats each comparison as a one-off. A persistent knowledge graph turns product mentions, manufacturers, model numbers, attributes, and price bins into typed nodes and relations. Matching becomes neighborhood comparison: multi-hop paths (convergent evidence) can beat any single similarity score.

Gandhinath Swaminathan
2 days ago7 min read


From Inverted Index to Attention Graph: Turning SPLADE Tokens Into ER Decisions
False entity merges don’t just dirty data. They distort inventory, pricing, and forecasts, then every model and report built on top. Learned sparse retrieval improves recall, but it can still treat records like unordered tokens. This post adds token-to-token attention as a structural check so near-duplicates pass and lookalikes fail, with a trail you can audit.

Gandhinath Swaminathan
3 days ago3 min read


The Best of Both Worlds: Learned Sparse Retrieval (SPLADE) For Entity Resolution
Entity resolution breaks when exact matching is too brittle and dense vectors blur identities. This post introduces SPLADE, a learned sparse retrieval model that keeps inverted indexes and token-level explainability while adding transformer-powered expansion and reweighting. We walk through where SPLADE beats BM25 and dense search, where it can fail on SKUs and over-expansion, and how to run it in Postgres/ParadeDB for large-scale product, customer, or patient identity.

Gandhinath Swaminathan
3 days ago10 min read


Hybrid Search and Reciprocal Rank Fusion: Building the Bridge Between Lexical and Semantic
Entity resolution struggles when systems must choose between the rigid precision of BM25 and the fuzzy flexibility of Vector Search. Part 4 reveals why simple linear weighting fails and introduces Reciprocal Rank Fusion (RRF) as the superior alternative. We explore the architectural shift to Hybrid Search, demonstrating how to merge rank positions rather than raw scores using Spring Boot and ParadeDB.

Gandhinath Swaminathan
Jan 147 min read


When “Almost” Isn’t Good Enough: Why Top Engineers Still Rely On BM25
BM25 looks old on paper, but it still decides which records are worth comparing when identifiers can’t afford to be “almost” right. This post walks through the TF‑IDF roots of BM25, how k1 and b shape the scoring curve, and why Lucene, Elasticsearch, and OpenSearch still rely on it. You’ll see how term statistics, not embeddings, keep product codes, SKUs, and customer records anchored during entity resolution.

Gandhinath Swaminathan
Jan 85 min read


How Data Structures Build the Bridge from Exact Matching to Semantic Search
Exact match is easy. Similarity is hard. This post climbs the ladder of structures that make vector lookups fast: linked lists (slow scans), skip lists (express lanes), small-world graphs, and HNSW. Then it shows how pgvector brings HNSW into PostgreSQL so entity resolution can happen where your records already live.

Gandhinath Swaminathan
Jan 58 min read


Why Data Leaders Are Quietly Outpacing the AI Hype
While most organizations chase the latest AI trends, data leaders are building something different: reliable foundations. This isn’t about deploying more agents faster—it’s about assets with lineage, harmonization rules, and semantic definitions that make every AI decision trustworthy. Discover why speed without discipline turns into liability, and how to fund the running back while the quarterback steals the headlines.

Gandhinath Swaminathan
Dec 18, 20256 min read


Transforming Business with Predictive Analytics Insights
In today's landscape, anticipating the future is a necessity. Predictive analytics insights act as a radar, transforming historical data into a forecast of future outcomes, helping leaders move from guesswork to data-driven decisions. By harnessing this power, companies can unlock new operational efficiencies, predict customer churn, optimize inventory, and drive sustainable growth. This article explores how to use these insights, which tools to choose, and how to build a dat

Gandhinath Swaminathan
Nov 9, 20254 min read


Agentic mesh for Analytics: Stop moving data. Start asking questions.
You’ve spent millions on analytics, yet every critical question is a six-week research project. The problem isn’t your data; it’s the hidden "translation tax" you pay on every query. An Agentic Mesh, built on a Headless BI architecture, eliminates this tax. It stops the endless data movement and empowers your team to get verified answers to complex business questions in minutes, not months. This isn't magic; it's modern data architecture. Stop translating. Start asking.

Gandhinath Swaminathan
Nov 3, 20253 min read
bottom of page