top of page
Data Strategy


Why Probabilistic Record Linkage Still Matters
Probabilistic record linkage still matters because identity data is messy and match decisions carry real financial and compliance risk. This article explains the intuition behind Fellegi–Sunter and Bayesian record linkage, shows how they control false merges and splits across noisy customer and product records, and points to modern tools and books that help you put these ideas into practice.

Gandhinath Swaminathan
2 days ago5 min read


Heterogeneous Knowledge Graphs: Multi-Hop Reasoning Beyond Pairwise Matching
Pairwise matching treats each comparison as a one-off. A persistent knowledge graph turns product mentions, manufacturers, model numbers, attributes, and price bins into typed nodes and relations. Matching becomes neighborhood comparison: multi-hop paths (convergent evidence) can beat any single similarity score.

Gandhinath Swaminathan
2 days ago7 min read


From Inverted Index to Attention Graph: Turning SPLADE Tokens Into ER Decisions
False entity merges don’t just dirty data. They distort inventory, pricing, and forecasts, then every model and report built on top. Learned sparse retrieval improves recall, but it can still treat records like unordered tokens. This post adds token-to-token attention as a structural check so near-duplicates pass and lookalikes fail, with a trail you can audit.

Gandhinath Swaminathan
3 days ago3 min read


The Best of Both Worlds: Learned Sparse Retrieval (SPLADE) For Entity Resolution
Entity resolution breaks when exact matching is too brittle and dense vectors blur identities. This post introduces SPLADE, a learned sparse retrieval model that keeps inverted indexes and token-level explainability while adding transformer-powered expansion and reweighting. We walk through where SPLADE beats BM25 and dense search, where it can fail on SKUs and over-expansion, and how to run it in Postgres/ParadeDB for large-scale product, customer, or patient identity.

Gandhinath Swaminathan
3 days ago10 min read


Hybrid Search and Reciprocal Rank Fusion: Building the Bridge Between Lexical and Semantic
Entity resolution struggles when systems must choose between the rigid precision of BM25 and the fuzzy flexibility of Vector Search. Part 4 reveals why simple linear weighting fails and introduces Reciprocal Rank Fusion (RRF) as the superior alternative. We explore the architectural shift to Hybrid Search, demonstrating how to merge rank positions rather than raw scores using Spring Boot and ParadeDB.

Gandhinath Swaminathan
Jan 147 min read


When “Almost” Isn’t Good Enough: Why Top Engineers Still Rely On BM25
BM25 looks old on paper, but it still decides which records are worth comparing when identifiers can’t afford to be “almost” right. This post walks through the TF‑IDF roots of BM25, how k1 and b shape the scoring curve, and why Lucene, Elasticsearch, and OpenSearch still rely on it. You’ll see how term statistics, not embeddings, keep product codes, SKUs, and customer records anchored during entity resolution.

Gandhinath Swaminathan
Jan 85 min read


The Solution Architect Role: Guiding Business Innovation with Clarity and Purpose
Solution architects aren’t just “technical.” They translate business intent into systems that scale, integrate, and stay secure—preventing costly silos and rework. In this post, I break down what solution architects actually do, why the role is highly valued, and how enterprise solution architects align AI, data, and platforms to long-term strategy. You’ll also get practical ways to leverage architects for measurable outcomes.

Gandhinath Swaminathan
Jan 74 min read


How One Invisible Data Problem Quietly Destroys Your Churn Models, Your Pricing, and Your AI Agents
Healthcare providers track the same patient under five name variations. Retailers can't tell when the same SKU is under two different codes. CPG companies buy demand data showing one product with three different names across channels. Supply chains have suppliers that are actually the same company. Every week. Same problem. Different domain. Your data doesn't know what it's describing.

Gandhinath Swaminathan
Jan 26 min read


Why Data Leaders Are Quietly Outpacing the AI Hype
While most organizations chase the latest AI trends, data leaders are building something different: reliable foundations. This isn’t about deploying more agents faster—it’s about assets with lineage, harmonization rules, and semantic definitions that make every AI decision trustworthy. Discover why speed without discipline turns into liability, and how to fund the running back while the quarterback steals the headlines.

Gandhinath Swaminathan
Dec 18, 20256 min read


Optimizing Business Analytics for Better Insights
Data is your competitive advantage—but only if it tells a clear story. In this post, I share what transforms business intelligence from overwhelming to actionable: quality data, the right metrics, intuitive tools, skilled teams, and a culture that values evidence. Learn the practical steps to audit your setup, eliminate noise, automate workflows, and embed analytics into daily decisions. Because sustainable growth isn't about chasing trends—it's about refining what matters.

Gandhinath Swaminathan
Dec 8, 20254 min read


Agentic mesh for Analytics: Stop moving data. Start asking questions.
You’ve spent millions on analytics, yet every critical question is a six-week research project. The problem isn’t your data; it’s the hidden "translation tax" you pay on every query. An Agentic Mesh, built on a Headless BI architecture, eliminates this tax. It stops the endless data movement and empowers your team to get verified answers to complex business questions in minutes, not months. This isn't magic; it's modern data architecture. Stop translating. Start asking.

Gandhinath Swaminathan
Nov 3, 20253 min read
bottom of page