Data Strategy

Why Probabilistic Record Linkage Still Matters

Probabilistic record linkage still matters because identity data is messy and match decisions carry real financial and compliance risk. This article explains the intuition behind Fellegi–Sunter and Bayesian record linkage, shows how they control false merges and splits across noisy customer and product records, and points to modern tools and books that help you put these ideas into practice.

Gandhinath Swaminathan

2 days ago5 min read

Heterogeneous knowledge graph diagram showing product entity resolution with typed nodes (mentions, organizations, models, attributes) connected by colored relationship edges (madeby, hasmodel, hasattr). Multiple convergent paths highlighted between two mentions, illustrating multi-hop reasoning for entity matching.

Heterogeneous Knowledge Graphs: Multi-Hop Reasoning Beyond Pairwise Matching

Pairwise matching treats each comparison as a one-off. A persistent knowledge graph turns product mentions, manufacturers, model numbers, attributes, and price bins into typed nodes and relations. Matching becomes neighborhood comparison: multi-hop paths (convergent evidence) can beat any single similarity score.

Gandhinath Swaminathan

2 days ago7 min read

Feature illustration of a Sony PS‑LX350H turntable with SPLADE token weights on the left and a token‑to‑token attention graph on the right, showing sparse retrieval turning into an entity-resolution decision.

From Inverted Index to Attention Graph: Turning SPLADE Tokens Into ER Decisions

False entity merges don’t just dirty data. They distort inventory, pricing, and forecasts, then every model and report built on top. Learned sparse retrieval improves recall, but it can still treat records like unordered tokens. This post adds token-to-token attention as a structural check so near-duplicates pass and lookalikes fail, with a trail you can audit.

Gandhinath Swaminathan

3 days ago3 min read

The Best of Both Worlds: Learned Sparse Retrieval (SPLADE) For Entity Resolution

Entity resolution breaks when exact matching is too brittle and dense vectors blur identities. This post introduces SPLADE, a learned sparse retrieval model that keeps inverted indexes and token-level explainability while adding transformer-powered expansion and reweighting. We walk through where SPLADE beats BM25 and dense search, where it can fail on SKUs and over-expansion, and how to run it in Postgres/ParadeDB for large-scale product, customer, or patient identity.

Gandhinath Swaminathan

3 days ago10 min read

Abstract visualization showing geometric lexical data patterns merging with flowing semantic vector networks, with particles fusing at the convergence point, representing hybrid search combining BM25 and vector similarity.

Hybrid Search and Reciprocal Rank Fusion: Building the Bridge Between Lexical and Semantic

Entity resolution struggles when systems must choose between the rigid precision of BM25 and the fuzzy flexibility of Vector Search. Part 4 reveals why simple linear weighting fails and introduces Reciprocal Rank Fusion (RRF) as the superior alternative. We explore the architectural shift to Hybrid Search, demonstrating how to merge rank positions rather than raw scores using Spring Boot and ParadeDB.

Gandhinath Swaminathan

Jan 147 min read

Warehouse worker scanning package labels with a handheld barcode scanner.

When “Almost” Isn’t Good Enough: Why Top Engineers Still Rely On BM25

BM25 looks old on paper, but it still decides which records are worth comparing when identifiers can’t afford to be “almost” right. This post walks through the TF‑IDF roots of BM25, how k1 and b shape the scoring curve, and why Lucene, Elasticsearch, and OpenSearch still rely on it. You’ll see how term statistics, not embeddings, keep product codes, SKUs, and customer records anchored during entity resolution.

Gandhinath Swaminathan

Jan 85 min read

Eye-level view of a modern office workspace with a laptop and architectural blueprints to represent a solution architect planning business technology.

The Solution Architect Role: Guiding Business Innovation with Clarity and Purpose

Solution architects aren’t just “technical.” They translate business intent into systems that scale, integrate, and stay secure—preventing costly silos and rework. In this post, I break down what solution architects actually do, why the role is highly valued, and how enterprise solution architects align AI, data, and platforms to long-term strategy. You’ll also get practical ways to leverage architects for measurable outcomes.

Gandhinath Swaminathan

Jan 74 min read

Diagram showing a single Sony turntable model with three conflicting names and SKU codes as it appears across CRM, inventory management, and pricing systems, illustrating how product fragmentation creates mismatched records.

How One Invisible Data Problem Quietly Destroys Your Churn Models, Your Pricing, and Your AI Agents

Healthcare providers track the same patient under five name variations. Retailers can't tell when the same SKU is under two different codes. CPG companies buy demand data showing one product with three different names across channels. Supply chains have suppliers that are actually the same company. Every week. Same problem. Different domain. Your data doesn't know what it's describing.

Gandhinath Swaminathan

Jan 26 min read

Two American football players in Seattle navy and neon green uniforms on a stadium field. The player in the foreground wears number 3 with 'AI ENGINEERING' on his jersey, while the player in the background wears number 24 with 'DATA ENGINEERING' on his jersey, symbolizing a collaborative technical offensive strategy.

Why Data Leaders Are Quietly Outpacing the AI Hype

While most organizations chase the latest AI trends, data leaders are building something different: reliable foundations. This isn’t about deploying more agents faster—it’s about assets with lineage, harmonization rules, and semantic definitions that make every AI decision trustworthy. Discover why speed without discipline turns into liability, and how to fund the running back while the quarterback steals the headlines.

Gandhinath Swaminathan

Dec 18, 20256 min read

Eye-level view of a modern office workspace with multiple screens showing data charts

Optimizing Business Analytics for Better Insights

Data is your competitive advantage—but only if it tells a clear story. In this post, I share what transforms business intelligence from overwhelming to actionable: quality data, the right metrics, intuitive tools, skilled teams, and a culture that values evidence. Learn the practical steps to audit your setup, eliminate noise, automate workflows, and embed analytics into daily decisions. Because sustainable growth isn't about chasing trends—it's about refining what matters.

Gandhinath Swaminathan

Dec 8, 20254 min read

Conceptual data architecture showing an intelligent network of connected nodes, representing an Agentic Mesh built on Headless BI.

Agentic mesh for Analytics: Stop moving data. Start asking questions.

You’ve spent millions on analytics, yet every critical question is a six-week research project. The problem isn’t your data; it’s the hidden "translation tax" you pay on every query. An Agentic Mesh, built on a Headless BI architecture, eliminates this tax. It stops the endless data movement and empowers your team to get verified answers to complex business questions in minutes, not months. This isn't magic; it's modern data architecture. Stop translating. Start asking.

Gandhinath Swaminathan

Nov 3, 20253 min read