How One Invisible Data Problem Quietly Destroys Your Churn Models, Your Pricing, and Your AI Agents
- Gandhinath Swaminathan

- Jan 2
- 6 min read
The Problem You Didn’t Know Was Costing You Real Money
I once watched a CFO stop listening to her own dashboard. She’d built it with a six-figure team, but every morning it told her one thing, and every afternoon her finance controller’s spreadsheet told her the opposite.
Same data. Different answers.
That’s when I understood: her real problem wasn’t math. It was that her systems didn’t speak the same language about her business.
Look at these three entries for the exact same Sony turntable:
Sony Turntable - PSLX350H
Sony PS-LX350H Belt-Drive Turntable
Sony Record Player, Model PS-LX350H, Black

Yes, It’s the same hardware, but it has three digital identities. Because of that, your inventory system sees overstocks that don’t exist and shortages that do.
This breaks everything downstream.
Since your system sees three products instead of one, stock reports will claim you’re overstocked on an item you’re actually sold out of. Then, your pricing engine runs tests on what it thinks are different items, accidentally dropping the price on your best-seller under a different name.
The result isn’t a theoretical "data quality" issue. It’s dollars. Leaving. Quietly.
Recurring Problems Scale with Growth: The Crisis Your Business Can’t Outrun
This happens in every industry: Banks, Retail, Healthcare, CPG, SaaS, and Insurance. Everyone hits the same wall the moment they have more than one system describing the same thing.
2011 at Microsoft: The Telemetry Trap I saw this at scale with Windows crash reports—millions of errors a day. Our job was to figure out which crashes were actually the same issue so we could prioritize fixes. One system logged an error as Function: contoso!CViewReportTask::Run+0x102, while another used a different name for the same failure. I built a matching tool using "Jaccard similarity." It worked—until the message formats changed. Then it broke. I was trying to solve a problem of meaning (identity) with a tool that only looked at text (syntax).
2015–2018 at a Major Coffee Company: The Demand Gap Their CPG division bought demand data from Nielsen and Circana. One physical product had three different names because it was packaged differently for different retailers. Internal systems used one SKU, Costco used another, and Fred Meyer used a third. Stock got misallocated. Inventory sat dead in one channel while another ran short. Demand planning cannot work when one product looks like three.
2025 with a Data Startup: The Hidden Signal Just this year, I worked with a startup selling data products. They used different labels for datasets: “Customer Records – Tech Industry” vs “Technology Sector Decision Makers.” When we looked at churn, one variant was performing significantly worse. It wasn't that the customers were different; it was that the specific combination of filters—hidden behind those fragmented names—produced worse outcomes. The signal was there, but it was buried in naming that the system couldn't parse.
Fractured Identity: The Wall Every Industry Eventually Hits
Every week I talk to teams in the same predicament:
Financial services: They can’t match the same customer across loan and investment systems. KYC is fractured. AML is blind.
Retailers: They can’t tell when the same SKU is being sold under different UPCs.
Healthcare: Tracking the same patient under five slightly different name variations.
Supply Chain: Three “different” suppliers that are actually two companies with different legal names.
Systems can’t agree on what’s the same. And because systems can’t agree, neither can you.
The Cracks in Your Foundation: The Real Price of Fragmented Identity
Fragmented identity isn’t a technical glitch; it is a structural instability in your foundation and a hard ceiling on your company’s intelligence. When identity is fractured, your organization cannot see, learn, or move as a single cohesive unit.

Your Product 360 is Split Into Pieces When the same product has different names depending on the retailer, you misallocate stock because one code shows high demand while its alias looks dead.
The cost: Working capital tied up in the wrong stock and massive planning errors.
Your Churn Models are Learning from Lies If the system treats the same item as separate products, the real culprit—the one bad variant—stays hidden. High-risk segments get grouped with low-risk segments, diluting the warning signal until it's useless.
The cost: Wasted marketing spend and lost customers you could have saved.
Agentic AI: The Speed Multiplier of Errors
Traditional data methods move at a human pace, but AI agents operate without friction. When you plug an autonomous agent into a fragmented data foundation, it doesn't just fail; it fails at machine speed. These agents are designed to execute, not to compensate for a broken foundation. When you deploy them on top of unstable data, you aren't investing in intelligence—you are subsidizing a system that scales mistakes at machine speed.
Consider the real-world fallout:
Pricing agents see demand under one name but low demand under an alias. They cut the price on a perceived failing item, accidentally tanking the margin on your best-seller.
Supply chain agents see a stock shortage in one region and an excess in another. Because the data foundation is fractured, they don't realize it is the same product and fail to rebalance.
Fraud agents flag your best customers because their fragmented footprints look suspicious in isolation, killing the customer experience.
The cost: You are moving from contained, manual data errors to automated financial liabilities. Until you solve the identity foundation, every dollar spent on AI is a questionable investment. You are training models on a warped history and forcing agents to act on a false present.
Why Your Current Stack Can't Solve Identity
Exact Joins: The Paradox of Pre-harmonization SELECT * FROM a JOIN b ON name = name. The result is almost nothing. Exact equality requires you to harmonize data before you join it. It only works after you've already solved the problem.
Fuzzy Matching: The Ambiguity Trap LIKE '%Sony%Turntable%'. Tighten the pattern and you miss matches; loosen it and you drown in false positives. You are left with a field of ambiguity where you can never be 100% certain which records represent the same entity.
String Metrics: Similarity is Not Identity Levenshtein or Jaccard scores tell you how similar strings are, not if they are the same entity. High similarity masks that you’re comparing an item to its accessory. Low similarity hides that you're comparing the same product with different marketing names.
General NLP: Contextual Blindness Tools like spaCy tag “Sony” as an organization but struggle with domain-specific codes like “PS-LX350H.” They can’t distinguish “Black” (the color) from “Black Magic” (the product name).
Taxonomies: The Maintenance Trap You build a hierarchy, then a provider updates theirs. Now you’re maintaining four different taxonomies that don't match. You’re just hiring more people to manually manage the mess.
Manual Review: Obsolete Before Completion Hiring an army of analysts to check rows by hand. A million rows takes months. By the time they finish, new data has arrived and the old work is obsolete.
Even The Best Architecture Can’t Fix a Broken Identity
Snowflake doesn’t care. Neither does your lakehouse. A perfectly modeled data mesh sitting on top of fragmented identity is just doing the wrong math faster, similar to the analogy of he pipes are modernized, yet the water is still contaminated.
The real question for leadership is: “How do I make my systems understand what these records represent, not just what they say?” Once a system can reason about what a record actually is, the superficial differences stop mattering.
What This Means for Your Business
You have two options:
Keep patching the symptoms: This feels cheaper today, but it is a mounting tax on your operations that eventually collapses under the weight of AI.
Fix the foundation: Solve for identity first, and the accuracy of everything else follows.
Most teams choose the first path because it’s easier to ignore in the short term. But when an AI agent makes a $2M pricing mistake because it matched the wrong products, the cost becomes impossible to hide.
I’ve taken the first path more times than I can count, and I’ve paid the price for it every time. In my next post, I will share my observations on the modern ways businesses try to solve this and why they still fall short.
For now, the reality is simple: Until you solve identity, every agent you deploy is learning from a warped history and acting on a false present.


Comments