Every AI project that disappoints traces back to the same root cause: the data was not ready. Recommendations that miss, forecasts that drift, segments that don’t hold up — these are rarely model problems. They are data problems wearing a model’s clothes. Before any of the clever work pays off, your data has to be reliable, joined up, and understood. This is the unglamorous groundwork that quietly determines whether everything built on top of it succeeds.
This article covers the foundational data work eCommerce brands need in place, what “ready” actually means, and how to get there without a multi-year platform programme.
Why data readiness comes first
AI does not create signal from nothing. It amplifies what is already in your data — including the errors, gaps, and inconsistencies. Feed it duplicated customers, inconsistent product categories, or events that fire unreliably, and it will confidently learn the wrong things. The cost shows up later, as a recommendation engine that suggests out-of-stock items or a churn model that flags loyal customers.
Getting the foundations right is the difference between an AI roadmap that compounds and one that stalls. It is the first dependency in any eCommerce AI roadmap, and it is why we treat it as the entry point to data insights work.
The four foundations
1. A single, reliable view of the customer
Most brands hold customer data in fragments — the commerce platform, email tool, helpdesk, loyalty system — each with its own idea of who the customer is. The same person appears as three records, guest checkouts float unattached, and lifetime value is impossible to calculate.
The foundation is identity resolution: a consistent way to recognise the same customer across systems and stitch their history together. You do not need a full customer data platform on day one, but you do need a deliberate answer to “how do we know two records are the same person?” Get this right and segmentation, lifetime value, and personalisation all become possible. We go deeper in unified customer data model.
2. Clean, structured product data
Your catalogue is the other half of every recommendation, search result, and merchandising decision. Foundational product data means:
- Consistent categorisation — one taxonomy, applied uniformly, not three overlapping ones.
- Complete, structured attributes — colour, size, material, compatibility — in fields, not buried in free-text descriptions.
- Accurate, real-time stock and price.
- Stable product identifiers that survive re-imports and feed changes.
Thin or inconsistent attributes are the most common reason semantic search and recommendations underperform. The fix is editorial discipline, not a bigger model.
3. Trustworthy behavioural events
Personalisation, funnel analysis, and conversion work all depend on knowing what customers do: views, searches, add-to-carts, purchases. The foundation here is an event tracking plan — a documented, consistent schema for what you capture, named the same way everywhere, firing reliably.
In our experience this is where audits find the worst surprises: events that stopped firing after a site change, the same action tracked under three names, or purchase events missing on mobile. Untrustworthy events quietly corrupt every downstream metric.
4. Defined, agreed metrics
If “active customer” or “conversion rate” means something different in each team’s report, no model and no dashboard will resolve the confusion. A foundation includes a shared metric dictionary: agreed definitions, owned and documented, so the numbers reconcile. Our piece on eCommerce KPIs that matter covers which to standardise first.
How to assess where you stand
Run a focused readiness audit before committing to any AI build. Work through these questions honestly:
- Can you produce a complete order and interaction history for a single customer across all channels in minutes, not days?
- Is your product taxonomy consistent, and are key attributes structured rather than free-text?
- Do your behavioural events fire reliably on every platform, under documented names?
- Do your core metrics reconcile across teams and tools?
- Do you know your data’s provenance — where each field comes from and how fresh it is?
A “no” or “not sure” to any of these is a foundation to fix before, not after, you build on it.
A pragmatic sequence
You do not need to perfect everything at once. Sequence the work against what your first AI use cases actually require.
- Start with the use case, work backwards to the data. If your first project is recommendations, prioritise product attributes and behavioural events; identity resolution can follow.
- Fix tracking before you fix history. Reliable forward-looking events compound; back-filling broken history rarely pays off.
- Document as you go. The metric dictionary and event plan are living artefacts, not one-off documents.
- Treat governance as part of the foundation. Privacy, consent, and retention belong here from the start — see GDPR and AI in eCommerce for the obligations that shape what you can store and use.
Common pitfalls
- Buying a platform to fix a discipline problem. A CDP will faithfully unify messy data into messy unified data. Sort definitions and tracking first.
- Treating data work as a one-time project. Catalogues change, sites get rebuilt, events break. Foundations need ownership and ongoing maintenance.
- Over-engineering for scale you don’t have. Match the investment to your actual volume and roadmap; a mid-sized brand rarely needs an enterprise data lake on day one.
- Skipping the audit because it’s tedious. The audit is the cheapest insurance you will buy.
The payoff
Solid foundations are quietly transformative. With a reliable customer view, clean catalogue, trustworthy events, and agreed metrics, every later project starts from a position of strength: recommendations that land, forecasts you can trust, segments that behave as expected. The work is unglamorous, but it is the highest-leverage investment most brands can make — because everything else depends on it.
If you would like a clear-eyed assessment of your data readiness before committing to an AI programme, get in touch and we will help you find the gaps that matter most.