The most common reason AI projects disappoint isn't the AI. It's the data. The best model in the world can't help you if your data is scattered across disconnected spreadsheets, full of duplicates, or locked in systems nobody can access.
Before investing in any AI project, answer the data questions first.
What data readiness actually means
Data readiness isn't about having "big data" or a data lake. For most businesses, it's about three things:
- Quality: Is the data accurate, complete, and consistent?
- Accessibility: Can the AI system actually reach the data it needs?
- Structure: Is the data organised in a way the AI can use?
You don't need perfect data. No dataset is perfect. But you need data that's good enough for the specific use case you're targeting.
The data readiness checklist
- Where does the relevant data live? Which systems, which databases, which file shares? Can you list them?
- Is it up to date? When was it last updated? Is it maintained regularly or does it go stale?
- How complete is it? What percentage of records have all the fields you need?
- Is it accessible programmatically? Can you get it via API, database query, or file export? Or is it locked behind a UI?
- Is there sensitive data? Personal information, financial data, health records? This affects how you can process and store it.
- Is there enough of it? AI models need examples. If you have 50 records, that's probably not enough. If you have 5,000, you're usually in a reasonable range for most business applications.
Common gaps we find
- Siloed data: Customer info in the CRM, order history in the ERP, support tickets in a separate system. Nothing connected.
- Inconsistent formatting: Phone numbers in five different formats. Addresses sometimes structured, sometimes free text. Product names that don't match across systems.
- Missing historical data: The system started tracking the data you need six months ago. For trend analysis or predictions, you need more history.
- No programmatic access: The data exists, but only a person can get it — through a web interface, one record at a time.
Getting AI-ready without overthinking it
- Start with one use case: Don't try to consolidate all your data. Focus on the data needed for one specific AI application.
- Clean what you need: You don't need to clean everything. Just the data for your target use case.
- Connect the sources: Build the integrations to bring the relevant data together. This investment pays off for AI and for everything else.
- Accept imperfection: You can start a RAG system with imperfect data and improve it over time. Waiting for perfect data means waiting forever.
Data readiness isn't a project you finish. It's a habit you build. Start where you are, fix what matters most, and improve as you go.
For a structured assessment, work through our AI readiness checklist. If you're evaluating specific AI approaches, RAG vs fine-tuning covers the data requirements for each.