Two ways to customise AI
When a general-purpose language model doesn't know enough about your business, you have two main options to make it more useful:
- RAG (retrieval-augmented generation): Keep the model as-is, but give it your documents as context at query time.
- Fine-tuning: Retrain the model on your specific data so it learns your domain, style, or terminology.
Both approaches work. They solve different problems. Let's break them down.
RAG: giving context at query time
RAG doesn't change the model. Instead, it retrieves relevant documents from your data and includes them in the prompt. The model generates an answer grounded in that context.
Advantages:
- No training required: Works with off-the-shelf models (GPT-4, Claude)
- Always current: When you update a document, the system's answers update too
- Auditable: Every answer can be traced back to source documents
- Data stays separate: Your data never enters the model's weights
- Fast to deploy: Days to weeks, not weeks to months
Fine-tuning: changing the model
Fine-tuning takes a pre-trained model and continues training it on your specific data or examples. The model's weights are adjusted to better reflect your domain.
Advantages:
- Consistent style and format: The model learns your tone, terminology, and output patterns
- Faster inference: No retrieval step needed. Knowledge is baked in
- Smaller context needed: The model already "knows" the domain, so prompts can be shorter
- Better for specialised tasks: Classification, extraction, and formatting tasks benefit from fine-tuning
Disadvantages:
- Requires training data (high-quality examples)
- Expensive and time-consuming to train
- Model becomes stale as your data changes and needs retraining
- No source citations, so you can't trace answers to documents
- Still hallucinates. Fine-tuning doesn't eliminate fabrication
Side-by-side comparison
| Factor | RAG | Fine-Tuning |
|---|---|---|
| Setup time | Days to weeks | Weeks to months |
| Training data needed | No (just documents) | Yes (curated examples) |
| Data freshness | Always current | Stale until retrained |
| Source citations | Yes | No |
| Hallucination control | Good (grounded) | Moderate |
| Style/format control | Moderate (via prompting) | Strong |
| Inference cost | Higher (retrieval + generation) | Lower (generation only) |
| Training cost | None | Significant |
| Privacy | Data stays in vector DB | Data enters model weights |
When to use which
Use RAG when:
- You need answers grounded in specific, current documents
- Source attribution and auditability matter
- Your data changes frequently
- You want to keep data separate from the model
- You're building knowledge Q&A, search, or support systems
Use fine-tuning when:
- You need the model to consistently use a specific tone, format, or terminology
- You're building a classifier, extractor, or formatter for a narrow task
- Latency is critical and you can't afford the retrieval step
- You have high-quality training examples and the domain is stable
Our recommendation
For the vast majority of Australian business use cases, start with RAG. It's faster to deploy, easier to maintain, more transparent, and handles 80% of "make AI know about our data" requirements.
Fine-tune only when RAG genuinely isn't enough, typically for specialised classification tasks or when you need very specific output formatting that prompting can't achieve.
And you can combine them. Use RAG for knowledge retrieval and a fine-tuned model for the generation layer that produces output in your exact format. Best of both worlds.
Key takeaways
- RAG gives a general model access to your data at query time. Fine-tuning changes the model itself.
- RAG is faster to deploy, easier to update, and keeps your data out of the model weights.
- Fine-tuning is better for teaching the model a specific style, format, or domain language.
- For most business applications, start with RAG. Only fine-tune if RAG genuinely isn't enough.