Vector Databases Explained: The Storage Layer for AI Search
How vector databases store meaning and enable AI-powered search. What they are, how they work, and which options to consider for your RAG system.
How vector databases store meaning and enable AI-powered search. What they are, how they work, and which options to consider for your RAG system.
A vector database is a specialised database designed to store and search vectors: lists of numbers that represent the meaning of text, images, or other data. In the context of AI and RAG systems, it's where you store the embeddings of your document chunks.
Traditional databases search by exact match or keyword. Vector databases search by similarity. This means you can find content that's semantically similar to a query, even if it uses completely different words.
Here's a simplified version:
The "closeness" is measured using mathematical methods like cosine similarity or Euclidean distance. In practice, you don't need to worry about the maths. The database handles it.
Why this matters: A user searching for "annual leave policy" will find a document titled "Employee Holiday Entitlements" because the meanings are similar, even though the words are different.
In a RAG system, the vector database is the retrieval engine. When a user asks a question:
Without a vector database, you'd need to search by keyword, which misses synonyms, paraphrases, and conceptual matches. Keyword search finds "leave policy." Vector search also finds "time off entitlements," "holiday allowance," and "PTO guide."
| Database | Type | Best for |
|---|---|---|
| pgvector | PostgreSQL extension (free) | Teams already using PostgreSQL, smaller datasets, cost-sensitive |
| Pinecone | Managed cloud service | Fast setup, low ops overhead, scaling without management |
| Weaviate | Open-source / managed | Hybrid search (vector + keyword), complex filtering |
| OpenSearch | Open-source / AWS managed | Existing AWS stacks, combined text + vector search |
| Qdrant | Open-source / managed | High performance, advanced filtering, self-hosted |
| ChromaDB | Open-source | Prototyping, local development, small projects |
For most Australian business RAG deployments, we recommend:
The choice matters less than you think at the start. Most vector databases perform similarly for datasets under a million documents. Start with what's easy and migrate later if needed.
Tell us what you're working on. We'll come back with a practical recommendation and clear next steps.