Vector Databases for RAG
How vector databases power retrieval in RAG systems. Choosing the right database, indexing strategies, hybrid search, and scaling for production workloads.
How vector databases power retrieval in RAG systems. Choosing the right database, indexing strategies, hybrid search, and scaling for production workloads.
In a RAG system, the vector database is the retrieval layer. It stores document chunks as vector embeddings and, when a question comes in, finds the most semantically similar chunks. Those chunks become the context the LLM uses to generate an answer.
If your general understanding of vector databases is still forming, start with Vector Databases Explained for the fundamentals. This article focuses specifically on choosing and using a vector database within a RAG architecture.
The quality of this retrieval step is the single biggest factor in RAG answer quality. A perfect LLM can't give good answers from irrelevant context.
The decision isn't just about performance benchmarks. Consider:
Adds vector storage and similarity search to PostgreSQL. If you already run Postgres, this is the lowest-friction option. No new infrastructure, no new vendor, and your vectors live alongside your relational data.
Good for: Small to medium datasets (up to a few million vectors). Teams that want simplicity. Projects where Postgres is already in the stack.
Limitations: Performance degrades with very large datasets compared to purpose-built vector databases. Indexing options (HNSW, IVFFlat) have trade-offs between build time and query speed.
Fully managed vector database as a service. Designed specifically for vector search at scale. Handles indexing, scaling, and operations for you.
Good for: Teams that want zero operational overhead. Large-scale deployments. Fast time to production.
Limitations: Vendor lock-in. Data is stored on Pinecone's infrastructure (though they offer AWS PrivateLink and dedicated instances). Cost can scale quickly with volume.
Open-source vector database with built-in hybrid search (vector + BM25 keyword search). Can be self-hosted or used as a managed service. Rich module ecosystem for vectorisation, classification, and more.
Good for: Teams that want hybrid search out of the box. Self-hosted deployments where data must stay on your infrastructure. Complex schema requirements.
Limitations: More complex to operate than pgvector. Resource requirements can be significant for large indexes.
Open-source, written in Rust, designed for high performance. Strong filtering capabilities and efficient memory usage. Available as managed cloud or self-hosted.
Good for: High-performance requirements. Large-scale deployments with complex filtering needs.
If you're already on AWS and using OpenSearch for other search needs, its k-NN plugin adds vector search capabilities. Combines full-text search and vector search in one system.
Good for: AWS-native architectures. Teams already using OpenSearch. Hybrid search requirements.
How vectors are indexed affects query speed and accuracy:
For most RAG deployments, HNSW is the right choice. It provides sub-millisecond search times for datasets up to tens of millions of vectors with minimal accuracy loss.
Pure vector search finds semantically similar content. That's powerful, but it can miss results that match exact terms. A search for "ISO 27001" should find documents containing that exact phrase, even if the embedding doesn't prioritise it.
Hybrid search combines vector similarity with keyword matching (typically BM25). The results from both are merged using reciprocal rank fusion or a weighted combination.
For business document collections, hybrid search almost always outperforms either approach alone. Enable it if your vector database supports it.
For most projects with under 1-2 million vectors, pgvector is perfectly adequate. Switch to a purpose-built vector database when you need higher throughput, larger scale, or features like built-in hybrid search that pgvector doesn't offer natively.
When a document changes, delete its old chunks from the vector store and insert the new ones. Most vector databases support deletion by metadata filter (e.g. delete all chunks where source_document = "policy-v2.pdf"). Build this into your ingestion pipeline.
Not in the same index. The embedding model used for documents must match the one used for queries. If you switch models, you need to re-embed your entire corpus. Choose your embedding model carefully and plan re-embedding into major model upgrade cycles.
A typical RAG system with 100,000 document chunks using pgvector on a modest cloud instance costs $50-150/month for the vector storage layer. Managed services like Pinecone start at $70/month for a base tier and scale with usage. Enterprise-scale deployments (millions of vectors, high query volume) can cost $500-2,000+/month depending on the provider and configuration.
Tell us what you're working on. We'll come back with a practical recommendation and clear next steps.