Vector Databases for RAG

The role of vector databases in RAG

In a RAG system, the vector database is the retrieval layer. It stores document chunks as vector embeddings and, when a question comes in, finds the most semantically similar chunks. Those chunks become the context the LLM uses to generate an answer.

If your general understanding of vector databases is still forming, start with Vector Databases Explained for the fundamentals. This article focuses specifically on choosing and using a vector database within a RAG architecture.

How vector retrieval works

Embed the query. The user's question is converted into a vector using the same embedding model that was used for the document chunks.
Search the index. The vector database finds the chunks whose embeddings are closest to the query embedding. "Closest" is measured by cosine similarity, dot product, or Euclidean distance.
Apply filters. Optionally, filter results by metadata: document type, date range, access permissions, source system.
Return top-k results. The closest matching chunks (typically 3-10) are returned with their metadata and similarity scores.
Feed to the LLM. The retrieved chunks are included in the prompt as context for answer generation.

The quality of this retrieval step is the single biggest factor in RAG answer quality. A perfect LLM can't give good answers from irrelevant context.

Choosing a vector database

The decision isn't just about performance benchmarks. Consider:

Scale. How many vectors will you store? 10,000 chunks (small project) vs 10 million (enterprise corpus)?
Update frequency. How often do documents change? Daily updates need a database that handles upserts efficiently.
Existing infrastructure. Do you already run PostgreSQL? pgvector avoids adding a new system to your stack.
Metadata filtering. Do you need to filter by document type, date, or permissions alongside vector search?
Managed vs self-hosted. Managed services reduce operational burden. Self-hosted gives you full control and avoids data leaving your infrastructure.
Cost. Some vector databases charge by stored vector count, others by compute, others by query volume. Model the cost at your expected scale.

Popular options

pgvector (PostgreSQL extension)

Adds vector storage and similarity search to PostgreSQL. If you already run Postgres, this is the lowest-friction option. No new infrastructure, no new vendor, and your vectors live alongside your relational data.

Good for: Small to medium datasets (up to a few million vectors). Teams that want simplicity. Projects where Postgres is already in the stack.

Limitations: Performance degrades with very large datasets compared to purpose-built vector databases. Indexing options (HNSW, IVFFlat) have trade-offs between build time and query speed.

Pinecone

Fully managed vector database as a service. Designed specifically for vector search at scale. Handles indexing, scaling, and operations for you.

Good for: Teams that want zero operational overhead. Large-scale deployments. Fast time to production.

Limitations: Vendor lock-in. Data is stored on Pinecone's infrastructure (though they offer AWS PrivateLink and dedicated instances). Cost can scale quickly with volume.

Weaviate

Open-source vector database with built-in hybrid search (vector + BM25 keyword search). Can be self-hosted or used as a managed service. Rich module ecosystem for vectorisation, classification, and more.

Good for: Teams that want hybrid search out of the box. Self-hosted deployments where data must stay on your infrastructure. Complex schema requirements.

Limitations: More complex to operate than pgvector. Resource requirements can be significant for large indexes.

Qdrant

Open-source, written in Rust, designed for high performance. Strong filtering capabilities and efficient memory usage. Available as managed cloud or self-hosted.

Good for: High-performance requirements. Large-scale deployments with complex filtering needs.

Amazon OpenSearch with vector search

If you're already on AWS and using OpenSearch for other search needs, its k-NN plugin adds vector search capabilities. Combines full-text search and vector search in one system.

Good for: AWS-native architectures. Teams already using OpenSearch. Hybrid search requirements.

Indexing strategies

How vectors are indexed affects query speed and accuracy:

HNSW (Hierarchical Navigable Small World). The most common index type. Builds a graph structure for fast approximate nearest neighbour search. Good balance of speed and accuracy. Higher memory usage.
IVFFlat (Inverted File with Flat storage). Clusters vectors into groups and only searches the most relevant clusters. Faster index building than HNSW. Slightly lower recall.
Flat/brute force. Compares the query against every vector. Perfect accuracy but doesn't scale. Fine for datasets under 50,000 vectors.

For most RAG deployments, HNSW is the right choice. It provides sub-millisecond search times for datasets up to tens of millions of vectors with minimal accuracy loss.

Hybrid search

Pure vector search finds semantically similar content. That's powerful, but it can miss results that match exact terms. A search for "ISO 27001" should find documents containing that exact phrase, even if the embedding doesn't prioritise it.

Hybrid search combines vector similarity with keyword matching (typically BM25). The results from both are merged using reciprocal rank fusion or a weighted combination.

For business document collections, hybrid search almost always outperforms either approach alone. Enable it if your vector database supports it.

Scaling considerations

Embedding dimension matters. Higher-dimension embeddings (1536 for OpenAI, 3072 for newer models) use more memory per vector. At millions of vectors, this drives infrastructure costs.
Quantisation reduces cost. Compressing vectors from full-precision to 8-bit or binary reduces memory usage significantly with a small accuracy trade-off. Most databases support this.
Partitioning by namespace or collection. In multi-tenant systems, separate each tenant's data into its own namespace. This enforces isolation and improves query performance by searching a smaller index.
Cold vs hot data. Documents that are rarely queried can be stored in cheaper, slower storage. Active document sets stay in the primary index.

FAQ

Do I need a dedicated vector database or can I use pgvector?

For most projects with under 1-2 million vectors, pgvector is perfectly adequate. Switch to a purpose-built vector database when you need higher throughput, larger scale, or features like built-in hybrid search that pgvector doesn't offer natively.

How do I handle document updates?

When a document changes, delete its old chunks from the vector store and insert the new ones. Most vector databases support deletion by metadata filter (e.g. delete all chunks where source_document = "policy-v2.pdf"). Build this into your ingestion pipeline.

Can I use multiple embedding models?

Not in the same index. The embedding model used for documents must match the one used for queries. If you switch models, you need to re-embed your entire corpus. Choose your embedding model carefully and plan re-embedding into major model upgrade cycles.

What about cost at scale?

A typical RAG system with 100,000 document chunks using pgvector on a modest cloud instance costs $50-150/month for the vector storage layer. Managed services like Pinecone start at $70/month for a base tier and scale with usage. Enterprise-scale deployments (millions of vectors, high query volume) can cost $500-2,000+/month depending on the provider and configuration.

Key takeaways

The vector database is the retrieval engine of a RAG system. Its quality determines whether the right documents get found.
For most projects, pgvector (PostgreSQL extension) is the smart starting point. No new infrastructure, good enough performance.
Hybrid search (vector + keyword) outperforms either approach alone for most business document collections.
Plan for data updates. A vector database that is hard to update becomes stale fast.

Postgraduate Researcher (AI & RAG), Curtin University - Western Australia

View profile →