RAG Database Comparison: Open Source and Managed Options by Scale
A practical comparison of RAG databases by scale. Open source and managed options for proof of concept, small volume, larger volume, and enterprise workloads. Plus scaling techniques and how to pick.
Kasun WijayamannaFounder & Lead DeveloperPostgraduate Researcher (AI & RAG), Curtin University - Western Australia
What counts as a RAG database
The phrase "RAG database" usually means the vector store that holds your document chunks. In practice, a working RAG system needs three things from its database layer:
Vector similarity search. Find chunks whose embeddings are close to the query embedding.
Metadata filtering. Restrict results by document type, source system, date, tenant, or access permission.
Update and delete operations. Documents change. The store must support clean upserts and deletions, not just inserts.
Pure vector libraries (FAISS, Annoy, ScaNN) do the first one well but are not databases. They have no persistence layer, no concurrent access control, no transactional updates. They belong inside a database, not as one.
A few systems blur the line. PostgreSQL with the pgvector extension is a full relational database that also does vector search. Elasticsearch and OpenSearch are search engines that added vector fields. MongoDB Atlas added vector search to a document database. These hybrid systems often beat pure vector databases on operational simplicity because you already know how to run them.
Open source vs managed
Every database in this article is either open source, available as a managed service, or both.
Open source databases (pgvector, Chroma, LanceDB, Qdrant, Weaviate, Milvus, Vespa, OpenSearch) give you full control of the data, no per-vector pricing, and no vendor lock-in. The cost is operational: you run the cluster, you patch it, you monitor it, you pay for the compute. Most open source vector databases also offer a managed cloud version run by the same company, which is often the pragmatic choice.
Managed services (Pinecone, Vertex AI Search, Azure AI Search, Bedrock Knowledge Bases, MongoDB Atlas, Redis Cloud) trade flexibility for speed of delivery. You skip the operational layer entirely. The trade-off is data residency considerations, less control over indexing, and pricing that scales with usage.
For an Australian business with sensitive data, an open source vector database self-hosted in an AWS Sydney region (or Azure Australia East, or GCP Australia Southeast 1) is often the right answer. Data stays on infrastructure you control, but you do not have to operate the storage hardware.
Proof of concept tier
Scale: under 100,000 vectors. A few hundred documents at most. One node, single process, no high availability needed. The goal is to prove the RAG idea works on your data, not to ship to production.
What matters at this tier: how fast you can ingest, query, and tear down. You do not want to spend a week setting up infrastructure to test a hypothesis.
Recommended options for PoC
Chroma (open source). Python-first, runs in-process or against a local DuckDB file. Zero setup. Ideal for notebook-driven exploration.
LanceDB (open source). Embedded database, stores vectors and metadata in Lance format on disk. Faster than Chroma for medium PoCs and survives restarts cleanly. Good when the PoC is heading for a small production deployment later.
pgvector (open source). A PostgreSQL extension. If you have a Postgres dev environment already, adding a vector column is one CREATE EXTENSION away. Works at any scale from 100 vectors to a million.
Qdrant (open source, local mode). A single Docker container. Same query API as the production cloud version, so the path from PoC to production is short.
What to avoid at this tier: anything that requires a managed cloud account, a sales call, or a multi-node deployment. The PoC tier is about iteration speed, not raw performance.
Small volume tier
Scale: 100,000 to 1 million vectors. A few thousand to tens of thousands of documents. Single tenant, modest query volume (under 10 queries per second). This is where most internal knowledge assistants, customer support bots, and small business document search systems land.
What matters: durability, backups, predictable query latency, and ideally hybrid search so exact-term queries (ISO codes, model numbers, policy IDs) still hit.
Recommended options for small volume
pgvector on a managed Postgres (open source). AWS RDS, Azure Database for PostgreSQL, GCP Cloud SQL, or Supabase all support pgvector. You get backups, point-in-time restore, and standard SQL alongside your vectors. The smartest default for most projects in this tier.
Qdrant (open source or Qdrant Cloud). Designed for fast vector search with strong filtering. Self-host on a small VM or use the managed cloud. Built-in scalar quantization brings memory usage down without hurting recall much.
Weaviate (open source or Weaviate Cloud). Strong hybrid search out of the box (vector plus BM25 in a single query). Good schema modelling if your documents have rich metadata.
Pinecone Serverless (managed). Pay per use, no infrastructure to manage. Reasonable choice when the team has no DevOps capacity at all.
Redis Stack with vector search (managed or open source). Works well if Redis is already in your stack and the dataset stays in memory. Latency is excellent.
The deciding factor in this tier is usually existing infrastructure. If you run Postgres, use pgvector. If you run Redis, use Redis vector. If you have nothing, Qdrant Cloud or Pinecone Serverless gives you the fastest path to production.
Larger volume tier
Scale: 1 million to 50 million vectors. Hundreds of thousands of documents, or millions of small chunks (think product catalogues, support ticket archives, multi-year knowledge bases). Query volume of 10 to 200 queries per second. May span multiple tenants.
What matters: index choice, sharding, query latency under load, hybrid search, and ideally quantization to keep memory costs sensible. Single-node databases start to creak. You want something that can scale horizontally.
Recommended options for larger volume
Qdrant Cloud or self-hosted cluster (open source). Built-in horizontal sharding and replication. Quantization (scalar, binary) brings memory down a lot. The Rust implementation gives strong per-node throughput.
Weaviate Cloud or self-hosted (open source). Multi-node deployment supported. Hybrid search remains a strong differentiator at this scale.
Milvus (open source) or Zilliz Cloud (managed). Purpose-built for large vector workloads. Multiple index types (HNSW, IVF, DiskANN). Strong story for multi-billion vector deployments, so this tier is comfortable territory.
Pinecone Standard (managed). Scales smoothly into this range. Sharding and replication are handled for you. The price climbs but operational complexity stays flat.
OpenSearch or Elasticsearch with vector fields (open source or managed). Excellent option if you already run one of these for full-text search. Combines BM25 and vectors in a single query. Mature operational tooling.
MongoDB Atlas Vector Search (managed). Sensible if your document data is already in MongoDB. Avoids running a separate vector system.
At this tier, indexing choice matters. HNSW gives the best latency but uses more memory. IVF saves memory at slightly lower recall. DiskANN (available in Milvus and Vespa) keeps most of the index on disk and is the right answer when memory cost becomes the bottleneck.
Enterprise tier
Scale: 50 million vectors and up, often into the billions. Multi-tenant by design. Strict requirements around isolation, audit logging, data residency, encryption at rest and in transit, role-based access, and SLAs measured in nines. Query patterns include high concurrency, mixed workloads (search plus filter plus aggregate), and global geographic distribution.
What matters: the maturity of the platform around the database (observability, security, compliance), not just the database itself. The vector search engine is one component in a larger system.
Recommended options for enterprise
Milvus on Kubernetes (open source). The most established open source vector database for billion-vector scale. Used in production at large internet companies. Operationally demanding but gives you full control. Zilliz Cloud is the same engine as a managed service.
Vespa (open source). Originally built by Yahoo for web-scale search. Handles vectors, text, and structured data in one engine. Excellent for hybrid retrieval at scale. Higher learning curve but the ceiling is very high.
Vertex AI Search (Google managed). Includes vector search as part of a broader managed search platform. Good if you are already on GCP. Built-in connectors for common enterprise sources.
Azure AI Search (Microsoft managed). Same idea on Azure. Tightly integrated with Azure OpenAI and Microsoft 365 data. Strong choice for enterprises already running on Microsoft.
Bedrock Knowledge Bases (AWS managed). AWS-native RAG infrastructure. Vectors stored in OpenSearch Serverless, Pinecone, Aurora pgvector, or Redis Enterprise underneath. Saves the team from stitching the pieces together.
Elasticsearch Enterprise (managed or self-hosted with paid licence). Mature search platform with vector support added. Strong for organisations that already have ES expertise in house.
At this tier, the decision is rarely made on benchmarks. It is made on what your security team will approve, what your data residency rules allow, and what your platform team can operate. Pick the option that fits the organisation, not just the workload.
Scaling options and what they unlock
The same database can serve very different scales depending on which scaling features you turn on. Understanding the levers helps you avoid moving databases when load grows.
Index type
The biggest single lever. The index determines how the database finds nearest neighbours without comparing the query against every stored vector.
HNSW (Hierarchical Navigable Small World). The default for most modern vector databases. Sub-millisecond search for tens of millions of vectors. Higher memory cost. Supported in pgvector, Qdrant, Weaviate, Milvus, Pinecone, Vespa, Elasticsearch, OpenSearch.
IVF (Inverted File). Clusters vectors and only searches the most relevant clusters. Lower memory than HNSW. Slightly lower recall. Good for very large datasets where memory cost matters.
DiskANN. Keeps most of the index on SSD instead of RAM. Cuts memory cost dramatically. Used in Milvus, Vespa, and Azure AI Search. The right choice past about 100 million vectors when memory becomes the bottleneck.
ScaNN (Google). Optimised for very large in-memory indexes. Used inside Vertex AI Search.
Quantization
Compress the stored vectors so they take less memory. A 1536-dimensional float32 vector takes 6 KB. The same vector quantized to int8 takes 1.5 KB. Binary quantization takes 192 bytes. The trade-off is recall, but in practice the accuracy drop is small (often under 2%) for huge memory savings.
Most production-grade vector databases support at least scalar quantization. Qdrant, Milvus, and Vespa support binary quantization, which is the most aggressive compression and the most useful at very large scale.
Sharding
Split the index across multiple nodes. Each shard holds a portion of the vectors. Queries fan out to all shards and merge results. This is the standard horizontal scaling pattern.
Qdrant, Weaviate, Milvus, Pinecone, Elasticsearch, and OpenSearch all support sharding natively. pgvector does not shard automatically; if you outgrow a single Postgres node, you typically move to a purpose-built vector database rather than sharding pgvector manually.
Replication
Keep multiple copies of each shard for availability and read throughput. Replication is essential for production deployments and standard in every database listed here at the managed-cloud level. Self-hosted replication adds operational complexity, which is why a managed service starts to look attractive once HA matters.
Hybrid search
Combine vector similarity with keyword (BM25) search. Pure vectors miss exact-term queries. Pure keyword search misses semantic matches. Hybrid search beats either alone for almost every business document collection.
Weaviate, Qdrant, Vespa, Elasticsearch, OpenSearch, MongoDB Atlas, and Azure AI Search support hybrid search natively. Pinecone supports it through sparse-dense vectors. pgvector can do it with a CTE that combines a pg_trgm or full-text query with the vector search.
Hot and cold tiering
Frequently queried vectors stay in fast storage (RAM, NVMe SSD). Rarely queried vectors move to cheaper storage (S3, Azure Blob, Google Cloud Storage). The database reloads them on demand. This is how vector databases handle billions of vectors without blowing the storage budget.
Pinecone, Milvus, and OpenSearch Serverless implement this natively. Self-hosted setups can approximate it with thoughtful collection design.
Multi-tenancy
Most enterprise RAG systems serve multiple customers, departments, or business units from the same vector database. There are three common patterns:
Namespaces or collections. Each tenant has its own logical partition inside the same database. Cheap, fast, but tenants share the underlying infrastructure. Use a metadata filter to enforce isolation. Supported by Pinecone, Qdrant, Weaviate, and Milvus.
Separate indexes per tenant. Stronger isolation. Each tenant has a dedicated index. More overhead, especially at hundreds or thousands of tenants. Useful when tenants have very different data sizes or compliance requirements.
Separate databases per tenant. Used only when regulation or contract requires full physical isolation. Most expensive option and operationally heavy.
Metadata filtering performance
Searching only documents from the last 30 days, or only documents tagged as "policy", is a metadata filter applied alongside the vector search. The way the database implements this matters at scale.
Pre-filter implementations apply the metadata filter first and then search vectors within the filtered set. Fast when the filter is selective, slow when the filter matches most of the corpus. Post-filter implementations search vectors first and then filter. The opposite trade-off. Qdrant and Weaviate let you tune this. Pinecone and Milvus do it automatically.
A short decision framework
If you want a one-paragraph answer for picking a database, here it is.
If you already run Postgres and your dataset is under a million vectors, use pgvector. Re-evaluate when you hit 5 million vectors or query latency degrades.
If you have no existing database and no DevOps capacity, use Pinecone Serverless or Qdrant Cloud. The path to production is fastest.
If hybrid search matters from day one, use Weaviate, Elasticsearch with vector, or OpenSearch. They handle vector and BM25 in a single query natively.
If you are heading toward tens of millions of vectors and want to stay on open source, plan for Qdrant, Weaviate, or Milvus.
If you are on AWS and want the least integration work, use Bedrock Knowledge Bases. On Azure, Azure AI Search. On GCP, Vertex AI Search.
If you are at enterprise scale with strict compliance, the database is less important than the platform around it. Pick the one your security team will approve.
FAQ
Should I start with pgvector or jump straight to a vector database?
Start with pgvector if you already run Postgres. Most RAG projects never outgrow it. Move to a purpose-built vector database when you can show a real reason (recall drops, latency exceeds your target, dataset size makes Postgres uncomfortable). Moving too early costs more than moving too late.
What about FAISS, Annoy, and ScaNN? Can I use them as a database?
No. They are libraries for similarity search, not databases. They have no persistence, no concurrent writes, no transactional updates, no metadata filtering. Most production vector databases use FAISS or a similar library inside, with a database layer built around it. Use them directly only inside a single-process application where you control the lifecycle (rebuilds, restarts, deduplication).
Is open source always cheaper than managed?
At small scale, no. The fixed cost of a small managed instance is often lower than the engineering time to set up and operate a self-hosted database. Open source becomes cheaper as the dataset grows, because per-vector pricing on managed services scales linearly while a self-hosted cluster scales more like a step function.
Can I migrate later?
Yes. The data format is the same across most databases (high-dimensional float vectors plus metadata). Re-embedding is rarely needed if you stick with the same embedding model. The bigger question is whether your application code is coupled to a specific database. Use a thin retrieval interface so the underlying store is swappable.
How do I handle multi-tenant data isolation in a RAG system?
For most cases, namespaces or collections inside a shared database give the right balance of cost and isolation. Enforce tenancy at the query layer by always filtering on a tenant_id metadata field. For regulated industries (healthcare, finance, government), separate indexes per tenant or separate databases may be required. Match the isolation strategy to the actual regulatory and contractual requirements rather than overengineering.
Do I need a graph database for advanced RAG?
For most knowledge bases, no. Vector search plus metadata filtering covers the use cases. Graph databases (Neo4j, ArangoDB) make sense when relationships between entities are core to retrieval (think medical records, supply chain entities, organisational hierarchies). Some teams combine a vector database for semantic retrieval with a graph database for entity-relation lookups. This is sometimes called "GraphRAG" and is still an active area of work.
How often should I re-evaluate the database choice?
Annually, or when your scale changes by 10x. Vector database tooling is moving fast. A database that was the right answer 18 months ago may not be today. Keep the retrieval layer behind a clean interface so swapping is not a rewrite.
Key takeaways
There is no single best RAG database. The right one depends on your scale, your existing infrastructure, and how much operational work you want to take on.
For most teams under a million vectors, pgvector on PostgreSQL or a managed Pinecone Serverless tier covers the use case with the least new infrastructure.
At larger volumes, the conversation shifts to indexing choice, sharding, and hybrid search. Qdrant, Weaviate, and Milvus dominate the open source side. Pinecone, Vertex AI Search, and Azure AI Search dominate the managed side.
Enterprise tier is less about database choice and more about isolation, compliance, and the operational maturity of the platform around the database.
You can always migrate. Pick what fits the current scale and re-evaluate when load doubles.