Why deploy RAG on AWS?
For Australian businesses, AWS is the most common choice for production RAG deployments. The reasons are practical:
- Sydney region (ap-southeast-2): All data stays in Australia — important for Privacy Act compliance
- Complete stack: Everything you need is available — compute, storage, vector search, LLM access
- Security: VPCs, encryption, IAM, CloudTrail logging — enterprise-grade by default
- AWS Bedrock: Access to Claude, Mistral, and other models via API, with data never leaving your account
- Most organisations are already on AWS: Reduces the integration overhead
Reference architecture
A typical production RAG deployment on AWS includes:
Ingestion pipeline
- S3: Document storage — PDFs, Word files, HTML. The source of truth.
- Lambda / ECS: Document processing — text extraction, chunking, metadata extraction.
- Amazon Bedrock: Embedding generation (Titan Embeddings or third-party models).
- OpenSearch / Aurora pgvector: Vector storage. Chunks + embeddings + metadata.
Query pipeline
- API Gateway + Lambda / ECS: API layer for user queries.
- OpenSearch / pgvector: Vector search to retrieve relevant chunks.
- Amazon Bedrock: LLM inference (Claude, Mistral) for answer generation.
- CloudWatch: Logging, monitoring, cost tracking.
Front-end
- CloudFront + S3: Static web app (React, Next.js) or integration into your existing app.
- Cognito: User authentication and access control.
Security layers
Network isolation
Deploy all components within a VPC. Use VPC endpoints for AWS services (S3, Bedrock, OpenSearch) so no traffic leaves the private network.
Encryption
- At rest: KMS-managed encryption for S3, OpenSearch, Aurora, and any EBS volumes.
- In transit: TLS everywhere. API calls, database connections, inter-service communication.
Access control
- IAM policies: Least-privilege access for all services and roles.
- Document-level permissions: Tag chunks with access levels and filter at query time.
- Cognito groups: Different users see different documents based on their role.
Audit trail
- CloudTrail: Log every API call and service interaction.
- Application logging: Log queries, retrievals, and generated answers for review.
- Retention policies: Define how long logs are kept, align with compliance requirements.
Australian data residency
For Privacy Act and APRA compliance, ensure:
- All S3 buckets are in
ap-southeast-2 - OpenSearch / Aurora instances are in
ap-southeast-2 - Bedrock model inference happens in
ap-southeast-2(Claude and Titan are available there) - No cross-region replication is enabled unless explicitly required
- CloudFront caching policies don't store sensitive data at edge locations
AWS Bedrock advantage: When you use Bedrock, your data is processed within your AWS account. It's not used to train models and doesn't leave the region.
Cost considerations
RAG on AWS typically costs less than people expect. For a mid-size deployment:
- Bedrock (LLM calls): $0.003–0.015 per 1K input tokens depending on model. Budget for your expected query volume.
- Embeddings: One-time cost per document. Titan Embeddings are very cheap.
- OpenSearch: From ~$200/month for a small cluster. Scales with data volume.
- Compute (Lambda/ECS): Pay per use for Lambda, or fixed for ECS. Depends on query volume.
- Storage (S3): Negligible for document storage.
Total cost for a typical deployment serving ~1,000 queries/day: $500–2,000/month. Significantly less than the manual effort it replaces.
Getting started
- Define your use case: What knowledge domain? What users? What questions?
- Audit your data: What format? How much? How often does it change?
- Set up infrastructure: VPC, S3, Bedrock access, vector database.
- Build ingestion pipeline: Process documents, generate embeddings, store in vector DB.
- Build query pipeline: API → retrieval → generation → response.
- Test with real users: Measure answer quality, iterate on chunking and prompts.
- Harden for production: Add monitoring, logging, error handling, access control.
Key takeaways
- AWS provides a complete stack for secure RAG — compute, storage, vector search, and LLM APIs.
- Deploy in ap-southeast-2 (Sydney) to keep all data in Australia.
- Use VPC endpoints, encryption at rest and in transit, and IAM policies to lock down the system.
- AWS Bedrock lets you use frontier models without your data leaving your AWS account.