Secure RAG on AWS: Architecture, Security & Data Residency

How to deploy a retrieval-augmented generation system on AWS with proper security, VPCs, encryption, and Australian data residency.

Kasun Wijayamanna Founder & Lead Developer Postgraduate Researcher (AI & RAG), Curtin University - Western Australia

Published 20 March 2025 Updated 1 June 2025

Why deploy RAG on AWS?

For Australian businesses, AWS is the most common choice for production RAG deployments. The reasons are practical:

Sydney region (ap-southeast-2): All data stays in Australia, which is important for Privacy Act compliance
Complete stack: Everything you need is available: compute, storage, vector search, LLM access
Security: VPCs, encryption, IAM, and CloudTrail logging. Enterprise-grade by default
AWS Bedrock: Access to Claude, Mistral, and other models via API, with data never leaving your account
Most organisations are already on AWS: Reduces the integration overhead

Reference architecture

A typical production RAG deployment on AWS includes:

Ingestion pipeline

S3: Document storage (PDFs, Word files, HTML). The source of truth.
Lambda / ECS: Document processing: text extraction, chunking, metadata extraction.
Amazon Bedrock: Embedding generation (Titan Embeddings or third-party models).
OpenSearch / Aurora pgvector: Vector storage. Chunks + embeddings + metadata.

Query pipeline

API Gateway + Lambda / ECS: API layer for user queries.
OpenSearch / pgvector: Vector search to retrieve relevant chunks.
Amazon Bedrock: LLM inference (Claude, Mistral) for answer generation.
CloudWatch: Logging, monitoring, cost tracking.

Front-end

CloudFront + S3: Static web app (React, Next.js) or integration into your existing app.
Cognito: User authentication and access control.

Security layers

Network isolation

Deploy all components within a VPC. Use VPC endpoints for AWS services (S3, Bedrock, OpenSearch) so no traffic leaves the private network.

Encryption

At rest: KMS-managed encryption for S3, OpenSearch, Aurora, and any EBS volumes.
In transit: TLS everywhere. API calls, database connections, inter-service communication.

Access control

IAM policies: Least-privilege access for all services and roles.
Document-level permissions: Tag chunks with access levels and filter at query time.
Cognito groups: Different users see different documents based on their role.

Audit trail

CloudTrail: Log every API call and service interaction.
Application logging: Log queries, retrievals, and generated answers for review.
Retention policies: Define how long logs are kept, align with compliance requirements.

Australian data residency

For Privacy Act and APRA compliance, ensure:

All S3 buckets are in ap-southeast-2
OpenSearch / Aurora instances are in ap-southeast-2
Bedrock model inference happens in ap-southeast-2 (Claude and Titan are available there)
No cross-region replication is enabled unless explicitly required
CloudFront caching policies don't store sensitive data at edge locations

AWS Bedrock advantage: When you use Bedrock, your data is processed within your AWS account. It's not used to train models and doesn't leave the region.

Cost considerations

RAG on AWS typically costs less than people expect. For a mid-size deployment:

Bedrock (LLM calls): $0.003–0.015 per 1K input tokens depending on model. Budget for your expected query volume.
Embeddings: One-time cost per document. Titan Embeddings are very cheap.
OpenSearch: From ~$200/month for a small cluster. Scales with data volume.
Compute (Lambda/ECS): Pay per use for Lambda, or fixed for ECS. Depends on query volume.
Storage (S3): Negligible for document storage.

Total cost for a typical deployment serving ~1,000 queries/day: $500–2,000/month. Significantly less than the manual effort it replaces.

Getting started

Define your use case: What knowledge domain? What users? What questions?
Audit your data: What format? How much? How often does it change?
Set up infrastructure: VPC, S3, Bedrock access, vector database.
Build ingestion pipeline: Process documents, generate embeddings, store in vector DB.
Build query pipeline: API → retrieval → generation → response.
Test with real users: Measure answer quality, iterate on chunking and prompts.
Harden for production: Add monitoring, logging, error handling, access control.

Key takeaways

AWS provides a complete stack for secure RAG: compute, storage, vector search, and LLM APIs.
Deploy in ap-southeast-2 (Sydney) to keep all data in Australia.
Use VPC endpoints, encryption at rest and in transit, and IAM policies to lock down the system.
AWS Bedrock lets you use frontier models without your data leaving your AWS account.

Postgraduate Researcher (AI & RAG), Curtin University - Western Australia

View profile →