© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Build scalable multi-tenant
databases with Amazon Aurora
D A T 3 1 8
Anum Jang Sher
Senior Product Manager
AWS
Peter Fein
Staff Engineer 2
VMware
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
Multi-tenant design patterns
Deep dive into Aurora’s key capabilities
Customer journey: VMware Aria Cost powered by CloudHealth
(formerly CloudHealth)
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Aurora
MySQL- and PostgreSQL-compatible relational database built for the cloud
High availability and
cross-Region
disaster recovery
Fully managed:
no hardware
provisioning, patching,
setup, or backups
Autoscaling compute,
storage, and IO
Performance of
commercial databases at
1/10
th
the price
The fastest growing service in the history of AWS
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Multi-tenant database design considerations
Isolation Scale
Cost
Operational complexity
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Single tenant vs. multi-tenant
Tenant n
Tenant 1
Tenant 1
Tenant 2
Tenant n
Highest degree of isolation
Tenant-level controls
Costly
Operational complexity
Single tenant per cluster
Multi-tenant cluster
Better utilization
Improved cost
Limited tenant-level tuning
Continuous monitoring
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Multi-tenant database isolation
Tenant 1
Tenant 2
Tenant n
TenantID ItemID Quantity
Tenant1 TRFD32 10
Tenant2 RFDQ32 4
Tenant3 TAED15 5
Single schema
Aurora cluster
Tenant 1
Tenant 2
ItemID Quantity
TRFD32 10
RFDQ32 4
TAED15 5
ItemID Quantity
TRFD32 10
RFDQ32 4
TAED15 5
Schema 1
Schema 2
Each tenant with their own
schema/database
Aurora cluster
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Scalable architecture
Availability Zone 1
Availability Zone 2 Availability Zone 3
Storage
Separation of storage
and compute
6 copies across 3 AZs for
high availability, durability,
and performance
Storage grows or shrinks
based on data size
Compute scales independently
15 low-latency readers to
scale reads
Purpose-built, log-structured, distributed
storage designed for cloud databases
Compute
Writer
Reader
Reader
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Scale globally
EU West
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Aurora Global Database
Low-latency read scalability across Regions, typically <1 sec lag
Up to 5 secondary Regions, 90 readers
Write forwarding to send occasional writes from secondary Regions, available for Aurora MySQL
Fast cross-Region disaster recovery, typically <1 min downtime
Writer
Reader
Storage
Region 1: primary cluster
Reader
Storage
Region 2: secondary cluster
Reader
Write
forwarding
Replication
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Scaling connections
Multi-tenant applications keep lots of connections
open for quick response times
Most connections might be idling
Idling connections consume database resources
Overprovisioning to handle connections
Preserve database resources for running your workload
Application
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon RDS Proxy
A F U L L Y M A N A G E D , H I G H L Y A V A I L A B L E D A T A B A S E P R O X Y F O R A M A Z O N R D S A N D A M A Z O N A U R O R A
Increase app availability and reduce DB
failover times
Manage app data security through integration
with AWS Secrets Manager and IAM
authentication
Pool and share DB connections for improved
app scaling
Application
RDS Proxy
Aurora
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Single tenant vs. multi-tenant
Tenant n
Tenant 1
Tenant 1
Tenant 2
Tenant n
Highest degree of isolation
Tenant-level controls
Costly
Operational complexity
Single tenant per cluster
Multi-tenant cluster
Better utilization
Improved cost
Limited tenant-level
tuning
Continuous monitoring
Tenant m
Tenant 1
Tenant 1
Tenant 2
Tenant n
Mix of single-tenant and
multi-tenant clusters
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Database capacity: Cost vs. management
Insufficient
capacity
Expensive, underutilized
Provision for
peak
Experience degradation
Continuously
monitor and scale
Difficult, requires experts,
involves downtime
OR
Tenant n
Tenant 1
Tenant 1
Tenant 2
Tenant n
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Aurora Serverless v2
Database scales within the min/max range based on the application workload
Capacity is measured in Aurora Capacity Unit (ACU)
1 ACU comes with 2 GiB of memory; CPU and networking similar to provisioned Aurora
instances
Fine-grained scaling with as little as 0.5 ACU (1 GiB) increments
Maximum capacity (ACUs)
Minimum capacity (ACUs)
P A Y - P E R - U S E , A U T O S C A L I N G C O N F I G U R A T I O N F O R A M A Z O N A U R O R A
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Scaling factors
Network throughput Memory utilization CPU utilization Predictable scaling rate
Bigger the instance,
faster the scaling rate
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Redistributing tenants
Evolving tenant requirements
Redistribute tenants to balance workload across clusters
Backup and restore is time-consuming depending on
the database size
Costly
Aurora cluster
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Fast database clones
Clone creates a new cluster that
uses the same storage as the
original cluster with copy-on-write
protocol
Creation of a clone takes a few
minutes and is independent of
data size
Operations on clone do not affect
source cluster
Pay for changed data only
Source
Fast database
clone
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Monitoring
Instance Operating system Database engine
Amazon CloudWatch Amazon RDS
Enhanced Monitoring
Amazon RDS
Performance Insights
Database engine
Amazon DevOps Guru
for RDS
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
VMware Aria Cost
Cost and operational efficiency journey
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
VMware Aria Cost powered by CloudHealth
Unified multi-cloud
management solution
SaaS born on AWS
20K+ customers
45 TB Relational Data
45 PB in S3
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Original shard architecture
Primary DB for the platform
Wide set of applications
Heavy R/W query load
Multi-tenant from the start
Organically grown MySQL on EC2 c5 with Chef
Binlog replicas standby for failover only
Shard count rapidly increased to support growth
MySQL 5.6
primary instance
MySQL 5.6
replica instance
A single ‘shard’
166x
~30TB Total
max 200 customers
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Multi-tenant design pattern
Groups of tenants on a single schema
~200 max premigration
Tenant data co-located in tables with strict
ID-based segmentation
Originally one schema per MySQL DB instance
Migration enabled multiple schemas on a single
Aurora cluster
Some important tenants get their own schema
Aurora cluster
Tena
nt ID
Qty
1 10
2 4
3 5
Schema1
Tenant
ID
Qty
4 10
5 4
6 5
Schema2
Tenant
ID
Qty
7 10
8 4
9 5
Schema3
EC2 MySQL
Tena
nt ID
Qty
1 10
2 4
3 5
Schema1
Tenant
ID
Qty
4 10
5 3
6 15
EC2 MySQL
Schema2
EC2 MySQL
Schema3
Migration schema
consolidation goal
Tenant
ID
Qty
7 10
8 4
9 5
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Multi-tenant data management
db1 vs tenant shards
db1 for reference data and unique id generation
Sync process replicates data to tenant shards
Extensive home-grown software to manage
Shard and customer context for query execution
Tools to run cross-shard queries
No changes for Aurora migration
db1
Reference data
Tenant shards
Tenant scoped data
Replication
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Challenges premigration
Limited capability to scale Overprovisioning
Dev and Operations Impact
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Project goals
Migrate 166 EC2 MySQL shards to Aurora clusters
Simplify operations and reduce interventions
Support horizontal and vertical multi-tenant growth
Improve platform performance and reliability
Stabilize costs
Alternatives?
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Timeline
Research and PoC Migration Starts Tuning Bulk Migration
Performance benchmark First Aurora cluster Schema consolidation Full automation
SQL Compatibility Proxy Integration AWS support RI and cost analysis
AWS Training and KT Tooling Performance review
Application optimization
Design Review Provisioning
Application optimization
Post-Mortem
January April June September
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Migration process
100% replication
compatibility between
MySQL 5.6 and Aurora
Supports multiple schemas
on an Aurora cluster
No platform downtime
Fully Automated for bulk
migrations
EC2 MySQL
Tena
nt ID
Qty
1 10
2 4
3 5
Schema 1
Aurora cluster
Tena
nt ID
Qty
1 10
2 4
3 5
Schema 1
Tenant
ID
Qty
4 10
5 4
6 5
Schema 2
Tenant
ID
Qty
7 10
8 4
9 5
Schema 3
1. MySQL Dump
SQL
Files
2. MySQL Load
3. Binlog
Replication
4. EC2 in Read only
5. Flip DNS to Aurora
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Storage
128 TB cluster limit for vertical and horizontal multi-tenant growth
Only pay for the storage we use
No EBS management or upsizing
Largest tenant uses 3.2 TB
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Readers
Initially used two node clusters
Application query load shifted to readers
Vertical scaling
Job read processing load increases
Horizontal scaling
Tenant size increases
Zero downtime scaling operations
Reader
r6g.4xl
Writer
r6g.4xl
Reader
r6g.4xl
Writer
r6g.4xl
Reader
r6g.4xl
Reader
r6g.8xl
Writer
r6g.8xl
Reader
r6g.8xl
Writer
r6g.8xl
Reader
r6g.8xl
Basic Cluster Scaling Patterns
Horizontal
(Read)
Vertical
(Write)
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
RDS Proxy
RDS Proxy with each Aurora Cluster. All
traffic goes through the proxy
Supports Aurora W and R endpoints
Connection scaling for multiple schemas
10 to 1 connection compression
Graceful failover
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Monitoring
All Monitoring through CloudWatch metrics
piped into VMware Wavefront
Slow query log analysis in CloudWatch
Lambda for events and automated operations
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
RDS Performance Insights
Single pane approach
Identify problematic
queries (full SQL text)
Make scaling decisions
Easy to use
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Performance
1952 r6g vCPUs replaced 5312 c5
vCPUs, a 64% reduction
7x customer increase per cluster
Faster writes
Faster DDL changes
Query cache handles 40% of
read queries
Parameter groups for tuning
Read replicas handle failover
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Post-migration architecture
62 Aurora clusters
10K+ DB clients in EC2, EKS,
EMR
1500+ customers per shard
Room to scale
Cluster volume
Provisioned
Writer
Provisioned
Reader
Provisioned
Reader
RDS Proxy
62x
Amazon EC2
Auto Scaling
300x
Amazon EMR
400x
Amazon Elastic Kubernetes
Service (Amazon EKS)
300 nodes
6000 pods
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Operations
Large savings for operations team
No new shards, no storage upsizing, no maintenance
windows
Chef replaced with Terraform, CloudFormation,
and Lambda
Reliable and quiet
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Costs
Blue: Premigration EC2 infrastructure
Red: RDS Aurora + Proxy
20% reduction from after final RI purchase
Savings keep growing over time
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Key learnings
Knowledge building is critical
Upfront training
Connected with service teams for deeper insights
Aurora + RDS Proxy a great pair
Aurora cluster performance exceeded expectations
Plan for legacy applications using Aurora reader endpoints
Wishlist:
Client host visibility with RDS proxy
RDS Performance Insights tailored for Aurora
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!
Please complete the session
survey in the mobile app
Anum Jang Sher
anujangs@amazon.com
Peter Fein