Scaling the Platform for your Startup Dean Bryen, AWS Solutions Architecture Peter Mounce, Senior Software Developer at JUST EAT
Scaling the Platform for your StartupDean Bryen, AWS Solutions ArchitecturePeter Mounce, Senior Software Developer at JUST EAT
Why are you here?• Building the technology platform for your startup• You want to prepare for success• Learn about design patterns & scalability• A pragmatic approach for startups
Priorities for startups• Racing within a window of opportunity• Small team with no legacy• Focus on solving a problem• Avoid over-engineering & re-engineering• Reduce risk of failure when you go viral
A scalable architecture• Can support growth in users, traffic, data size • Without practical limits• Without a drop in performance• Seamlessly - just by adding more resources• Efficiently - in terms of cost per user
Day 1 – Dev & private beta
Single host
THE server(e.g. Apache,
MySQL)
Elastic IPwww.example.com
Amazon Route 53DNS service
Server Image (AMI)
Day 2 - Public beta
We need a bigger server• Add larger & faster storage (EBS)• Use the right instance type• Easy to change instance sizes• Not our long term strategy• Will hit an endpoint eventually• No fault tolerance
Separating web and DB• More capacity• Scale each tier individually• Tailor instance for each tier
– Instance type– Storage
• Security– Security groups– DB in a private VPC subnet
But how do I choose what DB technology I need?
SQL? NoSQL?
Why start with a Relational DB?• SQL is versatile & feature-rich• Lots of existing code, tools, knowledge• Clear patterns to scalability (for read-heavy apps)• Reality: eventually you will have a polyglot data layer
– There will be workloads where NoSQL is a better fit– Combination of both Relational and NoSQL– Use the right tool for each workload
Key Insight: Relational Databases are Complex
• Our experience running Amazon.com taught us that relational databases can be a pain to manage and operate with high availability
• Poorly managed relational databases are a leading cause of lost sleep and downtime in the IT world!
• Especially for startups with small teams
Relational DatabasesMySQL, Aurora, PostgreSQL, Oracle, SQL Server
Fully managed; zero adminAmazonRDS
Aurora
Improving efficiency
Offload static content• Amazon S3: highly available hosting that scales
– Static files (JavaScript, CSS, images)– User uploads
• S3 URLs – serve directly from S3• Let the web server focus on dynamic content
Amazon CloudFront• Worldwide network of edge locations• Cache on the edge
– Reduce latency– Reduce load on origin servers – Static and dynamic content– Even few seconds caching of popular content can have huge impact
• Connection optimizations– Optimize transfer route– Reuse connections– Benefits even non cachable content
CloudFront for static & dynamic content
AmazonRoute 53
EC2 instance(s)
S3 bucket
Static content
Dynamic content
css/*js/*Images/*
Default(*)
CloudFront
distribution
Database caching• Faster response from RAM• Reduce load on database
Application server
1. If data in cache, return result
2. If not in cache, read from DB
RDS database
Amazon ElastiCache
3. And store in cache
Amazon ElastiCache: in-memory cache• Simple to Deploy • Managed
– Automatically replaces failed nodes– Patch management
• Elastic• Compatible
ElastiCache
Day 3 – Paying customers
High Availability
Availability Zone a
RDS DB instance
Web server
S3 bucket forstatic assets
www.example.com
Amazon Route 53DNS service
Amazon CloudFront
ElastiCachenode 1
High Availability
Availability Zone a
RDS DB instance
Availability Zone b
Web server
Web server
S3 bucket forstatic assets
www.example.com
Amazon Route 53DNS service
Amazon CloudFront
ElastiCachenode 1
High Availability
Availability Zone a
RDS DB instance
Availability Zone b
www.example.com
Amazon Route 53DNS service
Elastic LoadBalancing
Web server
Web server
S3 bucket forstatic assets
Amazon CloudFront
ElastiCachenode 1
Elastic Load Balancing• Managed Load Balancing Service• Fault tolerant• Health Checks• Distributes traffic across AZs• Elastic – automatically scales its capacity
High Availability
Availability Zone a
RDS DB instance
Availability Zone b
www.example.com
Amazon Route 53DNS service
Elastic LoadBalancing
Web server
Web server
S3 bucket forstatic assets
ElastiCachenode 1
Amazon CloudFront
High Availability
Availability Zone a
RDS DB instance
Availability Zone b
www.example.com
Amazon Route 53DNS service
Elastic LoadBalancing
Web server
Web server
RDS DBstandby
S3 bucket forstatic assets
ElastiCachenode 1
Amazon CloudFront
Data layer HA
Availability Zone a
RDS DB instance
ElastiCachenode 1
Availability Zone b
S3 bucket forstatic assets
www.example.com
Amazon Route 53DNS service
Elastic LoadBalancing
Web server
Web server
RDS DBstandby
Data layer HA
Availability Zone a
RDS DB instance
ElastiCachenode 1
Availability Zone b
S3 bucket forstatic assets
www.example.com
Amazon Route 53DNS service
Elastic LoadBalancing
Web server
Web server
RDS DBstandby
ElastiCachenode 2
User sessions• Problem: Often stored on local disk (not shared) • Quickfix: ELB Session stickiness• Solution: DynamoDB
Elastic LoadBalancing
Web server
Web server
Logged in Logged out
Amazon DynamoDB• Managed document and key-value store• Simple to launch and scale• To millions of IOPS• Both reads and writes
• Consistent, fast performance• Durable: perfect for storage of session data
https://github.com/aws/aws-‐dynamodb-‐session-‐tomcat
http://docs.aws.amazon.com/aws-‐sdk-‐php/guide/latest/feature-‐dynamodb-‐session-‐handler.html
Day 4 – Let’s go viral!
Replace guesswork with elastic IT
Startups pre-‐AWS
Demand
Unhappy Customers
Waste $$$
Traditional
Capacity
Capacity
Demand
AWS Cloud
Scaling the web tier
Availability Zone a
RDS DB instance
ElastiCachenode 1
Availability Zone b
S3 bucket forstatic assets
www.example.com
Amazon Route 53DNS service
Elastic LoadBalancing
Web server
Web server
RDS DBstandby
ElastiCachenode 2
Scaling the web tier
Availability Zone a
RDS DB instance
ElastiCachenode 1
Availability Zone b
S3 bucket forstatic assets
www.example.com
Amazon Route 53DNS service
Elastic LoadBalancing
Web server
Web server
RDS DBstandby
ElastiCachenode 2
Web server
Web server
Scaling the web tier
Availability Zone a
RDS DB instance
ElastiCachenode 1
Availability Zone b
S3 bucket forstatic assets
www.example.com
Amazon Route 53DNS service
Elastic LoadBalancing
Web server
Web server
RDS DBstandby
ElastiCachenode 2
Web server
Web server
Automatic resizing of compute clusters based on demand
Feature Details
Control Define minimum and maximum instance pool sizes and when scaling and cool down occurs.
Integrated to Amazon CloudWatch
Use metrics gathered by CloudWatch to drive scaling.
Instance types Run Auto Scaling for on-‐demand and Spot Instances. Compatible with VPC.
aws autoscaling create-‐auto-‐scaling-‐group-‐-‐auto-‐scaling-‐group-‐name MyGroup-‐-‐launch-‐configuration-‐name MyConfig-‐-‐min-‐size 4-‐-‐max-‐size 200-‐-‐availability-‐zones us-‐west-‐2c, us-‐west-‐2b
Auto ScalingTrigger auto-‐scaling policy
Amazon CloudWatch
Decompose into small, loosely coupled, stateless
building blocks
Prerequisite
What does this mean in practice?
• Only store transient data on local disk• Needs to persist beyond a single http request?– Then store it elsewhere
User uploads
User Sessions
Amazon S3
AWS DynamoDB
Application Data
Amazon RDS
Having decomposed into small, loosely coupled, stateless building blocks
You can now Scale out with ease
Having done that…
Having decomposed into small, loosely coupled, stateless building blocks
We can also Scale back with ease
Having done that…
Take the shortcut• While this architecture is simple you still need to deal with:
– Configuration details– Deploying code to multiple instances– Maintaining multiple environments (Dev, Test, Prod)– Maintain different versions of the application
• Solution: Use AWS Elastic Beanstalk
AWS Elastic Beanstalk (EB)• Easily deploy, monitor, and scale three-tier web applications and services.
• Infrastructure provisioned and managed by EB • You maintain control.• Preconfigured application containers • Easily customizable.• Support for these platforms:
Loose coupling with SQS
Tight coupling
• Place asynchronous tasks into Amazon SQS• SQS – buffer that protects backend systems• Process at own pace• Respond quickly to end users
SQS
Get Message
Back End EC2 Instance
Put Message
Front End EC2 Instance
Day 5 – Add more features
Mobile
PushNotifications
MobileAnalytics Cognito Cognito
Sync
Analytics
Kinesis DataPipelineRedShift EMR
Your Applications
AWS Global Infrastructure
Network
VPC DirectConnect Route 53
Storage
EBS S3 Glacier CloudFront
Database
DynamoDBRDS ElastiCache
Deployment & Management
ElasticBeanstalk OpsWorks Cloud
FormationCodeDeploy
CodePipeline
CodeCommit
Security & Administration
CloudWatch Config CloudTrail IAM Directory KMS
Application
SQS SWF AppStream
ElasticTranscoder SES Cloud
SearchSNS
Enterprise Applications
WorkSpaces WorkMail WorkDocs
Compute
EC2 ELB AutoScalingLambdaECS
AWS building blocksInherently Scalable & Highly Available Scalable & Highly Available
a Elastic Load Balancing
a Amazon CloudFront
a Amazon Route53
a Amazon S3
a Amazon SQS
a Amazon SES
a Amazon CloudSearch
a AWS Lambda
a …
a Amazon DynamoDB
a Amazon Redshift
a Amazon RDS
a Amazon Elasticache
a …
4 Amazon EC2
4 Amazon VPC
Automated Configurable With the right architecture
Stay focused as you scale your team
AWSCloud-‐Based
Infrastructure
YourBusiness
More Time to Focus onYour Business
Configuring Your Cloud Assets
70%
30%70%
On-‐PremiseInfrastructure
30%
Managing All of the “Undifferentiated Heavy Lifting”
Day 6 – Growing fast
Scaling Relational DBs• Increase RDS instance specs
– Larger instance type– More storage / more PIOPS
• Read Replicas (Master – Slave)– Scale out beyond capacity of single DB instance– Available in Amazon RDS for MySQL, PostgreSQL and Amazon Aurora– Replication lag– Writes => master– Reads with tolerance to stale data => read replica (slave)– Reads with need for most recent data => master
Scaling the DB
Web server
Web server
Web server
Web server
Availability Zone a
RDS DB instance
ElastiCachenode 1
Availability Zone b
S3 bucket forstatic assets
www.example.com
Amazon Route 53DNS service
Elastic LoadBalancing
RDS DBstandby
ElastiCachenode 2
Scaling the DB
Web server
Web server
Web server
Web server
Availability Zone a
RDS DB instance
ElastiCachenode 1
Availability Zone b
S3 bucket forstatic assets
www.example.com
Amazon Route 53DNS service
Elastic LoadBalancing
RDS DBstandby
ElastiCachenode 2
RDS read replica
Scaling the DB
Web server
Web server
Web server
Web server
Availability Zone a
RDS DB instance
ElastiCachenode 1
Availability Zone b
S3 bucket forstatic assets
www.example.com
Amazon Route 53DNS service
Elastic LoadBalancing
RDS DBstandby
ElastiCachenode 2
RDS read replica
RDS read replica
What if your app is write-heavy?Challenge: You will eventually hit the write throughput or storage limit of the master node
Solutions:• Federation (splitting into multiple DBs based on function)• Sharding (splitting one data set up across multiple hosts)
Database federation• Split up tables to smaller autonomous databases
• Harder to do cross-‐function queries• Essentially delaying the need for sharding
• Won’t help with single huge functions/tables
Forums DB
Users DB
Products DB
Sharded horizontal scaling• Each partition hosts a portion of the rows of a table
• More complex at the application layer
• ORM support can help• No practical limit on scalability• Operation complexity • Shard by key space• RDBMS or NoSQL
User ShardID
002345 A
002346 B
002347 C
002348 B
002349 A
Shard C
Shard B
Shard A
NoSQL data stores• Trade query & integrity features of Relational DBs for
– More flexible data model – Horizontal scalability & predictable performance
DynamoDBProvisioned read/write performance per table
Massive and Seamless Scale
• Distributed system that can scale both reads and writes– Sharding + Replicas
• Automatic & transparent partitioning:– Data set size growth– Provisioned capacity increases
table
Summary
Amazon Route 53DNS serviceNo limit
Availability Zone a
RDS DB instance
ElastiCachenode 2
Availability Zone b
S3 bucket forstatic assets
www.example.com
Elastic LoadBalancing
RDS DBstandby
ElastiCachenode 3
RDS read replica
RDS read replica
DynamoDB
RDS read replica
ElastiCachenode 4
RDS read replica
ElastiCachenode 1
CloudSearchLambdaSES SQS
A quick review• Keep it simple and stateless• Make use of managed self-scaling services• Multi-AZ and AutoScale your EC2 infrastructure• Use the right DB for each workload • Cache data at multiple levels• Simplify operations with deployment tools
Next steps?READ! • aws.amazon.com/documentation• aws.amazon.com/architecture• aws.amazon.com/start-ups
ASK FOR HELP!• forums.aws.amazon.com• aws.amazon.com/support
Performance testing @ JUST EAT(Or: DoS yourself every night in production to prove you can take it)
@justeat_tech + @petemouncehttp://tech.just-eat.com
Please wait while I start my DoS attack...(Demo - start fake load, show dashboards)
@justeat_tech + @petemouncehttp://tech.just-eat.com
The problem with performance tests & continuous delivery● Don’t want to sacrifice continuous delivery & decoupled teams
● Don’t want performance to suffer
All the usual problems:● Bottleneck through single environment● Individual tests take too long
@justeat_tech + @petemouncehttp://tech.just-eat.com
Why?Continuously test● performance● capacity
If we find a problem Thursday night:1. don’t run fake load over the weekend2. enjoy weekend as normal3. fix it next week with leisure
@justeat_tech + @petemouncehttp://tech.just-eat.com
Gamble!
OH: “We deploy tens of small changes a day. I bet we won’t break production...”
OH: “Let’s just do it in production with fake traffic at the same time as customers!”
@justeat_tech + @petemouncehttp://tech.just-eat.com
Not that much of a gamble, reallyWe have tight feedback loops at this point.
Engineers being on call... highly invested in not regressing performance.
@justeat_tech + @petemouncehttp://tech.just-eat.com
How?
Pick scenarios we care aboutPick data variations to exerciseAdd header(s) to discriminate fake load vs customer loadAnd then:● Run it every night during peak time● If no alerts fire, we’re good
@justeat_tech + @petemouncehttp://tech.just-eat.com
What did we gain?
Continuous confidence in capacity
@justeat_tech + @petemouncehttp://tech.just-eat.com
What did we gain?
Continuous confidence in dealing with spikes
@justeat_tech + @petemouncehttp://tech.just-eat.com
What did we gain?
Performance as a 1st-class concern
@justeat_tech + @petemouncehttp://tech.just-eat.com
What did we gain?
Tests become independent of environments’ data
@justeat_tech + @petemouncehttp://tech.just-eat.com
(Remind me to stop my DoS attack now)(Demo - stop fake load, show dashboards)
@justeat_tech + @petemouncehttp://tech.just-eat.com
Thank You@justeat_tech + @petemouncehttp://tech.just-eat.com
Yes, we’re recruiting too.http://tech.just-eat.com/jobs