Architecting to be Cloud Native Guest lecture at Dino Konstantopoulos’ BU MET CS755 Cloud Computing class 17-April-2014 (7:00 – 9:00 PM EDT) HELLO my name is Bill Wilder Aligning your application’s architecture with the architecture of the cloud… FTW! But the cloud is a friendly place for non- native apps too!
HELLO my name is. Architecting to be Cloud Native. Bill Wilder. Aligning your application’s architecture with the architecture of the cloud … FTW ! But the cloud is a friendly place for non-native apps too!. Guest lecture at Dino Konstantopoulos ’ BU MET CS755 Cloud Computing class - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Architecting to be Cloud Native
Guest lecture at Dino Konstantopoulos’ BU MET CS755 Cloud Computing class 17-April-2014 (7:00 – 9:00 PM EDT)
HELLOmy name isBill Wilder
Aligning your application’s architecture with the architecture
of the cloud… FTW!But the cloud is a friendly place for
1. You know what “the cloud” is2. You have an inkling about Amazon Web Services and
Windows Azure cloud platforms3. You understand that such cloud platforms include
compute services [like hosted virtual machines (VMs), in both IaaS and PaaS modes], SQL and NoSQL database services, file storage services, messaging, DNS, management, etc.
4. You are interested in understanding cloud-native applications and why that’s better than deploying my old-school app to the cloud “as is”
Roadmap for rest of talk… …
1. Lightning-fast overview of Windows Azure2. Cover three specific patterns for building
cloud-native applications3. Mention some other patterns along the way
• Q&A during talk is okay (time permitting)• Q&A at end with any remaining time• Okay to reach out through email or twitter
So Architecting for the (Windows Azure, AWS, GAE, …) Cloud is Different…
WHY DID THEY (Microsoft, Amazon, Google, …) DO THIS TO US?
But Why?
Know the rules
“If I had asked people what they wanted, they would have said faster horses.”
- Henry Ford
Faster h
orses w
ould not have
addressed th
e horse m
anure problem
…
late 1800s..
150k horses in
NYC
x 20 lbs m
anure/day/horse
= 3 millio
n lbs o
f manure per d
ay
Know the rules
“If I had asked IT departments what they wanted, they would have said IaaS.”
- Henry Cloud
Cloud Platform Characteristics• Scaling – or “resource allocation” – is horizontal
– and ∞ (“illusion of infinite resources”)
• Resources are easily added or released– self-service portal or API; cloud scaling is automatable
• Pay only for currently allocated resources– costs are operational, granular, controllable, and transparent
• Optimized for cost-efficiency– cloud services are MT, hardware is commodity– MTTR over MTTF
• Rich, robust functionality is simply accessible– like an iceberg
Cloud-Native Application Characteristics
• Application architecture is aligned with the cloud platform architecture–uses the platform in the most natural way– lets the platform do the heavy lifting
www.pageofphotos.com• Simple idea, simple app• Two-tiers: web tier (one server) + database• What’s the problem?
• But… what’s WRONG with this architecture?
• Different ≠ WRONG. Use the right tool for the job. Some apps are simply not good fit for cloud.
?
www.pageofphotos.com• Simple idea, simple app• Two-tiers: web tier (one server) + database• What can go wrong
• We’ll reexamine1. Scaling the web tier2. Scaling the service tier3. Scaling the data tier4. Handling failure5. Operational efficiency (scale the app, not the team!)
Horizontal Scaling Compute Pattern
pattern 1 of 3
What’s the difference between performance
and scale??
Common Terminology:Scaling Up/Down Vertical ScalingScaling Out/In Horizontal “Scaling” But really is Horizontal Resource Allocation
• Architectural Decision– Big decision… hard to change
Scale Up (and Scale Down??)vs. Horizontal Resourcing
Vertical Scaling (“Scaling Up”)
.
Resources that can be “Scaled Up”• Memory: speed, amount • CPU: speed, number of CPUs• Disk: speed, size, multiple controllers• Bandwidth: higher capacity pipe• … and it sure is EASY
Downsides of Scaling Up• Hard Upper Limit• HIGH END HARDWARE HIGH END CO$T• Lower value than “commodity hardware”• May have no other choice (architectural)
This is how the CLOUD works *and*This is how YOUR CLOUD-NATIVE APP WORKS
Load Balancer(Cloud Service)
Managed VMs(Cloud Service)
Example: Web Tier www.pageofphotos.com
1. Auto-Scale • Bidirectional
2. Nodes can fail• Auto-Scale is only one cause• Handle shutdown signals• Stateless (“like a taxi”)
vs. Sticky Sessions• Stateless nodes
vs. Stateless apps• N+1 rule
vs. occasional downtime (UX)
Horizontal Scaling Considerations
How many users does your cloud-native
application need before it needs to be able to
horizontally scale??
Queue-Centric Workflow Pattern
(QCW for short)
pattern 2 of 3
Extend www.pageofphotos.com example into Service Tier
• QCW enables applications where the UI and back-end services are Loosely Coupled
• (Compare to CQRS at end if there is interest)
QCW Example: User Uploads Photo www.pageofphotos.com
Web Server
Compute ServiceReliable Queue
Reliable Storage
QCW
WE NEED:• Compute (VM) resources to run our code
• Reliable Queue to communicate
• Durable/Persistent Storage
Where does Windows Azure fit?
QCW [on Windows Azure]
WE NEED:• Compute (VM) resources to run our code
Web Roles (IIS) and Worker Roles (w/o IIS)• Reliable Queue to communicate
Azure Storage Queues• Durable/Persistent Storage
Azure Storage Blobs & Tables; WASD
QCW on Azure: User Uploads a Photo
WebRole(IIS)
WorkerRoleAzure Queue
Azure Blob
UX implications: user does not wait for thumbnail(architecture!)
ww
w.p
ageo
fpho
tos.
com
push pull
QCW enables Responsive UX
• Response to interactive users is as fast as a work request can be persisted
• Time consuming work done asynchronously• Comparable total resource consumption, arguably
better subjective UX• UX challenge – how to express Async to users?
– Communicate Progress– Display Final results– Long Polling/Web Sockets (e.g., SignalR or Node.io)
QCW enables Scalable App
• Decoupled front/back provides insulation– Blocking is Bane of Scalability– Order processing partner doing maintenance– Twitter down– Email server unreachable– Internet connectivity interruption
• Loosely coupled, concern-independent scaling– (see next slide)– Get Scale Units right
–Key to optimizing operational CO$T$
General Case: Many Roles, Many Queues
WebRole(IIS)
WorkerRole
WebRole(IIS)
WebRole
(Public)
WorkerRoleWorker
RoleWorker
Role Type 1
WorkerRoleWorker
RoleWorkerRoleWorker
Role Type 2
Queue Type 1
Queue Type 2
Queue Type 1
Queue Type 2
Queue Type 3
• Scaling best when Investment α Benefit• Optimize for CO$T EFFICIENCY
• Logical vs. Physical Architecture depends on current scale
WorkerRole
Type 2
WorkerRole
Type 2
WorkerRole
Type 2
WebRole
(Admin)
Reliable Queue & 2-step Delete
(IIS)WebRole
WorkerRole
var url = “http://pageofphotos.blob.core.windows.net/up/<guid>.png”;queue.AddMessage( new CloudQueueMessage( url ) );
(… do some processing then …)queue.DeleteMessage( msg );
Queue
QCW requires Idempotent
• Perform idempotent operation more than once, end result same as if we did it once
• Example with Thumbnailing (easy case)• App-specific concerns dictate approaches
– Compensating action, Last write wins, etc.• PARTNERSHIP: division of responsibility
between cloud platform & app– Far cry from database transaction
QCW expects Poison Messages
• A Poison Message cannot be processed– Error condition for non-transient reason– Use dequeue count property
• Be proactive– Falling off the queue may kill your system
• Determine a Max Retry policy per queue– Delete, put on “bad” queue, alert human, …
QCW requires “Plan for Failure”
• VM restarts will happen– Hardware failure, O/S patching, crash (bug)
• Bake in handling of restarts into our apps– Restarts are routine: system “just keeps working”– Idempotent support needed important– Event Sourcing (commonly seen with CQRS) may
help• Not an exception case! Expect it!• Consider N+1 Rule
Typical Site Any 1 Role Inst Overall System
Operating System Upgrade
Application Code Update
Scale Up, Down, or In
Hardware Failure
Software Failure (Bug)
Security Patch
What’s Up? Reliability as EMERGENT PROPERTY
What about the DATA?
• You: Azure Web Roles and Azure Worker Roles– Taking user input, dispatching work, doing work– Follow a decoupled queue-in-the-middle pattern– Stateless compute nodes
• Cloud: “Hard Part”: persistent, scalable data– Azure Queue & Blob Services– Three copies of each byte– Blobs are geo-replicated– Busy Signal Pattern
Database Sharding Pattern
pattern 3 of 3
Database Sharding Pattern
Most Cloud Applications don’t care (much) about (very high) scale
But they do care about developer productivity and operational efficiency
pattern 3 of 3
foo.com
Site
-to-
Site
Virt
ual N
etw
ork
VNET in cloud, connected to on-prem
On-premdatabase
On-prem API
bar.com as Azure Cloud Service
TDS
(nati
ve S
QL
Serv
er
TCP-
base
d w
ire p
roto
col)
SOAP
/ R
EST
/ HT
TP
Azure Cloud
On-prem
Dev Team(Point-to-Site VPN from CoLo Router into Azure)
Off-site/Travel Dev Team(Point-to-Site VPN from laptop to Azure)
foo.com as Azure Web Site running CMS
dedicated MySQL Database to run CMS
bar.com
Global CDN
Public Internet
Blob Storag
e
Blob Stora
ge
Content Editing & Site Admin
Dev Team
Azure SQL Database (WASD)is SQL Server Except…
Common
SQL ServerSpecific(for now)
SQL DatabaseSpecific
“Just change the connection
string…”
• Full Text Search• Transparent Data
Encryption (TDE)• Many more…Limitations• You need to run it• Max VM size