B H A S K A R S U N K A R A A N D P E T E R A B R A M S
PERFORMANCE ON AMAZON AWS
INTRODUCTION
• Founded in April 2008 in San Francisco – Venture Funded • Founding Principles • The Move to the cloud presents a new set of challenges • New world - Constant Change (infrastructure, architecture,
code) • Existing management solutions not designed for constant
change • AppDynamics Value - Enable teams to operate business critical
applications in clouds and guarantee service performance • Working with Netflix since October 2009 • Oct. 2009 – 150 servers in private data center • May 2012 – 50 servers in data center, 8,000 servers in EC2 • AppDynamics is Netflix primary SLA management tool
AGENDA
Differences between Cloud and Physical Datacenter
Performance on AWS
Case Study on AWS
AppDynamics
CLOUD /PHYSICAL DATACENTER TH INGS HAVE CHANGED
EVERYTHING IS SHARED
• Shared/Virtualized Infrastructure • Shared services
Shared Services
The biggest public cloud !!
• S3 • SQS • SDB • EBS • EMR • …
INFRASTRUCTURE
• Machines come and go • High rate of change • Capacity is much cheaper • Capacity can be both increased and decreased • In minutes
• Cannot use physical dependencies anymore • E.g. static IP mapping between services
PERFORMANCE MONITORING
• Traditional monitoring : Measure • CPU and other hardware metrics • Code metrics – individual methods etc. • Scrape logs for errors etc. • Configured by hand
• Cloud Monitoring - Datacenter tools are a big pain ! • You were measuring CPU metrics for a bunch of machines • Now those are gone, and the new ones are up • Who is going to refresh your dashboards? • Who is going to clean up the dead instance data?
GOOD PERFORMANCE ON AWS?
(Re)architect your app to • Work on Amazon ! • Take advantage of all that it provides • Careful with shared services !
Pick the right performance monitoring tools !
Lets not forget managing capacity/cost !!
APPLICATION ARCHITECTURE
APPLICATION ARCHITECTURE
• Distributed • Service Oriented • Horizontally Scalable
AWS PERFORMANCE FACTSHEET –FROM A HEAVY DUTY USER - NETFLIX
IF YOU ARE USING SHARED SERVICES
• Measure service performance in isolation • Stress test the hell out of shared service calls • At minimum double of your peak load !
• Look for common patterns out there • e.g. Simple DB needs a cache frontend
• Avoid badly performing shared services • EBS?
PERFORMANCE MONITORING
ESTABL ISH A CRITER IA TO P ICK THE R IGHT TOOLS
1.HAS TO BE SERVICE ORIENTED
• Primarily monitor Services not Infrastructure ! • Focus on the application SLAs • Focus on the end user experience
Process Service Order
§ Response times § Load § Error rates § Trends
2. HAS TO BE DISTRIBUTED
• Tools need to measure health of tiers • Measuring individual servers does not make sense • Services are horizontally scalable
ec2-1 ec2-2
ec2-3
ec2-1 ec2-2
ec2-3 ec2-4
ec2-5
You need to know how the cluster/tier performs in terms of average utilization
3. HAS TO KEEP UP WITH RATE OF CHANGE
• Keep up with machines going up/down • Node are transient
• Provide a clean view of the current state • Clean up dead instances/services
• Maintain a baseline of how the overall tier does
ec2-1 ec2-2
ec2-3
ec2-22 ec2-23
ec2-24
4.CROSS SERVICE TRACING
• Becomes absolutely necessary for truly distributed apps • Should be able to drill down across services within
the context of a single user request • Should be able to analyze code in every service • Should be able to point out impact of using shared
services
CROSS SERVICE TRACING
IMPACT OF SHARED SERVICES
5.AUTODISCOVERY/LOW CONFIGURATION MAINTENANCE
• Cannot have configuration based discovery of new instances/services • Baking into AMIs etc. • Should auto-discover new tiers/services
• Cannot have code level configuration • Difficult to maintain with agile development
MANAGING COST – CASE STUDY
MANAGING COST - EISMANN
• Managing Capacity == Cost • The cloud isn’t free ! • Eismann • Frozen food delivery vendor in Germany • In-production on AWS • Has variable-capacity based on usage hours • Use application level SLAs to determine capacity • E.g. Process Order Volume == capacity of services on AWS
WHAT IS APPDYNAMICS?
• Fundamentally built for the Cloud • Handles constant change of infrastructure • Service oriented SLA management • Detailed – actionable information on service
performance for engineers, architects and operations • Zero to low configuration • No code configuration needed for visibility
Did I mention Eismann is fully deployed on AppDynamics and uses us for automatically managing capacity and SLAs !!