Top Banner
AWS Well-Architected Framework June 2018
68

AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

May 20, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

AWS Well-Architected FrameworkJune 2018

Page 2: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

Copyright © 2018 Amazon Web Services, Inc. or its affiliates

Notices

This document is provided for informational purposes only. It represents AWS’scurrent product offerings and practices as of the date of issue of this document, whichare subject to change without notice. Customers are responsible for making theirown independent assessment of the information in this document and any use ofAWS’s products or services, each of which is provided “as is” without warranty of anykind, whether express or implied. This document does not create any warranties,representations, contractual commitments, conditions or assurances from AWS,its affiliates, suppliers or licensors. The responsibilities and liabilities of AWS to itscustomers are controlled by AWS agreements, and this document is not part of, nordoes it modify, any agreement between AWS and its customers.

Page 3: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

Introduction ................................................................................................................................. 1Definitions ........................................................................................................................... 1On Architecture .................................................................................................................. 2General Design Principles ................................................................................................ 3

The Five Pillars of the Well-Architected Framework .......................................................... 5Operational Excellence ..................................................................................................... 5Security .............................................................................................................................. 10Reliability ........................................................................................................................... 17Performance Efficiency ................................................................................................... 22Cost Optimization ........................................................................................................... 29

The Review Process ................................................................................................................. 35Conclusion ................................................................................................................................. 38Contributors .............................................................................................................................. 39Further Reading ....................................................................................................................... 40Document Revisions ................................................................................................................ 42Appendix: Well-Architected Questions, Answers, and Best Practices ............................ 43

Operational Excellence ................................................................................................... 43Security .............................................................................................................................. 49Reliability ........................................................................................................................... 54Performance Efficiency ................................................................................................... 58Cost Optimization ........................................................................................................... 62

iii

Page 4: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

IntroductionThe AWS Well-Architected Framework helps you understand the pros and consof decisions you make while building systems on AWS. By using the Frameworkyou will learn architectural best practices for designing and operating reliable,secure, efficient, and cost-effective systems in the cloud. It provides a way for you toconsistently measure your architectures against best practices and identify areas forimprovement. The process for reviewing an architecture is a constructive conversationabout architectural decisions, and is not an audit mechanism. We believe that havingwell-architected systems greatly increases the likelihood of business success.

AWS Solutions Architects have years of experience architecting solutions across awide variety of business verticals and use cases. We have helped design and reviewthousands of customers’ architectures on AWS. From this experience, we haveidentified best practices and core strategies for architecting systems in the cloud.

The AWS Well-Architected Framework documents a set of foundational questionsthat allow you to understand if a specific architecture aligns well with cloud bestpractices. The framework provides a consistent approach to evaluating systemsagainst the qualities you expect from modern cloud-based systems, and theremediation that would be required to achieve those qualities. As AWS continuesto evolve, and we continue to learn more from working with our customers, we willcontinue to refine the definition of well-architected.

This paper is intended for those in technology roles, such as chief technology officers(CTOs), architects, developers, and operations team members. It describes AWS bestpractices and strategies to use when designing and operating a cloud workload, andprovides links to further implementation details and architectural patterns. For moreinformation, see the AWS Well-Architected homepage.

DefinitionsEvery day experts at AWS assist customers in architecting systems to take advantageof best practices in the cloud. We work with you on making architectural trade-offsas your designs evolve. As you deploy these systems into live environments, we learnhow well these systems perform and the consequences of those trade-offs.

Based on what we have learned we have created the AWS Well-ArchitectedFramework, which provides a consistent set of best practices for customers andpartners to evaluate architectures, and provides a set of questions you can use toevaluate how well an architecture is aligned to AWS best practices.

The AWS Well-Architected Framework is based on five pillars — operationalexcellence, security, reliability, performance efficiency, and cost optimization.

1

Page 5: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

The pillars of the AWS Well-Architected Framework

Pillar Name DescriptionOperational Excellence The ability to run and monitor systems to deliver

business value and to continually improve supportingprocesses and procedures.

Security The ability to protect information, systems, and assetswhile delivering business value through risk assessmentsand mitigation strategies.

Reliability The ability of a system to recover from infrastructureor service disruptions, dynamically acquire computingresources to meet demand, and mitigate disruptionssuch as misconfigurations or transient network issues.

Performance Efficiency The ability to use computing resources efficientlyto meet system requirements, and to maintain thatefficiency as demand changes and technologies evolve.

Cost Optimization The ability to run systems to deliver business value atthe lowest price point.

When architecting solutions you make trade-offs between pillars based upon yourbusiness context. These business decisions can drive your engineering priorities.You might optimize to reduce cost at the expense of reliability in developmentenvironments, or, for mission-critical solutions, you might optimize reliability withincreased costs. In ecommerce solutions, performance can affect revenue andcustomer propensity to buy. Security and operational excellence are generally nottraded-off against the other pillars.

On ArchitectureIn on-premises environments customers often have a central team for technologyarchitecture that acts as an overlay to other product or feature teams to ensure theyare following best practice. Technology architecture teams are often composed of aset of roles such as Technical Architect (infrastructure), Solutions Architect (software),Data Architect, Networking Architect, and Security Architect. Often these teams useTOGAF or the Zachman Framework as part of an enterprise architecture capability.

At AWS, we prefer to distribute capabilities into teams rather than having acentralized team with that capability. There are risks when you choose to distributedecision making authority, for example, ensuring that teams are meeting internalstandards. We mitigate these risks in two ways. First, we have practices 1 that focus onenabling each team to have that capability, and we put in place experts who ensure

1Ways of doing things, process, standards, and accepted norms.

2

Page 6: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

that teams raise the bar on the standards they need to meet. Second, we put in placemechanisms 2 that carry out automated checks to ensure standards are being met.This distributed approach is supported by the Amazon leadership principles, andestablishes a culture across all roles that works back 3 from the customer. Customer-obsessed teams build products in response to a customer need.

For architecture this means that we expect every team to have the capability to createarchitectures and to follow best practices. To help new teams gain these capabilitiesor existing teams to raise their bar, we enable access to a virtual community ofprincipal engineers who can review their designs and help them understand whatAWS best practices are. The principal engineering community works to make bestpractices visible and accessible. One way they do this, for example, is throughlunchtime talks that focus on applying best practices to real examples. These talks arerecorded and can be used as part of onboarding materials for new team members.

AWS best practices emerge from our experience running thousands of systems atinternet scale. We prefer to use data to define best practice, but we also use subjectmatter experts like principal engineers to set them. As principal engineers see newbest practices emerge they work as a community to ensure that teams follow them.In time, these best practices are formalized into our internal review processes, aswell as into mechanisms that enforce compliance. Well-Architected is the customer-facing implementation of our internal review process, where we have codified ourprincipal engineering thinking across field roles like Solutions Architecture andinternal engineering teams. Well-Architected is a scalable mechanism that lets youtake advantage of these learnings.

By following the approach of a principal engineering community with distributedownership of architecture, we believe that a Well-Architected enterprise architecturecan emerge that is driven by customer need. Technology leaders (such as a CTOsor development managers), carrying out Well-Architected reviews across all yourworkloads will allow you to better understand the risks in your technology portfolio.Using this approach you can identify themes across teams that your organizationcould address by mechanisms, trainings, or lunchtime talks where your principalengineers can share their thinking on specific areas with multiple teams.

General Design PrinciplesThe Well-Architected Framework identifies a set of general design principles tofacilitate good design in the cloud:

2 “Good intentions never work, you need good mechanisms to make anything happen” Jeff Bezos. Thismeans replacing humans best efforts with mechanisms (often automated) that check for compliance withrules or process.3Working backward is a fundamental part of our innovation process. We start with the customer and whatthey want, and let that define and guide our efforts.

3

Page 7: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

• Stop guessing your capacity needs: Eliminate guessing about your infrastructurecapacity needs. When you make a capacity decision before you deploy a system,you might end up sitting on expensive idle resources or dealing with theperformance implications of limited capacity. With cloud computing, theseproblems can go away. You can use as much or as little capacity as you need, andscale up and down automatically.

• Test systems at production scale: In the cloud, you can create a production-scaletest environment on demand, complete your testing, and then decommission theresources. Because you only pay for the test environment when it's running, you cansimulate your live environment for a fraction of the cost of testing on premises.

• Automate to make architectural experimentation easier: Automation allows youto create and replicate your systems at low cost and avoid the expense of manualeffort. You can track changes to your automation, audit the impact, and revert toprevious parameters when necessary.

• Allow for evolutionary architectures: Allow for evolutionary architectures. In atraditional environment, architectural decisions are often implemented as static,one-time events, with a few major versions of a system during its lifetime. As abusiness and its context continue to change, these initial decisions might hinderthe system’s ability to deliver changing business requirements. In the cloud, thecapability to automate and test on demand lowers the risk of impact from designchanges. This allows systems to evolve over time so that businesses can takeadvantage of innovations as a standard practice.

• Drive architectures using data: In the cloud you can collect data on how yourarchitectural choices affect the behavior of your workload. This lets you makefact-based decisions on how to improve your workload. Your cloud infrastructureis code, so you can use that data to inform your architecture choices andimprovements over time.

• Improve through game days: Test how your architecture and processes perform byregularly scheduling game days to simulate events in production. This will help youunderstand where improvements can be made and can help develop organizationalexperience in dealing with events.

4

Page 8: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

The Five Pillars of the Well-ArchitectedFrameworkCreating a software system is a lot like constructing a building. If the foundationis not solid structural problems can undermine the integrity and function of thebuilding. When architecting technology solutions, if you neglect the five pillarsof operational excellence, security, reliability, performance efficiency, and costoptimization it can become challenging to build a system that delivers on yourexpectations and requirements. Incorporating these pillars into your architecture willhelp you produce stable and efficient systems. This will allow you to focus on theother aspects of design, such as functional requirements.

Operational ExcellenceThe Operational Excellence pillar includes the ability to run and monitor systemsto deliver business value and to continually improve supporting processes andprocedures.

The operational excellence pillar provides an overview of design principles, bestpractices, and questions. You can find prescriptive guidance on implementation in theOperational Excellence Pillar whitepaper.

Design PrinciplesThere are six design principles for operational excellence in the cloud:

• Perform operations as code: In the cloud, you can apply the same engineeringdiscipline that you use for application code to your entire environment. You candefine your entire workload (applications, infrastructure) as code and update it withcode. You can script your operations procedures and automate their execution bytriggering them in response to events. By performing operations as code, you limithuman error and enable consistent responses to events.

• Annotate documentation: In an on-premises environment, documentation iscreated by hand, used by people, and hard to keep in sync with the pace of change.In the cloud, you can automate the creation of annotated documentation afterevery build (or automatically annotate hand-crafted documentation). Annotateddocumentation can be used by people and systems. Use annotations as an input toyour operations code.

• Make frequent, small, reversible changes: Design workloads to allow componentsto be updated regularly. Make changes in small increments that can be reversed ifthey fail (without affecting customers when possible).

5

Page 9: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

• Refine operations procedures frequently: As you use operations procedures,look for opportunities to improve them. As you evolve your workload, evolve yourprocedures appropriately. Set up regular game days to review and validate that allprocedures are effective and that teams are familiar with them.

• Anticipate failure: Perform “pre-mortem” exercises to identify potential sourcesof failure so that they can be removed or mitigated. Test your failure scenariosand validate your understanding of their impact. Test your response procedures toensure that they are effective, and that teams are familiar with their execution. Setup regular game days to test workloads and team responses to simulated events.

• Learn from all operational failures: Drive improvement through lessons learnedfrom all operational events and failures. Share what is learned across teams andthrough the entire organization.

DefinitionThere are three best practice areas for operational excellence in the cloud:

1. Prepare

2. Operate

3. Evolve

Operations teams need to understand their business and customer needs so theycan effectively and efficiently support business outcomes. Operations creates anduses procedures to respond to operational events and validates their effectivenessto support business needs. Operations collects metrics that are used to measure theachievement of desired business outcomes. Everything continues to changes—yourbusiness context, business priorities, customer needs, etc. It's important to designoperations to support evolution over time in response to change and to incorporatelessons learned through their performance.

Best Practices

Prepare

Effective preparation is required to drive operational excellence. Business successis enabled by shared goals and understanding across the business, development,and operations. Common standards simplify workload design and management,enabling operational success. Design workloads with mechanisms to monitor and gaininsight into application, platform, and infrastructure components, as well as customerexperience and behavior.

6

Page 10: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

Create mechanisms to validate that workloads, or changes, are ready to be movedinto production and supported by operations. Operational readiness is validatedthrough checklists to ensure a workload meets defined standards and that requiredprocedures are adequately captured in runbooks and playbooks. Validate thatthere are sufficient trained personnel to effectively support the workload. Prior totransition, test responses to operational events and failures. Practice responses insupported environments through failure injection and game day events.

AWS enables operations as code in the cloud and the ability to safely experiment,develop operations procedures, and practice failure. Using AWS CloudFormationenables you to have consistent, templated, sandbox development, test, andproduction environments with increasing levels of operations control. AWS enablesvisibility into your workloads at all layers through various log collection andmonitoring features. Data on use of resources, application programming interfaces(APIs), and network flow logs can be collected using Amazon CloudWatch, AWSCloudTrail, and VPC Flow Logs. You can use the collectd plugin, or the CloudWatchLogs agent, to aggregate information about the operating system into CloudWatch.

The following questions focus on these considerations for operational excellence.(For a list of operational excellence questions, answers, and best practices, see theAppendix.)

OPS 1:  What factors drive your operational priorities?OPS 2:  How do you design your workload to enable operability?OPS 3:  How do you know that you are ready to support a workload?

Implement the minimum number of architecture standards for your workloads.Balance the cost to implement a standard against the benefit to the workload andthe burden upon operations. Reduce the number of supported standards to reducethe chance that lower-than-acceptable standards will be applied by error. Operationspersonnel are often constrained resources.

Invest in scripting operations activities to maximize the productivity of operationspersonnel, minimize error rates, and enable automated responses. Adopt deploymentpractices that take advantage of the elasticity of the cloud to facilitate pre-deployment of systems for faster implementations.

Operate

Successful operation of a workload is measured by the achievement of businessand customer outcomes. Define expected outcomes, determine how success willbe measured, and identify the workload and operations metrics that will be usedin those calculations to determine if operations are successful. Consider thatoperational health includes both the health of the workload and the health andsuccess of the operations acting upon the workload (for example, deployment and

7

Page 11: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

incident response). Establish baselines from which improvement or degradation ofoperations will be identified, collect and analyze your metrics, and then validate yourunderstanding of operations success and how it changes over time. Use collectedmetrics to determine if you are satisfying customer and business needs, and identifyareas for improvement.

Efficient and effective management of operational events is required to achieveoperational excellence. This applies to both planned and unplanned operationalevents. Use established runbooks for well-understood events, and use playbooks toaid in the resolution of other events. Prioritize responses to events based on theirbusiness and customer impact. Ensure that if an alert is raised in response to an event,there is an associated process to be executed, with a specifically identified owner.Define in advance the personnel required to resolve an event and include escalationtriggers to engage additional personnel, as it becomes necessary, based on impact(that is, duration, scale, and scope). Identify and engage individuals with the authorityto decide on courses of action where there will be a business impact from an eventresponse not previously addressed.

Communicate the operational status of workloads through dashboards andnotifications that are tailored to the target audience (for example, customer, business,developers, operations) so that they may take appropriate action, so that theirexpectations are managed, and so that they are informed when normal operationsresume.

Determine the root cause of unplanned events and unexpected impacts fromplanned events. This information will be used to update your procedures to mitigatefuture occurrence of events. Communicate root cause with affected communities asappropriate.

In AWS, you can generate dashboard views of your metrics collected from workloadsand natively from AWS. You can leverage CloudWatch or third-party applications toaggregate and present business, workload, and operations level views of operationsactivities. AWS provides workload insights through logging capabilities includingAWS X-Ray, CloudWatch, CloudTrail, and VPC Flow Logs enabling the identification ofworkload issues in support of root cause analysis and remediation.

The following questions focus on these considerations for operational excellence.

OPS 4:  What factors drive your understanding of operational health?OPS 5:  How do you manage operational events?

Routine operations, as well as responses to unplanned events, should be automated.Manual processes for deployments, release management, changes, and rollbacksshould be avoided. Releases should not be large batches that are done infrequently.Rollbacks are more difficult in large changes. Failing to have a rollback plan, orthe ability to mitigate failure impacts, will prevent continuity of operations. Align

8

Page 12: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

metrics to business needs so that responses are effective at maintaining businesscontinuity. One-time decentralized metrics with manual responses will result ingreater disruption to operations during unplanned events.

Evolve

Evolution of operations is required to sustain operational excellence. Dedicatework cycles to making continuous incremental improvements. Regularly evaluateand prioritize opportunities for improvement (for example, feature requests,issue remediation, and compliance requirements), including both the workloadand operations procedures. Include feedback loops within your procedures torapidly identify areas for improvement and capture learnings from the execution ofoperations.

Share lessons learned across teams to share the benefits of those lessons. Analyzetrends within lessons learned and perform cross-team retrospective analysisof operations metrics to identify opportunities and methods for improvement.Implement changes intended to bring about improvement and evaluate the results todetermine success.

With AWS Developer Tools you can implement continuous delivery build, test, anddeployment activities that work with a variety of source code, build, testing, anddeployment tools from AWS and third parties. The results of deployment activitiescan be used to identify opportunities for improvement for both deployment anddevelopment. You can perform analytics on your metrics data integrating datafrom your operations and deployment activities, to enable analysis of the impact ofthose activities against business and customer outcomes. This data can be leveragedin cross-team retrospective analysis to identify opportunities and methods forimprovement.

The following questions focus on these considerations for operational excellence.

OPS 6:  How do you evolve operations?

Successful evolution of operations is founded in: frequent small improvements;providing safe environments and time to experiment, develop, and testimprovements; and environments in which learning from failures is encouraged.Operations support for sandbox, development, test, and production environments,with increasing level of operational controls, facilitates development and increasesthe predictability of successful results from changes deployed into production.

Key AWS ServicesThe AWS service that is essential to Operational Excellence is AWS CloudFormation,which you can use to create templates based on best practices. This enables you

9

Page 13: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

to provision resources in an orderly and consistent fashion from your developmentthrough production environments. The following services and features support thethree areas in operational excellence:

• Prepare: AWS Config and AWS Config rules can be used to create standards forworkloads and to determine if environments are compliant with those standardsbefore being put into production.

• Operate: Amazon CloudWatch allows you to monitor the operational health of aworkload.

• Evolve: Amazon Elasticsearch Service (Amazon ES) allows you to analyze your logdata to gain actionable insights quickly and securely.

ResourcesRefer to the following resources to learn more about our best practices forOperational Excellence.

Documentation

• DevOps and AWS

Whitepaper

• Operational Excellence Pillar

Video

• DevOps at Amazon

SecurityThe Security pillar includes the ability to protect information, systems, and assetswhile delivering business value through risk assessments and mitigation strategies.

The security pillar provides an overview of design principles, best practices, andquestions. You can find prescriptive guidance on implementation in the Security Pillarwhitepaper.

Design PrinciplesThere are seven design principles for security in the cloud:

• Implement a strong identity foundation: Implement the principle of least privilegeand enforce separation of duties with appropriate authorization for each interaction

10

Page 14: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

with your AWS resources. Centralize privilege management and reduce or eveneliminate reliance on long-term credentials.

• Enable traceability: Monitor, alert, and audit actions and changes to yourenvironment in real time. Integrate logs and metrics with systems to automaticallyrespond and take action.

• Apply security at all layers: Rather than just focusing on protection of a singleouter layer, apply a defense-in-depth approach with other security controls.Apply to all layers (e.g., edge network, VPC, subnet, load balancer, every instance,operating system, and application).

• Automate security best practices: Automated software-based security mechanismsimprove your ability to securely scale more rapidly and cost effectively. Createsecure architectures, including the implementation of controls that are defined andmanaged as code in version-controlled templates.

• Protect data in transit and at rest: Classify your data into sensitivity levels anduse mechanisms, such as encryption, tokenization, and access control whereappropriate.

• Keep people away from data: Create mechanisms and tools to reduce or eliminatethe need for direct access or manual processing of data. This reduces the risk of lossor modification and human error when handling sensitive data.

• Prepare for security events: Prepare for an incident by having an incidentmanagement process that aligns to your organizational requirements. Run incidentresponse simulations and use tools with automation to increase your speed fordetection, investigation, and recovery.

DefinitionThere are five best practice areas for security in the cloud:

1. Identity and Access Management

2. Detective Controls

3. Infrastructure Protection

4. Data Protection

5. Incident Response

Before you architect any system, you need to put in place practices that influencesecurity. You will want to control who can do what. In addition, you want to be able

11

Page 15: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

to identify security incidents, protect your systems and services, and maintain theconfidentiality and integrity of data through data protection. You should have awell-defined and practiced process for responding to security incidents. These toolsand techniques are important because they support objectives such as preventingfinancial loss or complying with regulatory obligations.

The AWS Shared Responsibility Model enables organizations that adopt the cloudto achieve their security and compliance goals. Because AWS physically secures theinfrastructure that supports our cloud services, as an AWS customer you can focus onusing services to accomplish your goals. The AWS Cloud also provides greater accessto security data and an automated approach to responding to security events.

Best Practices

Identity and Access Management

Identity and access management are key parts of an information security program,ensuring that only authorized and authenticated users are able to access yourresources, and only in a manner that you intend. For example, you should defineprincipals (that is, users, groups, services, and roles that take action in your account),build out policies aligned with these principals, and implement strong credentialmanagement. These privilege-management elements form the core of authenticationand authorization.

In AWS, privilege management is primarily supported by the AWS Identity and AccessManagement (IAM) service, which allows you to control user and programmatic accessto AWS services and resources. You should apply granular policies, which assignpermissions to a user, group, role, or resource. You also have the ability to requirestrong password practices, such as complexity level, avoiding re-use, and enforcingmulti-factor authentication (MFA). You can use federation with your existing directoryservice. For workloads that require systems to have access to AWS, IAM enables secureaccess through roles, instance profiles, identity federation, and temporary credentials.

The following questions focus on these considerations for security. (For a list ofsecurity questions, answers, and best practices, see the Appendix.)

SEC 1:  How do you manage credentials for your workload?SEC 2:  How do you control human access to services?SEC 3:  How do you control programmatic access to services?

Credentials must not be shared between any user or system. User access should begranted using a least-privilege approach with best practices including passwordrequirements and MFA enforced. Programmatic access including API calls to AWSservices should be performed using temporary and limited-privilege credentials suchas those issued by the AWS Security Token Service (STS).

12

Page 16: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

Detective Controls

You can use detective controls to identify a potential security threat or incident. Theyare an essential part of governance frameworks and can be used to support a qualityprocess, a legal or compliance obligation, and for threat identification and responseefforts. There are different types of detective controls. For example, conducting aninventory of assets and their detailed attributes promotes more effective decisionmaking (and lifecycle controls) to help establish operational baselines. You can alsouse internal auditing, an examination of controls related to information systems,to ensure that practices meet policies and requirements and that you have setthe correct automated alerting notifications based on defined conditions. Thesecontrols are important reactive factors that can help your organization identify andunderstand the scope of anomalous activity.

In AWS, you can implement detective controls by processing logs, events, andmonitoring that allows for auditing, automated analysis, and alarming. CloudTraillogs, AWS API calls, and CloudWatch provide monitoring of metrics with alarming,and AWS Config provides configuration history. Amazon GuardDuty is a managedthreat detection service that continuously monitors for malicious or unauthorizedbehavior to help you protect your AWS accounts and workloads. Service-level logs arealso available, for example, you can use Amazon Simple Storage Service (Amazon S3)to log access requests.

The following questions focus on these considerations for security.

SEC 4:  How are you aware of security events in your workload?

Log management is important to a well-architected design for reasons ranging fromsecurity or forensics to regulatory or legal requirements. It is critical that you analyzelogs and respond to them so that you can identify potential security incidents. AWSprovides functionality that makes log management easier to implement by giving youthe ability to define a data-retention lifecycle or define where data will be preserved,archived, or eventually deleted. This makes predictable and reliable data handlingsimpler and more cost effective.

Infrastructure Protection

Infrastructure protection encompasses control methodologies, such as defense indepth, necessary to meet best practices and organizational or regulatory obligations.Use of these methodologies is critical for successful, ongoing operations in either thecloud or on-premises.

In AWS, you can implement stateful and stateless packet inspection, either by usingAWS-native technologies or by using partner products and services available throughthe AWS Marketplace. You should use Amazon Virtual Private Cloud (Amazon VPC)

13

Page 17: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

to create a private, secured, and scalable environment in which you can define yourtopology—including gateways, routing tables, and public and private subnets.

The following questions focus on these considerations for security.

SEC 5:  How do you protect your networks?SEC 6:  How do you stay up to date with AWS security features and industry securitythreats?SEC 7:  How do you protect your compute resources?

Multiple layers of defense are advisable in any type of environment. In the case ofinfrastructure protection, many of the concepts and methods are valid across cloudand on-premises models. Enforcing boundary protection, monitoring points of ingressand egress, and comprehensive logging, monitoring, and alerting are all essential toan effective information security plan.

AWS customers are able to tailor, or harden, the configuration of an Amazon ElasticCompute Cloud (Amazon EC2), Amazon EC2 Container Service (Amazon ECS)container, or AWS Elastic Beanstalk instance, and persist this configuration to animmutable Amazon Machine Image (AMI). Then, whether triggered by Auto Scaling orlaunched manually, all new virtual servers (instances) launched with this AMI receivethe hardened configuration.

Data Protection

Before architecting any system, foundational practices that influence securityshould be in place. For example, data classification provides a way to categorizeorganizational data based on levels of sensitivity, and encryption protects data byway of rendering it unintelligible to unauthorized access. These tools and techniquesare important because they support objectives such as preventing financial loss orcomplying with regulatory obligations.

In AWS, the following practices facilitate protection of data:

• As an AWS customer you maintain full control over your data.

• AWS makes it easier for you to encrypt your data and manage keys, includingregular key rotation, which can be easily automated by AWS or maintained by you.

• Detailed logging that contains important content, such as file access and changes,is available.

• AWS has designed storage systems for exceptional resiliency. For example, AmazonS3 Standard, S3 Standard–IA, S3 One Zone-IA, and Amazon Glacier are all designedto provide 99.999999999% durability of objects over a given year. This durabilitylevel corresponds to an average annual expected loss of 0.000000001% of objects.

14

Page 18: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

• Versioning, which can be part of a larger data lifecycle management process, canprotect against accidental overwrites, deletes, and similar harm.

• AWS never initiates the movement of data between Regions. Content placed in aRegion will remain in that Region unless you explicitly enable a feature or leveragea service that provides that functionality.

The following questions focus on these considerations for security.

SEC 8:  How do you classify your data?SEC 9:  How do you manage data protection mechanisms?SEC 10:  How do you protect your data at rest?SEC 11:  How do you protect your data in transit?

AWS provides multiple means for encrypting data at rest and in transit. We buildfeatures into our services that make it easier to encrypt your data. For example, wehave implemented server-side encryption (SSE) for Amazon S3 to make it easierfor you to store your data in an encrypted form. You can also arrange for the entireHTTPS encryption and decryption process (generally known as SSL termination) to behandled by Elastic Load Balancing (ELB).

Incident Response

Even with extremely mature preventive and detective controls, your organizationshould still put processes in place to respond to and mitigate the potential impact ofsecurity incidents. The architecture of your workload strongly affects the ability ofyour teams to operate effectively during an incident, to isolate or contain systems,and to restore operations to a known good state. Putting in place the tools and accessahead of a security incident, then routinely practicing incident response throughgame days, will help you ensure that your architecture can accommodate timelyinvestigation and recovery.

In AWS, the following practices facilitate effective incident response:

• Detailed logging is available that contains important content, such as file accessand changes.

• Events can be automatically processed and trigger tools that automate responsesthrough the use of AWS APIs.

• You can pre-provision tooling and a “clean room” using AWS CloudFormation. Thisallows you to carry out forensics in a safe, isolated environment.

The following questions focus on these considerations for security.

SEC 12:  How do you prepare to respond to an incident?

15

Page 19: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

Ensure that you have a way to quickly grant access for your InfoSec team, andautomate the isolation of instances as well at the capturing of data and state forforensics.

Key AWS ServicesThe AWS service that is essential to Security is AWS Identity and AccessManagement (IAM), which allows you to securely control access to AWS services andresources for your users. The following services and features support the five areas insecurity:

• Identity and Access Management: IAM enables you to securely control access toAWS services and resources. MFA adds an additional layer of protection on useraccess. AWS Organizations lets you centrally manage and enforce policies formultiple AWS accounts.

• Detective Controls: AWS CloudTrail records AWS API calls, AWS Config provides adetailed inventory of your AWS resources and configuration. Amazon GuardDutyis a managed threat detection service that continuously monitors for maliciousor unauthorized behavior. Amazon CloudWatch is a monitoring service for AWSresources which can trigger CloudWatch Events to automate security responses.

• Infrastructure Protection: Amazon Virtual Private Cloud (Amazon VPC) enablesyou to launch AWS resources into a virtual network that you've defined. AmazonCloudFront is a global content delivery network that securely delivers data,videos, applications, and APIs to your viewers which integrates with AWS Shieldfor DDoS mitigation. AWS WAF is a web application firewall that is deployed oneither Amazon CloudFront or Application Load Balancer to help protect your webapplications from common web exploits.

• Data Protection: Services such as ELB, Amazon Elastic Block Store (Amazon EBS),Amazon S3, and Amazon Relational Database Service (Amazon RDS) includeencryption capabilities to protect your data in transit and at rest. Amazon Macieautomatically discovers, classifies and protects sensitive data, while AWS KeyManagement Service (AWS KMS) makes it easy for you to create and control keysused for encryption.

• Incident Response: IAM should be used to grant appropriate authorization toincident response teams and response tools. AWS CloudFormation can be used tocreate a trusted environment or clean room for conducting investigations. AmazonCloudWatch Events allows you to create rules that trigger automated responsesincluding AWS Lambda.

ResourcesRefer to the following resources to learn more about our best practices for Security.

16

Page 20: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

Documentation

• AWS Cloud Security

• AWS Compliance

• AWS Security Blog

Whitepaper

• Security Pillar

• AWS Security Overview

• AWS Security Best Practices

• AWS Risk and Compliance

Video

• AWS Security State of the Union

• Shared Responsibility Overview

ReliabilityThe Reliability pillar includes the ability of a system to recover from infrastructure orservice disruptions, dynamically acquire computing resources to meet demand, andmitigate disruptions such as misconfigurations or transient network issues.

The reliability pillar provides an overview of design principles, best practices, andquestions. You can find prescriptive guidance on implementation in the ReliabilityPillar whitepaper.

Design PrinciplesThere are five design principles for reliability in the cloud:

• Test recovery procedures: In an on-premises environment, testing is oftenconducted to prove the system works in a particular scenario. Testing is nottypically used to validate recovery strategies. In the cloud, you can test howyour system fails, and you can validate your recovery procedures. You can useautomation to simulate different failures or to recreate scenarios that led to failuresbefore. This exposes failure pathways that you can test and rectify before a realfailure scenario, reducing the risk of components failing that have not been testedbefore.

17

Page 21: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

• Automatically recover from failure: By monitoring a system for key performanceindicators (KPIs), you can trigger automation when a threshold is breached. Thisallows for automatic notification and tracking of failures, and for automatedrecovery processes that work around or repair the failure. With more sophisticatedautomation, it's possible to anticipate and remediate failures before they occur.

• Scale horizontally to increase aggregate system availability: Replace one largeresource with multiple small resources to reduce the impact of a single failure onthe overall system. Distribute requests across multiple, smaller resources to ensurethat they don’t share a common point of failure.

• Stop guessing capacity: A common cause of failure in on-premises systems isresource saturation, when the demands placed on a system exceed the capacity ofthat system (this is often the objective of denial of service attacks). In the cloud,you can monitor demand and system utilization, and automate the addition orremoval of resources to maintain the optimal level to satisfy demand without over-or under- provisioning.

• Manage change in automation: Changes to your infrastructure should be doneusing automation. The changes that need to be managed are changes to theautomation.

DefinitionThere are three best practice areas for reliability in the cloud:

1. Foundations

2. Change Management

3. Failure Management

To achieve reliability, a system must have a well-planned foundation and monitoringin place, with mechanisms for handling changes in demand or requirements. Thesystem should be designed to detect failure and automatically heal itself.

Best Practices

Foundations

Before architecting any system, foundational requirements that influence reliabilityshould be in place. For example, you must have sufficient network bandwidth to yourdata center. These requirements are sometimes neglected (because they are beyonda single project’s scope). This neglect can have a significant impact on the abilityto deliver a reliable system. In an on-premises environment, these requirements

18

Page 22: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

can cause long lead times due to dependencies and therefore must be incorporatedduring initial planning.

With AWS, most of these foundational requirements are already incorporated ormay be addressed as needed. The cloud is designed to be essentially limitless, so itis the responsibility of AWS to satisfy the requirement for sufficient networking andcompute capacity, while you are free to change resource size and allocation, such asthe size of storage devices, on demand.

The following questions focus on these considerations for reliability. (For a list ofreliability questions, answers, and best practices, see the Appendix.)

REL 1:  How are you managing AWS service limits for your accounts?REL 2:  How do you plan your network topology on AWS?

AWS sets service limits (an upper limit on the number of each resource your team canrequest) to protect you from accidentally over-provisioning resources. You will needto have governance and processes in place to monitor and change these limits tomeet your business needs. As you adopt the cloud, you may need to plan integrationwith existing on-premises resources (a hybrid approach). A hybrid model enables thegradual transition to an all-in cloud approach over time. Therefore, it’s important tohave a design for how your AWS and on-premises resources will interact as a networktopology.

Change Management

Being aware of how change affects a system allows you to plan proactively, andmonitoring allows you to quickly identify trends that could lead to capacity issues orSLA breaches. In traditional environments, change-control processes are often manualand must be carefully coordinated with auditing to effectively control who makeschanges and when they are made.

Using AWS, you can monitor the behavior of a system and automate the response toKPIs, for example, by adding additional servers as a system gains more users. You cancontrol who has permission to make system changes and audit the history of thesechanges.

The following questions focus on these considerations for reliability.

REL 3:  How does your system adapt to changes in demand?REL 4:  How do you monitor AWS resources?REL 5:  How do you implement change?

When you architect a system to automatically add and remove resources in responseto changes in demand, this not only increases reliability but also ensures that business

19

Page 23: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

success doesn't become a burden. With monitoring in place, your team will beautomatically alerted when KPIs deviate from expected norms. Automatic loggingof changes to your environment allows you to audit and quickly identify actions thatmight have impacted reliability. Controls on change management ensure that you canenforce the rules that deliver the reliability you need.

Failure Management

In any system of reasonable complexity it is expected that failures will occur. It isgenerally of interest to know how to become aware of these failures, respond tothem, and prevent them from happening again.

With AWS, you can take advantage of automation to react to monitoring data. Forexample, when a particular metric crosses a threshold, you can trigger an automatedaction to remedy the problem. Also, rather than trying to diagnose and fix a failedresource that is part of your production environment, you can replace it with a newone and carry out the analysis on the failed resource out of band. Since the cloudenables you to stand up temporary versions of a whole system at low cost, you canuse automated testing to verify full recovery processes.

The following questions focus on these considerations for reliability.

REL 6:  How do you back up data?REL 7:  How does your system withstand component failures?REL 8:  How do you test resilience?REL 9:  How do you plan for disaster recovery?

Regularly back up your data and test your backup files to ensure you can recoverfrom both logical and physical errors. A key to managing failure is the frequent andautomated testing of systems to cause failure, and then observe how they recover.Do this on a regular schedule and ensure that such testing is also triggered aftersignificant system changes Actively track KPIs, such as the recovery time objective(RTO) and recovery point objective (RPO), to assess a system’s resiliency (especiallyunder failure-testing scenarios). Tracking KPIs will help you identify and mitigatesingle points of failure. The objective is to thoroughly test your system-recoveryprocesses so that you are confident that you can recover all your data and continue toserve your customers, even in the face of sustained problems. Your recovery processesshould be as well exercised as your normal production processes.

Key AWS ServicesThe AWS service that is essential to Reliability is Amazon CloudWatch, whichmonitors runtime metrics. The following services and features support the three areasin reliability:

20

Page 24: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

• Foundations: IAM enables you to securely control access to AWS services andresources. Amazon VPC lets you provision a private, isolated section of the AWSCloud where you can launch AWS resources in a virtual network. AWS TrustedAdvisor provides visibility into service limits. AWS Shield is a managed DistributedDenial of Service (DDoS) protection service that safeguards web applicationsrunning on AWS.

• Change Management: AWS CloudTrail records AWS API calls for your account anddelivers log files to you for auditing. AWS Config provides a detailed inventory ofyour AWS resources and configuration, and continuously records configurationchanges. Auto Scaling is a service that will provide an automated demandmanagement for a deployed workload. CloudWatch provides the ability to alert onmetrics, including custom metrics. CloudWatch also has a logging feature that canbe used to aggregate log files from your resources.

• Failure Management: AWS CloudFormation provides templates for the creation ofAWS resources and provisions them in an orderly and predictable fashion. AmazonS3 provides a highly durable service to keep backups. Amazon Glacier provideshighly durable archives. AWS KMS provides a reliable key management system thatintegrates with many AWS services.

ResourcesRefer to the following resources to learn more about our best practices for Reliability.

Documentation

• Service Limits

• Service Limits Reports Blog

• Amazon Virtual Private Cloud

• AWS Shield

• Amazon CloudWatch

• Amazon S3

• AWS KMS

Whitepaper

• Reliability Pillar

• Backup Archive and Restore Approach Using AWS

• Managing your AWS Infrastructure at Scale

21

Page 25: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

• AWS Disaster Recovery

• AWS Amazon VPC Connectivity Options

Video

• How do I manage my AWS service limits?

• Embracing Failure: Fault-Injection and Service Reliability

Product

• AWS Premium Support

• Trusted Advisor

Performance EfficiencyThe Performance Efficiency pillar includes the ability to use computing resourcesefficiently to meet system requirements, and to maintain that efficiency as demandchanges and technologies evolve.

The performance efficiency pillar provides an overview of design principles, bestpractices, and questions. You can find prescriptive guidance on implementation in thePerformance Efficiency Pillar whitepaper.

Design PrinciplesThere are five design principles for performance efficiency in the cloud:

• Democratize advanced technologies: Technologies that are difficult to implementcan become easier to consume by pushing that knowledge and complexity intothe cloud vendor’s domain. Rather than having your IT team learn how to hostand run a new technology, they can simply consume it as a service. For example,NoSQL databases, media transcoding, and machine learning are all technologiesthat require expertise that is not evenly dispersed across the technical community.In the cloud, these technologies become services that your team can consumewhile focusing on product development rather than resource provisioning andmanagement.

• Go global in minutes: Easily deploy your system in multiple Regions around theworld with just a few clicks. This allows you to provide lower latency and a betterexperience for your customers at minimal cost.

• Use serverless architectures: In the cloud, serverless architectures remove the needfor you to run and maintain servers to carry out traditional compute activities. For

22

Page 26: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

example, storage services can act as static websites, removing the need for webservers, and event services can host your code for you. This not only removes theoperational burden of managing these servers, but also can lower transactionalcosts because these managed services operate at cloud scale.

• Experiment more often: With virtual and automatable resources, you can quicklycarry out comparative testing using different types of instances, storage, orconfigurations.

• Mechanical sympathy: Use the technology approach that aligns best to what youare trying to achieve. For example, consider data access patterns when selectingdatabase or storage approaches.

DefinitionThere are four best practice areas for performance efficiency in the cloud:

1. Selection

2. Review

3. Monitoring

4. Tradeoffs

Take a data-driven approach to selecting a high-performance architecture. Gatherdata on all aspects of the architecture, from the high-level design to the selectionand configuration of resource types. By reviewing your choices on a cyclical basis,you will ensure that you are taking advantage of the continually evolving AWSCloud. Monitoring will ensure that you are aware of any deviance from expectedperformance and can take action on it. Finally, your architecture can make tradeoffsto improve performance, such as using compression or caching, or relaxingconsistency requirements.

Best Practices

Selection

The optimal solution for a particular system will vary based on the kind of workloadyou have, often with multiple approaches combined. Well-architected systems usemultiple solutions and enable different features to improve performance.

In AWS, resources are virtualized and are available in a number of different types andconfigurations. This makes it easier to find an approach that closely matches yourneeds, and you can also find options that are not easily achievable with on-premises

23

Page 27: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

infrastructure. For example, a managed service such as Amazon DynamoDB provides afully managed NoSQL database with single-digit millisecond latency at any scale.

The following questions focus on these considerations for performance efficiency.(For a list of performance efficiency questions, answers, and best practices, see theAppendix.)

PERF 1:  How do you select the best performing architecture?

When you select the patterns and implementation for your architecture use a data-driven approach for the most optimal solution. AWS Solutions Architects, AWSReference Architectures, and AWS Partner Network (APN) Partners can help youselect an architecture based on what we have learned, but data obtained throughbenchmarking or load testing will be required to optimize your architecture.

Your architecture will likely combine a number of different architectural approaches(for example, event-driven, ETL, or pipeline). The implementation of your architecturewill use the AWS services that are specific to the optimization of your architecture’sperformance. In the following sections we look at the four main resource types thatyou should consider (compute, storage, database, and network).

Compute

The optimal compute solution for a particular system may vary based on applicationdesign, usage patterns, and configuration settings. Architectures may use differentcompute solutions for various components and enable different features to improveperformance. Selecting the wrong compute solution for an architecture can lead tolower performance efficiency.

In AWS, compute is available in three forms: instances, containers, and functions:

• Instances are virtualized servers and, therefore, you can change their capabilitieswith the click of a button or an API call. Because in the cloud resource decisionsare no longer fixed, you can experiment with different server types. At AWS, thesevirtual server instances come in different families and sizes, and they offer a widevariety of capabilities, including solid-state drives (SSDs) and graphics processingunits (GPUs).

• Containers are a method of operating system virtualization that allow you to runan application and its dependencies in resource-isolated processes.

• Functions abstract the execution environment from the code you want to execute.For example, AWS Lambda allows you to execute code without running an instance.

The following questions focus on these considerations for performance efficiency.

PERF 2:  How do you select your compute solution?

24

Page 28: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

When architecting your use of compute you should take advantage of the elasticitymechanisms available to ensure you have sufficient capacity to sustain performanceas demand changes.

Storage

The optimal storage solution for a particular system will vary based on the kindof access method (block, file, or object), patterns of access (random or sequential),throughput required, frequency of access (online, offline, archival), frequency ofupdate (WORM, dynamic), and availability and durability constraints. Well-architectedsystems use multiple storage solutions and enable different features to improveperformance.

In AWS, storage is virtualized and is available in a number of different types. Thismakes it easier to match your storage methods more closely with your needs, and alsooffers storage options that are not easily achievable with on- premises infrastructure.For example, Amazon S3 is designed for 11 nines of durability. You can also changefrom using magnetic hard disk drives (HDDs) to SSDs, and easily move virtual drivesfrom one instance to another in seconds.

The following questions focus on these considerations for performance efficiency.

PERF 3:  How do you select your storage solution?

When you select a storage solution, ensuring that it aligns with your access patternswill be critical to achieving the performance you want.

Database

The optimal database solution for a particular system can vary based on requirementsfor availability, consistency, partition tolerance, latency, durability, scalability,and query capability. Many systems use different database solutions for varioussubsystems and enable different features to improve performance. Selecting thewrong database solution and features for a system can lead to lower performanceefficiency.

Amazon RDS provides a fully managed relational database. With Amazon RDS, youcan scale your database's compute and storage resources, often with no downtime.Amazon DynamoDB is a fully managed NoSQL database that provides single-digitmillisecond latency at any scale. Amazon Redshift is a managed petabyte-scaledata warehouse that allows you to change the number or type of nodes as yourperformance or capacity needs change.

The following questions focus on these considerations for performance efficiency.

PERF 4:  How do you select your database solution?

25

Page 29: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

Although a workload’s database approach (RDBMS, NoSQL) has significant impact onperformance efficiency, it is often an area that is chosen according to organizationaldefaults rather than through a data-driven approach. As with storage, it is criticalto consider the access patterns of your workload, and also to consider if other non-database solutions could solve the problem more efficiently (such as a using a searchengine or data warehouse).

Network

The optimal network solution for a particular system will vary based on latency,throughput requirements and so on. Physical constraints such as user or on-premisesresources will drive location options, which can be offset using edge techniques orresource placement.

In AWS, networking is virtualized and is available in a number of different types andconfigurations. This makes it easier to match your networking methods more closelywith your needs. AWS offers product features (for example, Enhanced Networking,Amazon EBS-optimized instances, Amazon S3 transfer acceleration, dynamic AmazonCloudFront) to optimize network traffic. AWS also offers networking features (forexample, Amazon Route 53 latency routing, Amazon VPC endpoints, and AWS DirectConnect) to reduce network distance or jitter.

The following questions focus on these considerations for performance efficiency.

PERF 5:  How do you configure your networking solution?

When selecting your network solution, you need to consider location. With AWS, youcan choose to place resources close to where they will be used to reduce distance.By taking advantage of Regions, placement groups, and edge locations you cansignificantly improve performance.

Review

When architecting solutions, there is a finite set of options that you can choose from.However, over time new technologies and approaches become available that couldimprove the performance of your architecture.

Using AWS, you can take advantage of our continual innovation, which is drivenby customer need. We release new Regions, edge locations, services, and featuresregularly. Any of these could positively improve the performance efficiency of yourarchitecture.

The following questions focus on these considerations for performance efficiency.

PERF 6:  How do you evolve your workload to take advantage of new releases?

26

Page 30: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

Understanding where your architecture is performance-constrained will allow you tolook out for releases that could alleviate that constraint.

MonitoringAfter you have implemented your architecture you will need to monitor itsperformance so that you can remediate any issues before your customers are aware.Monitoring metrics should be used to raise alarms when thresholds are breached.The alarm can trigger automated action to work around any badly performingcomponents.

Amazon CloudWatch provides the ability to monitor and send notification alarms. Youcan use automation to work around performance issues by triggering actions throughAmazon Kinesis, Amazon Simple Queue Service (Amazon SQS), and AWS Lambda.

The following questions focus on these considerations for performance efficiency.

PERF 7:  How do you monitor your resources to ensure they are performing as expected?

Ensuring that you do not see too many false positives, or are overwhelmed with data,is key to having an effective monitoring solution. Automated triggers avoid humanerror and can reduce the time to fix problems. Plan for game days where you canconduct simulations in the production environment to test your alarm solution andensure that it correctly recognizes issues.

TradeoffsWhen you architect solutions, think about tradeoffs so you can select an optimalapproach. Depending on your situation you could trade consistency, durability, andspace versus time or latency to deliver higher performance.

Using AWS, you can go global in minutes and deploy resources in multiple locationsacross the globe to be closer to your end users. You can also dynamically add read-only replicas to information stores such as database systems to reduce the load onthe primary database. AWS also offers caching solutions such as Amazon ElastiCache,which provides an in-memory data store or cache, and Amazon CloudFront, whichcaches copies of your static content closer to end users. Amazon DynamoDBAccelerator (DAX) provides a read-through/write-through distributed caching tier infront of Amazon DynamoDB, supporting the same API, but providing sub-millisecondlatency for entities that are in the cache;

The following questions focus on these considerations for performance efficiency.

PERF 8:  How do you use tradeoffs to improve performance?

Tradeoffs can increase the complexity of your architecture and require load testing toensure that a measurable benefit is obtained.

27

Page 31: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

Key AWS ServicesThe AWS service that is essential to Performance Efficiency is Amazon CloudWatch,which monitors your resources and systems, providing visibility into your overallperformance and operational health. The following services and features support thefour areas in performance efficiency:

• Selection

• Compute: Auto Scaling is key to ensuring that you have enough instances tomeet demand and maintain responsiveness.

• Storage: Amazon EBS provides a wide range of storage options (such as SSDand provisioned input/output operations per second (PIOPS)) that allow you tooptimize for your use case. Amazon S3 provides serverless content delivery, andAmazon S3 transfer acceleration enables fast, easy, and secure transfers of filesover long distances.

• Database: Amazon RDS provides a wide range of database features (such asPIOPS and read replicas) that allow you to optimize for your use case. AmazonDynamoDB provides single-digit millisecond latency at any scale.

• Network: Amazon Route 53 provides latency-based routing. Amazon VPCendpoints and AWS Direct Connect can reduce network distance or jitter.

• Review: The AWS Blog and the What’s New section on the AWS website areresources for learning about newly launched features and services.

• Monitoring: Amazon CloudWatch provides metrics, alarms, and notifications thatyou can integrate with your existing monitoring solution, and that you can use withAWS Lambda to trigger actions.

• Tradeoffs: Amazon ElastiCache, Amazon CloudFront, and AWS Snowball areservices that allow you to improve performance. Read replicas in Amazon RDS canallow you to scale read-heavy workloads.

ResourcesRefer to the following resources to learn more about our best practices forPerformance Efficiency.

Documentation

• Amazon S3 Performance Optimization

• Amazon EBS Volume Performance

28

Page 32: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

Whitepaper

• Performance Efficiency Pillar

Video

• AWS re:Invent 2016: Scaling Up to Your First 10 Million Users (ARC201)

• Performance AWS re:Invent 2016: Deep Dive on Amazon EC2 Instances, FeaturingPerformance Optimization (CMP301)

Cost OptimizationThe Cost Optimization pillar includes the ability to run systems to deliver businessvalue at the lowest price point.

The cost optimization pillar provides an overview of design principles, best practices,and questions. You can find prescriptive guidance on implementation in the CostOptimization Pillar whitepaper.

Design PrinciplesThere are five design principles for cost optimization in the cloud:

• Adopt a consumption model: Pay only for the computing resources that yourequire and increase or decrease usage depending on business requirements, not byusing elaborate forecasting. For example, development and test environments aretypically only used for eight hours a day during the work week. You can stop theseresources when they are not in use for a potential cost savings of 75% (40 hoursversus 168 hours).

• Measure overall efficiency: Measure the business output of the system and thecosts associated with delivering it. Use this measure to understand the gains youmake from increasing output and reducing costs.

• Stop spending money on data center operations: AWS does the heavy lifting ofracking, stacking, and powering servers, so you can focus on your customers andbusiness projects rather than on IT infrastructure.

• Analyze and attribute expenditure: The cloud makes it easier to accurately identifythe usage and cost of systems, which then allows transparent attribution of IT coststo individual business owners. This helps measure return on investment (ROI) andgives system owners an opportunity to optimize their resources and reduce costs.

• Use managed and application level services to reduce cost of ownership: In thecloud, managed and application level services remove the operational burden of

29

Page 33: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

maintaining servers for tasks such as sending email or managing databases. Asmanaged services operate at cloud scale, they can offer a lower cost per transactionor service.

DefinitionThere are four best practice areas for cost optimization in the cloud:

1. Cost-Effective Resources

2. Matching supply and demand

3. Expenditure Awareness

4. Optimizing Over Time

As with the other pillars, there are tradeoffs to consider. For example, do you wantto optimize for speed to market or for cost? In some cases, it’s best to optimize forspeed—going to market quickly, shipping new features, or simply meeting a deadline—rather than investing in upfront cost optimization. Design decisions are sometimesguided by haste as opposed to empirical data, as the temptation always exists toovercompensate “just in case” rather than spend time benchmarking for the mostcost-optimal system over time. This often leads to drastically over-provisioned andunder-optimized deployments, which remain static throughout their life cycle. Thefollowing sections provide techniques and strategic guidance for the initial andongoing cost optimization of your deployment.

Best Practices

Cost-Effective Resources

Using the appropriate instances and resources for your system is key to cost savings.For example, a reporting process might take five hours to run on a smaller server butone hour to run on a larger server that is twice as expensive. Both servers give you thesame outcome, but the smaller server incurs more cost over time.

A well-architected system uses the most cost-effective resources, which can havea significant and positive economic impact. You also have the opportunity to usemanaged services to reduce costs. For example, rather than maintaining servers todeliver email, you can use a service that charges on a per-message basis.

AWS offers a variety of flexible and cost-effective pricing options to acquire instancesfrom EC2 and other services in a way that best fits your needs. On-Demand Instancesallow you to pay for compute capacity by the hour, with no minimum commitmentsrequired. Reserved Instances allow you to reserve capacity and offer savings of up

30

Page 34: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

to 75% off On-Demand pricing. With Spot Instances, you can leverage unusedAmazon EC2 capacity and offer savings of up to 90% off On-Demand pricing. SpotInstances are appropriate where the system can tolerate using a fleet of servers whereindividual servers can come and go dynamically, such as stateless web servers, batchprocessing, or when using HPC and big data.

Appropriate service selection can also reduce usage and costs; such as CloudFront tominimise data transfer, or completely eliminate costs, such as utilizing Amazon Auroraon RDS to remove expensive database licensing costs.

The following questions focus on these considerations for cost optimization. (For a listof cost optimization questions, answers, and best practices, see the Appendix.)

COST 1:  How do you evaluate cost when you select AWS services?COST 2:  How do you meet cost targets with resource type and size choices?COST 3:  How do you use pricing models to reduce cost?COST 4:  How do you plan for data transfer charges?

By factoring in cost during service selection, and using tools such as Cost Explorer andAWS Trusted Advisor to regularly review your AWS usage, you can actively monitoryour utilization and adjust your deployments accordingly.

Matching supply and demand

Optimally matching supply to demand delivers the lowest cost for a system, but therealso needs to be sufficient extra supply to allow for provisioning time and individualresource failures. Demand can be fixed or variable, requiring metrics and automationto ensure that management does not become a significant cost.

In AWS, you can automatically provision resources to match demand. Auto Scalingand demand, buffer, and time-based approaches allow you to add and removeresources as needed. If you can anticipate changes in demand, you can save moremoney and ensure your resources match your system needs.

The following questions focus on these considerations for cost optimization.

COST 5:  How do you match supply of resources with customer demand?

When architecting to match supply against demand, you will want to actively thinkabout the patterns of usage and the time it takes to provision new resources.

Expenditure Awareness

The increased flexibility and agility that the cloud enables encourages innovation andfast-paced development and deployment. It eliminates the manual processes and

31

Page 35: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

time associated with provisioning on-premises infrastructure, including identifyinghardware specifications, negotiating price quotations, managing purchase orders,scheduling shipments, and then deploying the resources. However, the ease of useand virtually unlimited on- demand capacity may require a new way of thinking aboutexpenditures.

Many businesses are composed of multiple systems run by various teams. Thecapability to attribute resource costs to the individual business or product ownersdrives efficient usage behavior and helps reduce waste. Accurate cost attribution alsoallows you to understand which products are truly profitable, and allows you to makemore informed decisions about where to allocate budget.

In AWS you can leverage Cost Explorer to track your spend, and gain insights intoexactly where you spend. Using AWS Budgets, you can send notifications if yourusage or costs are not inline with your forecasts. You can use tagging on resourcesto apply business and organization information to your usage and cost, this providesadditional insights to optimization from an organisation perspective.

The following questions focus on these considerations for cost optimization.

COST 6:  How do you monitor usage and cost?COST 7:  How do you govern AWS usage?COST 8:  How do you decommission resources?

You can use cost allocation tags to categorize and track your AWS usage and costs.When you apply tags to your AWS resources (such as EC2 instances or S3 buckets),AWS generates a cost and usage report with your usage and your tags. You canapply tags that represent business categories (such as cost centers, system names, orowners) to organize your costs across multiple services.

Combining tagged resources with entity lifecycle tracking (employees, projects) makesit possible to identify orphaned resources or projects that are no longer generatingvalue to the business and should be decommissioned. You can set up billing alerts tonotify you of predicted overspending, and the AWS Simple Monthly Calculator allowsyou to calculate your data transfer costs.

Optimizing Over Time

As AWS releases new services and features, it is a best practice to review your existingarchitectural decisions to ensure they continue to be the most cost-effective. As yourrequirements change, be aggressive in decommissioning resources, entire services,and systems that you no longer require.

Managed services from AWS can often significantly optimize a solution, so it isessential to be aware of new managed services and features as they become available.

32

Page 36: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

For example, running an Amazon RDS database can be cheaper than running yourown database on Amazon EC2.

The following questions focus on these considerations for cost optimization.

COST 9:  How do you evaluate new services?

When regularly reviewing your deployments, assess how newer services can help saveyou money. For example, Amazon Aurora on RDS could help you reduce costs forrelational databases

Key AWS ServicesThe tool that is essential to Cost Optimization is Cost Explorer, which helps yougain visibility and insights into your usage, across your systems and throughoutyour business. The following services and features support the four areas in costoptimization:

• Cost-Effective Resources: You can use Cost Explorer for Reserved Instancerecommendations, and see patterns in how much you spend on AWS resourcesover time. Use Amazon CloudWatch and Trusted Advisor will help to right size yourresources. You can use Amazon Aurora on RDS to remove database licensing costs.AWS Direct Connect and Amazon CloudFront can be used to optimize data transfer.

• Matching supply and demand: Auto Scaling allows you to add or remove resourcesto match demand without overspending.

• Expenditure Awareness: AWS Cost Explorer allows you to view and track yourusage in detail. AWS Budgets will notify you if your usage or spend exceeds actualor forecast budgeted amounts.

• Optimizing Over Time: The AWS News Blog and the What’s New section on theAWS website are resources for learning about newly launched features and services.AWS Trusted Advisor inspects your AWS environment and finds opportunities tosave you money by eliminating unused or idle resources or committing to ReservedInstance capacity.

ResourcesRefer to the following resources to learn more about our best practices for CostOptimization.

Documentation

• Analyzing Your Costs with Cost Explorer

33

Page 37: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

• AWS Cloud Economics Center

• AWS Detailed Billing Reports

Whitepaper

• Cost Optimization Pillar

Video

• Cost Optimization on AWS

Tool

• AWS Total Cost of Ownership (TCO) Calculators

• AWS Simple Monthly Calculator

34

Page 38: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

The Review ProcessThe review of architectures needs to be done in a consistent manner, with a blame-free approach that encourages diving deep. It should be a light-weight process(hours not days) that is a conversation and not an audit. The purpose of reviewing anarchitecture is to identify any critical issues that might need addressing or areas thatcould be improved. The outcome of the review is a set of actions that should improvethe experience of a customer using the workload.

As discussed in the “On Architecture” section, you will want each team memberto take responsibility for the quality of its architecture. We recommend that theteam members who build an architecture use the Well-Architected Framework tocontinually review their architecture, rather than holding a formal review meeting.A continuous approach allows your team members to update answers as thearchitecture evolves, and improve the architecture as you deliver features.

AWS Well-Architected is aligned to the way that AWS reviews systems and servicesinternally. It is premised on a set of design principles that influences architecturalapproach, and questions that ensure that people don’t neglect areas that oftenfeatured in Root Cause Analysis (RCA). Whenever there is a significant issue withan internal system, AWS service, or customer we look at the RCA to see if we couldimprove the review processes we use.

Reviews should be applied at multiple times in a workload lifecycle, early on in thedesign phase to avoid one-way doors 1 that are difficult to change, and then beforethe go live date. Post go live your workload will continue to evolve as you add newfeatures and change technology implementations. The architecture of a workloadchanges over time. You will need to follow good hygiene practices to stop itsarchitectural characteristics from degrading as you evolve it. As you make significantarchitecture changes you should follow a set of hygiene processes including a Well-Architected review.

If you want to use the review as a one-time snapshot or independent measurementyou will want to ensure you have all the right people in the conversation. Often wefind that reviews are the first time that a team truly understands what they haveimplemented. An approach that works well when reviewing another team's workloadis to have a series of informal conversations about their architecture where youcan glean the answers to most questions. You can then follow up with one or twomeetings where you can gain clarity or dive deep on areas of ambiguity or perceivedrisk.

Here are some suggested items to facilitate your meetings:

1Many decisions are reversible, two-way doors. Those decisions can use a light-weight process. One-waydoors are hard or impossible to reverse and require more inspection before making them.

35

Page 39: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

• A meeting room with whiteboards

• Print outs of any diagrams or design notes

• Action list of questions that require out-of-band research to answer (for example,“did we enable encryption or not?”)

After you have done a review you should have a list of issues that you can prioritizebased on your business context. You will also want to take into account the impactof those issues on the day-to-day work of your team. If you address these issuesearly you could free up time to work on creating business value rather than solvingrecurring problems. As you address issues you can update your review to see how thearchitecture is improving.

While the value of a review is clear after you have done one, you may find that anew team might be resistant at first. Here are some objections that can be handledthrough educating the team on the benefits of a review:

• “We are too busy!” (Often said when the team is getting ready for a big launch.)

• If you are getting ready for a big launch you will want it to go smoothly. Thereview will allow you to understand any problems you might have missed.

• We recommend that you carry out reviews early in the design lifecycle to uncoverrisks and develop a mitigation plan aligned with the feature delivery roadmap.

• “We don’t have time to do anything with the results!” (Often said when there is animmovable event, such as the Super Bowl, that they are targeting.)

• These events can’t be moved. Do you really want to go into it without knowingthe risks in your architecture? Even if you don’t address all of these issues you canstill have playbooks for handling them if they materialize

• “We don’t want others to know the secrets of our solution implementation!”

• If you point the team at the questions that are in the Appendix of the Well-Architected Framework whitepaper, they will see that none of the questionsreveal any commercial or technical propriety information.

As you carry out multiple reviews with teams in your organization you might identifythematic issues. For example, you might see that a group of teams has clusters ofissues in a particular pillar or topic. You will want to look at all your reviews in aholistic manner, and identify any mechanisms, training, or principal engineering talksthat could help address those thematic issues. After you have done a review you willlikely have a list of issues that you can prioritize based on your business context. Youwill also want to take into account the impact of those issues on your team’s in day-to-day work. Addressing these issues could free up time to create business value

36

Page 40: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

rather than “putting out fires.” As you address issues you can update your review tosee how the architecture is improving.

37

Page 41: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

ConclusionThe AWS Well-Architected Framework provides architectural best practices across thefive pillars for designing and operating reliable, secure, efficient, and cost-effectivesystems in the cloud. The Framework provides a set of questions that allows youto review an existing or proposed architecture. It also provides a set of AWS bestpractices for each pillar. Using the Framework in your architecture will help youproduce stable and efficient systems, which allow you to focus on your functionalrequirements.

38

Page 42: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

ContributorsThe following individuals and organizations contributed to this document:

• Philip Fitzsimons: Sr. Manager Well-Architected, Amazon Web Services

• Rodney Lester: Reliability Lead Well-Architected, Amazon Web Services

• Nathan Besh: Cost Lead Well-Architected, Amazon Web Services

• Brian Carlson: Operations Lead Well-Architected, Amazon Web Services

• Jon Steele: Sr. Technical Account Manager, Amazon Web Services

• Ryan King: Technical Program Manager, Amazon Web Services

• Ben Potter: Security Lead Well-Architected, Amazon Web Services

• Erin Rifkin: Senior Product Manager, Amazon Web Services

• Max Ramsay: Principal Security Solutions Architect, Amazon Web Services

• Scott Paddock: Security Solutions Architect, Amazon Web Services

• Callum Hughes: Solutions Architect, Amazon Web Services

39

Page 43: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

Further ReadingAWS Disaster Recovery

AWS KMS

AWS Security Best Practices

AWS re:Invent 2016: Scaling Up to Your First 10 Million Users (ARC201)

AWS Amazon VPC Connectivity Options

AWS Cloud Economics Center

AWS Cloud Security

AWS Compliance

AWS Detailed Billing Reports

AWS Limit Monitor

AWS Premium Support

AWS Risk and Compliance

AWS Security Blog

AWS Security Overview

AWS Security State of the Union

AWS Shield

AWS Simple Monthly Calculator

AWS Total Cost of Ownership (TCO) Calculators

AWS Well-Architected homepage

Amazon S3

Amazon CloudWatch

Amazon EBS Volume Performance

Amazon S3 Performance Optimization

Amazon Virtual Private Cloud

Amazon leadership principles

Analyzing Your Costs with Cost Explorer

Backup Archive and Restore Approach Using AWS

Cost Optimization on AWS

40

Page 44: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

Cost Optimization Pillar

DevOps and AWS

DevOps at Amazon

Embracing Failure: Fault-Injection and Service Reliability

How do I manage my AWS service limits?

Managing your AWS Infrastructure at Scale

Operational Excellence Pillar

Performance AWS re:Invent 2016: Deep Dive on Amazon EC2 Instances, Featuring PerformanceOptimization (CMP301)

Performance Efficiency Pillar

Reliability Pillar

Security Pillar

Service Limits

Service Limits Reports Blog

Shared Responsibility Overview

TOGAF

Trusted Advisor

Zachman Framework

41

Page 45: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

Document RevisionsMajor revisions:

Date DescriptionJune 2018 Updates to simplify question text, standardize answers,

and improve readability.

November 2017 Operational Excellence moved to front of pillars andrewritten so it frames other pillars. Refreshed otherpillars to reflect evolution of AWS.

November 2016 Updated the Framework to include operationalexcellence pillar, and revised and updated the otherpillars to reduce duplication and incorporate learningsfrom carrying out reviews with thousands of customers.

November 2015 Updated the Appendix with current AmazonCloudWatch Logs information.

October 2015 Original publication.

42

Page 46: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

Appendix: Well-Architected Questions,Answers, and Best Practices

Operational Excellence

Prepare

OPS 1  What factors drive your operational priorities?

Operational priorities are the focus areas of your operations efforts. Clearly define and agree toyour operations priorities to maximize the benefits of your operations efforts.

Best practices:• Evaluate business needs: Involve both business and development teams when setting

operational priorities to ensure that you have a thorough understanding of whatoperational support is required to achieve business outcomes.

• Evaluate compliance requirements: Consider external factors, such as regulatorycompliance requirements and industry standards to ensure you are aware of potentialobligations that may mandate or emphasize specific operational priorities.

• Evaluate risk: Operational priorities are frequently tradeoffs between competing interests.For example, accelerating speed to market for new features may be emphasized over costoptimization. Consider risks against potential benefits to ensure you are making informeddecisions when setting your operational priorities.

43

Page 47: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

OPS 2  How do you design your workload to enable operability?

The majority of the lifetime of a workload is typically spent in an operating state. Consideroperations needs as a part of system design to help you enable long term sustainment of yourworkload.

Best practices:• Share design standards: Share existing best practices, guidance, and governance

requirements across teams, and include shared design standards in system design toreduce complexity and maximize the benefits from development efforts. Ensure thatprocedures exist to request changes, additions, and exceptions to design standards tosupport continual improvement and innovation.

• Design for cloud operations: Leverage features of cloud environments in your workloaddesign (e.g. elasticity, on-demand scalability, pay-as-you-go pricing, automation) toenable operations capabilities such as rapid improvement iterations and lower riskexperimentation.

• Provide insights into workload behavior: Build instrumentation into your system design(for example logs, metrics, and counters) to enable your understanding what is goingon in the system and allow you to measure performance of the system across individualcomponents.

• Provide insights into customer behavior: Build instrumentation into your system design(for example logs, metrics, and counters) to enable your understanding of how thecustomer uses the system and the quality of the customer experience.

• Implement practices that reduce defects, ease remediation, and improve flow: Adoptapproaches that improve flow and that enable fast feedback on quality, refactoring,and bug fixing to help you to rapidly identify and remediate issues introduced throughdeployment activities.

• Mitigate deployment risks: Use approaches such as frequent small reversible changes,automated deployments, testing, canary or one-box deployments, blue-green, etc. to helpyou to limit the impact of issues introduced through the implementation of changes andenable rapid recovery.

44

Page 48: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

OPS 3  How do you know that you are ready to support a workload?

Evaluate the operational readiness of your workload, processes and procedures, and personnelto help you understand the operational risks related to your workload.

Best practices:• Continuous improvement culture: Cultivate a continuous improvement culture to

empower your personnel to identify and act upon opportunities for improvement. Developa continuous improvement culture by emphasizing that change is constant, that failure isexpected, and that improvement and innovation are achieved through experimentation.Provide a safe environment for experimentation where it is accepted that experiments donot always achieve desired outcomes.

• Share understanding of the value to the business: Have cross-team understanding ofthe criticality of the workload to the business with procedures to engage across teams forresources when needed to help you address operational issues.

• Ensure personnel capability: Have a mechanism to validate that you have an appropriatenumber of trained personnel to provide support for operational needs. Train personneland adjust personnel capacity as necessary to maintain effective support.

• Documented accessible governance and guidance: Publish standards that are accessible,readily understood, and are measurable for compliance to help guide and educate yourpersonnel enabling their compliance. Ensure that procedures exist to request changes,additions, and exceptions to standards to support continual improvement and innovation.

• Use checklists: Use checklists to ensure you have a consistent evaluation of your readinessto operate a workload. Checklists should include at a minimum the operational readinessof the teams and the workload, and security considerations. Checklist elements maybe hard requirements or risk based decisions may be made to operate a workload thatdoes not satisfy all requirements. Checklist elements may be specific to a workload,architecture, or may be implementation dependent. Script and automate checklists whereappropriate to ensure consistency, speed execution, and limit human error.

• Use runbooks: Have runbooks to help enable consistent and prompt responses to well-understood events through documented procedures to achieve specific outcomes.Effective procedures should contain the minimum information for an adequately skilledperson to achieve a desired outcome. Script and automate runbooks where appropriate toensure consistency, speed responses, and limit human error.

• Use playbooks: Have playbooks to help enable consistent and prompt responses to failurescenarios by documenting the processes to identify underlying issues. Effective processesshould guide an adequately skilled person through identifying potential sources of failure,isolating faults, determining root cause, and remediation. Script and automate playbookswhere appropriate to ensure consistency, speed responses, and limit human error.

• Practice recovery: Identify potential failure scenarios, remove the sources of failure wherepossible, develop and test responses to failures to limit their impact when they occur andhelp ensure prompt and effective responses.

45

Page 49: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

Operate

OPS 4  What factors drive your understanding of operational health?

Define metrics for the evaluation of your workload and processes to help you understandoperations effectiveness in supporting business outcomes. Capture and analyze metrics to gainvisibility to processes and events so that you can take appropriate action.

Best practices:• Define expected business and customer outcomes: Have a documented definition of

what success looks like for the workload from a business and customer perspective to useas evaluation criteria when evaluating operations success.

• Identify success metrics: Define metrics to measure the behavior of the workload againstbusiness and customer expectations to know if your workload is achieving them.

• Identify workload metrics: Define metrics to measure the status of the workload and itscomponents against the success metrics to know if your workload is achieving them.

• Identify operations metrics: Define metrics to measure the execution of operationsactivities (runbooks and playbooks) to know if you are achieving operational outcomesthat support the business needs.

• Established baselines: Establish baselines for metrics to provide expected values as thebasis for comparisons.

• Collect and analyze metrics: Perform regular proactive reviews of metrics to identifytrends and determine where appropriate responses are needed.

• Validate insights: Review your analysis results and responses with cross-functional teamsand business owners to help establish common understanding, identify additional impacts,and determine courses of action. Adjust responses as appropriate.

• Business-level view of operations: Create a business-level view of operations to help youdetermine if you are satisfying needs and to help identify areas that need improvement toreach business goals.

46

Page 50: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

OPS 5  How do you manage operational events?

Prepare and validate procedures to respond to operational events to help you minimize theirpotential disruption to your workload.

Best practices:• Determine priority of operational events based on business impact: Base the priority

of events on business impact to ensure that when multiple events require interventionthose that are most significant to the business are addressed first. For example impactscan include loss of life or injury, financial loss, or damage to reputation or trust.

• Processes for event, incident, and problem management: Have processes to addressobserved events, events that require intervention (incidents), and events that requireintervention and either recur or cannot currently be resolved (problems). Use theseprocesses to help you to mitigate the impact of those events on the business and yourcustomers by ensuring timely and appropriate responses.

• Process per alert: Have a well-defined response (runbook or playbook), with a specificallyidentified owner, for any event for which you raise an alert. By doing so you ensureeffective and prompt responses to operations events and prevent actionable events frombeing obscured by less valuable notifications.

• Identify decision makers: Identify decision makers who are empowered to determineoperations actions on behalf of their organizations. Escalate to decision makers asappropriate when operations activities may impact their business outcomes to ensureinformed decisions. Runbooks and playbooks should be pre-approved by decision makerswhere appropriate to ensure prompt responses to events.

• Defined escalation paths: Defined escalation paths in your runbooks and playbooksincluding what triggers escalation, procedures for escalation, and specifically identifyowners for each action, to help ensure effective and prompt responses to operationsevents. Escalations may include third parties.

• Push notifications: Direct communications to your users (for example, with email or SMS)when the services they consume are impacted, and when they return to normal operatingconditions, to enable them to take appropriate action in response.

• Communicate status through dashboards: Provide dashboards tailored to theirtarget audiences (for example, internal technical teams, leadership, and customers) tocommunicate the current operating status of the business and provide metrics of interest.Examples include Amazon CloudWatch dashboards, AWS Personal Health Dashboard, andService Health Dashboard.

• Process for root cause analysis: Have a process to identify and document the root causeof an event so that you can develop mitigations to limit or prevent recurrence and youcan develop procedures for prompt and effective responses. Communicated root cause asappropriate, tailored to target audiences.

47

Page 51: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

Evolve

OPS 6  How do you evolve operations?

Dedicate time and resources for continuous incremental improvement to help evolve theeffectiveness and efficiency of your operations.

Best practices:• Process for continuous improvement: Dedicated time and resources within your

operations processes to make continuous incremental improvements possible. Regularlyevaluate and prioritize opportunities for improvement to help focus resources where theywill have the greatest benefits.

• Define drivers for improvement: Identify your drivers for improvement to help youevaluate and prioritize opportunities. Evaluate opportunities for improvement byconsidering desired features, capabilities, and improvements; unacceptable issues, bugs,and vulnerabilities; and updates required to maintain compliance with policy or supportfrom a third party.

• Implement Feedback loops: Include feedback loops in your procedures to help enableyour identification of issues and areas in need of improvement.

• Document and share lessons learned: Document and share lessons learned from theexecution of operations activities so that you can leverage them internally and acrossteams. Analyze trends in lessons learned to identify issues and areas to investigate forimprovement opportunities.

• Perform operations metrics reviews: Regularly perform retrospective analysis ofoperations metrics with cross-team participants from different areas of the business toidentify opportunities for improvement, potential courses of action, and to share lessonslearned.

48

Page 52: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

Security

Identity and Access Management

SEC 1  How do you manage credentials for your workload?

Credentials include passwords, tokens, and keys that grant access directly or indirectly tomanage your workload. Protect credentials with appropriate mechanisms to help you reducethe risk of accidental or malicious use.

Best practices:• Enforce use of multi-factor authentication (MFA): Enforce multi-factor authentication

(MFA) with software or hardware tokens to provide additional access control.

• Enforce password requirements: Enforce the minimum length and complexity ofpasswords to help protect against brute force and other password attacks.

• Rotate credentials regularly: Rotate credentials regularly to help reduce the risk of oldcredentials being used by previous systems or personnel.

• Audit credentials periodically: Audit credentials to ensure the appropriate controls (egMFA) are enforced, are rotated regularly and appropriate access level.

• Using centralized identity provider: An identity provider or directory service is used toauthenticate users in a centralized place, reducing the requirement for multiple credentialsand management complexity.

SEC 2  How do you control human access to services?

Control human access to services with appropriately defined, limited, and segregated access tohelp you reduce the risk of unauthorized access.

Best practices:• Credentials are not shared: Credentials are not shared between any users to help

segregation of users and traceability.

• User life-cycle managed: Access is managed through employee life-cycle policies to grantonly valid users access.

• Minimum privileges: Users are granted only the minimum privileges needed toaccomplish their job to reduce the risk of unauthorized access.

• Access requirements clearly defined: Access requirements are clearly defined for user'sjob function or role to reduce the risk of unnecessary privileges.

• Access is granted through roles or federation: Using Roles allow for secure cross-accountaccess and federated users.

49

Page 53: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

SEC 3  How do you control programmatic access to services?

Control programmatic or automated access to services with appropriately limited short-termcredentials and roles to help you reduce the risk of unauthorized access.

Best practices:• Credentials are not shared: Credentials are not shared for programmatic access between

any systems.

• Dynamic authentication: Credentials are dynamically acquired and frequently rotatedfrom a service or system.

• Minimum privileges: Programmatic access requirements are clearly defined with onlyminimum privileges granted to the system to reduce the risk of unauthorized access.

• Access requirements clearly defined: Access requirements are clearly defined to reducethe risk of unnecessary privileges.

Detective Controls

SEC 4  How are you aware of security events in your workload?

Capture and analyze logs and metrics to gain visibility to security threats and events so thatyou can take appropriate action.

Best practices:• Logging enabled where available: Enabling logging for all services and functions

improves visibility of events.

• Analyzing AWS CloudTrail: CloudTrail trails should be automatically analyzed forsuspicious behavior.

• Analyzing logs centrally: All logs should be collected centrally and automatically analyzedto detect suspicious behavior.

• Monitoring and alerting for key metrics and events: Key metrics and events related tosecurity should be monitored with automated alerts.

• AWS marketplace or APN partner solution enabled: A solution from the AWSMarketplace or from an APN Partner.

50

Page 54: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

Infrastructure Protection

SEC 5  How do you protect your networks?

Public and private networks and services require multiple layers of defense to help protect yourworkloads from network-based threats.

Best practices:• Controlling traffic in Virtual Private Cloud (VPC): Using a VPC to isolate and control

workload traffic.

• Controlling traffic at the boundary: Control traffic at the boundary or network edge ofthe workload to take advantage of the first opportunity to control traffic and provide alayer of protection.

• Controlling traffic using available features: Controlling traffic using available features,including security groups, Network ACLs and subnets, adds layers of protection.

• AWS marketplace or APN partner solution enabled: A solution from the AWSMarketplace or from an APN Partner.

SEC 6  How do you stay up to date with AWS security features and industry securitythreats?

Staying up to date and implementing AWS and industry best practices including services andfeatures can improve the security of your workload. Being aware of the latest security threatswill help you build a threat model to identify and implement protective controls.

Best practices:• Evaluating new security services and features: Explore security services and features as

they are released to identify appropriate protections to improve your security posture.

• Using security services and features: Adopting the use of security services and featureswill help you implement controls to protect your workload.

51

Page 55: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

SEC 7  How do you protect your compute resources?

Configure compute resources with manageable components to protect and monitor theirintegrity so that you can take appropriate action.

Best practices:• Hardening default configurations: Configure and harden compute resources to meet your

requirements to help improve the security of your compute.

• Checking file integrity: Perform file integrity checking to gain visibility of unauthorizedchanges so that you can take appropriate action.

• Intrusion detection enabled: Intrusion detection controls including host-based agentsprovide additional visibility.

• AWS marketplace or APN partner solution enabled: A solution from the AWSMarketplace or from an APN Partner.

• Configuration management tool: Enforce secure configurations by default automaticallyby using a configuration management service or tool.

• Patching and scanning for vulnerabilities: Apply patches regularly and scan forvulnerabilities using a tool to help protect against new threats.

Data Protection

SEC 8  How do you classify your data?

Classification provides a way to categorize data, based on levels of sensitivity, to help youdetermine appropriate protective controls.

Best practices:• Use a data classification schema: Classify data with a data classification schema.

• Data classification applied: Data is protected according to its classification in the schema.

52

Page 56: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

SEC 9  How do you manage data protection mechanisms?

Data protection mechanisms include services and keys that protect data in transit and at rest.Protect these services and keys to help you reduce the risk of unauthorized access to systemsand data.

Best practices:• Use a secure key management service: Using a key management service such as AWS

KMS or CloudHSM.

• Use service level controls: Using built-in service level controls such as Amazon S3encryption with KMS managed keys.

• Use client side key management: We manage keys using client side techniques.

• AWS Marketplace or APN Partner solution: Using an AWS Marketplace or APN Partnersolution.

SEC 10  How do you protect your data at rest?

Protecting your data at rest reduces the risk of unauthorized access or loss.

Best practices:• Encrypting at rest: Data at rest is encrypted.

SEC 11  How do you protect your data in transit?

Protecting your data in transit reduces the risk of unauthorized access or exposure.

Best practices:• Encrypting in transit: TLS or equivalent is used for communication as appropriate.

Incident Response

SEC 12  How do you prepare to respond to an incident?

Prepare to investigate and respond to security incidents to help you minimize potentialdisruptions to your workload.

Best practices:• Pre-provisioned access: InfoSec has the right access or can gain access quickly. This access

should be pre-provisioned so that an appropriate response can be made to an incident.

• Pre-deployed tools: InfoSec has the right tools pre-deployed into AWS so that anappropriate response can be made to an incident.

• Run game days: Incident response simulations are conducted regularly, and lessonslearned are incorporated into the architecture and operations.

53

Page 57: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

Reliability

Foundations

REL 1  How are you managing AWS service limits for your accounts?

AWS accounts are provisioned with default service limits to prevent new users fromaccidentally provisioning more resources than they need. There also limits on how often youcan call APIs to protect AWS infrastructure. Evaluate your AWS service needs and requestappropriate changes to your limits for each region.

Best practices:• Active monitoring and managing limits: Evaluate your potential usage on AWS via

Amazon CloudWatch, or a third party product, increase your regional limits appropriately,and allow planned growth in usage.

• Implemented automated monitoring and management of limits: Implement tools usingAWS SDKs to alert you when thresholds are being approached. If you have EnterpriseSupport, you can also automate the limit increase request.

• Aware of fixed service limits: Be aware of unchangeable service limits and architectaround these.

• Ensure there is a sufficient gap between the current service limit and the max usageto accommodate for fail over: A fail over is when a facility fails. In AWS, this may be anisolation zone in your architecture, an Availability Zone (AZ), or an AWS Region. When a failover of an isolation zone or AZ occurs, your automation may make requests for resourcesbefore the failed resources are terminated. This may cause you to exceed planned limits.Ensure you can request resources for a failure of isolation zones before resources havebeen fully decommissioned.

• Service limits are managed across all relevant accounts and regions: If you are usingmultiple AWS accounts or AWS Regions, ensure you request the same limits in allenvironments in which you run your production workloads.

54

Page 58: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

REL 2  How do you plan your network topology on AWS?

Applications can exist in one or more environments: EC2-Classic, the default VPC, or VPC(s)created by you. Network considerations such as system connectivity, Elastic IP address andpublic IP address management, VPC and private address management, and name resolutionare fundamental to using resources in the cloud. Well planned and documented deploymentsare essential to reduce the risk of overlap and contention.

Best practices:• Connectivity back to data center is not needed: If you do not need connectivity to an

existing on-premises network, then planning can be simplified.

• Highly available connectivity between AWS and on-premises environment isimplemented: Use multiple Direct Connect circuits and multiple VPN tunnels. Use multipleDirect Connect locations for high availability. If you use multiple AWS Regions, you willalso need multiple Direct Connect locations in at least 2 regions. You may want to evaluateAWS Marketplace appliances that terminate VPNs. If you use AWS Marketplace appliances,deploy redundant instances for high availability in different Availability Zones.

• Highly available network connectivity for the users of the workload is implemented:Use a highly available DNS, load balancing, and/or reverse proxy as the public facingendpoint of your application. You may want to evaluate AWS Marketplace appliances forload balancing or proxying.

• Using non-overlapping private IP address ranges in multiple VPCs: The IP ranges of eachof your VPCs should not conflict if they are peered or connected via VPN. The same is truefor private connectivity to your on-premises environments and other cloud providers.

• IP subnet allocation accounts for expansion and availability: Individual Amazon VPC IPaddress ranges should be large enough to accommodate an application’s requirements,including factoring in future expansion and allocation of IP addresses to subnets acrossAvailability Zones. Additionally, keep some IP addresses available for possible futureexpansion.

Change Management

REL 3  How does your system adapt to changes in demand?

A scalable system provides elasticity to add and remove resources automatically so that theyclosely match the current demand at any given point in time.

Best practices:• Workload scales automatically: Use automatically scalable services, such as Amazon

S3, Amazon CloudFront, Auto Scaling, Amazon DynamoDB, Amazon Aurora, Elastic LoadBalancing, and AWS Lambda or automation created using third party tools and/or AWSSDKs.

• Workload is load tested: Adopt a load testing methodology to measure if scaling activitywill meet workload requirements.

55

Page 59: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

REL 4  How do you monitor AWS resources?

Logs and metrics are a powerful tool for gaining insight into the health of your workloads. Youcan configure your system to monitor logs and metrics and send notifications when thresholdsare crossed or significant events occur. Ideally, when low-performance thresholds are crossed orfailures occur, the system has been architected to automatically self-heal or scale in response.

Best practices:• Monitoring the workload in all tiers: Monitor the tiers of the workload with Amazon

CloudWatch or third-party tools. Monitor AWS services with Personal Health Dashboard.

• Notifications are sent based on the monitoring: Organizations that need to know receivenotifications when significant events occur.

• Automated responses are performed for events: Use automation to take action when anevent is detected; for example, to replace failed components.

• Reviews are conducted regularly: Frequently review the monitoring of the system basedon significant events and changes to evaluate the architecture and implementation.

REL 5  How do you implement change?

Uncontrolled changes to your environment make it difficult to predict the effect of a change.Controlled changes to provisioned AWS resources and workloads are necessary to ensure thatthe workloads and the operating environment are running known software and can be patchedor replaced in a predictable manner.

Best practices:• Changes are deployed with automation: Deployments and patching are automated.

Failure ManagementREL 6  How do you back up data?

Back up data, applications, and operating environments (defined as operating systemsconfigured with applications) to meet requirements for mean time to recovery (MTTR) andrecovery point objectives (RPO).

Best practices:• Data is backed up manually: Important data is backed up using Amazon S3, Amazon EBS

snapshots, or third- party software to meet RPO.

• Data is backed up using automated processes: Automate backups using AWS features (forexample, snapshots of Amazon RDS and Amazon EBS, versions on Amazon S3, etc.), AWSMarketplace solutions, or third-party solutions.

• Periodic recovery of the data is done to verify backup integrity and processes: Validatethat your backup process implementation meets Recovery Time Objective and RecoveryPoint Objective through a recovery test.

• Backups are secured and encrypted: See the AWS Security Best Practices whitepaper.

56

Page 60: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

REL 7  How does your system withstand component failures?

If your workloads have a requirement, implicit or explicit, for high availability and low meantime to recovery (MTTR), architect your workloads for resiliency and distribute your workloadsto withstand outages.

Best practices:• Monitoring is done at all layers of the workload to detect failures: Continuously monitor

the health of your system and report degradation as well as complete failure.

• Deployed to multiple Availability Zones; Multiple AWS Regions if required: Distributeworkload load across multiple Availability Zones and AWS Regions (for example, DNS, ELB,Application Load Balancer, API Gateway).

• Has loosely coupled dependencies: Dependencies such as queuing systems, streamingsystems, workflows, and load balancers are loosely coupled.

• Has implemented graceful degradation: When a component’s dependencies areunhealthy, the component itself does not report as unhealthy. It can continue to serverequests in a degraded manner.

• Automated healing implemented on all layers: Use automated capabilities upondetection of failure to perform an action to remediate.

• Notifications are sent upon availability impacting events: Notifications are sent upondetection of any significant events, even if it was automatically healed.

REL 8  How do you test resilience?

Test the resilience of your workload to help you find latent bugs that only surface inproduction. Exercise these tests regularly.

Best practices:• Use a playbook: Have a playbook for failure scenarios that have not been anticipated.

• Inject failures to test: Test failures regularly, ensuring coverage of failure pathways.

• Schedule game days: Use game days to regularly exercise your failure procedures.

• Conduct root cause analysis (RCA): Review system failures based on significant events toevaluate the architecture and identify the root cause. Have a method of communicatingthese causes to others as needed.

57

Page 61: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

REL 9  How do you plan for disaster recovery?

Data recovery (DR) is critical should restoration of data be required from backup methods. Yourdefinition of and execution on the objectives, resources, locations, and functions of this datamust align with RTO and RPO objectives.

Best practices:• Recovery objectives are defined: Recovery time objective (RTO) and recovery point

objective (RPO) are defined.

• Recovery strategy is defined: A disaster recovery (DR) strategy has been defined to meetobjectives.

• Configuration drift is managed: Ensure that AMIs and the system configuration state areup-to-date at the DR site or region, as well as the limits on AWS services.

• Test and validate disaster recovery implementation: Regularly test failover to DR toensure that RTO and RPO are met.

• Recovery is automated: Use AWS or third-party tools to automate system recovery.

Performance Efficiency

Selection

PERF 1  How do you select the best performing architecture?

Often, multiple approaches are required to get optimal performance across a workload.Well-architected systems use multiple solutions and enable different features to improveperformance.

Best practices:• Benchmarking: Load test a known workload on AWS and use that to make the best

selection.

• Load test: Deploy the latest version of your system on AWS using different resource typesand sizes, use monitoring to capture performance metrics, and then make a selectionbased on a calculation of performance and cost.

58

Page 62: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

PERF 2  How do you select your compute solution?

The optimal compute solution for a particular system varies based on application design, usagepatterns, and configuration settings. Architectures may use different compute solutions forvarious components and enable different features to improve performance. Selecting the wrongcompute solution for an architecture can lead to lower performance efficiency.

Best practices:• Consider options: Consider the different options of using instances, containers, and

functions to get the best performance.

• Consider instance configuration options: If you use instances, consider configurationoptions such as family, instance sizes, and features (GPU, I/O, burstable).

• Consider container configuration options: If you use containers, consider configurationoptions such as memory, CPU, and tenancy configuration of the container.

• Consider function configuration options: If you use functions, consider configurationoptions such as memory, runtime, and state.

• Use elasticity: Use elasticity (for example, AWS Auto Scaling, Amazon Elastic ContainerService, and AWS Lambda) to meet changes in demand.

PERF 3  How do you select your storage solution?

The optimal storage solution for a system varies based on the kind of access method (block,file, or object), patterns of access (random or sequential), throughput required, frequency ofaccess (online, offline, archival), frequency of update (WORM, dynamic), and availability anddurability constraints. Well-architected systems use multiple storage solutions and enabledifferent features to improve performance.

Best practices:• Consider characteristics: Consider the different characteristics (for example, shareable, file

size, cache size, access patterns, latency, throughput, persistence of data) you require toselect the services you need to use, such as Amazon S3, Amazon EBS, Amazon Elastic FileSystem (Amazon EFS), and Amazon EC2 instance store.

• Consider configuration options: Considered configuration options such as PIOPS, SSD,magnetic, and Amazon S3 Transfer Acceleration.

• Consider access patterns: Optimize for how you use storage systems based on accesspatterns (for example, striping, key distribution, and partitioning).

59

Page 63: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

PERF 4  How do you select your database solution?

The optimal database solution for a system varies based on requirements for availability,consistency, partition tolerance, latency, durability, scalability, and query capability. Manysystems use different database solutions for various sub-systems and enable different featuresto improve performance. Selecting the wrong database solution and features for a system canlead to lower performance efficiency.

Best practices:• Consider characteristics: Consider the different characteristics (for example, availability,

consistency, partition tolerance, latency, durability, scalability, and query capability) sothat you can select the best performing database approach (for example, relational,NoSQL, warehouse, and in-memory).

• Consider configuration options: Consider configuration options such as storageoptimization, database level settings, memory, and cache.

• Consider access patterns: Optimize how you use database systems based on your accesspatterns (for example, indexes, key distribution, partition, and horizontal scaling).

• Consider other approaches: Considered other approaches to provide queryable data, suchas search indexes, data warehouses, and Big Data.

PERF 5  How do you configure your networking solution?

The optimal network solution for a system varies based on latency, throughput requirements,and so on. Physical constraints such as user or on-premises resources drive location options,which can be offset using edge techniques or resource placement.

Best practices:• Consider location: Consider location options (for example, AWS Region, Availability Zone,

placement groups, and edge locations) to reduce network latency.

• Consider service features: Consider service features to optimize network traffic; forexample, EC2 instance network capability, Enhanced Networking, Amazon EBS-optimizedinstances, Amazon S3 Transfer Acceleration, and dynamic content delivery with AmazonCloudFront.

• Consider networking features: Consider networking features (for example, Amazon Route53 latency-based routing, Amazon VPC endpoints, AWS Direct Connect) to reduce networkdistance or jitter.

60

Page 64: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

Review

PERF 6  How do you evolve your workload to take advantage of new releases?

When architecting solutions, there is a finite set of options that you can choose from. However,over time, new technologies and approaches become available that could improve theperformance of your architecture.

Best practices:• Use process for evaluation: Have a process to evaluate new resource types and sizes. Re-

run performance tests to evaluate any improvements in performance efficiency.

Monitoring

PERF 7  How do you monitor your resources to ensure they are performing as expected?

System performance can degrade over time. Monitor system performance to identify thisdegradation and remediate internal or external factors, such as the operating system orapplication load

Best practices:• Monitor: Use Amazon CloudWatch, third-party, or custom monitoring tools to monitor

performance.

• Alarm-based notifications: Receive an automatic alert from your monitoring systems ifmetrics are out of safe bounds.

• Trigger-based actions: Set alarms that cause automated actions to remediate or escalateissues.

Tradeoffs

PERF 8  How do you use tradeoffs to improve performance?

When architecting solutions, actively considering tradeoffs enables you to select an optimalapproach. Often you can improve performance by trading consistency, durability, and space fortime and latency.

Best practices:• Use services: Use services that improve performance, such as Amazon ElastiCache,

Amazon CloudFront, and AWS Snowball.

• Use patterns: Use patterns to improve performance, such as caching, read replicas,sharding, compression, and buffering.

61

Page 65: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

Cost Optimization

Cost-Effective Resources

COST 1  How do you evaluate cost when you select AWS services?

Amazon EC2, Amazon EBS, and Amazon S3 are building-block AWS services. Managed services,such as Amazon RDS and Amazon DynamoDB, are higher level, or application level, AWSservices. By selecting the appropriate building blocks and managed services, you can optimizeyour architecture for cost. For example, using managed services, you can reduce or removemuch of your administrative and operational overhead, freeing you to work on applications andbusiness-related activities.

Best practices:• Select services for cost reduction: Analysis of services performed and services were

selected to get the lowest price point.

• Optimize for license costs: Use open source software on services such as Linux for EC2workloads, Aurora for Oracle database workloads, and Redshift for data analytics toreduce cost.

• Optimize using serverless and container-based approach: Use AWS Lambda, Amazon S3for static websites, Amazon DynamoDB, and Amazon ECS to reduce overall business cost.

• Optimize using appropriate storage solutions: Use the most cost-effective storagesolution based on usage patterns (for example, Amazon EBS cold storage, Amazon S3Standard-Infrequent Access, and Amazon Glacier).

• Optimize using appropriate databases: Use Amazon RDS (Aurora, PostgreSQL, MySQL,SQL Server, Oracle Database) or Amazon DynamoDB (or other key-value stores, NoSQLalternatives) where appropriate.

• Optimize using other application-level services: Use Amazon SQS, Amazon SNS, andAmazon Simple Email Service (Amazon SES) where appropriate.

COST 2  How do you meet cost targets with resource type and size choices?

Ensure that you choose the appropriate AWS resource size for the task at hand. AWSencourages the use of benchmarking assessments to ensure that the type you choose isoptimized for its workload.

Best practices:• Metrics-driven resource sizing: Use performance metrics to select the right size and type

to optimize for cost. Appropriately provision throughput, sizing, and storage for servicessuch as Amazon EC2, Amazon DynamoDB, Amazon EBS (PIOPS), Amazon RDS, AmazonEMR, networking, and so on.

62

Page 66: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

COST 3  How do you use pricing models to reduce cost?

Use the pricing model that is most appropriate for your workload to minimize expense. Theoptimal deployment could be fully On-Demand Instances, a mix of On-Demand and ReservedInstances, or you might include Spot Instances, where applicable.

Best practices:• Reserved capacity and commit deals: Regularly analyze usage and purchase Reserved

Instances accordingly; for example, Amazon EC2, Amazon DynamoDB, Amazon RDS, andAmazon CloudFront.

• Spot Instances: Use Spot Instances (for example, Spot block or fleet) for select workloadssuch as EC2 Auto scaling, AWS Batch and Amazon EMR.

• Consider region cost: Factor costs into AWS Region selection.

COST 4  How do you plan for data transfer charges?

Ensure that you monitor data transfer charges so that you can make architectural decisionsthat might alleviate some of these costs. For example, if you are a content provider andhave been serving content directly from an S3 bucket to your end users, you might be able tosignificantly reduce your costs if you push your content to the Amazon CloudFront contentdelivery network (CDN). Remember that a small yet effective architectural change candrastically reduce your operational costs.

Best practices:• Optimize: Optimize application design, WAN acceleration, Multi-AZ, and AWS Region

selection to for data transfer.

• Use a content delivery network (CDN): Use a CDN where applicable.

• Use AWS Direct Connect: Analyze the situation and use AWS Direct Connect whereapplicable.

63

Page 67: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

Matching supply and demand

COST 5  How do you match supply of resources with customer demand?

For an architecture that is balanced in terms of spend and performance, ensure that everythingyou pay for is used and avoid significantly underutilizing instances. A skewed utilization metricin either direction has an adverse impact on your business, in either operational costs (degradedperformance due to over-utilization), or wasted AWS expenditures (due to over-provisioning).

Best practices:• Demand-based approach: Use automatic scaling to respond to variable demand.

• Buffer-based approach: Buffer work (for example, using Amazon Kinesis or Amazon SQS)to defer work until there is sufficient capacity to process it.

• Time-based approach: Examples of a time-based approach include following the sun,turning off development and test instances over the weekend, following quarterly orannual schedules (for example, Black Friday).

Expenditure Awareness

COST 6  How do you monitor usage and cost?

Establish policies and procedures to monitor, control, and appropriately assign your costs.Leverage tools provided by AWS for visibility into who is using what, and at what cost. Thisprovides you with a deeper understanding of your business needs and your teams’ operations.

Best practices:• Tag all resources: Tag all resources that can be tagged to be able to correlate changes in

the AWS bill to changes in infrastructure and usage.

• Use billing and cost management tools: Have a standard process to load and interpretthe detailed billing reports or use Cost Explorer. Monitor usage and spend regularly usingAmazon CloudWatch or a third-party provider where applicable (for example, Cloudability,CloudCheckr, and CloudHealth).

• Notifications: Let key members of your team know if your spend moves outside of well-defined limits.

• Business outcome allocation: Use this method to allocate workload costs back to businessoutcomes and revenue; for example, by using tagging.

64

Page 68: AWS Well-Architected Framework · Amazon Web Services AWS Well-Architected Framework The Five Pillars of the Well-Architected Framework Creating a software system is a lot like constructing

Amazon Web Services AWS Well-Architected Framework

COST 7  How do you govern AWS usage?

Establish policies and mechanisms to make sure that appropriate costs are incurred whileobjectives are achieved. By employing a checks-and-balances approach through tagging andIAM controls, you can innovate without overspending.

Best practices:• Establish groups and roles: Use governance mechanisms to control who can spin up

instances and resources in each group; for example, dev, test, and prod groups. This appliesto AWS services and third-party solutions.

• Track project lifecycle: Track, measure, and audit the lifecycle of projects, teams, andenvironments to avoid using and paying for unnecessary resources.

COST 8  How do you decommission resources?

Implement change control and resource management from project inception to end-of-life sothat you can identify necessary process changes or enhancements where appropriate. Workwith AWS Support for recommendations on how to optimize your project for your workload: forexample, when to use AWS Auto Scaling, AWS OpsWorks, AWS Data Pipeline, or the differentAmazon EC2 provisioning approaches, or review AWS Trusted Advisor cost optimizationrecommendations.

Best practices:• Automate decommission: Design your system to gracefully handle resource termination

as you identify and decommission non-critical or not required resources with lowutilization.

• Defined process: Have a process in place to identify and decommission orphanedresources.

Optimizing Over TimeCOST 9  How do you evaluate new services?

As AWS releases new services and features, it is a best practice to review your existingarchitectural decisions to ensure they continue to be the most cost effective.

Best practices:• Scheduled reviews: Meet regularly with an AWS Solutions Architect, consultant, or

account team, and consider which new services or features provide lower overall cost.

• Establish a cost optimization function: Create a team that regularly reviews cost andusage across the business.

• Review and analyze workload: Have a process for reviewing new services, resource types,and sizes. Re-run performance tests to evaluate any reduction in cost.

65