Page 1
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Development Workflows
with Docker and Amazon ECS Jon Todd, Chief Architect, Okta
Tim Secor, Manager of Developer Productivity, Okta
Danielle Greshock, Manager, Solutions Architecture, AWS
CON302
December 1, 2016
Page 2
What to Expect from the Session
• Review the CI/CD Pipeline
• How would you use containers with CI/CD?
• Okta Engineering: How they work and ship code
• CI with Docker and ECS
Page 3
The Continuous Everything… Nirvana
Goal Design Develop Deploy TestRun and
monitor
Continuous integration
Continuous delivery
Continuous deployment
Continuous feedback
Page 4
Virtual machine Container
Page 5
Why Use Containers for Continuous Delivery?
• Roll out features as quickly as possible
• Predictable and reproducible environment
• They are immutable! They will run the same in every
environment
• Fast feedback
Page 6
The Lifecycle:
Stage 1 – Source
Page 7
Docker and Docker Toolbox
• Docker (Linux > 3.10)
• Docker Toolbox or Docker Beta (OS X, Windows)
• Define app environment with Dockerfile
Page 8
Dockerfile
FROM ruby:2.2.2
RUN apt-get update -qq && apt-get install -y build-
essential libpq-dev
RUN mkdir -p /opt/web
WORKDIR /tmp
ADD Gemfile /tmp/
ADD Gemfile.lock /tmp/
RUN bundle install
ADD . /opt/web
WORKDIR /opt/web
Page 9
Docker Compose
Define and run multi-container applications:
1. Define app environment with Dockerfile
2. Define services that make up your app in docker-
compose.yml
3. Run docker-compose up to start and run entire app
Page 10
The Lifecycle:
Stage 2 – Build
Page 11
Containers as Build Execution Environment
Page 12
Containers as Build Artifacts
Page 13
Amazon EC2 Container Registry
• Security
• IAM resource-based policies
• CloudTrail audit logs
• Images encrypted at transit and at rest
• Easily manage & deploy images
• Tight integration with ECS
• Integration with Docker toolset
• AWS Management Console & AWS CLI
• Reliability & performance
• S3-backed
Page 14
The Lifecycle:
Stage 3 – Test
Page 15
Running Tests Inside a Container
Usual Docker commands available within your test
environment
Run the container with the commands necessary to
execute your tests, e.g.:
docker run web bundle exec rake test
Page 16
Running Tests Against a Container
Start a container running in detached mode with an
exposed port serving your app
Run browser tests or other black box tests against the
container, e.g., headless browser tests
Page 17
The Lifecycle:
Stage 4 – Deploy
Page 18
Amazon EC2 Container Service
• Highly scalable container management service
• Easily manage clusters for any scale
• Flexible container placement
• Integrated with other AWS services
• Extensible
• ECS concepts
• Cluster and container instances
• Task definition and task
Page 19
AWS Elastic Beanstalk
• Deploy and manage applications without worrying about
the infrastructure
• Elastic Beanstalk manages your database, Elastic Load
Balancing, ECS cluster, monitoring, and logging
• Docker support
• Single container (on EC2)
• Multi container (on ECS)
Page 20
Amazon ECS CLI
• Easily create ECS clusters & supporting resources
such as EC2 instances
• Run Docker Compose configuration files on ECS
• Available today – http://amzn.to/1jBf45a
Page 21
Continuous Delivery
Workflows
Page 22
Continuous Delivery To ECS with Jenkins
4. Push image to
Docker registry
2. Build image from
sources 3. Run test on image
1. Code push
triggers build
5. Update service
6. Pull image
Page 23
Continuous Delivery To ECS with Jenkins
Easy deployment
Developers – Merge into master, done!
Jenkins build steps
Trigger via webhooks, monitoring, Lambda
Build Docker image via Build and Publish plugin
Push Docker image into registry
Register updated job with ECS API
Page 24
Continuous Delivery To ECS with CodePipeline
1. Code push
triggers pipeline
2. Lambda function
creates EC2 instance
3. Image is built and
pushed to ECR
4. Lambda function
terminates EC2 instance
5. Lambda function
deploy new task
revision to ECS
Page 25
Continuous Delivery To ECS with CodePipeline
• Lambda custom actions
• Create and terminate EC2 instance
• Update ECS service
• EC2 instance uses user data to build an image and push
it to ECR
Page 26
Continuous Delivery To ECS with Shippable
Page 28
Millions of People Use Okta Every DayMillions of People Use Okta Every Day
Page 29
An identity platform for developers
1. Connect to any data source
Page 30
© Okta and/or its affiliates. All rights reserved.
An identity platform for developers
2. Customizable login w/ MFA
Page 31
© Okta and/or its affiliates. All rights reserved.
An identity platform for developers
3. Support all application types w/
modern identity standards
Page 32
© Okta and/or its affiliates. All rights reserved.
An identity platform for developers
Learn more at: developer.okta.com
Page 33
The case for ECS & Docker
Page 34
The problem
Inspired by: http://dev2ops.org/2010/02/what-is-devops/
Dev OpsWall of turmoil
Dev Ops
I want stabilityI want change
Domain boundary
Container frameworks
Cluster schedulerDev Ops
Continuous integration
Page 35
© Okta and/or its affiliates. All rights reserved. Okta Confidential
Options
Container frameworks Cluster schedulers
Amazon ECSLXC
Page 36
Okta’s CI with ECS
Page 38
Okta Engineering—How Do We Work, How Do
We Ship Our Code?
• 200 engineers, split into teams with embedded
specialists
• 1 week sprints, and deploy to production weekly
• Capability to do more than one hotfix per day at
customers’ request or for bugs found in CI or pre-prod
• Every merge to master is a potential release candidate
Page 39
Okta Engineering—How Do We Test Our
Code?
• Every topic branch goes through the same amount of
vigor in testing as release candidates.
• Passing automated tests is enforced at commit time.
• Largest repo: 33K tests, takes 60 minutes (22 parallel
runs)
• Smallest repo: 100 tests, 5 minutes
• The Developer Productivity team is responsible for
supporting engineering.
Page 40
Challenge of Developer Productivity Team
• Developer experience
• Quality
• Cost
• Cloud first
Page 41
Challenge of Developer Productivity Team
• Developer experience
• Quality
• Cost
• Cloud first
Developers expect fast turn-
around time and reliable results
Page 42
Challenge of Developer Productivity Team
• Developer experience
• Quality
• Cost
• Cloud first
We need to run all the tests
required to guarantee quality
Page 43
Challenge of Developer Productivity Team
• Developer experience
• Quality
• Cost
• Cloud first
We need to run an
infrastructure which is as cost-
effective as possible
Page 44
Challenge of Developer Productivity Team
• Developer experience
• Quality
• Cost
• Cloud first
We aim to use cloud services
first, wherever possible
Page 46
CI Using Open Source, Monolithic Applications
Page 48
Vision
• Clean testing environments
• Dynamic worker scaling
• Spot Instances for cost
• Versioned testing
• Improved queuing system
• Less infrastructure flakiness
• The correct privileges, to
maintain security
Page 49
Vision
• Clean testing
environment
• Dynamic worker scaling
• Spot Instances for cost
• Versioned testing
• Improved queuing system
• Less infrastructure flakiness
• The correct privileges, to
maintain security
Isolate test environments from
others, parallel and serial runs
Page 50
Vision
• Clean testing environments
• Dynamic worker scaling
• Spot Instances for cost
• Versioned testing
• Improved queuing system
• Less infrastructure flakiness
• The correct privileges, to
maintain security
Workers should survive the
loss of their build server
Worker pool should scale
quickly
Number of workers should not
affect memory footprint of build
server
Page 51
Vision
• Clean testing environment
• Dynamic worker scaling
• Spot Instances for cost
• Versioned testing
• Improved queuing system
• Less infrastructure flakiness
• The correct privileges, to
maintain security
Run our services for cheaper
rates, as we have many short
lived tasks, and could certainly
handle a few failures
Page 52
Vision
• Clean testing environment
• Dynamic worker scaling
• Spot Instances for cost
• Versioned Testing
• Improved queuing system
• Less infrastructure flakiness
• The correct privileges, to
maintain security
Enable testing of infrastructure
changes in topic branches
Page 53
Vision
• Clean testing environment
• Dynamic worker scaling
• Spot Instances for cost
• Versioned testing
• Improved queuing system
• Less infrastructure flakiness
• The correct privileges, to
maintain security
Should survive build server
reboots
Shouldn’t be tied to specific
workers or build servers
Centralized
Should have good visibility
Re-queuing of lost tasks
Page 54
Vision
• Clean testing environment
• Dynamic worker scaling
• Spot Instances for cost
• Versioned testing
• Improved queuing system
• Less infrastructure
flakiness
• The correct privileges, to
maintain security
Push testing and creation of
test machines to developers
Page 55
Vision
• Clean testing environment
• Dynamic worker scaling
• Spot Instances for cost
• Versioned testing
• Improved queuing system
• Less infrastructure flakiness
• The correct privileges, to
maintain security
Launch tasks in secure
environments
Page 58
ECS and Docker
• AWS + Java app tailored to Okta process
• Immutable and disposable build workers—created for
one-time use, destroyed when job is done
• Near ZERO cost on weekends, scales with load
• ECS allows us to maximize usage of EC2 instances
• Same containers for multiple types and numbers of
builds
• Same AMI can run multiple Docker images
Page 59
Amazon ECS
IAM separation per service
• Either service per cluster or use new IAM for ECS functionality
Sharing the docker daemon to allow running Docker within
Docker
Pre-fetching large data blobs and making them available
on the hosts is an option
Multiple containers: mysql, redis, kinesilite
Page 60
Docker Update
• Update Dockerfile and our CI system builds the new image,
uploading it to our repository
• Update task definition for cluster updates
Page 61
Docker Conventions
• Dockerfiles live with project code, versioned together
• docker-compose used for development, so a clone plus
build will have a full service running locally
• Single repo for library and third-party service definitions
• Secrets or any form of config NEVER baked in
containers
• Start from minimal, audited base OS
• Strict rules around “FROM” clause
• Build owns creating immutable version and publishing
Page 62
Docker Build Process
Page 63
Task Definitions
{
"taskDefinitionArn": "arn:aws:ecs:us-east-1:262205085595:task-definition/base-container-box-task:1",
"containerDefinitions": [
{
"memory": 15000,
"essential": true,
"mountPoints": [
{
"containerPath": "/usr/bin/docker",
"sourceVolume": "docker_daemon",
"readOnly": null
},
{
"containerPath": "/var/run/docker.sock",
"sourceVolume": "docker_socket",
"readOnly": null
}
Page 64
Task Definitions
],
}
],
"volumes": [
{
"host": {
"sourcePath": "/var/run/docker.sock"
},
"name": "docker_socket"
},
{
"host": {
"sourcePath": "/usr/bin/docker"
},
"name": "docker_daemon"
}
],
"family": "base-container-box-task”
Page 65
Clean Testing Environments
• Docker images
• Nearly instant machine refresh
• Easy for users to create and upload images that have
been tested to work locally
• Efficient machine use
• ECS with ECR and private repository back end
Page 66
Dynamic Worker Scaling
SQS LambdaSNS
Lambda
Scaling
Bin packing
ECS
Page 67
Dynamic Worker Scaling
Lambda allocates jobs using bin packing
This is one of the changes we had to make in order to use
ECS for long running tasks, rather than services spread
across many stateless instances
Disconnects unneeded nodes from cluster, allowing
themselves to self-terminate when they are idle
VS
Page 68
Dynamic Worker Scaling
Lambda allocates jobs using bin packing
This is one of the changes we had to make in order to use
ECS for long running tasks, rather than services spread
across many stateless instances
Disconnects unneeded nodes from cluster, allowing
themselves to self-terminate when they are idle
VS
Page 69
Dynamic Worker Scaling
Lambda allocates jobs using bin packing
This is one of the changes we had to make in order to use
ECS for long running tasks, rather than services spread
across many stateless instances
Disconnects unneeded nodes from cluster, allowing
themselves to self-terminate when they are idle
VS
Page 70
Dynamic Worker Scaling
Lambda allocates jobs using bin packing
This is one of the changes we had to make in order to use
ECS for long running tasks, rather than services spread
across many stateless instances
Disconnects unneeded nodes from cluster, allowing
themselves to self-terminate when they are idle
VS
Page 71
Dynamic Worker Scaling`
Lambda allocates jobs using bin packing
This is one of the changes we had to make in order to use
ECS for long running tasks, rather than services spread
across many stateless instances
Disconnects unneeded nodes from cluster, allowing
themselves to self-terminate when they are idle
VS
Page 72
Dynamic Worker Scaling
Page 73
Spot Instances
• We use Spot Instances across all Availability Zones
• Manually switch between On-Demand and Spot
Instances 3 times per week during Spot price spikes
• We are planning on moving to Spot Fleet soon
• Set pricing to On-Demand prices, we lose build slaves
whenever pricing goes above On-Demand prices
• 4000-6000 instance hours per day, about 1500 Spot
losses per week
Page 77
Versioned Jobs
Scripts checked into repositories Makes a transition to Docker jobs
easy
Page 78
Versioned Jobs with ECS
• Versioned build and test scripts can now be run in
versioned Docker containers, using versioned task
definitions
• Creates extreme flexibility
• CloudFormation allows us to stand up whole new
clusters with all different versions in a matter of minutes
for long term testing
Page 79
ECS + Docker Problems
• Docker containers not launching
• ECS agent failing
• Docker containers stopping
• Incompatibility with certain services
• Docker OS availability
• Cleanup - AWS has made this configurable
• Image size
Page 80
Amazon Web Services
EC2
SQS
LambdaECS S3
RDS
Amazon
KinesisSpot Instances
ECR
CloudFormation
SNS
CloudWatch
CloudTrail
Page 81
Building CI with Amazon Web Services
Page 83
Expand Use
• Use ECS for more services
• Allow developers to control their test suites and Docker
images more directly
• Developer environments
• Use Docker for local long running services
• Use a VM running the same version OS
• Remote updates to keep it in line with CD system
• Aim to enable running CD containers right out of the box
Page 84
ECS Services In Production
Page 85
© Okta and/or its affiliates. All rights reserved.
Requirements
• Support for our multi-AZ & multi-region architecture
• Compliance – SOC2 type 2, HIPAA, ISO 27001, FedRAMP
• Least-privilege principle - independent IAM roles per service
• Host to host encryption
• Deployment support for:
• Rollback
• Canary
• Blue-green
• 0-downtime deployments
Page 87
0-Downtime Testing
https://github.com/jontodd/aries
Page 88
© Okta and/or its affiliates. All rights reserved. Okta Confidential
Test Assumptions
• ECS config• Agent version 1.11.0
• Docker version 1.11.2
• Cluster config• 8 instances backed by ASG
• ASG config• 8 instances across 3 AZs• Default termination policy
• 5 min health check grace period
• ELB• Timeout 4s• Interval 5s
• Unhealthy threshold 2• Healthy threshold 10
• Enable connection draining 300s timeout
• Load generation
• 16 threads
• Throughput
• Interactive ➔ 490 r/s
• 10s long poll ➔ 1.5 r/s
Page 89
© Okta and/or its affiliates. All rights reserved. Okta Confidential 89
Operation Interactive Errors
(~70ms latency, 490rps)
Long Poll Errors
(~10s latency, 1.5rps)
Upsize ECS service 4 → 8 0 0
Downsize ECS service 8 → 4 0 0
Deploy ECS service – 50% min healthy 0 0
Stop task* 0 0
Downsize Auto Scaling group 0 0
Terminate EC2 instance 0 0
Stop Docker daemon (service docker stop)* 0 0
Stop EC2 instance** 0 0
Kill Docker container (docker kill <containerId>)* 2 2
Fail health check 450 5
* No intention of running operation in practice ** Caused inconsistent state
Page 90
Workflow
Auto Scaling group
Launch config
EC2
ECS cluster
ECS
serviceECS canary
serviceApplication YAML
Docker Registry
(Artifactory)
ELB
Images pulled
when tasks start
Conductor
(Bastion ECS controller)
CI Pipeline
Git repo
Promoted artifactsDockerfile
docker_compose.yml
Test / Preview / ProductionDev
Deploy new version
Page 91
© Okta and/or its affiliates. All rights reserved. Okta Confidential
Application definition
• Developers define YAML for
their application
• Deploy time configuration is
supplied to the ECS task
definition
• Secrets are pulled by the
application at startup
Page 93
© Okta and/or its affiliates. All rights reserved.
Feature requests
• Dynamic port mapping (Application load balancing)
• Service autoscaling
• Per container IAM roles
• Per-container security groups
• Bin-packing scheduler
Page 94
© Okta and/or its affiliates. All rights reserved.
Lessons learned
• /etc/ecs/ecs.config• ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION for forensics (default 1hr)
• ECS_LOGLEVEL=debug
• Tune ELB health check
• Docker 1.10 for security enhancements
• Canary & blue/green separate service attached to same ELB
• ECS is incredibly easy to get up and running
• The ecosystem is changing quickly
Page 95
Thank you!
Jon Todd – @JonToddDotCom
Tim Secor - @TimSecor
Danielle Greshock – [email protected]
Page 96
Remember to complete
your evaluations!