Smart Platform Infrastructure How we are learning to let our team sleep at night James Huston DevOPS Days Charlotte February 2017
Smart Platform Infrastructure
How we are learning to let our team sleep at night
James Huston DevOPS Days Charlotte
February 2017
whoami
• James Huston - Director of Platform Engineering @ Red Ventures
• Over the last 20 years I have been on teams that:
• Tried a lot of things, some worked, some didn’t
• Learned a lot of do’s and don’ts
The Team
Thomas Hopkins Ryan Ruscett
Alfonso Cabrera Garrett JohnsonMike Guthrie
So what do I have to share?• Sleep
• Operations -vs- Platform Ops
• Infrastructure (AWS)
• Monitoring and Alerting
• Security
• Workflows
• Documentation
• Docker
Sleep
• Our jobs are 24/7/365
• Small teams
• Resource bound
• To be successful, We need sleep
Operations -vs- Platform Ops• Deeper knowledge
• Correct -vs- Fast
• Snowflakes?
• Wide breadth of knowledge
• Fast turn around, or self service
• Automate all the things
Platform OpsPlatform enables developers to safely and consistently perform their own operations and build resilient and secure applications.
Infrastructure• Traditional Operations - Healthy Infrastructure
• Linux in your datacenter
• Apps on top of that
• Platform Ops - Healthy Applications
• AWS/Azure/Google
• Managed services
• Apps on top of that
Monitoring and Alerting
• You are likely underestimating its importance
• Integrate them from the beginning, don’t bolt them on.
• Make sure your alerts go to the correct people
• Don’t create alerts that you are going to ignore!
Infrastructure Layout
Staging Production
Our Infrastructure
Infrastructure - Why is it Important
• Take advantage of Autoscaling for scale and auto healing
• Design to be secure from the start
• Design with monitoring and alerting built in
• Build your infrastructure in a standard, documented, reproducible way
Immutable Infrastructure• First line of debugging: remove the machine and let
it get replaced
• Avoid snowflakes/unicorns as much as possible
• Replace for security reasons
• Easy to implement (in the cloud anyhow)
• Salt/Chef/Puppet - use it for initial config, don’t push changes
Program and Automate• Reproduce repeatable infrastructures
• Team review of changes before they are made
• Pull requests
• Easy Rollback
• Shareable and reusable modules
• https://github.com/segmentio/stack
Terraform
• Plays nice with Most of the Things
• Multiple cloud providers, VMware, OpenStack
• Grafana, DataDog, New Relic, PagerDuty, Logentries
• MySQL, PostgreSQL
• Program all the things - Except Snowflakes
Terraform -vs- CloudFormation
• State
• Fast
• Admin Access
• No State
• Not so fast
• AWS Service Catalog
Security - SSO
• Don’t underestimate the power of the dark side OR your need to use Single Sign On (SSO)
• Active Directory, LDAP, Okta for AWS/Apps
• JumpCloud or LDAP for EC2 instances
• Avoid tools that don’t support SSO (GitHub.com) in favor of tools that do (GitHub Enterprise)
Security
• Don’t share SSH keys among your team(s). Ever.
• 0.0.0.0/0 on a security group that is not a public ELB? That’s likely bad.
• eg. future VPN or DirectConnect
Developer Workflows• Automation is key
• Use standard tooling (Makefile, shell scripts, etc)
• Bamboo -vs- Jenkins
• Centralization
• Provide guardrails and let teams with the expertise control their own destiny
• Documentation of workflows is critically important
Documentation
• README.MD - keep docs with your projects
• Centralize infrastructure, CI/CD, and other core docs
• Make it mandatory in governance
• Set a good example!
Docker
Security Info ala Jérôme Petazzoni (https://jpetazzo.github.io/) http://bit.ly/1t1DG3Q
Docker• Don’t run things as root
• Update often!
• For real security, run all filesystems read-only
• Use small (Alpine, Debian) base images
• Use only approved images
• Update them often
• Windows? All of the above.
Docker
• KISS - Keep It Simple Stupid!
Drumroll PleaseThe “Cloud” makes Platform Ops a reality. We can now program and automate “all the things” and we have the tools to make our infrastructure and applications maintain and heal themselves …
And we get to sleep at night
411James Huston
Director of Platform Engineering @ Red Ventures
@hustonjs