Top Banner
ChatOps at Shopify
53

ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

Jun 14, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

ChatOps at Shopify

Page 2: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven
Page 3: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

Inviting Bots in our Day-to-Day OperationsYour

Page 4: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

Production Engineering

Page 5: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

1200+ $40b

Rails 40+

500k+ 10k

Page 6: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

ChatOps

Page 7: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

Conversation Driven Development

Page 8: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

Conversation Driven Development

How do you choose a Chat Bot?

Page 9: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

ChatOps at Shopify

Page 10: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

# defining a new commandcommand :find_answer, 'answer', help: 'the answer to life, universe, and everything'

def find_answer reply(42)end

$ spy find_answer

42

$ spy help find_answer

the answer to life, universe, and everything

Adding Commands

Page 11: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

Infrastructure

Page 12: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

Region

Host

Web Server

Load Balancers

Host

Job Server

Host

Web Server

Hosts

Web ServersHost

Job Server

Hosts

Job Servers

HostDB Standby

The Internet

HostDB Reader

Load Balancers

HostDB Writer

Edge Router Edge Router

Region

Host

Web Server

Load Balancers

Host

Job Server

Host

Web Server

Hosts

Web ServersHost

Job Server

Hosts

Job Servers

HostDB StandbyHostDB Reader

Load Balancers

HostDB Writer

Edge Router Edge Router

A Global Scale Resilient Web App

CDN

Page 13: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

● spy cdn show traffic

● spy cdn backend [region]

Page 14: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

● spy nginx status

● spy profile nginx lua cpu

Page 15: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

● spy revisions

● spy shops

● spy profile shopify

Page 16: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

● spy resque [dc=x]

● spy job working [dc=x]

Page 17: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

● spy shards

● spy shard load

Page 18: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

Failovers

● spy failover shopify pod :pod to :location

Page 19: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

A pod is an isolated set of shops.

Page 20: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

Region

Host

Web Server

Load Balancers

Host

Web Server Shared Workers

Pod 2

Pods

Load Balancers

Pod N

Pod 5 Pod 9Redis

Pod N

Shared WorkersDedicated WorkersDedicated Workers

Memcache MySQL

Dedicated Workers

ActivePassive

Page 21: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

Pods

● spy pods

Page 22: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

Failovers● spy failover shopify pod :ids to :location

Page 23: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

Failovers: User Authentication

Page 24: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven
Page 25: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven
Page 26: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

And More...

● spy chef environment :environment :server● spy newrelic :app● spy datadog :metric

Page 27: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

Incident Management

Page 28: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven
Page 29: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

The Incident Manager On Call (IMOC)’s role is to lead the incident response.

Page 30: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

➔ Shit breaks➔ Detection➔ Start Incident➔ Communicate➔ Fix➔ Stop Incident➔ Document (Service Disruption)➔ Investigation➔ Root Cause Analysis (RCA)➔ Action Items➔ Resolution

Incident Response

Page 31: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

• spy page

• spy incident

• spy status

Page 32: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

Shit Breaks

➔ spy page imoc “order notifications not going out”

Page 33: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

Start Incident

➔ spy incident start me order fraud analysis outage

Page 34: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

Communicate

➔ spy incident tldr

Page 35: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

Communicate with other teams➔ spy incident tell :team message ➔ spy page datastores

Page 36: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

Actions

Page 37: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

Third Party Services➔ spy status➔ spy status :provider :status for :feature➔ spy pager imoc res 123

Page 38: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

Reminders when: [30, stop] command: :check_status_page- when: 120 command: :notify_support_atc message: 'Spy has notified the Support Response Manager (SRM) on your behalf.'- when: 120 command: :srm_fill_out_doc- when: 300 message: 'You should coordinate external comms with the support incident responder.’- when: 600 command: :srm_checking_in- when: [3600] command: :notify_imoc_team- when: stop message: 'Please create a Service Disruptions report.

Page 39: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

Stop Incident➔ spy incident stop

Page 40: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

Generating the SD report➔ spy incident note &

Page 41: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

spy helps us to reduce the impact and duration of incidents.

Page 42: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

Developer Onboarding

Page 43: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

Learn commands by seeing others execute them.

Page 44: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

Hit The Ground Running

● spy github add user :user :team● spy circle add my_new_shiny_project● spy buildkite add my_new_shiny_repo● spy shipit lock :stack *message

Page 45: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

Deploying Code

Page 46: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

Resiliency.What if Slack is down?

Page 47: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

Benefits & Lessons Learned

Page 48: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

● Increased sharing and focus● Shortened feedback loop● Eliminated manual toil● Smoother incident handling● Faster onboarding experienceBut, we have also learned ...

Page 49: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

Summary

Infrastructure

Incident Management

Developer Onboarding

Page 50: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven
Page 51: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

Thanks!@niyodanie

Page 52: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

Another Shopify Talk

Page 53: ChatOps at Shopify - USENIX · ChatOps at Shopify. Inviting Bots in our Day-to-Day OperationsYour. Production Engineering. 1200+ $40b Rails 40+ 500k+ 10k. ChatOps. Conversation Driven

Questions?@niyodanie