Top Banner
New tricks with IBM ChatOps : Achieve Site Reliability Engineering operations with legacy solutions Cloud Service Management and Operations (CSMO) Environment Ops DevOps Site Reliability Engineering Service Management, ITIL, IT4IT, ZeroOutage
31

New tricks with IBM ChatOps : Achieve Site Reliability ...

Feb 16, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: New tricks with IBM ChatOps : Achieve Site Reliability ...

New tricks with IBM ChatOps :

Achieve Site Reliability Engineering operations with

legacy solutions

Cloud Service Management and Operations (CSMO)

Environment Ops

DevOps

Site ReliabilityEngineering

Service Management,

ITIL, IT4IT, ZeroOutage

Page 2: New tricks with IBM ChatOps : Achieve Site Reliability ...

IT Operations and Administrators

• Quickly set up & manage modern, flexible, and compliant hybrid clouds

• Integrate with new & existing management tools and processes

By nature, IT operations and developers have opposite goals

Todd

Operations / Admin

Responsible for infrastructure, security, and management of the environment.

Jane

Enterprise Developer

Responsible for modernizing existing applications and creating new Cloud Native Workloads.

Both development and IT operations need to meet business demands:

Developers

• Rapidly create new applications, optimize existing ones, and securely connect their applications with data and services across all clouds

Page 3: New tricks with IBM ChatOps : Achieve Site Reliability ...

IT Operations and Administrators

• Quickly set up & manage modern, flexible, and compliant hybrid clouds

• Integrate with new & existing management tools and processes

Todd

Operations / Admin

Responsible for infrastructure, security, and management of the environment.

Jane

Enterprise Developer

Responsible for modernizing existing applications and creating new Cloud Native Workloads.

Both development and IT operations need to meet business demands:

Developers

• Rapidly create new applications, optimize existing ones, and securely connect their applications with data and services across all clouds

By nature, IT operations and developers have opposite goals

Page 4: New tricks with IBM ChatOps : Achieve Site Reliability ...

Todd

Operations / Admin

Responsible for infrastructure, security, and management of the environment.

Jane

Enterprise Developer

Responsible for modernizing existing applications and creating new Cloud Native Workloads.

Both development and IT operations need to meet business demands:

Developers

• Rapidly create new applications, optimize existing ones, and securely connect their applications with data and services across all clouds

IT Operations and Administrators

• Quickly set up & manage modern, flexible, and compliant hybrid clouds

• Integrate with new & existing management tools and processes

Page 5: New tricks with IBM ChatOps : Achieve Site Reliability ...

Jane

Enterprise Developer

Responsible for modernizing existing applications and creating new Cloud Native Workloads.

Both development and IT operations need to meet business demands:

Todd

Operations / Admin

Responsible for infrastructure, security, and management of the environment.

Change that can be managed Stability that enables change

Unified, flexible, CSMO toolchain

Page 6: New tricks with IBM ChatOps : Achieve Site Reliability ...

Dedicated Ops tools vs ChatOps

Page 7: New tricks with IBM ChatOps : Achieve Site Reliability ...

Dedicated ITSM tools vs ChatOps

Instant Collaboration between SMEs …• Various Operations roles• Developers• Vendor / Provider

… and between Humans and Applications (ITSM, DevOps, etc.) through Bots

Solving thorough „Swarming“

Persistent audit of communication

Traditional

Help Desk tools

L1

L2

L3, SMEs

Modern

ChatOpstools

Plus• Email• Phone• Bridge Calls• Instant Messaging

(Skype, WhatsApp)• …

+ Physical War rooms

MicrosoftTeams

Page 8: New tricks with IBM ChatOps : Achieve Site Reliability ...

Incident Management Tool Chain

Monitor(Multiple Clouds) Analyze Plan Execute

Monitoring

Notification CollaborationEvent

Management

Runbooks

Dashboards & Reporting

Ticketing & Trending

Logging

Datalake &

Analytics

TracingIncident

Page 9: New tricks with IBM ChatOps : Achieve Site Reliability ...

Incident Management Tool Chain with ChatOps

Monitor(Multiple Clouds) Analyze Plan Execute

Monitoring

Notification CollaborationEvent

Management

Runbooks

Dashboards & Reporting

Ticketing & Trending

Logging

Datalake &

Analytics

TracingIncident

Page 10: New tricks with IBM ChatOps : Achieve Site Reliability ...

What is ChatOps?

Operations

Chat BotsCollaboration

platformPlatform based

Consumer to BusinessBusiness to Business

Dedicated app/website

Consumer to BusinessBusiness to Business

Individual app/website.No sharing or collaboration

Siloed Operations

Non-operationalcollaboration

CHATOPS• Human-human collaboration• Simple automations• Advanced automations

Page 11: New tricks with IBM ChatOps : Achieve Site Reliability ...

IBM Cloud / ChatOps / November, 2018 / © 2018 IBM Corporation

Forrester Reasearch : ChatBots for IT Operations 2019

Page 12: New tricks with IBM ChatOps : Achieve Site Reliability ...

The inherent benefits of ChatOps in modernizing operations, DevOps and SRE

Integrate your tools. Simplify your processes. Automate everything.Higher Transparency & Efficiency Lower Costs Better MTTR and other KPIs

For existing Operations personas:• Operators• Site Reliability Engineers• Level 1-2-3 support

For new personas:• Developers• Managers• Line of Business Owners• Subject Matter Experts

Benefits• Availability of tools in a familiar environment, improved

ramp-up time and reduced friction to adopt new tools.• Reduced dependence on traditional operations

personas, more collaboration.• Use your own tools in the new platform. • Gain the benefits of a dynamic & flexible collaboration

platform in parallel to traditional process-oriented tooling – more transparency.

Benefits• Reduction of “Waste by Motion”

No needless context-switching between tools,easier and closer collaboration between humans.

• Reduction of “Waste by Transport”No copy/paste between tools, faster access to information (swivel chair operations)

• More opportunity to learn from others.

Jane

Enterprise Developer

Todd

Operations / Admin

Page 13: New tricks with IBM ChatOps : Achieve Site Reliability ...

13

Entry points will be different from organization to organization. Some stages may be skipped.

Entry points into ChatOps

Page 14: New tricks with IBM ChatOps : Achieve Site Reliability ...

Ad-hoc human-human collaboration

Examples:

• Communication using phone calls, conference calls, WhatsApp/SMS, etc…

• Physical co-location for war-rooms, etc…

Business Value:

• This is the way we’ve always done it, comfort zone

OperationsSupport level 1,2,3

Subject MatterExperts

Developers

Silo Silo

Before ChatOps

Netcool Operations Insight / Cloud Event

Management

IBMMonitoring

AlertNotification

others…Prometheus

Grafana 3rd

Party Solutions

Page 15: New tricks with IBM ChatOps : Achieve Site Reliability ...

Slack MatterMost Microsoft others…Teams

• Human-to-human collaboration.CSMO tooling remains the same.

• Slight change to processes.• Create dedicated channels for

Sev1 incidents• Document activities in

collaboration tool

• Business Value:• Persistent record of

incident• Remote/Virtual

war-room• Clear communications• Reduce Mean Time to Know

Netcool Operations Insight / Cloud Event

Management

IBMMonitoring

AlertNotification

others…Prometheus

Grafana 3rd

Party Solutions

ChatOps level 1

Page 16: New tricks with IBM ChatOps : Achieve Site Reliability ...

ChatOps level 2a

Slack MatterMost Microsoft others…Teams

Incoming integrations

• Human-to-human collaboration. • Monitoring tools send notifications and

information to collaboration tool.

• Slight change to processes:• Send event information to channel

automatically• Send closing event when incident

is resolved• Send notification of new deployment,

issue, pull request, etc…

• Business Value:• Reduction of

Mean Time To Detect,Mean Time To Inform

Netcool Operations Insight / Cloud Event

Management

IBMMonitoring

AlertNotification

others…Prometheus

Grafana 3rd

Party Solutions

Page 17: New tricks with IBM ChatOps : Achieve Site Reliability ...

ChatOps level 2b

Slack MatterMost Microsoft others…Teams

Outgoing integrations

• Human-to-human collaboration. • Pull data from monitoring tools

into collaboration tool.

• Slight change to processes.Examples:• Query CMDB for information• Query Ticketing system• Query Metrics

• Business Value:• Reduce Mean Time to Identify,

Mean Time to Know

Netcool Operations Insight / Cloud Event

Management

IBMMonitoring

AlertNotification

others…Prometheus

Grafana 3rd

Party Solutions

Page 18: New tricks with IBM ChatOps : Achieve Site Reliability ...

ChatOps level 3

Slack MatterMost Microsoft others…Teams

• Human-to-human collaboration. • Automated interactions with monitoring

tools from within collaboration tools

• Larger process changes• Update ticket/event status• Execute runbooks and view

responses

• Business Value:• Reduce Mean Time to Identify,

Mean Time to Know, Mean Time to Repair

Netcool Operations Insight / Cloud Event

Management

IBMMonitoring

AlertNotification

others…Prometheus

Grafana 3rd

Party Solutions

Page 19: New tricks with IBM ChatOps : Achieve Site Reliability ...

• Human-to-human collaboration. • Bots interact with humans and tools

within the collaboration channels.

• Larger process changes:• Relay conversation into ticketing system• Monitor for key words and send updates• Update Knowledge Base• Improved interactions (Virtual agents)

• Business Value:• Processes are streamlined, manual

toil is replaced by automation.• Addition of security/RBAC layer• Continuous improvement and

learning.• Leveraging ChatOps between processes

ChatOps level 4

Slack MatterMost Microsoft others…Teams

Netcool Operations Insight / Cloud Event

Management

IBMMonitoring

AlertNotification

others…Prometheus

Grafana 3rd

Party Solutions

Page 20: New tricks with IBM ChatOps : Achieve Site Reliability ...

• Human-to-human collaboration. • Interact with monitoring tools from

withincollaboration tools.• Bots interact with humans and tools

within the collaboration channels.

• Cognitive bots (Cognitive virtual agents).• Recommend solutions and/or

participants based on history/Knowledge Base

• Recommend channels wheresimilar discussions took place

• Business Value:• Continuous improvement

and learning.• Easier on-boarding of processes

ChatOps level 5

Slack MatterMost Microsoft others…Teams

Netcool Operations Insight / Cloud Event

Management

IBMMonitoring

AlertNotification

others…Prometheus

Grafana 3rd

Party Solutions

Page 21: New tricks with IBM ChatOps : Achieve Site Reliability ...

1

1

2

3

44

7

5

5

6

6

6

Incident Lifecycle with ChatOps

Demo time!

Page 22: New tricks with IBM ChatOps : Achieve Site Reliability ...

Manage-toIBM Cloud SystemsEdge

Management services

Runbooks DashboardsMonitoring

Topology AIOps DevOps Tickets

Manage-From Environment

NOI

Omnibus ImpactMessageBus

Probe

Architecture : The “old dog”

Page 23: New tricks with IBM ChatOps : Achieve Site Reliability ...

Manage-toIBM Cloud SystemsEdge

Management services

Runbooks DashboardsMonitoring

Topology AIOps DevOps Tickets

Manage-From Environment

NOI

Omnibus ImpactMessageBus

Probe

The first new trick

1. Send Events Slack

Page 24: New tricks with IBM ChatOps : Achieve Site Reliability ...

2. Respond to direct commands

3. Respond to key words

Manage-toIBM Cloud SystemsEdge

Management services

Runbooks DashboardsMonitoring

Topology AIOps DevOps Tickets

Manage-From Environment

NOI

Omnibus ImpactMessageBus

Probe

The 2nd new trick

Slack

Hubot

1. Send Events

Page 25: New tricks with IBM ChatOps : Achieve Site Reliability ...

2. Respond to direct commands

3. Respond to key words

Manage-toIBM Cloud SystemsEdge

Management services

Runbooks DashboardsMonitoring

Topology AIOps DevOps Tickets

Manage-From Environment

NOI

Omnibus ImpactMessageBus

Probe

A 3rd new trick

Slack

HubotCloud Functions

4. Respond to buttons and dialogs

1. Send Events

Page 26: New tricks with IBM ChatOps : Achieve Site Reliability ...

2. Respond to direct commands

3. Respond to key words

Manage-toIBM Cloud SystemsEdge

Management services

Runbooks DashboardsMonitoring

Topology AIOps DevOps Tickets

Manage-From Environment

NOI

Omnibus ImpactMessageBus

Probe

Trick #4

1. Send Events

HubotCloud Functions

4. Respond to buttons and dialogs

5. Update events

Slack

Page 27: New tricks with IBM ChatOps : Achieve Site Reliability ...

2. Respond to direct commands

3. Respond to key words

Manage-toIBM Cloud SystemsEdge

Management services

Runbooks DashboardsMonitoring

Topology AIOps DevOps Tickets

Manage-From Environment

NOI

Omnibus ImpactMessageBus

Probe

What a good dog!

HubotCloud Functions

4. Respond to buttons and dialogs

5. Update events

6. Execute commands

1. Send Events Slack

Page 28: New tricks with IBM ChatOps : Achieve Site Reliability ...

2. Respond to direct commands

3. Respond to key words

Manage-toIBM Cloud SystemsEdge

Management services

Runbooks DashboardsMonitoring

Topology AIOps DevOps Tickets

Manage-From Environment

NOI

Omnibus ImpactMessageBus

Probe

Many good dogs!

HubotCloud Functions

4. Respond to buttons and dialogs

5. Update events

6. Execute commands

1. Send Events Slack

Mattermost

MSTeams

Page 29: New tricks with IBM ChatOps : Achieve Site Reliability ...

Previous Process: 15 minutes

Identify system-

related or server-

related acronyms

Current process: 20 seconds

97.8% reduction in operational effort

Contact

Change and

Release

Management

Extract report or

consult Maximo

databases

Identify possible

offensive changes

Identify the

affected business

service

Interact with the

robot by command

Transfer information

to Incident

Management

Real Case

ChatOps level 2b –Pulling information into collaboration channel

IBM & Customer confidential

Page 30: New tricks with IBM ChatOps : Achieve Site Reliability ...

Collaboration commands Cloud Services

Grafana

Netcool

App commandsSQL

PowerShellnode-omnibus

HTTPS (REST)

HTTPS

HTTPS

INTERNET

DMZ Webhooks

CEMEXNET

Customer and IBM Confidential

ChatOps level 4 –Processes starting to change, Bots and automation leading to much higher velocity and transparancy

Page 31: New tricks with IBM ChatOps : Achieve Site Reliability ...

Further reading and questions

Existing lab material:http://ibm.biz/csmo-chatops-lab

Reach out to me directly for material that’s in [email protected] / @flyingbarron

ChatOps and the Moon Landinghttp://ibm.biz/csmo-apollo-chatops