打造企業內的 SRE 透過 Istio. 透過 Istio...Istio in 2 minutes Gallery Service A Service B proxy proxy Control Plane API on K8S API Server Citadel Logging plugin oring plugin

透過 Istio 打造企業內的 SRE

Hybrid Specialist: Shawn Ho

shawnho@google.com

1What is SRE?

Product Lifecycle

Concept Business Development Operations Market

Agile solves this

DevOps solves this

DevelopersAgility

OperatorsStability

Dev & Ops’ KPIs aren't Aligned

What is relationship between Devops and SRE ?

● Devops is more like abstract concept,guide line and disciplines to break silos in developments, operation

● SRE is Google version of realized practice of Devops.

“Class SRE implements Devops”

Self-Service Platform

Monitoring Automation

Developers

Class SRE = REAL PERSON

#1. Decision based on data所有的決定是以資料為基礎

#2. Be user centric即使所有的監控數據都是正常的，

但客戶只要覺得系統不穩定，那系統就是不穩定

#3. Blameless culture & Share responsibility降低部門隔閡要由跨部門的責任分享開始 (Developers, Operators, Leader) 系統

系統失效不僅是維運者的責任，程式碼品質，技術債等都是可能的原因

2How to Implement SRE by Istio/Anthos?

Istio in 2 minutes

Gallery

Service A Service B

proxy proxy

Control Plane API on K8S API Server

Citadel

itorin

HTTP, gRPC, TCP

Routing +

Secure Naming

Ingress Gateway Egress Gateway

mTLSmTLS mTLS

JWT + TLS

Cert issuance

Perimeter security policies

Istio Control Plane

Policy Enforcement + Reporting

Data flow

Control + metrics flow

Local AuthzJWT + TLS

Internal App 1

External App 1

What does SRE implement on Platform?

Metrics & monitoring

Capacity planning

Emergency response

Change management

Culture

● SLO● Dashboard● Analytics

● Forecasting● Demand-driven● Performance

● Release process● Consulting design● Automations

● Oncall● Incident analysis● Postmortems

● Toil management● Blamelessness● Share responsibility

Capacity planning

Emergency response

Change management

Culture

Monitoring and Incident Management

Understand system architecture

Understand system architecture and deployed topology

System monitoring

Monitoring system by gathering blackbox & whitebox metrics

SLI & SLO are extracted from the matrix and logs.

The informations are visualized thru dashboard

Log handling

Managing planned event (release, maintenance)

Incident handling

Create incident ticketRollback change to resolve incident

Investigate root cause with logging,monitoring matrix and debugging.

Postmortem

Retrospect incident and prepare plan to prevent reoccurence

What to Monitor?

SLO = SLI + Target“99% of REST API call will complete in less than 100ms every week”

SLI Target

SLIservice level indicator: a well-defined measure of 'good enough'

• used to specify SLO/SLA

SLOservice level objective: a top-line target for fraction of good interactions

• specifies goals (SLI + Target)

SLAservice level agreement: consequences

• SLA = (SLO + margin) + consequences = SLI + Target + consequences

Error BudgetProduct management & SRE define an availability target.

• 100% - availability targetis a “budget of unreliability”(or the error budget).

Availability SLO

Allowed unavailability window Error Budget

per year per quarter per 30 days Error rate 1%

90% 36.5 days 9 days 3 days 90

95% 18.25 days 4.5 days 1.5 days 80

99% 3.65 days 21.6 hours 7.2 hours 0

99.5% 1.83 days 10.8 hours 3.6 hours -100

99.9% 8.76 hours 2.16 hours 43.2 minutes -900

99.95% 4.38 hours 1.08 hours 21.6 minutes -1900

99.99% 52.6 minutes 12.96 minutes 4.32 minutes -9900

99.999% 5.26 minutes 1.30 minutes 25.9 seconds -99900

Error Budget (Availability)

Demo with Anthos:Monitoring+Incident Mgmt

● Topology

● SLO/SLI Metrics

● Blackbox/Whitebox

● Log Viewer

● Tracing/Tracing Report

Demo with Anthos:Monitoring+Incident Mgmt

Topology Blackbox Whitebox

Demo with Anthos:Monitoring+Incident MgmtLogging Tracing

Error Budget Burn Down Rate

Demo with Anthos:Proactive Reduce Error Budget

● Alert Setting

● Canary Deployment

● Cross-Region Deployment

ClientsKubernetes ClusterKubernetes Engine

Taiwan-1

Kubernetes ClusterKubernetes Engine

Singapore

Cloud LoadBalancing

● Alert Setting

● Canary Deployment

● Cross-Region Deployment

ClientsKubernetes ClusterKubernetes Engine

Taiwan-1

Singapore

Cloud LoadBalancing

Demo with Anthos:Proactive Reduce Error Budget

Capacity planning

Emergency response

Change management

Culture

Capacity planning

Plan for organic growth

Increased product adoption and usage by customers.

Determine inorganic growth

Sudden jumps in demand due to feature launches, marketing campaigns, etc.

Change ManagementRoughly 70%1 of outages are due to changes in a live system

Kubernetes Configuration Service Continuous Deployment

Clients

Multiple Instances

Cloud SourceRepositories

OnPremise

On-Prem1

Anthos HubService

Demo with Anthos:The Power of GitOps

Summary + Call for Action● SRE has 3 key principles:

○ Decision Based on Data (有意義的監控）

○ Be User Centric（黑箱測試）

○ Blameless Culture & Share Responsibility （分擔責任，共同努力）

● Kubernetes is a perfect platform to implement SRE○ SLI + SLO + Error Budget ○ Watch for the Budget Burn Rate○ Establish CI+CD with GitOps

● Pick a System and Build your SRE Practices

Cover images used with permission. These books can be found on shop.oreilly.com.

打造企業內的 SRE 透過 Istio. 透過 Istio...Istio in 2 minutes Gallery Service A Service B proxy proxy Control Plane API on K8S API Server Citadel Logging plugin oring plugin

Documents

Securing Microservices with Istio

CAD - سریع آسان Softwares/Dialux 1 SariAsan.pdf ·....

Snakes in a plugin - WordPress plugin security

FAMILY ISTIOPHORIDAE ISTIO Makaira ISTIO Istio ...forage...

台南 ‧ 小吃 ‧ 呷透透

Software Engineering for Android – Security and User...

Service Mesh The Salesforce - Istio

Easyazon plugin - Easyazon plugin review, Easyazon plugin...

Istio Service Mesh Developing Microservices with...HTTP1.1,....

Istio...Modeling the Service Mesh etcd Kubernetes Consul...

Facebook Regisrtration Plugin (Plugin)

Istio / Service Mesh - events.redhat.com · Istio and...

DataManager Plugin - d36j349d8rqm96.cloudfront.net Plugin -....

Makaira ISTIO Mak Genu s - fao.org

xiaorui.ccxiaorui.cc/static/service_mesh.pdf · httpl.l,...

Heimer Martínez - WordCamp Bogotá 2018Wp plugin wp plugin....