Top Banner
Story of our own Monitoring Agent in golang @dxhuy LINE corp
50

GOCON Autumn (Story of our own Monitoring Agent in golang)

Jan 21, 2018

Download

Software

Huy Do
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: GOCON Autumn (Story of our own Monitoring Agent in golang)

Story of our own Monitoring Agent

in golang@dxhuy

LINE corp

Page 2: GOCON Autumn (Story of our own Monitoring Agent in golang)

Introduction

• @dxhuy • Vietnamese • Building monitoring stack at LINE

Page 3: GOCON Autumn (Story of our own Monitoring Agent in golang)

My goal today• Join GoConference without lottery

Page 4: GOCON Autumn (Story of our own Monitoring Agent in golang)

My goal today• Show that this is not 100% true

Page 5: GOCON Autumn (Story of our own Monitoring Agent in golang)

Today takeaway

→Anatomy of monitoring agent →How to design one →Challenges and learn

Page 6: GOCON Autumn (Story of our own Monitoring Agent in golang)

Monitoring Agent !?

Page 7: GOCON Autumn (Story of our own Monitoring Agent in golang)
Page 8: GOCON Autumn (Story of our own Monitoring Agent in golang)
Page 9: GOCON Autumn (Story of our own Monitoring Agent in golang)

• Small application run on host machine • Collect host machine metrics

• Request latency? • MySQL load? • Redis hit/miss rate? • .....

• Aggregate metrics (sum/avg/histogram..) • Send to collector server → alert / chart ...

• statsd / collectd / telegraf...

Page 10: GOCON Autumn (Story of our own Monitoring Agent in golang)
Page 11: GOCON Autumn (Story of our own Monitoring Agent in golang)

Not a generic log transfer

Page 12: GOCON Autumn (Story of our own Monitoring Agent in golang)

Why not reuse existing technology?

• Scale problem • We need to write our own stack

• Various environment problem • Management problem • Development velocity problem

Page 13: GOCON Autumn (Story of our own Monitoring Agent in golang)

Let's start write our own

Page 14: GOCON Autumn (Story of our own Monitoring Agent in golang)

Language

Page 15: GOCON Autumn (Story of our own Monitoring Agent in golang)
Page 16: GOCON Autumn (Story of our own Monitoring Agent in golang)
Page 17: GOCON Autumn (Story of our own Monitoring Agent in golang)

Features

Page 18: GOCON Autumn (Story of our own Monitoring Agent in golang)

• Modularity (for user)

• Buffer (prevent data loss)

• Management friendly (for admin)

Page 19: GOCON Autumn (Story of our own Monitoring Agent in golang)

Modularity

• What is modularity? • Easily to add new metrics from user

view • Pluggable

Page 20: GOCON Autumn (Story of our own Monitoring Agent in golang)

Modularity• How?

• Input : get metric • Codec : understand metric • Output : send metric

Page 21: GOCON Autumn (Story of our own Monitoring Agent in golang)

// Metric is central model for imonDtype Metric struct {

ProtocolVersion ProtocolVerName stringVal ValueTimeStamp time.TimeFingerprint FingerprintType MetricTypeLabels map[string]string

}

Page 22: GOCON Autumn (Story of our own Monitoring Agent in golang)

Input Plugin design

Page 23: GOCON Autumn (Story of our own Monitoring Agent in golang)

Input Plugin design

• Three important things: • Process model • Plugin model • Collecting model (push vs pull)

Page 24: GOCON Autumn (Story of our own Monitoring Agent in golang)

Process model

Single process vs

Multiple process

Page 25: GOCON Autumn (Story of our own Monitoring Agent in golang)

Process model

- Adv : easy management / maintainance

- DisAdv : one bad plugin could affect the whole

Page 26: GOCON Autumn (Story of our own Monitoring Agent in golang)

Same language vs

Embedded language

Plugin model

Page 27: GOCON Autumn (Story of our own Monitoring Agent in golang)

Plugin model- Adv: Simple model, better maintainance - DisAdv: each time add new plugin, need to restart the whole agent

Page 28: GOCON Autumn (Story of our own Monitoring Agent in golang)

// InputPlugin represent an input plugin interfacetype InputPlugin interface {

Interval() config.DurationGracefulStop() errorName() stringType() InputType

}

type InputByte interface {Decoder() codec.DecoderReadBytesWithContext(ctx context.Context) ([]byte, error)

}

type InputMetrics interface {ReadMetricsWithContext(ctx context.Context) (model.Metrics, error)

}

All plugins share same interface

Page 29: GOCON Autumn (Story of our own Monitoring Agent in golang)

Push vs

Pull

Collecting model

Page 30: GOCON Autumn (Story of our own Monitoring Agent in golang)

Collecting model

- Adv: less affect to middleware, simple model - DisAdv: Application need to expose some thing to "pull" (http endpoint / file / ..)

Page 31: GOCON Autumn (Story of our own Monitoring Agent in golang)

func (i *MemcachedInput) ReadMetricsWithContext(ctx context.Context) (model.Metrics, error) {

..............conn, err := net.DialTimeout("tcp", i.endpoint, i.timeout.Duration)if err != nil {

return nil, err}defer conn.Close()

_, err = conn.Write([]byte("stats\n"))if err != nil {

return nil, err}..................scanner := bufio.NewScanner(conn)

for scanner.Scan() {text := scanner.Text()if text == "END" {

break}// Split entries which look like: STAT time 1488291730entries := strings.Split(text, " ")if len(entries) == 3 {

v, err := strconv.ParseInt(entries[2], 10, 64)if err != nil {

log.Debug("invalid value %s", entries[2])continue

}

ms = append(ms, *model.NewMetric(entries[1],model.Value(float64(v)),time.Now(),model.GaugeType,

))}

}..........return ms, nil

}

Pull sample directly contact server

Page 32: GOCON Autumn (Story of our own Monitoring Agent in golang)

Codec Plugin / Output Plugin

Page 33: GOCON Autumn (Story of our own Monitoring Agent in golang)

type Encoder interface {//Name() stringEncode(metrics model.Metrics) ([]byte, error)Name() string

}

type Decoder interface {//Name() stringDecode(input []byte) (model.Metrics, error)Name() string

}

Codec interface

Page 34: GOCON Autumn (Story of our own Monitoring Agent in golang)

// OutputPlugin represent an output plugin interfacetype OutputPlugin interface {

WriteWithContext(ctx context.Context, metrics model.Metrics) error // for Cancellable write

Encoder() codec.EncoderInterval() config.DurationGracefulStop() errorWalReader() wal.LogReaderName() string

}

Output interface

Page 35: GOCON Autumn (Story of our own Monitoring Agent in golang)

Buffer design

Page 36: GOCON Autumn (Story of our own Monitoring Agent in golang)

each Output maintain its own offset i offset will be update when output success

Buffer design

Page 37: GOCON Autumn (Story of our own Monitoring Agent in golang)

Buffer design• Advantages

• When output failed, just rollback index

• Chunks will be organized by segments (each segments ~ 1GB) • To clean up, just delete old segments

which already consumed by all output

Page 38: GOCON Autumn (Story of our own Monitoring Agent in golang)

Buffer design• Other concerns

• Serialization • It's not hard to write your own serialization method (link)

• mmap vs file read • not much different in our case • mmap index management is cubersome to write because it

has to manipulate at 2^n address

• Concurrent write vs Synchronized write • Synchronized write for data safety

https://www.slideshare.net/dxhuy88/story-writing-byte-serializer-in-golang

Page 39: GOCON Autumn (Story of our own Monitoring Agent in golang)

Buffer designtype LogReader interface {

Read() (model.Metrics, error)Read1() (model.Metrics, error)CurrentOffset() int64SetOffset(int64) errorDestroy() error

}

type LogWriter interface {Write(*model.Metrics) errorLastOffset() int64

}

Page 40: GOCON Autumn (Story of our own Monitoring Agent in golang)

Management friendly

• Monitoring agents is f**king hard

• Deploy agents in large scale is painful

Page 41: GOCON Autumn (Story of our own Monitoring Agent in golang)

Potential risk

• Die without noticing • Over resource consume • Overflow buffer • Dirty data • Resend storm

Page 42: GOCON Autumn (Story of our own Monitoring Agent in golang)

Resend storm is aweful

Page 43: GOCON Autumn (Story of our own Monitoring Agent in golang)

How we solve those problems

• Expose agent state as http endpoint • and monitoring them all using prometheus • Monitoring everything

• Aliveness / CPU / Memory / Output Lag • Using circuitbreaker / jitter resend to

prevent resend storm

Page 44: GOCON Autumn (Story of our own Monitoring Agent in golang)

func (b *AutoOpenBreaker) Close() {log.Info("close breaker for %v", b.autoOpenTime)b.state = CLOSEb.closeTime = time.Now()go b.autoOpen()

}

func (b *AutoOpenBreaker) open() {b.state = OPEN

}

func (b *AutoOpenBreaker) IsOpen() bool {return b.state == OPEN

}

func (b *AutoOpenBreaker) autoOpen() {tick := time.Tick(b.autoOpenTime)select {case <-tick:

log.Info("auto open breaker after %v", b.autoOpenTime)b.open()

}} Circuit

breaker

Page 45: GOCON Autumn (Story of our own Monitoring Agent in golang)

func (i *Output) retry(left int, cancelCtx context.Context, f func() error) error {

select {case <-cancelCtx.Done():

return fmt.Errorf("got cancelled")default: // no-op}

// jitter retrym := math.Min(capacity, float64(base*math.Pow(2.0, float64(maxRetry-

left))))s := rand.Intn(int(m))log.Debug("retry sleep %d second", s)time.Sleep(time.Duration(s) * time.Second)

// do some work....}

jitter

Page 46: GOCON Autumn (Story of our own Monitoring Agent in golang)

Agent monitoring using prometheus / grafana

Page 47: GOCON Autumn (Story of our own Monitoring Agent in golang)

Export agent own metrics at http://host:port/agent_metrics

Page 48: GOCON Autumn (Story of our own Monitoring Agent in golang)

Admin page

Page 49: GOCON Autumn (Story of our own Monitoring Agent in golang)

Finally• Golang is awesome

• Quick prototype, works everywhere • Never, ever write your own agent

• ... unless you have to • But it's fun because there're a lot of

problems

Page 50: GOCON Autumn (Story of our own Monitoring Agent in golang)

We're hiring