Top Banner
High Performance Logging System for Embedded UNIX and GNU/Linux Applications IEEE RTCSA 2013 (8/21/13) Cisco Systems Jaein Jeong
25

High Performance Logging System for Embedded UNIX and GNU/Linux Applications IEEE RTCSA 2013 (8/21/13) Cisco Systems Jaein Jeong.

Dec 23, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: High Performance Logging System for Embedded UNIX and GNU/Linux Applications IEEE RTCSA 2013 (8/21/13) Cisco Systems Jaein Jeong.

High Performance Logging System for Embedded UNIX and GNU/Linux Applications

IEEE RTCSA 2013 (8/21/13)Cisco Systems

Jaein Jeong

Page 2: High Performance Logging System for Embedded UNIX and GNU/Linux Applications IEEE RTCSA 2013 (8/21/13) Cisco Systems Jaein Jeong.

2 / 25

Introduction- Embedded UNIX in many places

File System

KERNEL

USER

Buffer

AppProcessAppProcess

AppProcess

log

log

logsyslogd

syslog

Traditional UNIX Logging System

Page 3: High Performance Logging System for Embedded UNIX and GNU/Linux Applications IEEE RTCSA 2013 (8/21/13) Cisco Systems Jaein Jeong.

3 / 25

Problem Statement- Apps slow down w. large amount of logging

• Long latency to logging daemon• Inefficiency of unbuffered writes to flash FS• Long latency even with output buffering

FlashFile System

KERNEL

USER

Buffer

AppProcessAppProcess

AppProcess

log

log

logsyslogd

syslog FlashFile System

KERNEL

USER

Buffer

AppProcessAppProcess

AppProcess

log

log

logsyslogd

syslog FlashFile System

KERNEL

USER

Buffer

AppProcessAppProcess

AppProcess

log

log

logsyslogd

syslog FlashFile System

KERNEL

USER

Buffer

AppProcessAppProcess

AppProcess

log

log

logsyslogd

syslog FlashFile System

KERNEL

USER

Buffer

AppProcessAppProcess

AppProcess

log

log

logsyslogd

syslog

FlashLogger

Named pipe

Page 4: High Performance Logging System for Embedded UNIX and GNU/Linux Applications IEEE RTCSA 2013 (8/21/13) Cisco Systems Jaein Jeong.

4 / 25

Our Approach

• Faster Message Transfer• Compatibility with Existing Logging Apps• Destination-Aware Message Formatting

Page 5: High Performance Logging System for Embedded UNIX and GNU/Linux Applications IEEE RTCSA 2013 (8/21/13) Cisco Systems Jaein Jeong.

5 / 25

Organization

• Related Work for UNIX Logging Systems• Background– Cisco UCS and Virtual Interface Card (VIC)– Evolution of VIC Logging System

• Design Requirements and Implementation• Evaluation and Optimization• Conclusion

Page 6: High Performance Logging System for Embedded UNIX and GNU/Linux Applications IEEE RTCSA 2013 (8/21/13) Cisco Systems Jaein Jeong.

6 / 25

Related Work- Logging Methods for UNIX Apps

• Not designed for embedded/flash logging– Slow msg passing (msg copying over kernel)– Unbuffered message writes

Syslog• Introduced in early 80’s• Still most notable one

Syslog-ng• An extension based on nsyslogd• Reliable transport, encryption, and richer set of information and filtering

Rsyslog• An extension used in latest distros• Multi-threading.

Page 7: High Performance Logging System for Embedded UNIX and GNU/Linux Applications IEEE RTCSA 2013 (8/21/13) Cisco Systems Jaein Jeong.

7 / 25

Background- Cisco UCS and Virtual Interface Card

Cisco UCS datacenterserver system

Cisco UCS server

128Programmable

VirtualInterfaces

Ethernet NICs Fibre Channel HBAs

10GBASE-KRUnified NetworkFabric, 1 to EachFabric Extender

Cisco UCS Virtual Interface Card (VIC)

Mgmt CPUFCPU 0

VIC ASIC

FCPU 1

Mgmt CPU

MIPS proc core(500MHz, MIPS 24Kc)

Embedded Linux(Linux kernel 2.6.23-rc5)

Page 8: High Performance Logging System for Embedded UNIX and GNU/Linux Applications IEEE RTCSA 2013 (8/21/13) Cisco Systems Jaein Jeong.

8 / 25

Background- Evolution of VIC Logging System

• Logging from Multiple Processes• Different Severity Levels• Formatting and flash writing

• Forwards serious msgs to switches• Functional, but with worse write performance

• Improves flash write performance of unbuffered syslogd• Still suffers long latency

JFFS2

Flash

AppProcess

AppProcess

AppProcess

log

log

loglogd

Logd – a simple logging daemon

Unbuffered syslogd System

Process

SystemProcess

SystemProcess

Switch Switch

JFFS2

Flash

KERNEL

USER

Buffer

AppProcessAppProcess

AppProcess

log

log

logsyslogd

syslogBuffered syslogd System

Process

SystemProcess

SystemProcess

Switch Switch

JFFS2

FlashFlash

Logger

KERNEL

USER

Named pipeBuffer

AppProcess

AppProcess

AppProcess

log

log

logsyslogd

syslog

Page 9: High Performance Logging System for Embedded UNIX and GNU/Linux Applications IEEE RTCSA 2013 (8/21/13) Cisco Systems Jaein Jeong.

9 / 25

Organization

• Related Work for UNIX Logging Systems• Background– Cisco UCS and Virtual Interface Card (VIC)– Evolution of VIC Logging System

• Design Requirements & Implementation• Evaluation & Optimization• Conclusion

Page 10: High Performance Logging System for Embedded UNIX and GNU/Linux Applications IEEE RTCSA 2013 (8/21/13) Cisco Systems Jaein Jeong.

10 / 25

SystemProcess

SystemProcess

SystemProcess

Switch Switch

JFFS2

FlashFlash

Logger

KERNEL

USER

Named pipe

AppProcess

AppProcess

AppProcess

log

log

logmqlogd

MemoryMapped

File

enqueue

dequeue

Design Requirements - Faster Message Transfer

• Avoid kernel-to-user space msg copying

Syslogd Logging Mqlogd LoggingSystemProcess

SystemProcess

SystemProcess

Switch Switch

JFFS2

FlashFlash

Logger

KERNEL

USER

Named pipeBuffer

AppProcessAppProcess

AppProcess

log

log

logsyslogd

syslog

Page 11: High Performance Logging System for Embedded UNIX and GNU/Linux Applications IEEE RTCSA 2013 (8/21/13) Cisco Systems Jaein Jeong.

11 / 25

SystemProcess

SystemProcess

SystemProcess

Switch Switch

JFFS2

FlashFlash

Logger

KERNEL

USER

Named pipeBuffer

AppProcessAppProcess

AppProcess

log

log

logsyslogd

syslog

Design Requirements - Faster Message Transfer

• Reduce message copying from 4 to 2

1

2 34

1’2’

Syslogd Logging Mqlogd LoggingSystemProcess

SystemProcess

SystemProcess

Switch Switch

JFFS2

FlashFlash

Logger

KERNEL

USER

Named pipe

AppProcess

AppProcess

AppProcess

log

log

logmqlogd

MemoryMapped

File

enqueue

dequeue

App local copy1Write to kernel buffer2Write directly to shared memory1’ Write from shared memory to named pipe2’Write to named pipe4Syslogd local copy3

Page 12: High Performance Logging System for Embedded UNIX and GNU/Linux Applications IEEE RTCSA 2013 (8/21/13) Cisco Systems Jaein Jeong.

12 / 25

Design Requirements- Compatibility with Existing Logging Apps

• Thru Logging API– Replace syslog() with

share memory lib calls

• Direct Syslog Calls– Server receives msgs

through UDP Unix socket

Logging Server (Syslogd)

Logging Client

syslog() library call

klogd fls …

UDP Unix Socket

Logging Server (Syslogd)

Logging Client

syslog() library call

mcp fls …

UDP Unix Socket

Logging API :log_info(), log_error(), …

Logging Server (mqlogd)

Logging Client

klogd xinetd …

syslog() library call

UDP Unix Socket

Logging Server (mqlogd)

Logging Client

app1 app2 …Logging API :

log_info(), log_error(), …

Shared MemoryLogging Library

Page 13: High Performance Logging System for Embedded UNIX and GNU/Linux Applications IEEE RTCSA 2013 (8/21/13) Cisco Systems Jaein Jeong.

13 / 25

Design Requirements- Destination-Aware Message Formatting

• Syslogd– Working but limited– Redundant– Coarse time granularity (in seconds)

• Mqlogd– Destination-aware formatting with space saving– Uses system supported timing (in micro-seconds)

Page 14: High Performance Logging System for Embedded UNIX and GNU/Linux Applications IEEE RTCSA 2013 (8/21/13) Cisco Systems Jaein Jeong.

14 / 25

Implementation- Shared Memory and Circular Queue

• Notification Mechanism– Write-and-select– Signal

• Locking Mechanism– Semaphore lock– Pthread lock

EnqueueLoggingClient

Shared Memory

… LoggingClient

LoggingServer

Dequeue

LoggingEvent Notification

Disable Flag

Circular Queue Header

NotificationDisable Flag

Non-Header EntryHeader Entry

Queue Memory Layout

Non-Header Entry

Non-Header Entry

Notification

Page 15: High Performance Logging System for Embedded UNIX and GNU/Linux Applications IEEE RTCSA 2013 (8/21/13) Cisco Systems Jaein Jeong.

15 / 25

Organization

• Related Work for UNIX Logging Systems• Background– Cisco UCS and Virtual Interface Card (VIC)– Evolution of VIC Logging System

• Design Requirements & Implementation• Evaluation & Optimization• Conclusion

Page 16: High Performance Logging System for Embedded UNIX and GNU/Linux Applications IEEE RTCSA 2013 (8/21/13) Cisco Systems Jaein Jeong.

16 / 25

Evaluation

• Metrics– Request Latency– Request Drop Rate

• Parameters– Number of clients– Number of iterations (Depth of queue size)– Locking mechanism– Notification mechanism

Page 17: High Performance Logging System for Embedded UNIX and GNU/Linux Applications IEEE RTCSA 2013 (8/21/13) Cisco Systems Jaein Jeong.

17 / 25

Performance Results- Performance compared to syslogd

• Avg Latency: >10x speed-up• Min Latency: >20x speed-up• Max Latency: >2x speed-up

100 1000 5000 10000 500000

200

400

600

800

Average Request Latency - 1 Client

syslogdmqlogd (select, semaphore)mqlogd (signal, semaphore)mqlogd (select, pthread)mqlogd (signal, pthread)

Number of Iterations

Late

ncy

(us)

100 1000 5000 10000 500000

100200300400500600700

Minimum Request Latency - 1 Client

syslogdmqlogd (select, semaphore)mqlogd (signal, semaphore)mqlogd (select, pthread)mqlogd (signal, pthread)

Number of Iterations

Late

ncy

(us)

100 1000 5000 10000 500000

10000

20000

30000

40000

50000

Maximum Request Latency - 1 Client

syslogdmqlogd (select, semaphore)mqlogd (signal, semaphore)mqlogd (select, pthread)mqlogd (signal, pthread)

Number of Iterations

Late

ncy

(us)

Page 18: High Performance Logging System for Embedded UNIX and GNU/Linux Applications IEEE RTCSA 2013 (8/21/13) Cisco Systems Jaein Jeong.

18 / 25

Performance Results- Effect of Queue Size

• No drops within queue size (e.g. 10000)• Queue size should be larger than max

expected burst size

100 1000 5000 10000 500000%

20%

40%

60%

80%

100%

Request Drop Rate - 1 Client

mqlogd (select, semaphore)mqlogd (signal, semaphore)mqlogd (select, pthread)mqlogd (signal, pthread)

Number of Iterations

Perc

ent

Page 19: High Performance Logging System for Embedded UNIX and GNU/Linux Applications IEEE RTCSA 2013 (8/21/13) Cisco Systems Jaein Jeong.

19 / 25

Performance Results- Effect of Multiple Clients

• Avg request latency increases proportionally• With 2 clients, request starts to drop with

smaller number of iterations

100 1000 5000 10000 500000.0

400.0

800.0

1200.0

1600.0

Avg Request Latency - 1 and 2 Clients

syslogd (1 client)syslogd (2 clients)mqlogd (select, 1 client)mqlogd (select, 2 clients)

Number of Iterations

Late

ncy

(us)

100 1000 5000 10000 500000%

20%

40%

60%

80%

100%

Request Drop Rate - 1 and 2 Clients

mqlogd (select, 1 client)mqlogd (select, 2 clients)

Number of Iterations

Perc

ent

Page 20: High Performance Logging System for Embedded UNIX and GNU/Linux Applications IEEE RTCSA 2013 (8/21/13) Cisco Systems Jaein Jeong.

20 / 25

Performance Results - Effect of Notification Mechanisms

• Makes little difference

100 1000 5000 10000 500000

20

40

60

80

100

Average Request Latency - 1 Client

mqlogd (select, semaphore)mqlogd (signal, semaphore)

Number of Iterations

Late

ncy

(us)

100 1000 5000 10000 500000

5

10

15

20

25

Minimum Request Latency - 1 Client

mqlogd (select, semaphore)mqlogd (signal, semaphore)

Number of Iterations

Late

ncy

(us)

100 1000 5000 10000 500000

5000

10000

15000

20000

Maximum Request Latency - 1 Client

mqlogd (select, semaphore)mqlogd (signal, semaphore)

Number of Iterations

Late

ncy

(us)

Page 21: High Performance Logging System for Embedded UNIX and GNU/Linux Applications IEEE RTCSA 2013 (8/21/13) Cisco Systems Jaein Jeong.

21 / 25

Performance Results - Effect of Lock Mechanisms

• Pthread mutex is 40% faster than semaphore.• Semaphore is used for our production code due to a

limitation of pthread mutex lock(Linux kernel 2.6.23-rc5)..

100 1000 5000 10000 500000

50

100

150

200

250

Average Request Latency - 1 Client

mqlogd (select, semaphore)mqlogd (select, pthread)

Number of Iterations

Late

ncy

(us)

100 1000 5000 10000 500000

20406080

100120140

Minimum Request Latency - 1 Client

mqlogd (select, semaphore)mqlogd (select, pthread)

Number of Iterations

Late

ncy

(us)

100 1000 5000 10000 500000

5000

10000

15000

20000

Maximum Request Latency - 1 Client

mqlogd (select, semaphore)mqlogd (select, pthread)

Number of Iterations

Late

ncy

(us)

Page 22: High Performance Logging System for Embedded UNIX and GNU/Linux Applications IEEE RTCSA 2013 (8/21/13) Cisco Systems Jaein Jeong.

22 / 25

Performance Results- Effect of Client Interface Type

• Logging using UNIX socket interface– Backward compatibility is no faster– About the same level as syslogd.– For compatibility, not for general use.

100 1000 5000 10000 500000

200

400

600

800

1000

Average Request Latency - 1 Client

syslogdmqlogd (select, semaphore)mqlogd (Unix socket)

Number of Iterations

Late

ncy

(us)

Page 23: High Performance Logging System for Embedded UNIX and GNU/Linux Applications IEEE RTCSA 2013 (8/21/13) Cisco Systems Jaein Jeong.

23 / 25

Optimization- Effects of deferred notification

• Sends one notification for a batch of msgs• Measured time for host-to-adapter commands

(capability & macaddr) with and w.o. logging• 2x speed-up in latency

write-and-se-lect

deferred syslogd0

200400600800

10001200140016001800

Latency for 'capability' command

msg xfer time (us)

write-and-se-lect

deferred syslogd0

100200300400500600700800900

1000

Latency for 'macaddr' command

msg xfer time (us)

write-and-se-lect

deferred syslogd0

200400600800

10001200140016001800

Latency for 'capability' command

logging time (us)

msg xfer time (us)

write-and-se-lect

deferred syslogd0

100200300400500600700800900

1000Latency for 'macaddr' command

logging time (us)

msg xfer time (us)

Page 24: High Performance Logging System for Embedded UNIX and GNU/Linux Applications IEEE RTCSA 2013 (8/21/13) Cisco Systems Jaein Jeong.

24 / 25

Future Works

• Reduce kernel msg copying even further• Improve performance with faster lock• Avoid loss of serious messages

FlashLogger

Named pipe

File System

KERNEL

USER

AppProcessAppProcess

AppProcess

log

log

logmqlogd

MemoryMapped

File

enqueuedequeue

FlashLogger

File System

KERNEL

USER

AppProcessAppProcess

AppProcess

log

log

logmqlogd

MemoryMapped

File

enqueuedequeue

MemoryMapped

File

Page 25: High Performance Logging System for Embedded UNIX and GNU/Linux Applications IEEE RTCSA 2013 (8/21/13) Cisco Systems Jaein Jeong.

25 / 25

Conclusion

• Logging system for embedded UNIX apps• Up to 100x speed-up in latency, 10x throughput• Backward Compatibility• Commercially used in Cisco UCS Virtual

Interface Cards