Top Banner
Troubleshooting Redis @charsyam KAKAO
101

Troubleshooting redis

Jan 08, 2017

Download

Technology

DaeMyung Kang
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Troubleshooting redis

Troubleshooting Redis

@charsyam

KAKAO

Page 2: Troubleshooting redis

About me•Senior Software Engineer in KAKAO

•Redis/Twemproxy Contributor

•Redis-doc project merger.

•Apache Tajo Commiter

Page 3: Troubleshooting redis

Kakaostory

Page 4: Troubleshooting redis

Kakaostory

DAU: 8MMAU: 15M

Page 5: Troubleshooting redis

Kakaostory

420M API CALL COUNT

Page 6: Troubleshooting redis

Kakaostory Service Stack• For Storage

•MariaDB(Master/Slave for HA)• Hbase• Cassandra

• For Cache• Redis•Arcus

• (Memcached variant, opensource, supporting collections)

Page 7: Troubleshooting redis

Redis5.2TB, 274 Servers

(Arcus: 3.3TB, 137 Servers)

Page 8: Troubleshooting redis

Why Redis?•As lookaside Cache for service data•Example)•User Profile Information•Feeds•Activities•Friends•Notifications

Page 9: Troubleshooting redis

Agenda•Single Threaded

•Memory Fragmentation

•Redis Troubleshooting cases

•Redis Monitoring

•Redis HA

Page 10: Troubleshooting redis

Single Threaded

Page 11: Troubleshooting redis

Redis Event Loop

Client #1

Client #2

……

Client #N

Redis Event Loop

I/O Multiplexing

ProcessCommand

command #1

command #2

Page 12: Troubleshooting redis

Only One Commandat Once

Page 13: Troubleshooting redis

Long-time Spendingoperations

Page 14: Troubleshooting redis

KEYSFlushAll/FlushDB

LUA ScriptMULTI/EXEC

Delete Collections

Page 15: Troubleshooting redis

Why slow?

Page 16: Troubleshooting redis

O(n)

Page 17: Troubleshooting redis

KEYS – Iterating all Keys

di = dictGetSafeIterator(c->db->dict);allkeys = (pattern[0] == '*' && pattern[1] == '\0');while((de = dictNext(di)) != NULL) {

……stringmatchlen(pattern,plen,key,sdslen(key),0)

}

Page 18: Troubleshooting redis

FlushAll – Deleting all itemsfor (i = 0; i < ht->size && ht->used > 0; i++) {

dictEntry *he, *nextHe;if ((he = ht->table[i]) == NULL) continue;while(he) {

nextHe = he->next;dictFreeKey(d, he);dictFreeVal(d, he);zfree(he);ht->used--;he = nextHe;

}}

Page 19: Troubleshooting redis

How slow?

Page 20: Troubleshooting redis

Command Item Count Time

flushall 1,000,000 1000ms(1 second)

FlushAll

Page 21: Troubleshooting redis

Delete collections

Item Count Time

list 1,000,000 1000ms(1 second)

set 1,000,000 1000ms(1 second)

Sorted set 1,000,000 1000ms(1 second)

hash 1,000,000 1000ms(1 second)

You can use Xscan commands from 2.8.x

Page 22: Troubleshooting redis

Using Multiple Instancesin a Physical Server(can use more cpus)

Page 23: Troubleshooting redis

Fork forCreating RDB,AOF Rewrite

Page 24: Troubleshooting redis

Maximum 2x MemoryDisk IO

CPU Load/Usage

Page 25: Troubleshooting redis

CPU 4 core, 32G Memory

Mem: 24G

Mem: 8G

Mem: 8G

Mem: 8G

more Reliable

Page 26: Troubleshooting redis

Set CPU Affinityusing taskset

Page 27: Troubleshooting redis

Divide NIC Interrupt CPUand Redis Process CPU

Page 28: Troubleshooting redis

Memory Fragmentation

Page 29: Troubleshooting redis

Memory Fragmentation #1Used_memory RSS

Page 30: Troubleshooting redis

Memory Fragmentation #2Used_memory RSS

Starting to use Arcus at this case

Page 31: Troubleshooting redis

Redis Troubleshooting Cases

Page 32: Troubleshooting redis

Problem #1KEYS

Page 33: Troubleshooting redis

Performance Spike

Page 34: Troubleshooting redis

INFO all# Commandstatscmdstat_psetex:calls=2326667,usec=9322929,usec_per_call=4.01……cmdstat_pexpire:calls=3695333,usec=10068580,usec_per_call=2.72cmdstat_keys:calls=249,usec=1000314022,usec_per_call=4017325.50cmdstat_ping:calls=27005,usec=30027,usec_per_call=1.11……

Page 35: Troubleshooting redis

Slowlog get 10

Page 36: Troubleshooting redis

rename KEYS Command

Page 37: Troubleshooting redis

Using Scan

Page 38: Troubleshooting redis

Redis Dict Structure

Page 39: Troubleshooting redis

Scan #1

Page 40: Troubleshooting redis

Scan #2

Page 41: Troubleshooting redis

Scan #3

Page 42: Troubleshooting redis

Problem #2All Write Commands Fail

Page 43: Troubleshooting redis

“MISCONF Redis is configured to save RDB

snapshots, but is currently not able to persist on

disk. Commands that may modify the data set are

disabled. Please check Redis logs for details about

the error.”

Page 44: Troubleshooting redis

Reasonif (((server.stop_writes_on_bgsave_err &&

server.saveparamslen > 0 &&server.lastbgsave_status == C_ERR) ||server.aof_last_write_status == C_ERR) &&

server.masterhost == NULL &&(c->cmd->flags & CMD_WRITE ||c->cmd->proc == pingCommand))

{…

}

Page 45: Troubleshooting redis

config set stop-writes-on-bgsave-error no

Page 46: Troubleshooting redis

Problem #3Using Default Option

Page 47: Troubleshooting redis

Redis as Cache

Page 48: Troubleshooting redis

SAVE 900 1SAVE 300 10SAVE 60 10000

Page 49: Troubleshooting redis

Heavy Disk IOHigh Cpu Load

with creating RDB

Page 50: Troubleshooting redis

Config set SAVE “”

Page 51: Troubleshooting redis

Problem #4Using Swap Memory

Page 52: Troubleshooting redis

Redis using 28Gon single 32G machine

Page 53: Troubleshooting redis

Migrate or Restart

Page 54: Troubleshooting redis

Monitor Redis Serverand keep within bounds

Page 55: Troubleshooting redis

Problem #5Simultaneous AOF Rewrite

Page 56: Troubleshooting redis

A 256GB Single MachineRedis26GB

Redis26GB

Redis26GB

Redis26GB

Redis26GB

Redis26GB

Redis26GB

Redis26GB

Page 57: Troubleshooting redis

Simultaneous AOF RewriteRedis26GB

Redis26GB

Redis26GB

Redis26GB

Redis26GB

Redis26GB

Redis26GB

Redis26GB

AOF Rewrite AOF Rewrite AOF Rewrite AOF Rewrite

AOF Rewrite AOF Rewrite AOF Rewrite AOF Rewrite

Page 58: Troubleshooting redis

Stop all AOF Rewrites

Page 59: Troubleshooting redis

Turn off Automatic AOF Rewrite

Page 60: Troubleshooting redis

Config set auto-aof-rewrite-percentage 0

Page 61: Troubleshooting redis

Manually Run AOF Rewrite

Page 62: Troubleshooting redis

Problem #6Replication is Broken with

Network Line Failure

Page 63: Troubleshooting redis

All redis replication are broken

by Network line failure

Page 64: Troubleshooting redis

What Happensif network is recovered

Page 65: Troubleshooting redis

Replication

Master SlavereplicationCron

Health check Periodically

Page 66: Troubleshooting redis

All slaves automatically try to reconnect to

master.

Page 67: Troubleshooting redis

Slave of no one

Page 68: Troubleshooting redis

Problem #7Replication Failure

Page 69: Troubleshooting redis

Permission

Page 70: Troubleshooting redis

Memory Allocation Failsysctl vm.overcommit_memory=1

Page 71: Troubleshooting redis

Replication Failurewith OutputBufferSize

Page 72: Troubleshooting redis

Hard LimitSoft Limit

Page 73: Troubleshooting redis

config set client-output-buffer-limit "slave 1024mb 1024mb 60"

Page 74: Troubleshooting redis

Problem #8Hash Table Expansion

Page 75: Troubleshooting redis

Redis Dict – Hash Table Expansion #1

Page 76: Troubleshooting redis

Redis Dict – Hash Table Expansion #2

Page 77: Troubleshooting redis

Redis Dict – Hash Table Expansion #3

Page 78: Troubleshooting redis

Grows by twice

Page 79: Troubleshooting redis

Maxmemoryand

freeMemoryIfNeeded

Page 80: Troubleshooting redis

1 Billion items

Page 81: Troubleshooting redis

1,000,000,000 * 4 = 4G

Page 82: Troubleshooting redis

Maxmemory = 16GUsed_memory = 12G

Page 83: Troubleshooting redis

Hash Table Expansionis needed.

Page 84: Troubleshooting redis

4G * 2 = 8G.You need 20G(12G + 8G)

Page 85: Troubleshooting redis

20G > 16G(maxmemory)

Page 86: Troubleshooting redis

Need a feature that can Set Initial size of Hash

Table (Not Supported)

https://github.com/antirez/redis/pull/2812

Page 87: Troubleshooting redis

Redis Monitoring

Page 88: Troubleshooting redis

Monitoring is important as much as

Management

Page 89: Troubleshooting redis

Redis Monitoring MetricsFactor System or Redis Info

CPU Usage, Load System

Network Inbound/outbound System

Client connectionsMaxclient setting

Info

Key sizeProcessed commands

Redis

Memory Usage, RSS(very Important)

Redis

Disk Usage, IO System

Expired Keys, Evicted Keys Redis

Page 90: Troubleshooting redis

Redis HA

Page 91: Troubleshooting redis

Using DNS for Failover

Page 92: Troubleshooting redis

Private Internal DNS Serverwith TTL 0

Page 93: Troubleshooting redis

DNS HA FlowDetect A

RedisFailure

ChangeB can write

Change DNS A with B

Send AClient Kill

New clientsWill connect to B

B Configrewrite

Page 94: Troubleshooting redis

JVMadd –Dsun.net.inetaddr.ttl=0

Page 95: Troubleshooting redis

twemproxyusing 0.4.1

Page 96: Troubleshooting redis

UsingCoordinator

Page 97: Troubleshooting redis

Zookeeper

Page 98: Troubleshooting redis

Zookeeper with Redis Information

Page 99: Troubleshooting redis

Zookeeper with RedisApplication Servers

ZooKeeper

RedisShard-1

RedisShard-2

RedisShard-3

Redis Cluster Monitor

Get Redis Shard Information

Health Check

Update ShardInfo

Event: Node Add or Remove, Master change

Page 100: Troubleshooting redis

Summary•Redis is Single Threaded

•Creating RDB or AOF Rewrite is expensive

•Don’t use KEYS command.

•Don’t use default redis configuration.

•Monitoring is very importatnt.

Page 101: Troubleshooting redis

Thanks