Top Banner
A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruc:on in Erasure;coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur, K. Ramchandran
47

A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

Nov 02, 2018

Download

Documents

NguyễnHạnh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

A"“Hitchhiker’s”"Guide"to"Fast"and"Efficient"Data"Reconstruc:on"in"Erasure;coded"Data"Centers

K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur, K. Ramchandran

Page 2: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

Need"for"Redundant"Storage""in"Data"Centers"

•  "Frequent"unavailability"events"in"data"centers"–  unreliable"components"–  soHware"glitches,"maintenance"shutdowns,""" "power"failures,"etc."

•  "Redundancy"necessary"for"reliability"and"availability"""

Page 3: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

Popular"Approach"for"Redundant"Storage:"Replica:on"

•  Distributed"file"systems"used"in"data"centers"store"mul:ple"copies"of"data"on"different"machines""

"•  Machines"typically"chosen"on"different"racks""

–  to"tolerate"rack"failures"

E.g.,"Hadoop"Distributed"File"System"(HDFS)"stores""""3"replicas"by"default""

""

Page 4: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

HDFS"a" b" c" d" e" f" g" h" i" j"

a" b" c" d" e" f" g" h" i" j"

a" b" c" d" e" f" g" h" i" j"

a" b" c" d" e" f" g" h" i" j"

FILE%

divide"into"blocks"

introduce"redundancy"

TOR" TOR" TOR" TOR"

AS/Router"

…% …% …% …%

store"distributed"across"network"

Page 5: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

Massive"Data"Sizes:""Need"Alterna:ve"to"Replica:on"

"•  Small"to"moderately"sized"data:"disk"storage"is"inexpensive""–  replica:on"viable"

•  No"longer"true"for"massive"scales"of"opera:on"– e.g.,"Facebook"data"warehouse"cluster"stores"mul:ple"tens"of"Petabytes"(PBs)"

“Erasure"codes”"are"an"alterna:ve"

Page 6: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

Erasure"Codes"in"Data"Centers""•  Facebook"data"warehouse"cluster"

– uses"Reed;Solomon"(RS)"codes"instead"of"3;replica:on"on"a"por:on"of"the"data"

– savings'of'mul-ple'Petabytes'of'storage'space'

"

Page 7: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

block 1

block 2

block 3

block 4

a

b

a+b

a+2b parity"blocks"

data"blocks"

Erasure"Codes"Replication

Overhead" 2x" 2x"

block 1

block 2

block 3

block 4

a

b

a

b

Fault""tolerance:"

tolerates"any"one"failure" tolerates"any"two"failures"

Reed-Solomon (RS) code

In"general,"erasure"codes"provide"orders"of"magnitude"higher"reliability"at"much"smaller"storage"overheads"

Page 8: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

Outline"

•  Erasure"Codes"in"Data"Centers"– HDFS"

•  Impact"on"the"data"center"network"–  Problem"descrip:on""

•  Our"system:"“Hitchhiker”""•  Implementa:on"and"evalua:on"

–  Facebook"data"warehouse"cluster"

•  Literature""

Page 9: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

Outline"

•  Erasure"Codes"in"Data"Centers"– HDFS"

•  Impact"on"the"data"center"network"–  Problem"descrip:on""

•  Our"solu:on:"“Hitchhiker”""•  Implementa:on"and"evalua:on"

–  Facebook"data"warehouse"cluster"

•  Literature""

Page 10: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

Erasure"codes"in"Data"Centers:""HDFS;RAID"

Borthakur, “HDFS and Erasure Codes (HDFS-RAID)”!Fan, Tantisiriroj, Xiao and Gibson, “DiskReduce: RAID for Data-Intensive Scalable Computing”, PDSW 09!

a" b" c" d" e" f" g" h" i" j" P1" P2" P3" P4"

a" b" c" d" e" f" g" h" i" j"

a" b" c" d" e" f" g" h" i" j"

a" b" c" d" e" f" g" h" i" j"

Overhead:"3x"

Overhead:"1.4x"

(10,"4)"Reed;Solomon"code"

Page 11: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

Erasure"codes"in"Data"Centers:""HDFS;RAID"

a" b" c" d" e" f" g" h" i" j"

a" b" c" d" e" f" g" h" i" j"

a" b" c" d" e" f" g" h" i" j"

Overhead:"3x""Cannot"tolerate""many"3;failures"

a" b" c" d" e" f" g" h" i" j" P1" P2" P3" P4"

Overhead:"1.4x"

•  Any"10"blocks"sufficient"•  Can"tolerate"any"4;failures"

Borthakur, “HDFS and Erasure Codes (HDFS-RAID)”!Fan, Tantisiriroj, Xiao and Gibson, “DiskReduce: RAID for Data-Intensive Scalable Computing”, PDSW 09!

(10,"4)"Reed;Solomon"code"

Page 12: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

Outline"

•  Erasure"Codes"in"Data"Centers"– HDFS"

•  Impact"on"the"data"center"network"–  Problem"descrip:on""

•  Our"system:"“Hitchhiker”""•  Implementa:on"and"evalua:on"

–  Facebook"data"warehouse"cluster"

•  Literature""

Page 13: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

""Impact"on"Data"Center"Network"

•  Degraded"Reads"–  reques:ng"currently"unavailable"data"

–  on;the;fly"reconstruc:on"

•  Recovery"–  periodically"replace"unavailable"blocks"

–  to"ensure"desired"level"of"reliability"

Storage"Layer"

Network"Layer"

Reconstruc:on"Opera:ons"

Page 14: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

RS"codes"significantly"increase"network"usage"during"reconstruc:on"

Impact"on"Data"Center"Network"

Page 15: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

Replication

a"

Network Transfer & disk IO

= 1x Network Transfer

& disk IO = 2x

b"

a+b"

a

a

b

b

a

b

a+b

a+2b

Reed-Solomon code

block 1

block 2

block 3

block 4

block 1

block 2

block 3

block 4

a a

Network"transfer"&"disk"IO"""""""""""="(#data;blocks)"x"(size"of"data"to"be"reconstructed)"

Impact"on"Data"Center"Network"

In"(10,"4)"RS,"it"is"10x"

Page 16: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

Burdens"the"already"oversubscribed"Top;of;Rack"and"higher"level"switches"

TOR" TOR" TOR" TOR"

Router"

a% b%a%+%b%

a%+%2b%

…% …% …% …%

machine"1"

a%

Impact"on"Data"Center"Network"

machine"2" machine"3" machine"4"

Page 17: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

Impact"on"Data"Center"Network:""Facebook"Data"Warehouse"Cluster"

•  Mul:ple"PB"of"Reed;Solomon"encoded"data"

•  Median"of"180"TB"transferred"across"racks"per"day"for"RS"reconstruc:on"≈"5":mes"that"under"3;replica:on"

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”, Usenix HotStorage Workhsop 2013"

Page 18: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

RS"codes:"The"Good"and"The"Bad"

•  Maximum"possible"fault;tolerance"for"given"storage"overhead""–  storage;capacity"op:mal""–  (“maximum&distance&separable”"in"coding"theory"parlance)"

•  Flexibility"in"choice"of"parameters"–  Supports"any"number"of"data"and"parity"blocks"

"

"•  Not"designed"to"handle"reconstruc:on"opera:ons"efficiently"–  nega:ve"impact"on"the"network"

Page 19: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

RS"codes:"The"Good"and"The"Bad"

•  Maximum"possible"fault;tolerance"for"given"storage"overhead""–  storage;capacity"op:mal""–  (“maximum&distance&separable”"in"coding"theory"parlance)"

•  Flexibility"in"choice"of"parameters"–  Supports"any"number"of"data"and"parity"blocks"

"

"•  Not"designed"to"handle"reconstruc:on"opera:ons"efficiently"–  nega:ve"impact"on"the"network"

Maintain"

Improve"

Goal%

Page 20: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

Goal"

To"build"a"system"with:""

•  Same"(op:mal)"storage"requirement"and"fault"tolerance"

•  Same"(complete)"flexibility"in"choice"of"design"parameters"

•  Reduced"data"transfer"across"network"and"reduced"IO"from"disk"during"reconstruc:on"

Maintain"

Improve"

Page 21: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

Hitchhiker"

Is"a"system"with:""

•  Same"(op:mal)"storage"requirement"and"fault"tolerance"

•  Same"(complete)"flexibility"in"choice"of"design"parameters"

25"to"45%"less"network"transfers"and"disk"IO""during"reconstruc:on"

Maintain"

Improve"

!"

!"

!"

Page 22: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

Outline"

•  Erasure"Codes"in"Data"Centers"– HDFS"

•  Impact"on"the"data"center"network"–  Problem"descrip:on""

•  Our"system:"“Hitchhiker”""•  Implementa:on"and"evalua:on"

–  Facebook"data"warehouse"cluster"

•  Literature""

Page 23: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

Hitchhiker’s"Erasure"Code"

At"an"Abstract"Level"

Reed;Solomon"Code"

Hop;and;couple"(disk"layout)"

HITCHHIKER"

Page 24: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

Start"with"the"RS"code"

block"1"

block"2"

block"3"

block"4"

a1 �

b1 �

a1+b1 �

a1+2b1 �

a2 �

b2 �

a2+b2 �

a2+2b2 �

Hitchhiker’s"Erasure"Code:"Toy"Example"

1"byte" 1"byte"

Page 25: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

1"byte"

block"1"

block"2"

block"3"

block"4"

a1 �

b1 �

a1+b1 �

a1+2b1 �

a2 �

b2 �

a2+b2 �

a2+2b2+a1 �

1"byte"

Add"informa:on"from"first"group"on"to""pari:es"of"the"second"group"

No"extra"storage"

Intermediate"Code"

Page 26: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

Storage;op:mality"of"Intermediate"Code"

Retains"failure"tolerance"of"RS"codes:""can"tolerate"failure"of"any"2"nodes"

1"byte"

block"1"

block"2"

block"3"

block"4"

a1 �

b1 �

a1+b1 �

a1+2b1 �

a2 �

b2 �

a2+b2 �

a2+2b2+a1 �

1"byte"

a1 b1 �

+a1 �subtract"

a2 b2 �

Page 27: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

1"byte"

block"1"

block"2"

block"3"

block"4"

a1 �

b1 �

a1+b1 �

a1+2b1 �

a2 �

b2 �

a2+b2 �

a2+2b2+a1 �

1"byte"

Final"Code"

subtract"

Inver:ble"opera:on"within"a"block"

Page 28: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

1"byte"

block"1"

block"2"

block"3"

block"4"

a1 �

b1 �

a1+b1 �

2b1-a2-2b2 �

a2 �

b2 �

a2+b2 �

a2+2b2+a1 �

1"byte"

Final"Code"

Inver:ble"opera:ons"within"blocks"do"not"change"storage"or"fault"tolerance"

Page 29: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

Data"transferred:"only"3"bytes""(instead"of"4"bytes"as"in"RS)"

b2 �

a2+b2 �block"1"

block"2"

block"3"

block"4"

a1 �

b1 �

a1+b1 �

2b1-a2-2b2 �

a2 �

b2 �

a2+b2 �

a2+2b2+a1 �

a2+2b2+a1 �

Efficient"Reconstruc:on"

1"byte" 1"byte"

Page 30: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

1"byte"

block"1"

block"2"

block"3"

block"4"

a1 �

b1 �

a1+b1 �

2b1-a2-2b2 �

a2 �

b2 �

a2+b2 �

a2+2b2+a1 �

1"byte"

Data"transferred:"only"3"bytes""(instead"of"4"bytes"as"in"RS)"

Efficient"Reconstruc:on"

Page 31: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

•  Builds"on"top"of"RS"codes"

•  Uses"our"theore:cal"framework"of"“Piggybacking”*"

•  Three"versions"– XOR""– XOR+"– non;XOR"

* K.V. Rashmi, Nihar Shah, K. Ramchandran, “A Piggybacking Design Framework for Read-and Download-efficient Distributed Storage Codes”, in IEEE International Symposium on Information Theory, 2013.!

Hitchhiker’s"Erasure"Code"

Page 32: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

•  Way"of"choosing"which"bytes"to"mix""– couples"bytes"farther"apart"in"block"–  to"minimize"fragmenta:on"of"reads"during"reconstruc:on""

•  Translate"savings"in"network;transfer"to"savings"in"disk;IO"as"well"-  By"making"reads"con:guous"

Hop;and;couple"(disk"layout)"

Page 33: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

RS"vs"Hitchhiker"from"the"Network’s"Perspec:ve…"

Page 34: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

Data"Transfer"during"Reconstruc:on""in"RS;based"System"

Transfer:"10"full"blocks"Connect"to"10"machines"

block"10"block"11"

block"14"block"13"block"12"

block"9"block"8"block"7"block"6"block"5"block"4"

block"2"block"1"

block"3"

256"MB"

data"

parity"

Page 35: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

Data"Transfer"during"Reconstruc:on""in"Hitchhiker"

Transfer:"2"full"blocks"+"9"half"blocks"(="6.5"blocks"total)"Connect"to"11"machines"

Reconstruc:on"of"data"blocks"1;9:"

block"10"block"11"

block"14"256"MB"

block"13"block"12"

block"9"block"8"block"7"block"6"block"5"block"4"

block"2"block"1"

block"3"

data"

parity"

Page 36: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

Data"Transfer"during"Reconstruc:on""in"Hitchhiker"

Transfer:"13"half"blocks"(="6.5"blocks"total)"Connect"to"13"machines"

Reconstruc:on"of"block"10:"

256"MB"

block"10"block"11"

block"14"256"MB"

block"13"block"12"

block"9"block"8"block"7"block"6"block"5"block"4"

block"2"block"1"

block"3"

data"

parity"

Page 37: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

Outline"

•  Erasure"Codes"in"Data"Centers"– HDFS"

•  Impact"on"the"data"center"network"–  Problem"descrip:on""

•  Our"system:"“Hitchhiker”""•  Implementa:on"and"evalua:on"

–  Facebook"data"warehouse"cluster"

•  Literature""

Page 38: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

Implementa:on"&"Evalua:on"Setup"(1)"•  Implemented"on"top"of"HDFS;RAID"

– erasure"coding"module"in"HDFS"based"on"RS"– used"in"the"Facebook"data"warehouse"cluster"

•  Deployed"and"tested"on"a"60"machine"test"cluster"at"Facebook""– verified"35%"reduc:on"in"the"network"transfers"during"reconstruc:on"

Page 39: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

•  Evalua:on"of":ming"metrics"on"the"Facebook"data"warehouse"cluster"in"produc:on"– under"real;:me"produc:on"traffic"and"workloads"

– using"Map;Reduce"to"run"encoding"and"reconstruc:on"jobs,"just"as"HDFS;RAID"

Implementa:on"&"Evalua:on"Setup"(2)"

Page 40: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

Decoding"Time"

•  RS"decoding"on"only"half"por:on"of"the"blocks"•  Faster"computa:on"for"degraded"reads"and"recovery""•  XOR"versions:"25%"lesser"than"non;XOR"

36%""reduc:on"

Page 41: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

Read"&"Transfer"Time"

•  Read"&"transfer":me"30%"lower"in"Hitchhiker"(HH)"•  Similar"reduc:on"for"other"block"sizes"as"well"

System% Data%transfer% Connec:vity%(#machines)%

RS" 2.56"GB" 10"

HH"blocks"1;9" 1.67"GB" 11"

HH"block"10" 1.67"GB" 13"

Median" 95th"%ile"

Page 42: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

Encoding"Time"

72%"higher"

Benefits"outweigh"higher"encoding"cost"in"many"systems"(e.g.,"HDFS):"

• encoding"is"one":me"opera:on"• oHen"run"as"a"background"job"• does"not"fall"along"any"cri:cal"path"

Page 43: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

Outline"

•  Impact"on"the"data"center"network"– Problem"descrip:on""

•  Our"system:"“Hitchhiker”""•  Implementa:on"and"evalua:on"

– Facebook"data"warehouse"cluster"

•  Literature""

Page 44: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

Exis:ng"Systems"•  Need"addi:onal"storage"

– Huang"et"al."(Windows"Azure)"2012,"Sathiamoorthy"et"al."(Xorbas)"2013,"Esmaili"et"al."(CORE)"2013"• Add"addi:onal"pari:es"to"reduce"download"

– Hu"et"al."(NCFS"2011)"

•  Highly"restricted"parameters"– Khan"et"al."(Rotated;RS)"2012:"#parity"≤"3"– Xiang"et"al.,"Wang"et"al."2010,"Hu"(NCCloud)"et"al."2012:"#parity"≤"2"

– Hitchhiker"performs"as"good"or"beter"for"these""restricted"seungs"as"well"

Page 45: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

Code"metrics:""Storage"requirement"" Same"(op:mal)""Supported"parameters"" All"Fault"tolerance"" Same"(op:mal)"""

Reconstruc:on:"Network"transfers" 35%"less""Disk"IO" 35%"less"Data"read"and"transfer":me"(median)"" 31.8%"less""Data"read"and"transfer":me"(95th"%ile)" 30.2%"less"Computa:on":me"(median)" 36.1%"less""

Encoding:"Encoding":me"(median)" 72.1%"more"

Thanks!"

Hitchhiker:"Summary"

Page 46: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

Backup"Slides"

Page 47: A“Hitchhiker’s”GuidetoFastandEfficientData Reconstruc ...rvinayak/papers/Hitchhiker_slides_sigcomm2014.pdf · • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster"

Hop;and;Couple"•  Technique"to"pair"bytes"under"Hitchhiker’s"erasure"code"

•  Makes"disk"reads"during"reconstruc:on"con:guous"

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

coupled bytes (encoded together)

unit 1 unit 2

unit 10

unit 14

unit 3

unit 4

unit 12 unit 13

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . . . . .

hop length

unit 1 unit 2

unit 10 unit 11

unit 14 …

data units

parity units

unit 3

unit 4

unit 12 unit 13

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

1 byte

coupled bytes (encoded together)

(a) coupling adjacent bytes to form stripes (b) hop-and-couple

. . .

. . . . . .

unit 11

data units

parity units