Top Banner
Resilient Algorithms and Data Structures Giuseppe F. Italiano Università di Roma “Tor Vergata”
49

Resilient algorithms data structures intro by Giuseppe F.Italiano

Jan 20, 2017

Download

Engineering

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Resilient algorithms data structures intro by Giuseppe F.Italiano

Resilient Algorithms and Data Structures

Giuseppe F. Italiano Università di Roma “Tor Vergata”

Page 2: Resilient algorithms data structures intro by Giuseppe F.Italiano

Some advertising first

School on Graph Theory, Algorithms and Applications Erice (Sicily), September 25 - October 3, 2011

Please send your best students /postdocs /junior ….

Page 3: Resilient algorithms data structures intro by Giuseppe F.Italiano

Memory Errors Common in Practice

3

Page 4: Resilient algorithms data structures intro by Giuseppe F.Italiano

Outline of the Talk

1.  Motivation and Model 2.  Resilient Algorithms: •  Sorting and Searching

3.  Resilient Data Structures •  Priority Queues •  Dictionaries

4.  Conclusions and Open Problems

4

Page 5: Resilient algorithms data structures intro by Giuseppe F.Italiano

Memory Errors

Memory error: one or multiple bits read differently from how they were last written.

Many possible causes: •  electrical or magnetic interference (cosmic rays) •  hardware problems (bit permanently damaged) •  corruption in data path between memories and processing units

Errors in DRAM devices concern for a long time [May & Woods 79, Ziegler et al 79, Chen & Hsiao 84, Normand 96, O’Gorman et al 96, Mukherjee et al 05, … ]

5

Page 6: Resilient algorithms data structures intro by Giuseppe F.Italiano

Memory Errors

Soft Errors: Randomly corrupt bits, but do not leave any physical damage --- cosmic rays

Hard Errors: Corrupt bits in a repeatable manner because of a physical defect (e.g., stuck bits) --- hardware problems

6

Page 7: Resilient algorithms data structures intro by Giuseppe F.Italiano

Error Correcting Codes (ECC)

Error correcting codes (ECC) allow detection and correction of one or multiple bit errors

Typical ECC is SECDED (i.e., single error correct, double error detect)

Chip-Kill can correct up to 4 adjacent bits at once

ECC has several overheads in terms of performance (33%), size (20%) and money (10%).

ECC memory chips are mostly used in memory systems for server machines rather than for client computers

7

Page 8: Resilient algorithms data structures intro by Giuseppe F.Italiano

Impact of Memory Errors

Consequence of a memory error is system dependent

1. Correctable errors : fixed by ECC

2. Uncorrectable errors :

2.1. Detected : Explicit failure (e.g., a machine reboot)

2.2. Undetected : 2.2.1. Induced failure (e.g., a kernel panic) 2.2.2. Unnoticed (but application corrupted, e.g., segmentation fault, file not found, file not readable, … )

8

Page 9: Resilient algorithms data structures intro by Giuseppe F.Italiano

Impact of Memory Errors

9

Page 10: Resilient algorithms data structures intro by Giuseppe F.Italiano

How Common are Memory Errors?

10

Page 11: Resilient algorithms data structures intro by Giuseppe F.Italiano

How Common are Memory Errors?

11

Page 12: Resilient algorithms data structures intro by Giuseppe F.Italiano

How Common are Memory Errors? [Schroeder et al 2009] experiments 2.5 years (Jan 06 – Jun 08) on Google fleet (104 machines, ECC memory)

Memory errors are NOT rare events! 12

Page 13: Resilient algorithms data structures intro by Giuseppe F.Italiano

Memory Errors

Not all machines (clients) have ECC memory chips.

Increased demand for larger capacities at low cost just makes the problem more serious – large clusters of inexpensive memories

Need of reliable computation in the presence of memory faults

13

Page 14: Resilient algorithms data structures intro by Giuseppe F.Italiano

Memory Errors

•  Memory errors can cause security vulnerabilities: Fault-based cryptanalysis [Boneh et al 97, Xu et al 01, Bloemer & Seifert 03] Attacking Java Virtual Machines [Govindavajhala & Appel 03] Breaking smart cards [Skorobogatov & Anderson 02, Bar-El et al 06]

•  Avionics and space electronic systems: Amount of cosmic rays increase with altitude (soft errors)

Other scenarios in which memory errors have impact (and seem to be modeled in an adversarial setting):

14

Page 15: Resilient algorithms data structures intro by Giuseppe F.Italiano

Memory Errors in Space

15

Page 16: Resilient algorithms data structures intro by Giuseppe F.Italiano

Memory Errors in Space

16

Page 17: Resilient algorithms data structures intro by Giuseppe F.Italiano

Memory Errors in Space

17

Page 18: Resilient algorithms data structures intro by Giuseppe F.Italiano

Recap on Memory Errors

1. Memory errors are NOT rare: even a small cluster of computers with few GB per node can experience one bit error every few minutes.

18

I know my PIN number: it’s my name I can’t remember…

Page 19: Resilient algorithms data structures intro by Giuseppe F.Italiano

Memory Errors

Mem. size Mean Time Between Failures

512 MB 2.92 hours 1 GB 1.46 hours 16 GB 5.48 minutes 64 GB 1.37 minutes 1 TB 5.13 seconds

In the field study, Google researchers observed mean error rates of 2,000 – 6,000 per GB per year (25,000 – 75,000 FIT/Mbit)

19

Page 20: Resilient algorithms data structures intro by Giuseppe F.Italiano

Recap on Memory Errors

2. Memory errors can be harmful: uncorrectable memory errors cause some catastrophic event (reboot, kernel panic, data corruption, …)

20

I’m thinking of getting back into crime, Luigi. Legitimate business is too corrupt…

Page 21: Resilient algorithms data structures intro by Giuseppe F.Italiano

A small example

Classical algorithms may not be correct in the presence of (even very few) memory errors

1 2 3 4 5 6 7 8 9 10

11 12 13 14 15 16 17 18 19 20

A

B

Out

An example: merging two ordered lists

Θ(n) Θ(n)

Θ(n2) inversions

... 11 12 20 13

80

... 2 3 4 9 10 80

21

Page 22: Resilient algorithms data structures intro by Giuseppe F.Italiano

Recap on Memory Errors

3. ECC may not be available (or may not be enough): No ECC in inexpensive memories. ECC does not guarantee complete fault coverage; expensive; system halt upon detection of uncorrectable errors; service disruption; etc… etc…

22

Page 23: Resilient algorithms data structures intro by Giuseppe F.Italiano

Resilient Algorithms and Data Structures

Resilient Algorithms and Data Structures: Capable of tolerating memory errors on data (even

throughout their execution) without sacrificing correctness, performance and storage space

Make sure that the algorithms and data structures we design are capable of dealing with memory errors

23

Page 24: Resilient algorithms data structures intro by Giuseppe F.Italiano

Faulty- Memory Model [Finocchi, I. 04]

•  Memory fault = the correct data stored in a memory location gets altered (destructive faults)

•  Faults can appear at any time in any memory location simultaneously

•  Assumptions: – Only O(1) words of reliable memory (safe memory) – Corrupted values indistinguishable from correct ones

Wish to produce correct output on uncorrupted data (in an adversarial model)

•  Even recursion may be problematic in this model.

24

Page 25: Resilient algorithms data structures intro by Giuseppe F.Italiano

Terminology

δ = upper bound known on the number of memory errors (may be function of n)

α = actual number of memory errors (happen during specific execution)

Note: typically α ≤ δ

All the algorithms / data structure described here need to know δ in advance

25

Page 26: Resilient algorithms data structures intro by Giuseppe F.Italiano

Other Faulty Models

Design of fault-tolerant alg’s received attention for 50+ years

Liar Model [Ulam 77, Renyi 76,…]

Comparison questions answered by a possibly lying adversary. Can exploit query replication strategies.

Fault-tolerant sorting networks [Assaf Upfal 91, Yao Yao 85,…]

Comparators can be faulty. Exploit substantial data replication using fault-free data replicators.

Parallel Computations [Huang et al 84, Chlebus et al 94, …]

Faults on parallel/distributed architectures: PRAM or DMM simulations (rely on fault-detection mechanisms)

26

Page 27: Resilient algorithms data structures intro by Giuseppe F.Italiano

Other Faulty Models

  Robustness in Computational Geometry [Schirra 00, …]

  Faults from unreliable computation (geometric precision) rather than from memory errors

  Noisy / Unreliable Computation [Bravermann Mossel 08]

  Faults (with given probability) from unreliable primitives (e.g., comparisons) rather than from memory errors

  Memory Checkers [Blum et al 93, Blum et al 95, …]

  Programs not reliable objects: self-testing and self-correction. Essential error detection and error correction mechanisms.

  ………………………………………

27

Page 28: Resilient algorithms data structures intro by Giuseppe F.Italiano

Outline of the Talk

1.  Motivation and Model 2.  Resilient Algorithms: •  Sorting and Searching

3.  Resilient Data Structures •  Priority Queues •  Dictionaries

4.  Conclusions and Open Problems

28

Page 29: Resilient algorithms data structures intro by Giuseppe F.Italiano

Resilient Sorting

We are given a set of n keys that need to be sorted

Q1. Can sort efficiently correct values in presence of memory errors? Q2. How many memory errors can tolerate in the worst case if we wish to maintain optimal time and space?

Value of some keys may get arbitrarily corrupted

We cannot tell which is faithful and which is corrupted

29

Page 30: Resilient algorithms data structures intro by Giuseppe F.Italiano

Terminology

•  Faithfully ordered sequence = ordered except for corrupted keys

•  Resilient sorting algorithm = produces a faithfully ordered sequence (i.e., wish to sort correctly all the uncorrupted keys)

•  Faithful key = never corrupted

1 2 3 4 5 6 7 8 9 10 ordered Faithfully

80

•  Faulty key = corrupted

30

Page 31: Resilient algorithms data structures intro by Giuseppe F.Italiano

Trivially Resilient Resilient variable: consists of (2δ+1) copies

x1, x2, …, x2δ+1 of a standard variable x

Value of resilient variable given by majority of its copies: •  cannot be corrupted by faults •  can be computed in linear time and constant space

[Boyer Moore 91]

Trivially-resilient algorithms and data structures have Θ(δ) multiplicative overheads in terms of time and space

Note: Trivially-resilient does more than ECC (SECDED, Chip-Kill, ….)

31

Page 32: Resilient algorithms data structures intro by Giuseppe F.Italiano

Trivially Resilient Sorting

Can trivially sort in O(δ n log n) time during δ memory errors

Trivially Resilient Sorting

O(n log n) sorting algorithm able to tolerate only O (1) memory errors

32

Page 33: Resilient algorithms data structures intro by Giuseppe F.Italiano

Resilient Sorting

Comparison-based sorting algorithm that takes O(n log n + δ2) time to run during δ memory errors

O(n log n) sorting algorithm able to tolerate up to O ((n log n)1/2) memory errors

Any comparison-based resilient O(n log n) sorting algorithm can tolerate the corruption of at most O ((n log n)1/2) keys

Upper Bound [Finocchi, Grandoni, I. 05]:

Lower Bound [Finocchi, I. 04]:

33

Page 34: Resilient algorithms data structures intro by Giuseppe F.Italiano

Resilient Sorting (cont.)

Randomized integer sorting algorithm that takes O(n + δ2) time to run during δ memory errors

O(n) randomized integer sorting algorithm able to tolerate up to O(n1/2) memory errors

Integer Sorting [Finocchi, Grandoni, I. 05]:

34

Page 35: Resilient algorithms data structures intro by Giuseppe F.Italiano

search(5) = false

Resilient Binary Search

2 3 4 5 8 9 13 20 26 1 7 80 10

Wish to get correct answers at least on correct keys:

search(s) either finds a key equal to s, or determines that no correct key is equal to s

If only faulty keys are equal to s, answer uninteresting (cannot hope to get trustworthy answer)

35

Page 36: Resilient algorithms data structures intro by Giuseppe F.Italiano

Trivially Resilient Binary Search

Can search in O(δ log n) time during δ memory errors

Trivially Resilient Binary Search

36

Page 37: Resilient algorithms data structures intro by Giuseppe F.Italiano

Resilient Searching

Randomized algorithm with O(log n + δ) expected time [Finocchi, Grandoni, I. 05]

Deterministic algorithm with O(log n + δ) time [Brodal et al. 07]

Upper Bounds :

Lower Bounds : Ω(log n + δ) lower bound (deterministic) [Finocchi, I. 04]

Ω(log n + δ) lower bound on expected time [Finocchi, Grandoni, I. 05]

37

Page 38: Resilient algorithms data structures intro by Giuseppe F.Italiano

Resilient Dynamic Programming

Running time O(nd + δd+1) and space usage O(nd + nδ) Can tolerate up to δ = O(nd/(d+1)) memory errors

[Caminiti et al. 10]

d-dim. Dynamic Programming

38

Page 39: Resilient algorithms data structures intro by Giuseppe F.Italiano

Outline of the Talk

1.  Motivation and Model 2.  Resilient Algorithms: •  Sorting and Searching

3.  Resilient Data Structures •  Priority Queues •  Dictionaries

4.  Conclusions and Open Problems

39

Page 40: Resilient algorithms data structures intro by Giuseppe F.Italiano

Resilient Data Structures

Algorithms affected by errors during execution

Data structures affected by errors in lifetime

Data structures more vulnerable to memory errors than algorithms:

40

Page 41: Resilient algorithms data structures intro by Giuseppe F.Italiano

Resilient Priority Queues

Maintain a set of elements under insert and deletemin

insert adds an element

deletemin deletes and returns either the minimum uncorrupted value or a corrupted value

Consistent with resilient sorting

41

Page 42: Resilient algorithms data structures intro by Giuseppe F.Italiano

Resilient Priority Queues

Upper Bound : Both insert and deletemin can be implemented

in O(log n + δ) time [Jorgensen et al. 07] (based on cache-oblivious priority queues)

Lower Bound : A resilient priority queue with n > δ elements must

use Ω(log n + δ) comparisons to answer an insert followed by a deletemin

[Jorgensen et al. 07]

42

Page 43: Resilient algorithms data structures intro by Giuseppe F.Italiano

Resilient Dictionaries

Maintain a set of elements under insert, delete and search

insert and delete as usual, search as in resilient searching:

Again, consistent with resilient sorting

search(s) either finds a key equal to s, or determines that no correct key is equal to s

43

Page 44: Resilient algorithms data structures intro by Giuseppe F.Italiano

Resilient Dictionaries

Randomized resilient dictionary implements each operation in O(log n + δ) time

[Brodal et al. 07]

More complicated deterministic resilient dictionary implements each operation in O(log n + δ) time

[Brodal et al. 07]

44

Page 45: Resilient algorithms data structures intro by Giuseppe F.Italiano

Resilient Dictionaries

Pointer-based data structures

Faults on pointers likely to be more problematic than faults on keys

Randomized resilient dictionaries of Brodal et al. built on top of traditional (non-resilient) dictionaries

Our implementation built on top of AVL trees

45

Page 46: Resilient algorithms data structures intro by Giuseppe F.Italiano

Outline of the Talk

1.  Motivation and Model 2.  Resilient Algorithms: •  Sorting and Searching

3.  Resilient Data Structures •  Priority Queues •  Dictionaries

4.  Conclusions and Open Problems

46

Page 47: Resilient algorithms data structures intro by Giuseppe F.Italiano

Concluding Remarks

•  Need of reliable computation in the presence of memory errors

•  Investigated basic algorithms and data structures in the faulty memory model: do not wish to detect /correct errors, only produce correct output on correct data

•  Tight upper and lower bounds in this model •  After first tests, resilient implementations of

algorithms and data structures look promising

47

Page 48: Resilient algorithms data structures intro by Giuseppe F.Italiano

Future Work and Open Problems

•  More (faster) implementations, engineering and experimental analysis?

•  Resilient graph algorithms?

•  Lower bounds for resilient integer sorting?

•  Better faulty memory model?

•  Resilient algorithms oblivious to δ?

•  Full repertoire for resilient priority queues (delete, decreasekey, increasekey)?

48

Page 49: Resilient algorithms data structures intro by Giuseppe F.Italiano

Thank You!

49

My memory’s terrible these days…