Top Banner
Persistent Data Structures Living in a world where nothing changes but everything evolves - or - A complete idiot's guide to immutability
80
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Persistent Data Structures by @aradzie

Persistent Data Structures

Living in a world where nothing changes but everything evolves

- or -A complete idiot's guide to immutability

Page 2: Persistent Data Structures by @aradzie

● Warm, soft and cute● Imperative● Object oriented● Just like good old

Basic, but with classes

Java Haskell

● Strange, unfamiliar alien● Purely functional● Everything is different● Shocking news! It's not

like Basic!

vs

Page 3: Persistent Data Structures by @aradzie

Haskell does not have variables!

Imagine a dialect of Java where everything is final by defaultclass LinkedList { class Node { final Node next, prev; final Object value; }

final Node head, tail;

void add(final Object v) { for (final Node n = head; n != null; n = n.next) { ... } }}

All fields, parameters and variables are automatically immutable, the final is implied everywhere, and there is no

way to get rid of it

Page 4: Persistent Data Structures by @aradzie

Haskell does not have variables!

Imagine a dialect of Java where everything is final by defaultclass LinkedList { class Node { final Node next, prev; final Object value; }

final Node head, tail;

void add(final Object v) { for (final Node n = head; n != null; n = n.next) { ... } }}

All fields, parameters and variables are automatically immutable, the final is implied everywhere

But it doesn't make sense!

It won't work!

It does for me!

Page 5: Persistent Data Structures by @aradzie

What is a variable?

var·y/ˈve(ə)rē/vary, varied, varying

● — verb (used with object)Definition: to change or alter, as in form, appearance, character, or substance

● — verb (used without object)Definition: to undergo change in appearance, form, substance, character, etc

● — synonyms:modify, mutate

Page 6: Persistent Data Structures by @aradzie

"Variables" in Haskell

● Must be assigned once declared

YES: int a = 1; NO: int a;

● Cannot be reassigned

YES: final int a = 1; NO: a = 2;

These are mathematical variables, not imperative ones!

Page 7: Persistent Data Structures by @aradzie

When everything is immutable

There is no notion of time:

● Functions take old values, produce new values, nothing is changed in-place

● It does not matter when a function was called, it only matters what arguments it was called with

There is no notion of identity:

● Everything is a value, complex data structures are values too

● There is no way to tell if a == b, only if a.equals(b)● In other words, values are never identical to each other, but

may be equal

Page 8: Persistent Data Structures by @aradzie

I want my linked list!

Basic terminology:

● Ephemeral data structure — everything that is not persistent. Most Java data structures (lists, sets, etc.) are ephemeral.

● Persistent data structure — immutable data structure with

history. No in-place modifications. Operations on it create new versions. Older versions are always available. That. Is. Simple.

● The persistence property has nothing to do with persistent storage, like disks! This is a completely different story.

Page 9: Persistent Data Structures by @aradzie

I want my linked list!

● In imperative languages, like Java, most data structures are ephemeral by default

Designing persistent data structures is somewhat awkward and not always efficient

● In purely functional languages, like Haskell, all data structures are automatically persistent!

There is just no other way to make data structures

Page 10: Persistent Data Structures by @aradzie

History of updates

Making update to a persistent DS instancealways creates a new instance that contains this update.

The current version is left unmodified.

Page 11: Persistent Data Structures by @aradzie

Why should I bother?

Is it fun? Hell yeah!

But is it practical? Let's see!

Page 12: Persistent Data Structures by @aradzie

The free lunch is over!"The biggest sea change in software development

since the OO revolution is knocking at the door,and its name is Concurrency." — Herb Sutter

A commodity hardware

(my laptop)

The need for writing correct multi-threaded codeis constantly increasing

Page 13: Persistent Data Structures by @aradzie

Concurrent data structures are hard!

Want a concurrent ephemeral linked list?Here are some implementation strategies:

● Coarse-grained synchronization● Fine-grained synchronization● Optimistic synchronization● Lazy synchronization

All lock-based — no composition, deadlocks, etc

● Non-blocking synchronization in different flavorsAnd you need the size of a list you are in trouble!

Page 14: Persistent Data Structures by @aradzie

Concurrent data structures are hard!

● Making mutable concurrent data structures requires inter-thread coordination within these structures

● Locks and atomic references all over the place

● Decades of research by academia with many attempts

● Sophisticated algorithms that are hard to reason about, test and prove

● Several different ways to solve the same problems, each with its own cons and pros

Page 15: Persistent Data Structures by @aradzie

Concurrent data structures are hard!

● Making mutable concurrent data structures requires inter-thread coordination within these structures

● Locks and atomic references all over the place

● Decades of research by academia with many attempts

● Sophisticated algorithms that are hard to test and prove

● Several different ways to solve the same problems, each with its own cons and pros

Yes, but are persistent data structures actually simpler?

Page 16: Persistent Data Structures by @aradzie

Just give up mutability!

● Persistent data structures are easy to reason about in concurrent environment

● The behavior does not depend on how many threads are trying to "modify" it at once

● Therefore persistent data structures are very easy to test and debug

Page 17: Persistent Data Structures by @aradzie

The whole picture

● Persistent data structures alone are not sufficientThey are an essential part of the picture, but not the whole answer to concurrency

● Inter-thread coordination is neededThreads still need to know what each other thread is doing to agree on a common outcome

● But it can be added "outside"Which gives us complete separation of concerns

Page 18: Persistent Data Structures by @aradzie

The whole picture

Solving concurrency challenge in a modern language:

● Scala Way — Persistent data structures with message passing

● Clojure Way — Persistent data structures with software transactional memory

● Will likely be mixed in the future

Page 19: Persistent Data Structures by @aradzie

Last few words on concurrency

● Persistent data structures are slower than ephemeral ones in sequential use

● But not that much slower!

● We can forgive it, since they give you more functionality, and ephemeral data structures are simply less capable

● And in multiprocessor era, it is better to make things scalable rather than fast

Page 20: Persistent Data Structures by @aradzie

Efficient persistent data structures

We want persistent data structures to be space and time efficient:

● Structural sharingWe want to reuse as many fragments of the previous version as possible

● Path copyingWe want to copy as few pieces as possible

● Maybe, just maybe lazy evaluation (where available)We don't want nasty pathological cases

Page 21: Persistent Data Structures by @aradzie

A case study

● Let's make some persistent data structures in Java

● All these structures consist of classes with only final fields

● With good amortized asymptotic complexity in most cases

Why are you looking at me?!

Page 22: Persistent Data Structures by @aradzie

Our plan

Lets start with some trivial examples

● Stack

● Queue

● Tree

The proceed with more advanced structures

● Hash Table

● Finger Tree

Page 23: Persistent Data Structures by @aradzie

Trivial Example — Persistent Stackclass Stack<T> { final T v; (a) final Stack<T> next; (b)

Stack() { v = null; next = null; size = 0; }

Stack(T v, Stack<T> next) { this.v = v; this.next = next; } ...

Source Code 1/2

It's just a singly linkedlist of nodes

Page 24: Persistent Data Structures by @aradzie

Trivial Example — Persistent Stackclass Stack<T> { ... Stack<T> push(T v) { return new Stack<T>(v, this); (a) }

T peek() { if (next == null) throw new NoSuchElementException(); return v; (b) }

Stack<T> pop() { if (next == null) throw new NoSuchElementException(); return next; (c) }

Source Code 2/2

Page 25: Persistent Data Structures by @aradzie

Trivial Example — Persistent Stack

Structural sharing in persistent stack

Page 26: Persistent Data Structures by @aradzie

Trivial Example — Persistent Stack

Looks familiar?The versions tree!

Page 27: Persistent Data Structures by @aradzie

Trivial Example — Persistent Stack

Also known as Spaghetti stack or

Cactus stack

Page 28: Persistent Data Structures by @aradzie

Persistent Queue

It's just two stacks combined:

● Back stack to enqueue items● Front stack to dequeue items

When front stack is empty, reverse back stack and use it as front stack

Page 29: Persistent Data Structures by @aradzie

Persistent Queueclass Queue<T> { // back stack - push elements here final Stack<T> b; (a) // front stack - pop elements from here final Stack<T> f; (b)

Queue() { b = f = new Stack<T>(); }

Queue(Stack<T> b, Stack<T> f) { this.b = b; this.f = f; }

boolean isEmpty() { return f.isEmpty(); (c) } ...

Source Code 1/3

Page 30: Persistent Data Structures by @aradzie

Persistent Queueclass Queue<T> { ... static <T> Queue<T> check(Stack<T> b, Stack<T> f) { if (f.isEmpty()) return new Queue<T>(f, b.reverse()); (a) else return new Queue<T>(b, f); (b) }

Queue<T> push(T v) { return check(b.push(v), f); }

Queue<T> pop() { if (isEmpty()) { throw new NoSuchElementException(); } return check(b, f.pop()); }

Source Code 2/3

Page 31: Persistent Data Structures by @aradzie

Persistent Queueclass Queue<T> { ... T peek() { if (isEmpty()) { throw new NoSuchElementException(); } return f.peek(); }

class Stack<T> { ... Stack<T> reverse() { if (isEmpty() || next.isEmpty()) return this; Stack<T> r = new Stack<T>(); for (Stack<T> s = this; !s.isEmpty(); s = s.pop()) { r = r.push(s.peek()); } return r; }

Source Code 3/3

Page 32: Persistent Data Structures by @aradzie

Persistent Queue

Structural sharing in persistent queue

Page 33: Persistent Data Structures by @aradzie

Persistent Queue

Beware pathological cases!

● What is forward stack is empty, but back stack is full?

● And we are going to pop from the same queue N times

● Then we get N back back stack reversions!

● Lazy evaluation to the rescue — use lazy streams instead of strict stacks

Page 34: Persistent Data Structures by @aradzie

Persistent Queue

But there is a better wayto design queue!

Monoidally Annotated 2-3 Finger Tree is a versatile data structure that can be used to build efficient lists, deques, priority queues, interval trees, ropes, etc.

It is more complex, we will take a look at it later.

Page 35: Persistent Data Structures by @aradzie

Persistent Tree

● It is trivial to convert any ephemeral tree to a persistent one by means of path copying

● It works for binary trees, 2-3 trees, B-trees, etc

● The shape of tree is not affected, only mutating algorithms

● In a balanced binary tree at most log N nodes need to be copied — quite efficient

● The secret to all persistent data structures is that they all are trees! (Yes, lists and hash tables are trees too)

Page 36: Persistent Data Structures by @aradzie

Persistent Tree

Page 37: Persistent Data Structures by @aradzie

Simple Persistent Binary Tree

class SimpleBinaryTree { static class Node { final K key; (a) final V value; (b) final Node l, r; (c)

Node(K key, V value, Node l, Node r) { this.key = key; this.value = value; this.l = l; this.r = r; } } ...

Source Code 1/2

Page 38: Persistent Data Structures by @aradzie

Simple Persistent Binary Tree

class SimpleBinaryTree { ... static Node insert(Node n, K key, V value) { if (n == null) { return new Node(key, value, null, null); (a) } int cmp = key.compareTo(n.key); (b) if (cmp < 0) { return new Node(n.key, n.value, (c) insert(n.l, key, value), n.r); } if (cmp > 0) { return new Node(n.key, n.value, (d) n.l, insert(n.r, key, value)); } return new Node(key, value, n.l, n.r); (e) }

Source Code 2/2

Page 39: Persistent Data Structures by @aradzie

Persistent Tree

Multiple definitions of persistence:

● Immutable data structure with history● Committed to a persistent storage

Append only databases and file systems:

● CouchDB uses append only B-Tree● RethinkDB makes append only variant of MySQL● ZFS, BTRFS implement copy-on-write transactions

and snapshots

Nothing is new under the moon!

Page 40: Persistent Data Structures by @aradzie

Persistent Map

interface Map<K, V> { // get value for a key, or null if not found V get(K key); // make key/value association Map<K, V> put(K key, V value); // remove key/value association Map<K, V> remove(K key);}

Remember, no in-place updatesMutations create new instances

Page 41: Persistent Data Structures by @aradzie

Persistent Map

Implementation Strategy

● Persistent red-black tree for ordered keysTime complexity — O(log n)

● Persistent hash table for hashable keysTime complexity — O(1)

Page 42: Persistent Data Structures by @aradzie

Persistent Hash Table

But how do we implement it?Copying the whole table would be too expensive!

Page 43: Persistent Data Structures by @aradzie

Persistent Hash Table

Here's the idea: partition hash table into smaller pieces, organized them as a persistent tree

Nice idea, but how do we navigate in such a tree?

Page 44: Persistent Data Structures by @aradzie

Prefix Tree/Trie

Hash code is just a string of digits!

Search is guided by individual letters of a string key

Page 45: Persistent Data Structures by @aradzie

Persistent Hash Table in Prefix Tree

Represent 32 bit hash codes as strings of 5 bit symbol:

hashCode = CAFEBABE16level 6 5 4 3 2 1 0bits 11 00101 01111 11101 01110 10101 11110symbol 3 5 15 29 14 21 30

Page 46: Persistent Data Structures by @aradzie

Persistent Hash Table

hashCode = ... xxxxx xxxxx xxxxx xxxxx

Each item is either a key/value pair or a subtree

Page 47: Persistent Data Structures by @aradzie

Persistent Hash Table

class PersistentHashMap { abstract class Item<K, V> {}

class Node<K, V> extends Item<K, V> { final Item<K, V> children = new Item<K, V>[32]; (a) }

class Entry<K, V> extends Item<K, V> { final int hashCode; (b) final K key; (c) final V value; (d) final Entry<K, V> next; (e) }

Source Code 1/2

Page 48: Persistent Data Structures by @aradzie

Persistent Hash Table

class PersistentHashMap { V get(K key) { return root.find(key.hashCode(), key, 0); (a) }

class Node<K, V> extends Item<K, V> { V find(int hashCode, K key, int level) { int index = (hashCode >>> (level * 5)) & 31; (b) Item<K, V> item = children[index]; (c) if (item instanceof Node) { (d) return ((Node<K, V>) item) (e) .find(hashCode, key, level + 1); } if (item instanceof Entry) { (f) return ((Entry<K, V>) item) (g) .find(hashCode, key); } return null; }

Source Code 2/2

Page 49: Persistent Data Structures by @aradzie

Persistent Hash Table

Do not waste space!

class PersistentHashMap { class Node<K, V> { final Item<K, V> children = new Item<K, V>[32]; (a) }

● Most of the children would be null on deeper levels

● The number of arrays grows exponentially as we go deeper

● Need to find a way to compact tree

● Simply get rid of nulls in arrays!

Page 50: Persistent Data Structures by @aradzie

Persistent Hash Table

● Mask is a 32-bit integer whose bits set to 1 only for those array elements that are not null

● Array stores only non-null elements. Its size is the number of 1 bits in the mask. Array size varies from 2 to 32 elements.

● Overhead for null array element is just one bit. Quite good!

class Node<K, V> { final int mask; (a) final Item<K, V> children = new Item<K, V>[bitCount(mask)]; (b)}

Page 51: Persistent Data Structures by @aradzie

Persistent Hash Table

● To test that array has element at index i, simply test if ith bit in the mask is 1:

if ((mask & (1 << i)) != 0) { ...

● To get offset to ith element in the array, count number of 1 bits lower than i in the mask:

int offset = bitCount(mask & ((1 << i) - 1));if (children[offset] instanceof ...

Page 52: Persistent Data Structures by @aradzie

Persistent List

interface Seq<T> { T head(); // get first element Seq<T> tail(); // get list without first element Seq<T> cons(T v); // append element to head Seq<T> snoc(T v); // append element to tail Seq<T> concat(Seq<T> that); // join two lists int size(); // get number of elements T get(int index); // get Nth element Seq<T> set(int index, T v); // set Nth element }

Remember, no in-place updatesMutations create new instances

Page 53: Persistent Data Structures by @aradzie

Persistent List

● There are quite a few ways to implement persistent lists

● But we will not be studying them

● Instead, we will turn our attention to finger trees

● Soon, it will be clear why

Page 54: Persistent Data Structures by @aradzie

Finger Trees

● An incredibly elegant, simple and efficient data structure

● Oh so very versatile, functional programmer's Swiss Army knife

● Basic data structure for building random acces sequences, deques, priority queues, ropes, interval trees, etc.

● Let's define it in stages

Page 55: Persistent Data Structures by @aradzie

Persistent leafy 2-3 trees

Let's begin with a simple data structure — leafy 2-3 tree

● Every intermediate node has either two childrent or three children

● All values are stored in leafs

● Perfectly balanced — all leafs are at the same level

Page 56: Persistent Data Structures by @aradzie

Persistent leafy 2-3 trees

Page 57: Persistent Data Structures by @aradzie

Persistent leafy 2-3 trees

Leafs contain interesting values,

but what is stored in nodes?

Page 58: Persistent Data Structures by @aradzie

Annotated leafy 2-3 trees

● There must be a way to find interesting values in a tree

● We need to guide search from the root of a tree to its leafs

● Let's add special annotations to nodes

● Use these annotations to find values

Page 59: Persistent Data Structures by @aradzie

Size annotated leafy 2-3 trees

● Each intermediate node is annotated with the size of a subtree rooted at this node

● Makes it trivial to find any leaf by its index

● Starting from root, test if index is in the range of its left (middle) or right subtree, and repeat recursively for that subtree, until a leaf is found

Page 60: Persistent Data Structures by @aradzie

Size annotated leafy 2-3 trees

Looks like random access list

Page 61: Persistent Data Structures by @aradzie

Priority annotated leafy 2-3 trees

● Each intermediate node is annotated with the highest priority of an element in its subtree

● Makes it trivial to find value with the highest priority

● Starting from root, find subtree with the highest priority descent recursively into it, until a leaf is found

Page 62: Persistent Data Structures by @aradzie

Priority annotated leafy 2-3 trees

Looks like priority queue

Page 63: Persistent Data Structures by @aradzie

Monoids

● One interface to unify size, priority (and more!) annotations on trees

● A set of values with a "zero" element 0 and a binary associative operation ⊕

● Monoid laws:0⊕a = aa⊕0 = aa⊕(b⊕c) = (a⊕b)⊕c

Page 64: Persistent Data Structures by @aradzie

Monoid examples

● Strings with empty string and concatenation"" + "a" = "a", "a" + "" = "a""a" + ("b" + "c") = ("a" + "b") + "c"

● Integers with zero and addition0 + 1 = 1, 1 + 0 = 11 + (2 + 3) = (1 + 2) + 3

● Integers with one and multiplication1 * 2 = 2, 2 * 1 = 12 * (3 * 4) = (2 * 3) * 4

● And many, more of them! (Monoids are everywhere)

Page 65: Persistent Data Structures by @aradzie

Monoid interface

interface Monoid<T extends Monoid<T>> { T unit(); T combine(T that);}

class String implements Monoid<String> { ...

String unit() { return ""; (a) }

String combine(String that) { return this + that; (b) }}

Page 66: Persistent Data Structures by @aradzie

Size monoid

class Size implements Monoid<Size> { final int size; (a)

Size(int size) { this.size = size; }

Size unit() { return new Size(0); (b) }

Size combine(Size that) { return new Size(this.size + that.size); (c) }}

Page 67: Persistent Data Structures by @aradzie

Priority monoid

class Priority implements Monoid<Priority> { final int priority; (a)

Priority(int priority) { this.priority = priority; }

Priority unit() { return new Priority(MAX_INTEGER); (b) }

Priority combine(Priority that) { return new Priority( Math.min(this.priority, that.priority)); (c) }}

Page 68: Persistent Data Structures by @aradzie

But where do we get monoids from?

● Monoids have nice property of composability

● We can get more monoids by combining existing ones

● But where do we get initial monoids to begin with?

● We need a way to measure values!

● Those measures must be monoids, obviously

interface Measured<M extends Monoid> { M measure();}

Page 69: Persistent Data Structures by @aradzie

Let's make a sketch of annotated tree/** <V> is the type of values <M> is the type of monoidal measures of values */class Tree<M extends Monoid, V extends Measured<M>> implements Measured<M> { (a)

abstract class Leaf<M, V> extends Tree<M, V> { final V value; (b) override abstract M measure(); (c) }

class Node<M, V> extends Tree<M, V> { final Tree<M, V> left, right; (d) final M m; (e) Node(Tree<M, V> l, Tree<M, V> r) { left = l; right = r; m = l.measure().combine(r.measure()); (f) } override final M measure() { return m; (g) }

Pseudocode!

Page 70: Persistent Data Structures by @aradzie

Let's make a sketch of annotated tree ... class Leaf<V> extends Tree<Size, V> { final V value;

override final Size measure() { return new Size(1); (a) } }

... class Leaf<V> extends Tree<Priority, V> { final V value;

override final Priority measure() { return new Priority(value.priority()); (b) } }

Pseudocode!

Page 71: Persistent Data Structures by @aradzie

But that is not finger tree yet!

Page 72: Persistent Data Structures by @aradzie

Finger Tree

... is a just an annotated tree of annotated 2-3 trees!

Page 73: Persistent Data Structures by @aradzie

Finger Tree

Digits, 2-3 trees, fingers and nested levels

Page 74: Persistent Data Structures by @aradzie

Finger Tree

A little bit of Haskell would not hurt:

data Node v a = Node2 v a a | Node3 v a a a

data Digit v a = One v a | Two v a a | Three v a a a | Four v a a a a

data FingerTree v a = Empty | Single a | Deep v (Digit a) (a) (FingerTree v (Node v a)) (b) (Digit a) (c)

Page 75: Persistent Data Structures by @aradzie

Finger Tree

class FingerTree<M extends Monoid<M>, T extends Measured<M>> implements Measured<M> {

class Empty<M extends Monoid<M>, T extends Measured<M>> extends FingerTree<M, T> {}

class Single<M extends Monoid<M>, T extends Measured<M>> extends FingerTree<M, T> { final T v; (a) final M m; (b)

class Deep<M extends Monoid<M>, T extends Measured<M>> extends FingerTree<M, T> { final Digit<M, T> prefix; (c) final FingerTree<M, Node<M, T>> middle; (d) final Digit<M, T> suffix; (e) final M m; (f)

Source Code 1/3

Page 76: Persistent Data Structures by @aradzie

Finger Tree

class Digit<M extends Monoid<M>, T extends Measured<M>> implements Measured<M> { final M m; (a)

class One<M extends Monoid<M>, T extends Measured<M>> extends Digit<M, T> { final T a; (b)

class Two<M extends Monoid<M>, T extends Measured<M>> extends Digit<M, T> { final T a, b; (c)

class Three<M extends Monoid<M>, T extends Measured<M>> extends Digit<M, T> { final T a, b, c; (d)

class Four<M extends Monoid<M>, T extends Measured<M>> extends Digit<M, T> { final T a, b, c, d; (e)

Source Code 2/3

Page 77: Persistent Data Structures by @aradzie

Finger Tree

class Node<M extends Monoid<M>, T extends Measured<M>> implements Measured<M> { final M m; (a)

class Node2<M extends Monoid<M>, T extends Measured<M>> extends Node<M, T> { final T a, b; (b)

class Node3<M extends Monoid<M>, T extends Measured<M>> extends Node<M, T> { final T a, b, c; (c)

Source Code 3/3

Page 78: Persistent Data Structures by @aradzie

Finger Tree Interface

Basic operations:

● cons, snoc — append/prepend element● concat — join two trees● split — find prefix, element and suffix using predicate

Beyond the scope of this presentation, sorry

Page 79: Persistent Data Structures by @aradzie

Finger Tree Performance

Amortized bounds:

● cons, snoc● head, last● concat● split● index

Finger TreeO(1)O(1)O(log min(ℓ1, ℓ2))O(log min(n, ℓ-n))O(log min(n, ℓ-n)

ListO(1)/O(n)O(1)/O(n)O(n)O(n)O(n)

2-3 TreeO(log n)O(log n)O(log n)O(log n)O(log n)

Page 80: Persistent Data Structures by @aradzie

Thanks!

Questions?