Top Banner
Succinct: Fast Interactive Queries Anurag Khandelwal
272

SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Jan 19, 2017

Download

Data & Analytics

Chester Chen
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct: Fast Interactive Queries

Anurag Khandelwal

Page 2: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Interactive Queries at Scale

Page 3: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Interactive Queries at Scale

Search Tweets by @AMPLab about #Succinct

Page 4: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Interactive Queries at Scale

Search

Regular Expressions

Tweets by @AMPLab about #Succinct

Links to Berkeley or Stanford domains.*(berkeley|stanford)\.edu

Page 5: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Interactive Queries at Scale

Search

Regular Expressions

Range Queries

Tweets by @AMPLab about #Succinct

Links to Berkeley or Stanford domains.*(berkeley|stanford)\.edu

All my Facebook posts between 2013 and 2016

Page 6: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Interactive Queries at Scale

Search

Regular Expressions

Range Queries

Graph Queries

Tweets by @AMPLab about #Succinct

Links to Berkeley or Stanford domains.*(berkeley|stanford)\.edu

All my Facebook posts between 2013 and 2016

Friends of my friends who like trekking

Page 7: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Interactive Queries at Scale

Search

Random Access

Regular Expressions

Range Queries

Graph Queries

Aggregate Queries

Updates

Tweets by @AMPLab about #Succinct

Links to Berkeley or Stanford domains.*(berkeley|stanford)\.edu

All my Facebook posts between 2013 and 2016

Friends of my friends who like trekking

Page 8: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Interactive Queries at Scale

Search

Random Access

Regular Expressions

Range Queries

Graph Queries

Aggregate Queries

Updates

Compute Platforms

Page 9: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Interactive Queries at Scale

Search

Random Access

Regular Expressions

Range Queries

Graph Queries

Aggregate Queries

Updates

Compute Platforms

Query Engines

Page 10: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Interactive Queries at Scale

Search

Random Access

Regular Expressions

Range Queries

Graph Queries

Aggregate Queries

Updates

Compute Platforms

Query Engines

Data Stores

Page 11: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Interactive Queries at Scale

Page 12: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Interactive Queries at ScaleToday’s focus on two main issues:

Page 13: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Interactive Queries at Scale

‣ Performance degradation when data size > memory

Today’s focus on two main issues:

Page 14: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Interactive Queries at Scale

‣ Performance degradation when data size > memory

Today’s focus on two main issues:Th

roug

hput

(O

ps)

0

500

1000

1500

2000

Input Size

1GB 2GB 4GB 8GB 16GB 32GB 64GB 128GB

Page 15: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Interactive Queries at Scale

‣ Performance degradation when data size > memory

Today’s focus on two main issues:Th

roug

hput

(O

ps)

0

500

1000

1500

2000

Input Size

1GB 2GB 4GB 8GB 16GB 32GB 64GB 128GB

Page 16: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Interactive Queries at Scale

‣ Performance degradation when data size > memory

Today’s focus on two main issues:

‣ Handling skewed query workloads

Thro

ughp

ut

(Ops

)

0

500

1000

1500

2000

Input Size

1GB 2GB 4GB 8GB 16GB 32GB 64GB 128GB

Page 17: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Interactive Queries at Scale

‣ Performance degradation when data size > memory

Today’s focus on two main issues:

‣ Handling skewed query workloads

Thro

ughp

ut

(Ops

)

0

500

1000

1500

2000

Input Size

1GB 2GB 4GB 8GB 16GB 32GB 64GB 128GB

Page 18: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Interactive Queries at Scale

‣ Performance degradation when data size > memory

Today’s focus on two main issues:

‣ Handling skewed query workloads

Thro

ughp

ut

(Ops

)

0

500

1000

1500

2000

Input Size

1GB 2GB 4GB 8GB 16GB 32GB 64GB 128GB

Maximum sustainable throughput

Page 19: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Our Solution

BlowFish [NSDI’16]

Succinct [NSDI’15]

SuccinctEncryption

Gra

ph S

tore

KV S

tore

Col

umna

r Sto

re

Row

Sto

re

Uns

truct

ured

Dat

a

Page 20: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Our Solution

‣ Compressed representation → More queries in faster storage

BlowFish [NSDI’16]

Succinct [NSDI’15]

SuccinctEncryption

Gra

ph S

tore

KV S

tore

Col

umna

r Sto

re

Row

Sto

re

Uns

truct

ured

Dat

a

Page 21: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Our Solution

‣ Compressed representation → More queries in faster storage‣ Rich functionality directly on compressed representation

‣ Search, RegEx, Range queries

BlowFish [NSDI’16]

Succinct [NSDI’15]

SuccinctEncryption

Gra

ph S

tore

KV S

tore

Col

umna

r Sto

re

Row

Sto

re

Uns

truct

ured

Dat

a

Page 22: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Our Solution

‣ Compressed representation → More queries in faster storage‣ Rich functionality directly on compressed representation

‣ Search, RegEx, Range queries‣ Flexible support for different data models

BlowFish [NSDI’16]

Succinct [NSDI’15]

SuccinctEncryption

Gra

ph S

tore

KV S

tore

Col

umna

r Sto

re

Row

Sto

re

Uns

truct

ured

Dat

a

Page 23: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Our Solution

‣ Compressed representation → More queries in faster storage‣ Rich functionality directly on compressed representation

‣ Search, RegEx, Range queries‣ Flexible support for different data models

‣ Handles skewed & time-varying workloads

BlowFish [NSDI’16]

Succinct [NSDI’15]

SuccinctEncryption

Gra

ph S

tore

KV S

tore

Col

umna

r Sto

re

Row

Sto

re

Uns

truct

ured

Dat

a

Page 24: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Existing Techniques

SEARCH( )

Example:

Page 25: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Existing Techniques

Data Scans

SEARCH( )

Example:

Page 26: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Existing Techniques

Data Scans

SEARCH( )

Example: Ex: Apache Spark

Page 27: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Existing Techniques

Data Scans

SEARCH( )

Example: Ex: Apache Spark

Page 28: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Existing Techniques

Data Scans

SEARCH( )

Example: Ex: Apache Spark

Page 29: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Existing Techniques

Data Scans

Low storage High Latency

SEARCH( )

Example: Ex: Apache Spark

Page 30: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Existing Techniques

Data Scans Indexes

Low storage High Latency

SEARCH( )

Example: Ex: Apache Spark

Page 31: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Existing Techniques

Data Scans Indexes

Low storage High Latency

SEARCH( )

Example: Ex: Apache Spark Ex: SOLR

Page 32: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Existing Techniques

Data Scans Indexes

Low storage High Latency

SEARCH( )

Example: Ex: Apache Spark Ex: SOLR

Page 33: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Existing Techniques

0, 10, 14, 16, 19, 26, 29

1, 4, 5, 8, 20, 22, 24

2, 15, 17, 27

3, 6, 7, 9, 12, 13, 18, 23 ..

11, 21

Data Scans Indexes

Low storage High Latency

SEARCH( )

Example: Ex: Apache Spark Ex: SOLR

Page 34: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Existing Techniques

0, 10, 14, 16, 19, 26, 29

1, 4, 5, 8, 20, 22, 24

2, 15, 17, 27

3, 6, 7, 9, 12, 13, 18, 23 ..

11, 21

Data Scans Indexes

Low storage High Latency

SEARCH( )

Example: Ex: Apache Spark Ex: SOLR

Page 35: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Existing Techniques

0, 10, 14, 16, 19, 26, 29

1, 4, 5, 8, 20, 22, 24

2, 15, 17, 27

3, 6, 7, 9, 12, 13, 18, 23 ..

11, 21

Data Scans Indexes

Low storage High Latency

High storage Low Latency

SEARCH( )

Example: Ex: Apache Spark Ex: SOLR

Page 36: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct

Page 37: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct

Succinct

Page 38: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct

Succinct

Page 39: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct

Queries executed directly on the

compressed representation

Succinct

Page 40: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct

Queries executed directly on the

compressed representation

Low Storage Low Latency

Succinct

Page 41: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct

Queries executed directly on the

compressed representation

Low Storage Low Latency

SuccinctWhat makes Succinct unique

Page 42: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct

Queries executed directly on the

compressed representation

Low Storage Low Latency

SuccinctWhat makes Succinct unique

No additional indexes

Query responses embedded in the compressed

representation

Page 43: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct

Queries executed directly on the

compressed representation

Low Storage Low Latency

SuccinctWhat makes Succinct unique

No additional indexes

Query responses embedded in the compressed

representation

No data scans Functionality of indexes

Page 44: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct

Queries executed directly on the

compressed representation

Low Storage Low Latency

SuccinctWhat makes Succinct unique

No additional indexes

Query responses embedded in the compressed

representation

No data scans Functionality of indexes

No decompression

Queries directly on the compressed representation (except for data access queries)

Page 45: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct

Queries executed directly on the

compressed representation

Low Storage Low Latency

Succinct

Page 46: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct

Queries executed directly on the

compressed representation

Low Storage Low Latency

Succinct Scale

In-memory data sizes >= memory capacity

Page 47: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct

Queries executed directly on the

compressed representation

Low Storage Low Latency

Succinct Scale

In-memory data sizes >= memory capacity

Complex queries

Search, range, random access, RegEx

Page 48: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct

Queries executed directly on the

compressed representation

Low Storage Low Latency

Succinct Scale

In-memory data sizes >= memory capacity

Complex queries

Search, range, random access, RegEx

Interactivity

Avoids data scans and decompression

Page 49: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct Data Representation

Page 50: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct Data Representation

Builds on a large body of theory work

Page 51: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct Data Representation

Builds on a large body of theory work

Suffix Arrays

Page 52: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct Data Representation

Builds on a large body of theory work

Suffix Arrays

‣ Strong functionality (search)

Page 53: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct Data Representation

Builds on a large body of theory work

Suffix Arrays

‣ Strong functionality (search) ‣ No structure

Page 54: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct Data Representation

Builds on a large body of theory work

Suffix Arrays

‣ Strong functionality (search) ‣ No structure

Compression?

Page 55: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct Data Representation

Builds on a large body of theory work

Suffix Arrays

‣ Strong functionality (search) ‣ No structure

Compression?

‣ Sample the suffix array

Page 56: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct Data Representation

Builds on a large body of theory work

Suffix Arrays

‣ Strong functionality (search) ‣ No structure

Compression?

‣ Sample the suffix array

‣ Store set of pointers to compute unsampled values on the fly

Page 57: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct Data Representation

Builds on a large body of theory work

Suffix Arrays

‣ Strong functionality (search) ‣ No structure

Compression?

‣ Sample the suffix array

‣ Store set of pointers to compute unsampled values on the fly

Possesses structure that enables compression!

Page 58: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct Data Model

Page 59: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct Data Model

‣ Unstructured data

‣ Key-value stores (Voldemort, Dynamo)

‣ Document store (Elasticsearch, MongoDB)

‣ Tables (Cassandra, BigTable)

‣ And many more ....

Unified Interface

Page 60: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct Data Model

‣ Unstructured data

‣ Key-value stores (Voldemort, Dynamo)

‣ Document store (Elasticsearch, MongoDB)

‣ Tables (Cassandra, BigTable)

‣ And many more ....

Unified Interface

With all the powerful queries on values, documents, columns

Page 61: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Data Model & FunctionalityFor unstructured data:

Page 62: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Data Model & Functionality

Original Input Succinct

For unstructured data:

Page 63: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Data Model & Functionality

Original Input Succinct

SEARCH( ) = {0, 10, 14, 16, 19, 26, 29}

Search: returns offsets of arbitrary strings in uncompressed file

For unstructured data:

Page 64: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Data Model & Functionality

Original Input Succinct

SEARCH( ) = {0, 10, 14, 16, 19, 26, 29}

For unstructured data:

Extract(0, 5) = { , , , , }

Extract: returns data at arbitrary offsets in uncompressed file

Page 65: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Data Model & Functionality

Original Input Succinct

SEARCH( ) = {0, 10, 14, 16, 19, 26, 29}

For unstructured data:

Extract(0, 5) = { , , , , }

COUNT( ) = 7

Count: returns count of arbitrary strings in uncompressed file

Page 66: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Data Model & Functionality

Original Input Succinct

SEARCH( ) = {0, 10, 14, 16, 19, 26, 29}

For unstructured data:

Extract(0, 5) = { , , , , }

COUNT( ) = 7

Append( , , , , )

Append: appends arbitrary strings to uncompressed file

Page 67: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Data Model & Functionality

Original Input Succinct

SEARCH( ) = {0, 10, 14, 16, 19, 26, 29}

For unstructured data:

Extract(0, 5) = { , , , , }

COUNT( ) = 7

Append( , , , , )

Range Queries, REGULAR EXPRESSIONS

Page 68: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Unifying the Data Models

Page 69: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Unifying the Data Models

Page 70: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Unifying the Data Models

Page 71: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Unifying the Data Models

Page 72: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Unifying the Data Models

SEARCH(Column1, )

Page 73: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Unifying the Data Models

SEARCH(Column1, )SEARCH( )

Page 74: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct Architecture

Page 75: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct Architecture

Multi-store Architecture

Page 76: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct Architecture

SuccinctStore

Multi-store Architecture

Page 77: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct Architecture

SuccinctStore

SuffixStore

Multi-store Architecture

Page 78: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct Architecture

SuccinctStore

SuffixStore

LogStore

Multi-store Architecture

Page 79: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct Architecture

SuccinctStore

SuffixStore

LogStore

Data APPENDS

Multi-store Architecture

Page 80: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct Architecture

SuccinctStore

SuffixStore

LogStore

Data APPENDS

Multi-store Architecture

Page 81: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct Architecture

SuccinctStore

SuffixStore

LogStore

Data APPENDS

Multi-store Architecture

Page 82: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct on Apache Spark

Page 83: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Queries on Compressed RDDs

Page 84: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Queries on Compressed RDDs

New FunctionalitiesDocument store, Key-Value store

search on documents, values

Page 85: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Queries on Compressed RDDs

New FunctionalitiesDocument store, Key-Value store

search on documents, values

Faster operations on RDDs

random access, filters avoid scans

Page 86: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Queries on Compressed RDDs

New FunctionalitiesDocument store, Key-Value store

search on documents, values

Faster operations on RDDs

random access, filters avoid scans

More in-memory Compressed RDDsno decompression

overheads

Page 87: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Unstructured data using SuccinctRDD

Page 88: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

import edu.berkeley.cs.succinct._ Import classes

Unstructured data using SuccinctRDD

Page 89: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

import edu.berkeley.cs.succinct._

val rdd = ctx.textFile(…).map(_.getBytes)

val succinctRDD = rdd.succinct

Load data & compress using Succinct

Unstructured data using SuccinctRDD

Page 90: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

import edu.berkeley.cs.succinct._

val rdd = ctx.textFile(…).map(_.getBytes)

val succinctRDD = rdd.succinct

val offsets = succinctRDD.search("Berkeley") Find all occurrences of “Berkeley”

Unstructured data using SuccinctRDD

Page 91: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

import edu.berkeley.cs.succinct._

val rdd = ctx.textFile(…).map(_.getBytes)

val succinctRDD = rdd.succinct

val count = succinctRDD.count("Berkeley")

val offsets = succinctRDD.search("Berkeley")

Count #occurrences of “Berkeley”

Unstructured data using SuccinctRDD

Page 92: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

import edu.berkeley.cs.succinct._

val rdd = ctx.textFile(…).map(_.getBytes)

val succinctRDD = rdd.succinct

val bytes = succinctRDD.extract(50, 100)

val count = succinctRDD.count("Berkeley")

val offsets = succinctRDD.search("Berkeley")

Extract 100 bytes from offset 50

Unstructured data using SuccinctRDD

Page 93: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Key-Value Store using SuccinctKVRDD

Page 94: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

import edu.berkeley.cs.succinct.kv._ Import classes

Key-Value Store using SuccinctKVRDD

Page 95: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

import edu.berkeley.cs.succinct.kv._

val kvRDD = rdd.zipWithIndex.map(t => (t._2, t._1.getBytes))

val succinctKVRDD = kvRDD.succinctKV Load data & compress using Succinct

Key-Value Store using SuccinctKVRDD

Page 96: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

import edu.berkeley.cs.succinct.kv._

val kvRDD = rdd.zipWithIndex.map(t => (t._2, t._1.getBytes))

val succinctKVRDD = kvRDD.succinctKV

val keys = succinctKVRDD.search("Berkeley") Find all keys for values that contain “Berkeley”

Key-Value Store using SuccinctKVRDD

Page 97: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

import edu.berkeley.cs.succinct.kv._

val kvRDD = rdd.zipWithIndex.map(t => (t._2, t._1.getBytes))

val succinctKVRDD = kvRDD.succinctKV

val value = succinctKVRDD.get(0)

val keys = succinctKVRDD.search("Berkeley")

Get value for key 0

Key-Value Store using SuccinctKVRDD

Page 98: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Evaluation

Page 99: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Evaluation

Dataset Wikipedia dataset

~40GB data

Page 100: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Evaluation

Dataset

Cluster

Wikipedia dataset

~40GB data

Amazon EC2, 5 machines, 30GB RAM each

Page 101: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Evaluation

Dataset

Cluster

Workload

Wikipedia dataset

~40GB data

Amazon EC2, 5 machines, 30GB RAM each

Search queries, 1-10,000 occurrences

Page 102: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Evaluation

Dataset

Cluster

Workload

Systems

Wikipedia dataset

~40GB data

Amazon EC2, 5 machines, 30GB RAM each

Search queries, 1-10,000 occurrences

Spark, Elasticsearch

Page 103: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Evaluation

Dataset

Cluster

Workload

Systems

Wikipedia dataset

~40GB data

Amazon EC2, 5 machines, 30GB RAM each

Search queries, 1-10,000 occurrences

Spark, Elasticsearch

Caveats Absolute numbers are dataset dependent

Page 104: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Evaluation: Search

Page 105: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Evaluation: Search

Takeaway: Succinct on Apache Spark is 2.5x faster than Elasticsearch while being 2.5x more space efficient.(Data fits in memory for all systems)

Page 106: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Support for Regular Expressions

Page 107: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Support for Regular Expressions

Applications Data Cleaning

Information Extraction

Bioinformatics

Document Stores

Page 108: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Support for Regular Expressions

Applications

Operators

Data Cleaning

Information Extraction

Bioinformatics

Document Stores

Union, Concat, Wildcard, Repeat

Page 109: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Support for Regular Expressions

Applications

Operators

Data Cleaning

Information Extraction

Bioinformatics

Document Stores

Union, Concat, Wildcard, Repeat

Example .*(berkeley|stanford)\.edu

Page 110: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Support for Regular Expressions

Page 111: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Support for Regular Expressions

val matches = succinctRDD.regexSearch(".*(berkeley|stanford)\.edu")

Find all matches for the RegEx “.*(berkeley|stanford)\.edu”

SuccinctRDD

Page 112: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Support for Regular Expressions

val matches = succinctRDD.regexSearch(".*(berkeley|stanford)\.edu")

Find all matches for the RegEx “.*(berkeley|stanford)\.edu”

SuccinctRDD

val matchKeys = succinctKVRDD.regexSearch(".*(berkeley|stanford)\.edu")

Find all keys for values that contain the RegEx “.*(berkeley|stanford)\.edu”

SuccinctKVRDD

Page 113: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Evaluation: RegEx

Page 114: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Evaluation: RegEx

Page 115: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Evaluation: RegEx

Takeaway: Succinct significantly speeds up RegEx queries even when all the data fits in memory for all systems.

Page 116: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct on Apache Spark

Page 117: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct on Apache Spark

Already in use at Elsevier Labs

Page 118: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct on Apache Spark

Already in use at Elsevier Labs

‣ Use case: Annotation Search

Page 119: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct on Apache Spark

Already in use at Elsevier Labs

‣ Use case: Annotation Search

Documents

Page 120: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct on Apache Spark

Already in use at Elsevier Labs

‣ Use case: Annotation Search

Documents

1, sentence, (0, 15) 2, word, (0, 4) 3, word, (5, 10) 4, word, (11, 15)

Annotations

Page 121: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct on Apache Spark

Already in use at Elsevier Labs

‣ Use case: Annotation Search

Documents

1, sentence, (0, 15) 2, word, (0, 4) 3, word, (5, 10) 4, word, (11, 15)

Annotations

“Find sentences that talk about open problems in research”

Page 122: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct on Apache Spark

Already in use at Elsevier Labs

‣ Use case: Annotation Search

Documents

1, sentence, (0, 15) 2, word, (0, 4) 3, word, (5, 10) 4, word, (11, 15)

Annotations

(remains|is|still)(unknown|unclear|uncertain)within<sentence>RegEx Annotation

“Find sentences that talk about open problems in research”

Page 123: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct on Apache Spark

Already in use at Elsevier Labs

‣ Use case: Annotation Search

Documents

1, sentence, (0, 15) 2, word, (0, 4) 3, word, (5, 10) 4, word, (11, 15)

Annotations

https://spark-packages.org/package/amplab/succinct

(remains|is|still)(unknown|unclear|uncertain)within<sentence>RegEx Annotation

“Find sentences that talk about open problems in research”

Page 124: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Problem: Skewed Query Workloads

Page 125: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Problem: Skewed Query Workloads

Load distribution across partitions is often non-uniform

Page 126: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Problem: Skewed Query Workloads

‣ Succinct: Larger fraction of queries in main memory

‣ Challenge: skewed load across shards?

‣ Challenge: time varying loads?

Load distribution across partitions is often non-uniform

Page 127: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Problem: Skewed Query Workloads

‣ Succinct: Larger fraction of queries in main memory

‣ Challenge: skewed load across shards?

‣ Challenge: time varying loads?

‣ E.g.: Memcached + MySQL deployment @ Facebook

Load distribution across partitions is often non-uniform

Page 128: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Problem: Skewed Query Workloads

Load distribution across partitions is often non-uniform

Page 129: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Problem: Skewed Query Workloads

Load distribution across partitions is often non-uniform

Page 130: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Selective Replication

Problem: Skewed Query Workloads

Load distribution across partitions is often non-uniform

Traditional approach:

Page 131: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Selective Replication

Problem: Skewed Query Workloads

Load distribution across partitions is often non-uniform

#Rep

licas

Traditional approach:

Page 132: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Selective Replication

Problem: Skewed Query Workloads

Load distribution across partitions is often non-uniform

#Rep

licas #Replicas α Load

Traditional approach:

Page 133: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Selective Replication

Problem: Skewed Query Workloads

Load distribution across partitions is often non-uniform

#Rep

licas #Replicas α Load

Coarse grained

Traditional approach:

Page 134: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Selective Replication

Problem: Skewed Query Workloads

Load distribution across partitions is often non-uniform

#Rep

licas #Replicas α Load

Coarse grained 1-2× throughput → 2× storage

Traditional approach:

Page 135: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct + BlowFish

Page 136: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct + BlowFish

Page 137: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct + BlowFish

Page 138: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct + BlowFish

Storage

Throughput

Page 139: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct + BlowFish

Storage

Throughput

Indexes

Page 140: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct + BlowFish

Storage

Throughput

Scans

Indexes

Page 141: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct + BlowFish

Storage

Throughput

Scans

Indexes

Succinct

Page 142: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct + BlowFish

Storage

Throughput

Scans

Indexes

SuccinctStorage-Performance tradeoff curve for each partition

Page 143: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct + BlowFish

Storage

Throughput

Scans

Indexes

SuccinctStorage-Performance tradeoff curve for each partition

Page 144: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

BlowFish: Layered Sampled Array

Page 145: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Recap: Succinct stores a sampled suffix array

BlowFish: Layered Sampled Array

Page 146: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Recap: Succinct stores a sampled suffix array

BlowFish: Layered Sampled Array

Unsampled values computed on the fly

Page 147: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

OriginalSampled Array 9 15 3 0 12 8 14 5

Recap: Succinct stores a sampled suffix array

BlowFish: Layered Sampled Array

Rate = 2

Unsampled values computed on the fly

Page 148: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

OriginalSampled Array 9 15 3 0 12 8 14 5

Recap: Succinct stores a sampled suffix array

BlowFish: Layered Sampled Array

Rate = 2

Unsampled values computed on the fly

Page 149: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

OriginalSampled Array 9 15 3 0 12 8 14 5

9 12RATE = 8

Recap: Succinct stores a sampled suffix array

BlowFish: Layered Sampled Array

Rate = 2

Unsampled values computed on the fly

Page 150: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

OriginalSampled Array 9 15 3 0 12 8 14 5

9 12RATE = 83 14RATE = 4

Recap: Succinct stores a sampled suffix array

BlowFish: Layered Sampled Array

Rate = 2

Unsampled values computed on the fly

Page 151: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

OriginalSampled Array 9 15 3 0 12 8 14 5

9 12RATE = 83 14RATE = 4

15 0 8 5RATE = 2

Recap: Succinct stores a sampled suffix array

BlowFish: Layered Sampled Array

Rate = 2

Unsampled values computed on the fly

Page 152: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

OriginalSampled Array 9 15 3 0 12 8 14 5

9 12RATE = 83 14RATE = 4

15 0 8 5RATE = 2

Different combination of layers

Recap: Succinct stores a sampled suffix array

BlowFish: Layered Sampled Array

Rate = 2

Unsampled values computed on the fly

Page 153: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

OriginalSampled Array 9 15 3 0 12 8 14 5

9 12RATE = 83 14RATE = 4

15 0 8 5RATE = 2

Different combination of layers Different points on tradeoff curve

Recap: Succinct stores a sampled suffix array

BlowFish: Layered Sampled Array

Rate = 2

Unsampled values computed on the fly

Page 154: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

OriginalSampled Array 9 15 3 0 12 8 14 5

9 12RATE = 83 14RATE = 4

15 0 8 5RATE = 2

Different combination of layers Different points on tradeoff curve

Recap: Succinct stores a sampled suffix array

BlowFish: Layered Sampled Array

Rate = 2

Layer Additions and Deletions

Unsampled values computed on the fly

Page 155: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

OriginalSampled Array 9 15 3 0 12 8 14 5

9 12RATE = 83 14RATE = 4

15 0 8 5RATE = 2

Different combination of layers Different points on tradeoff curve

Recap: Succinct stores a sampled suffix array

BlowFish: Layered Sampled Array

Rate = 2

Layer Additions and Deletions Move along tradeoff curve→

Unsampled values computed on the fly

Page 156: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

BlowFish: Technical Details

Page 157: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

BlowFish: Technical Details‣ How should partitions share cache on a server?

Page 158: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

BlowFish: Technical Details‣ How should partitions share cache on a server?

Page 159: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

BlowFish: Technical Details‣ How should partitions share cache on a server?

Low Threshold

Page 160: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

BlowFish: Technical Details‣ How should partitions share cache on a server?

High ThresholdLow Threshold

Page 161: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

BlowFish: Technical Details‣ How should partitions share cache on a server?

High ThresholdLow Threshold

Page 162: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

BlowFish: Technical Details‣ How should partitions share cache on a server?

‣ How should partitions share cache across servers?

High ThresholdLow Threshold

Page 163: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

BlowFish: Technical Details‣ How should partitions share cache on a server?

‣ How should partitions share cache across servers?

‣ How should requests be scheduled across replicas?

High ThresholdLow Threshold

Page 164: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

BlowFish: Technical Details‣ How should partitions share cache on a server?

‣ How should partitions share cache across servers?

‣ How should requests be scheduled across replicas?

Unified Solution: Back-pressure style scheduling

High ThresholdLow Threshold

Page 165: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

BlowFish: Technical Details‣ How should partitions share cache on a server?

‣ How should partitions share cache across servers?

Cache proportional to load,

‣ How should requests be scheduled across replicas?

Unified Solution: Back-pressure style scheduling

High ThresholdLow Threshold

Page 166: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

BlowFish: Technical Details‣ How should partitions share cache on a server?

‣ How should partitions share cache across servers?

Cache proportional to load,

‣ How should requests be scheduled across replicas?

Unified Solution: Back-pressure style scheduling

without explicit coordination

High ThresholdLow Threshold

Page 167: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

BlowFish: Technical Details‣ How should partitions share cache on a server?

‣ How should partitions share cache across servers?

‣ How should requests be scheduled across replicas?

Unified Solution: Back-pressure style scheduling

1.5x higher throughput than Selective Replication,

High ThresholdLow Threshold

Page 168: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

BlowFish: Technical Details‣ How should partitions share cache on a server?

‣ How should partitions share cache across servers?

‣ How should requests be scheduled across replicas?

Unified Solution: Back-pressure style scheduling

1.5x higher throughput than Selective Replication,

within 11% of maximum possible throughput

High ThresholdLow Threshold

Page 169: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Succinct +

BlowFish

Page 170: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

‣ Standalone system (prototyped & tested)

Succinct +

BlowFish

Page 171: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

‣ Standalone system (prototyped & tested)

‣ Spark Package: Succinct on Apache SparkSuccinct +

BlowFish

Page 172: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

‣ Standalone system (prototyped & tested)

‣ Spark Package: Succinct on Apache Spark

‣ As libraries

‣ C++, Java, Scala

‣ for ease of integration

Succinct +

BlowFish

Page 173: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Thanks!

succinct.cs.berkeley.edu

Page 174: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Backup Slides

Page 175: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Array of Suffixes (AoS)

banana$(Input)

Page 176: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Array of Suffixes (AoS)

banana$

banana$anana$nana$ana$na$a$$

Suffixes

(Input)

Page 177: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Array of Suffixes (AoS)

banana$

banana$anana$nana$ana$na$a$$

Suffixes

$a$

ana$anana$

banana$na$

nana$Array of

Suffixes (AoS)

lexi

cogr

aphi

cal o

rder

(Input)

Page 178: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

AoS to Input (AoS2Input) Array

$a$

ana$anana$

banana$na$

nana$AoS

6

AoS2Input

531042

b

Input

0123456

anana$

Page 179: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

AoS to Input (AoS2Input) Array

$a$

ana$anana$

banana$na$

nana$AoS

6

AoS2Input

531042

b

Input

0123456

anana$

locations of suffixes(suffix array)

Page 180: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

AoS to Input (AoS2Input) Array

$a$

ana$anana$

banana$na$

nana$AoS

6

AoS2Input

531042

b

Input

0123456

anana$

locations of suffixes(suffix array)

Page 181: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

AoS to Input (AoS2Input) Array

$a$

ana$anana$

banana$na$

nana$AoS

6

AoS2Input

531042

b

Input

0123456

anana$

locations of suffixes(suffix array)

Page 182: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Example: search(“an”)

$a$

ana$anana$

banana$na$

nana$AoS

6

AoS2Input

531042

b

Input

0123456

anana$

Page 183: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Example: search(“an”)

$a$

ana$anana$

banana$na$

nana$AoS

6

AoS2Input

531042

b

Input

0123456

anana$

search(“an”) = {1, 3}

Page 184: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Example: search(“an”)

$a$

ana$anana$

banana$na$

nana$AoS

6

AoS2Input

531042

b

Input

0123456

anana$

search(“an”) = {1, 3}

Page 185: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Example: search(“an”)

$a$

ana$anana$

banana$na$

nana$AoS

6

AoS2Input

531042

b

Input

0123456

anana$

search(“an”) = {1, 3}

Page 186: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Example: search(“an”)

$a$

ana$anana$

banana$na$

nana$AoS

6

AoS2Input

531042

b

Input

0123456

anana$

search(“an”) = {1, 3}

Page 187: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Example: search(“an”)

$a$

ana$anana$

banana$na$

nana$AoS

6

AoS2Input

531042

b

Input

0123456

anana$

search(“an”) = {1, 3}

Page 188: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Example: search(“an”)

$a$

ana$anana$

banana$na$

nana$AoS

6

AoS2Input

531042

b

Input

0123456

anana$

search(“an”) = {1, 3}

Page 189: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Next Pointer Array: Reducing AoS Size

$a$

ana$anana$

banana$na$

nana$AoS

0123456

NPA

Page 190: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Next Pointer Array: Reducing AoS Size

$a$

ana$anana$

banana$na$

nana$AoS

0123456

NPA

3

Page 191: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Next Pointer Array: Reducing AoS Size

$a$

ana$anana$

banana$na$

nana$AoS

0123456

NPA

3

Page 192: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Next Pointer Array: Reducing AoS Size

$a$

ana$anana$

banana$na$

nana$AoS

0123456

NPA

36

Page 193: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Next Pointer Array: Reducing AoS Size

$a$

ana$anana$

banana$na$

nana$AoS

0123456

NPA

36

Page 194: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Next Pointer Array: Reducing AoS Size

$a$

ana$anana$

banana$na$

nana$AoS

0123456

NPA

2

36

Page 195: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Next Pointer Array: Reducing AoS Size

$a$

ana$anana$

banana$na$

nana$AoS

0123456

NPA

2

36

Page 196: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Next Pointer Array: Reducing AoS Size

$a$

ana$anana$

banana$na$

nana$AoS

0123456

NPA

405

12

36

Page 197: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Next Pointer Array: Reducing AoS Size

$a$

ana$anana$

banana$na$

nana$AoS

0123456

NPA

405

12

AoS NPA

$0123456

aaabnn

4056312

36

Page 198: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Next Pointer Array: Reducing AoS Size

$a$

ana$anana$

banana$na$

nana$AoS

0123456

NPA

405

12

AoS NPA

$0123456

aaabnn

4056312

Store only the first character(entire suffix can be computed

“on the fly” using Next Pointer Array (NPA))

36

Page 199: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Next Pointer Array: Reducing AoS Size

$a$

ana$anana$

banana$na$

nana$AoS

0123456

NPA

405

12

AoS NPA

$0123456

aaabnn

4056312

36

Page 200: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Next Pointer Array: Reducing AoS Size

$a$

ana$anana$

banana$na$

nana$AoS

0123456

NPA

405

12

AoS NPA

$0123456

aaabnn

4056312

36

a

Page 201: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Next Pointer Array: Reducing AoS Size

$a$

ana$anana$

banana$na$

nana$AoS

0123456

NPA

405

12

AoS NPA

$0123456

aaabnn

4056312

36

an

Page 202: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Next Pointer Array: Reducing AoS Size

$a$

ana$anana$

banana$na$

nana$AoS

0123456

NPA

405

12

AoS NPA

$0123456

aaabnn

4056312

36

an

Page 203: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Next Pointer Array: Reducing AoS Size

$a$

ana$anana$

banana$na$

nana$AoS

0123456

NPA

405

12

AoS NPA

$0123456

aaabnn

4056312

36

ana

Page 204: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Next Pointer Array: Reducing AoS Size

$a$

ana$anana$

banana$na$

nana$AoS

0123456

NPA

405

12

AoS NPA

$0123456

aaabnn

4056312

36

ana

Page 205: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Next Pointer Array: Reducing AoS Size

$a$

ana$anana$

banana$na$

nana$AoS

0123456

NPA

405

12

AoS NPA

$0123456

aaabnn

4056312

36

ana$

Page 206: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Next Pointer Array: Reducing AoS Size

$a$

ana$anana$

banana$na$

nana$AoS

0123456

NPA

405

12

AoS NPA

$0123456

aaabnn

4056312

36

ana$

Page 207: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Next Pointer Array: Reducing AoS Size

$a$

ana$anana$

banana$na$

nana$AoS

0123456

NPA

405

12

AoS NPA

$0123456

aaabnn

4056312

36

Page 208: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Next Pointer Array: Reducing AoS Size

$a$

ana$anana$

banana$na$

nana$AoS

0123456

NPA

405

12

AoS NPA

$a

b

n

4056312

0123456

AoS NPA

$0123456

aaabnn

4056312

36

Page 209: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Reducing the size of AoS2Input

6

AoS2Input

5

0

2

4

NPA

056312

0123456

31

4

Page 210: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Reducing the size of AoS2Input

6

AoS2Input

5

0

2

4

NPA

056312

0123456

31

4

Page 211: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Reducing the size of AoS2Input

6

AoS2Input

5

0

2

4

NPA

056312

0123456

31

4

Page 212: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Reducing the size of AoS2Input

6

AoS2Input

5

0

2

4

NPA

056312

0123456

31

4

AoS2Input NPA

4056312

6

0

2

0123456

3

Page 213: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Reducing the size of AoS2Input

6

AoS2Input

5

0

2

4

NPA

056312

0123456

31

4

AoS2Input NPA

4056312

6

0

2

0123456

3

Store only a few sampled values(unsampled values computed

“on the fly” using NPA)

Page 214: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Reducing the size of AoS2Input

6

AoS2Input

5

0

2

4

NPA

056312

0123456

31

4

AoS2Input NPA

4056312

6

0

2

0123456

3

Store only a few sampled values(unsampled values computed

“on the fly” using NPA)

Page 215: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Reducing the size of AoS2Input

6

AoS2Input

5

0

2

4

NPA

056312

0123456

31

4

AoS2Input NPA

4056312

6

0

2

0123456

3

Store only a few sampled values(unsampled values computed

“on the fly” using NPA)

Page 216: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Compressing NPA

Increasing sequence of integers(values for suffixes starting with

same character)

Can be compressed(E.g., using run-length encoding)

$

a

b

n

4

0

5

6

3

1

2

Page 217: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Compressing NPA

Increasing sequence of integers(values for suffixes starting with

same character)

Can be compressed(E.g., using run-length encoding)

Succinct uses a 2-dimensional representation of NPA

$

a

b

n

4

0

5

6

3

1

2

Page 218: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Compressing NPA

Increasing sequence of integers(values for suffixes starting with

same character)

Can be compressed(E.g., using run-length encoding)

Succinct uses a 2-dimensional representation of NPA

$

a

b

n

4

0

5

6

3

1

2

Page 219: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Compressing NPA

Increasing sequence of integers(values for suffixes starting with

same character)

Can be compressed(E.g., using run-length encoding)

Succinct uses a 2-dimensional representation of NPA

- better compressibility

$

a

b

n

4

0

5

6

3

1

2

Page 220: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Compressing NPA

Increasing sequence of integers(values for suffixes starting with

same character)

Can be compressed(E.g., using run-length encoding)

Succinct uses a 2-dimensional representation of NPA

- better compressibility- avoids binary search on AoS (lower latency)

$

a

b

n

4

0

5

6

3

1

2

Page 221: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Compressing NPA

Increasing sequence of integers(values for suffixes starting with

same character)

Can be compressed(E.g., using run-length encoding)

Succinct uses a 2-dimensional representation of NPA

- better compressibility- avoids binary search on AoS (lower latency)- enables wider range of queries (E.g., RegEx)

$

a

b

n

4

0

5

6

3

1

2

Page 222: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Compressing NPA

Increasing sequence of integers(values for suffixes starting with

same character)

Can be compressed(E.g., using run-length encoding)

Succinct uses a 2-dimensional representation of NPA

- better compressibility- avoids binary search on AoS (lower latency)- enables wider range of queries (E.g., RegEx)

$

a

b

n

4

0

5

6

3

1

2

Page 223: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Compressing NPA

Increasing sequence of integers(values for suffixes starting with

same character)

Can be compressed(E.g., using run-length encoding)

Succinct uses a 2-dimensional representation of NPA

- better compressibility- avoids binary search on AoS (lower latency)- enables wider range of queries (E.g., RegEx)

See upcoming NSDI paper!

$

a

b

n

4

0

5

6

3

1

2

Page 224: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Evaluation: Storage Footprint10 node 150GB cluster

Page 225: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Evaluation: Storage Footprint

server itself but currently does this in the background.

Limitations. The current Succinct prototype suffersfrom three main limitations. It requires manually han-dling: (1) coordinator failure; (2) fault tolerance anddata durability; and (3) adding new servers to an exist-ing cluster. However, none of these limitations are fun-damental. Succinct could use traditional solutions likeZooKeeper [36] for maintaining multiple coordinatorreplicas with a consistent view. Fault tolerance and datadurability can be achieved using standard replication-based or erasure-code-based [35] techniques. Finally,since each SuccinctStore contains multiple partitions,adding a new server simply requires moving some par-titions from existing servers to the new server and up-dating pointers at servers. We plan to incorporate theseand evaluate associated overheads in the near future.

7 Evaluation

This section explores whether Succinct design and algo-rithms meet Succinct’s goal of supporting fast queries bypushing more data in memory and by operating directlyon compressed data. We perform an end-to-end evalua-tion of Succinct’s memory footprint (§7.1), throughput(§7.2) and latency (§7.3).

Compared systems. We evaluate Succinct using theNoSQL interface extension as described in §5, sinceit requires strictly more operations than queries onflat files. We compare Succinct against several open-source and industrial systems: (1) MongoDB [6] ver-sion 2.6.4 and Cassandra [37] version 2.0.10 with in-dexes; (2) HyperDex [26] version 1.2 with metadata;and (3) DB-X, one of the industrial columnar-store withdata scans. For HyperDex, we encountered a bug alsoencountered by other users [4] that crashes the sys-tem when the inter-machine latencies are variable. ForDB-X, distributed experiments require access to the in-dustrial version. To that end, we only perform micro-benchmarks for HyperDex and DB-X on a single ma-chine for Workloads A and C.

We configured each of the system for no-failure sce-nario (no fault tolerance). For MongoDB and Cassan-dra, we used the most memory-efficient indexes. Theseindexes do not support substring searches and wildcardsearches. HyperDex and DB-X do not support wildcardsearches. Thus, the evaluated systems provide slightlyweaker functionality than Succinct. Finally, for Suc-cinct, we disabled dictionary encoding to evaluate theperformance of Succinct techniques in isolation.

Datasets and Cluster. We use two multi-attributerecord datasets, one smallKV and one largeKV from

Table 2: Datasets used in our evaluation.

Size (Bytes) #Attr- #RecordsKey Value ibutes (Millions)

SmallKV 8 ≈ 140 15 123–1393

LargeKV 8 ≈ 1300 98 19–200

Table 3: Workloads used in our evaluation. All workloadsuse a query popularity that follows a Zipf distribution with

skewness 0.99, similar to YCSB [22].

Workload Remarks

A 100% Reads YCSB workload C

B 95% Reads, 5% Inserts YCSB workload D

C 100% Search -

D 95% Search, 5% Inserts YCSB workload E

Conviva customers as shown in Table 2. All our experi-ments were performed on Amazon EC2 m1.xlarge ma-chines with 15GB RAM and 4 cores, except for DB-Xwhere we used pre-installed r2.2xlarge instances.

75

150

225

Dat

aS

ize

that

Fit

sin

Mem

ory

(GB

)

SmallKV LargeKV

MongoDB

Cassandra

HyperDex

Succinct

RAM

Figure 12: Succinct pushes more than 10× larger amountof data in memory compared to the next best system, while

providing similar or stronger functionality.

10

10 node 150GB cluster

Page 226: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Evaluation: Storage Footprint

Takeaway: Succinct can push >11x more data in memory

server itself but currently does this in the background.

Limitations. The current Succinct prototype suffersfrom three main limitations. It requires manually han-dling: (1) coordinator failure; (2) fault tolerance anddata durability; and (3) adding new servers to an exist-ing cluster. However, none of these limitations are fun-damental. Succinct could use traditional solutions likeZooKeeper [36] for maintaining multiple coordinatorreplicas with a consistent view. Fault tolerance and datadurability can be achieved using standard replication-based or erasure-code-based [35] techniques. Finally,since each SuccinctStore contains multiple partitions,adding a new server simply requires moving some par-titions from existing servers to the new server and up-dating pointers at servers. We plan to incorporate theseand evaluate associated overheads in the near future.

7 Evaluation

This section explores whether Succinct design and algo-rithms meet Succinct’s goal of supporting fast queries bypushing more data in memory and by operating directlyon compressed data. We perform an end-to-end evalua-tion of Succinct’s memory footprint (§7.1), throughput(§7.2) and latency (§7.3).

Compared systems. We evaluate Succinct using theNoSQL interface extension as described in §5, sinceit requires strictly more operations than queries onflat files. We compare Succinct against several open-source and industrial systems: (1) MongoDB [6] ver-sion 2.6.4 and Cassandra [37] version 2.0.10 with in-dexes; (2) HyperDex [26] version 1.2 with metadata;and (3) DB-X, one of the industrial columnar-store withdata scans. For HyperDex, we encountered a bug alsoencountered by other users [4] that crashes the sys-tem when the inter-machine latencies are variable. ForDB-X, distributed experiments require access to the in-dustrial version. To that end, we only perform micro-benchmarks for HyperDex and DB-X on a single ma-chine for Workloads A and C.

We configured each of the system for no-failure sce-nario (no fault tolerance). For MongoDB and Cassan-dra, we used the most memory-efficient indexes. Theseindexes do not support substring searches and wildcardsearches. HyperDex and DB-X do not support wildcardsearches. Thus, the evaluated systems provide slightlyweaker functionality than Succinct. Finally, for Suc-cinct, we disabled dictionary encoding to evaluate theperformance of Succinct techniques in isolation.

Datasets and Cluster. We use two multi-attributerecord datasets, one smallKV and one largeKV from

Table 2: Datasets used in our evaluation.

Size (Bytes) #Attr- #RecordsKey Value ibutes (Millions)

SmallKV 8 ≈ 140 15 123–1393

LargeKV 8 ≈ 1300 98 19–200

Table 3: Workloads used in our evaluation. All workloadsuse a query popularity that follows a Zipf distribution with

skewness 0.99, similar to YCSB [22].

Workload Remarks

A 100% Reads YCSB workload C

B 95% Reads, 5% Inserts YCSB workload D

C 100% Search -

D 95% Search, 5% Inserts YCSB workload E

Conviva customers as shown in Table 2. All our experi-ments were performed on Amazon EC2 m1.xlarge ma-chines with 15GB RAM and 4 cores, except for DB-Xwhere we used pre-installed r2.2xlarge instances.

75

150

225

Dat

aS

ize

that

Fit

sin

Mem

ory

(GB

)

SmallKV LargeKV

MongoDB

Cassandra

HyperDex

Succinct

RAM

Figure 12: Succinct pushes more than 10× larger amountof data in memory compared to the next best system, while

providing similar or stronger functionality.

10

10 node 150GB cluster

Page 227: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Evaluation: Throughput (95% GET + 5% PUT)

10 node 150GB cluster, uniform random access pattern

Page 228: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Evaluation: Throughput (95% GET + 5% PUT)

10 node 150GB cluster, uniform random access pattern

Page 229: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Evaluation: Throughput (95% GET + 5% PUT)

Takeaway: Succinct achieves performance comparable to existing open source systems for queries on primary attributes

10 node 150GB cluster, uniform random access pattern

Page 230: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Evaluation: Throughput (95% SEARCH + 5% PUT)

10 node 150GB cluster, search queries with 1-10K occurrences

Page 231: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Evaluation: Throughput (95% SEARCH + 5% PUT)

10 node 150GB cluster, search queries with 1-10K occurrences

Page 232: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Evaluation: Throughput (95% SEARCH + 5% PUT)

Takeaway: Succinct by pushing more data in faster storage provides performance similar to existing systems for 10-11x larger data sizes.

10 node 150GB cluster, search queries with 1-10K occurrences

Page 233: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Evaluation: RegEx Latency40GB Wikipedia dataset, 5 commonly used RegEx queries

Single EC2 node, 32 vCPUs, 244GB RAM

Page 234: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Evaluation: RegEx Latency40GB Wikipedia dataset, 5 commonly used RegEx queries

Single EC2 node, 32 vCPUs, 244GB RAM

Page 235: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Evaluation: RegEx Latency

Takeaway: Succinct significantly speeds up RegEx queries even when all the data fits in memory for all systems.

40GB Wikipedia dataset, 5 commonly used RegEx queries

Single EC2 node, 32 vCPUs, 244GB RAM

Page 236: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Support for JSON

Page 237: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

val ids1 = succinctJsonRDD.search("AMPLab")

Search for JSON documents containing “AMPLab”

Support for JSON

Page 238: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

val ids2 = succinctJsonRDD.filter("city", "Berkeley")

val ids1 = succinctJsonRDD.search("AMPLab")

Filter JSON documents where the “city” attribute has value “Berkeley”

Support for JSON

Page 239: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

val jsonDoc = succinctJsonRDD.get(0)

val ids2 = succinctJsonRDD.filter("city", "Berkeley")

val ids1 = succinctJsonRDD.search("AMPLab")

Get JSON document with id 0

Support for JSON

Page 240: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Layer Additions & Deletions

Page 241: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

9 12RATE = 83 14RATE = 4

15 0 8 5RATE = 2

Layer Additions & Deletions

Page 242: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

9 12RATE = 83 14RATE = 4

Layer Additions & Deletions

Layer Deletions: simple

Page 243: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

RATE = 2

9 12RATE = 83 14RATE = 4

Layer Additions & Deletions

Layer Addition:

Page 244: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

RATE = 2

9 12RATE = 83 14RATE = 4

Unsampled values already computed during query execution

Layer Additions & Deletions

Layer Addition:

Page 245: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

RATE = 2

9 12RATE = 83 14RATE = 4

815

Unsampled values already computed during query execution

Layer Additions & Deletions

Layer Addition:

Layers in LSA populated opportunistically!!

Page 246: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Spatial Skew

Page 247: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Spatial SkewLoad distribution across partitions is heavily skewed

Page 248: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Object

Load

1Compressed

Wasted Cache!

Spatial SkewLoad distribution across partitions is heavily skewed

#Replicas α Load

Selective Replication

Page 249: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Spatial SkewLoad distribution across partitions is heavily skewed

#Replicas α Load

Selective Replication

BlowFish

Fractionally change storage just enough to meet load

1Compressed

Uncompressed10

Object

Load

Page 250: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Spatial SkewLoad distribution across partitions is heavily skewed

#Replicas α Load

Selective Replication

BlowFish

Fractionally change storage just enough to meet load

1.5x higher throughput than Selective Replication,

1Compressed

Uncompressed10

Object

Load

Page 251: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Spatial SkewLoad distribution across partitions is heavily skewed

#Replicas α Load

Selective Replication

BlowFish

Fractionally change storage just enough to meet load

1.5x higher throughput than Selective Replication,

within 10% of optimal

1Compressed

Uncompressed10

Object

Load

Page 252: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Changes in Spatial Skew

Page 253: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Changes in Spatial Skew

Study on Facebook Warehouse Cluster

[HotStorage’13]

Page 254: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Changes in Spatial Skew

Transient failures → 90% of failuresStudy on Facebook Warehouse Cluster

[HotStorage’13]

Page 255: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Changes in Spatial Skew

Transient failures → 90% of failures

Replica creation delayed by 15 mins

Study on Facebook Warehouse Cluster

[HotStorage’13]

Page 256: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Changes in Spatial Skew

Transient failures → 90% of failures

Replica creation delayed by 15 mins

Study on Facebook Warehouse Cluster

[HotStorage’13]

Leads to variation in load over time

Page 257: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Changes in Spatial Skew

Transient failures → 90% of failures

Replica creation delayed by 15 mins

Replica#1

Replica#2

Replica#3

Data Partitions Request Queues

Study on Facebook Warehouse Cluster

[HotStorage’13]

Leads to variation in load over time

Page 258: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Changes in Spatial Skew

Transient failures → 90% of failures

Replica creation delayed by 15 mins

Replica#1

Replica#2

Replica#3

Data Partitions Request Queues

Study on Facebook Warehouse Cluster

[HotStorage’13]

Leads to variation in load over time

Page 259: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Changes in Spatial Skew

Transient failures → 90% of failures

Replica creation delayed by 15 mins

Replica#1

Replica#2

Replica#3

Data Partitions Request Queues

Study on Facebook Warehouse Cluster

[HotStorage’13]

Leads to variation in load over time

Page 260: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Changes in Spatial Skew

Replica#1

Replica#2

Replica#3

Page 261: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Changes in Spatial Skew

Replica#1

Replica#2

Replica#3

Page 262: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Changes in Spatial SkewOperation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Replica#1

Replica#2

Replica#3

Page 263: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Changes in Spatial Skew

Load

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Replica#1

Replica#2

Replica#3

Page 264: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Changes in Spatial Skew

Load Throughput

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Replica#1

Replica#2

Replica#3

Page 265: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Changes in Spatial Skew

Load Throughput

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Replica#1

Replica#2

Replica#3

Page 266: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Changes in Spatial Skew

Load Throughput

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Request Queue Siz

e0K

10K

20K

30K

40K

50K

Time (mins)

0 30 60 90 120

Replica#1

Replica#2

Replica#3

Page 267: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Request Queue Siz

e0K

10K

20K

30K

40K

50K

Time (mins)

0 30 60 90 120

Changes in Spatial Skew

Load Throughput

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Request Queue Siz

e0K

10K

20K

30K

40K

50K

Time (mins)

0 30 60 90 120

Replica#1

Replica#2

Replica#3

Page 268: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Request Queue Siz

e0K

10K

20K

30K

40K

50K

Time (mins)

0 30 60 90 120

Request Queue Siz

e0K

10K

20K

30K

40K

50K

Time (mins)

0 30 60 90 120

Changes in Spatial Skew

Load Throughput

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Request Queue Siz

e0K

10K

20K

30K

40K

50K

Time (mins)

0 30 60 90 120

Replica#1

Replica#2

Replica#3

Page 269: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Request Queue Siz

e0K

10K

20K

30K

40K

50K

Time (mins)

0 30 60 90 120

Request Queue Siz

e0K

10K

20K

30K

40K

50K

Time (mins)

0 30 60 90 120

Changes in Spatial Skew

Load Throughput

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Request Queue Siz

e0K

10K

20K

30K

40K

50K

Time (mins)

0 30 60 90 120

Replica#1

Replica#2

Replica#3

Page 270: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Request Queue Siz

e0K

10K

20K

30K

40K

50K

Time (mins)

0 30 60 90 120

Request Queue Siz

e0K

10K

20K

30K

40K

50K

Time (mins)

0 30 60 90 120

Changes in Spatial Skew

Load Throughput

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Request Queue Siz

e0K

10K

20K

30K

40K

50K

Time (mins)

0 30 60 90 120

Replica#1

Replica#2

Replica#3

Page 271: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Request Queue Siz

e0K

10K

20K

30K

40K

50K

Time (mins)

0 30 60 90 120

Request Queue Siz

e0K

10K

20K

30K

40K

50K

Time (mins)

0 30 60 90 120

Request Queue Siz

e0K

10K

20K

30K

40K

50K

Time (mins)

0 30 60 90 120

Changes in Spatial Skew

Load Throughput

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Request Queue Siz

e0K

10K

20K

30K

40K

50K

Time (mins)

0 30 60 90 120

Replica#1

Replica#2

Replica#3

Page 272: SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Request Queue Siz

e0K

10K

20K

30K

40K

50K

Time (mins)

0 30 60 90 120

Request Queue Siz

e0K

10K

20K

30K

40K

50K

Time (mins)

0 30 60 90 120

Request Queue Siz

e0K

10K

20K

30K

40K

50K

Time (mins)

0 30 60 90 120

Changes in Spatial Skew

Load Throughput

Operation

s / second

0

600

1200

1800

2400

3000

Time (mins)

0 30 60 90 120

Request Queue Siz

e0K

10K

20K

30K

40K

50K

Time (mins)

0 30 60 90 120

Adapts to 3x higher load in < 5 mins

Replica#1

Replica#2

Replica#3